{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T12:10:03Z","timestamp":1772280603171,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,1,19]],"date-time":"2022-01-19T00:00:00Z","timestamp":1642550400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Detecting objects with a small representation in images is a challenging task, especially when the style of the images is very different from recent photos, which is the case for cultural heritage datasets. This problem is commonly known as few-shot object detection and is still a new field of research. This article presents a simple and effective method for black box few-shot object detection that works with all the current state-of-the-art object detection models. We also present a new dataset called MMSD for medieval musicological studies that contains five classes and 693 samples, manually annotated by a group of musicology experts. Due to the significant diversity of styles and considerable disparities between the artistic representations of the objects, our dataset is more challenging than the current standards. We evaluate our method on YOLOv4 (m\/s), (Mask\/Faster) RCNN, and ViT\/Swin-t. We present two methods of benchmarking these models based on the overall data size and the worst-case scenario for object detection. The experimental results show that our method always improves object detector results compared to traditional transfer learning, regardless of the underlying architecture.<\/jats:p>","DOI":"10.3390\/jimaging8020018","type":"journal-article","created":{"date-parts":[[2022,1,19]],"date-time":"2022-01-19T08:20:57Z","timestamp":1642580457000},"page":"18","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Few-Shot Object Detection: Application to Medieval Musicological Studies"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1527-4314","authenticated-orcid":false,"given":"Bekkouch Imad Eddine","family":"Ibrahim","sequence":"first","affiliation":[{"name":"Sorbonne Center for Artificial Intelligence, Sorbonne University, 75005 Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3775-1495","authenticated-orcid":false,"given":"Victoria","family":"Eyharabide","sequence":"additional","affiliation":[{"name":"STIH Laboratory, Sorbonne University, 75005 Paris, France"}]},{"given":"Val\u00e9rie","family":"Le Page","sequence":"additional","affiliation":[{"name":"IReMus Laboratory, Sorbonne University, 75002 Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6939-4739","authenticated-orcid":false,"given":"Fr\u00e9d\u00e9ric","family":"Billiet","sequence":"additional","affiliation":[{"name":"IReMus Laboratory, Sorbonne University, 75002 Paris, France"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Bekkouch, I.E.I., Eyharabide, V., and Billiet, F. (2021, January 18\u201322). Dual Training for Transfer Learning: Application on Medieval Studies. Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China.","DOI":"10.1109\/IJCNN52387.2021.9534426"},{"key":"ref_2","unstructured":"Arai, K. (2022). Adversarial Domain Adaptation for Medieval Instrument Recognition. Intelligent Systems and Applications, Springer International Publishing."},{"key":"ref_3","first-page":"153","article-title":"Multi-agent shape models for hip landmark detection in MR scans","volume":"11596","author":"Landman","year":"2021","journal-title":"Medical Imaging 2021: Image Processing"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"42424","DOI":"10.1109\/ACCESS.2021.3066041","article-title":"Adversarial Reconstruction Loss for Domain Generalization","volume":"9","author":"Bekkouch","year":"2021","journal-title":"IEEE Access"},{"key":"ref_5","unstructured":"Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Yan, Z., Tomizuka, M., Gonzalez, J., Keutzer, K., and Vajda, P. (2020). Visual Transformers: Token-based Image Representation and Processing for Computer Vision. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., and Darrell, T. (2019, January 27\u201328). Few-shot object detection via feature reweighting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00851"},{"key":"ref_7","unstructured":"Wang, X., Huang, T.E., Darrell, T., Gonzalez, J.E., and Yu, F. (2020). Frustratingly Simple Few-Shot Object Detection. arXiv."},{"key":"ref_8","unstructured":"Wang, Y., and Yao, Q. (2019). Few-Shot Learning: A Survey. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Vanschoren, J. (2019). Meta-learning. Automated Machine Learning, Springer.","DOI":"10.1007\/978-3-030-05318-5_2"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Batanina, E., Bekkouch, I.E.I., Youssry, Y., Khan, A., Khattak, A.M., and Bortnikov, M. (2019, January 6\u20139). Domain Adaptation for Car Accident Detection in Videos. Proceedings of the 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), Istanbul, Turkey.","DOI":"10.1109\/IPTA.2019.8936124"},{"key":"ref_11","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_12","first-page":"91","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on COMPUTER Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_17","unstructured":"Farhadi, A., and Redmon, J. (2018). Yolov3: An incremental improvement. Computer Vision and Pattern Recognition, Springer."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yakovlev, K., Bekkouch, I.E.I., Khan, A.M., and Khattak, A.M. (2020, January 3\u20134). Abstraction-Based Outlier Detection for Image Data. Proceedings of the SAI Intelligent Systems Conference, London, UK.","DOI":"10.1007\/978-3-030-55180-3_40"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1109\/TNNLS.2020.3027667","article-title":"Anomaly Detection Based on Zero-Shot Outlier Synthesis and Hierarchical Feature Distillation","volume":"33","author":"Rivera","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ibrahim, B.I., Nicolae, D.C., Khan, A., Ali, S.I., and Khattak, A. (2020, January 17\u201319). VAE-GAN Based Zero-Shot Outlier Detection. Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control (ISCSIC 2020), Tyne, UK.","DOI":"10.1145\/3440084.3441180"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Choi, H.T., Lee, H.J., Kang, H., Yu, S., and Park, H.H. (2021). SSD-EMB: An Improved SSD Using Enhanced Feature Map Block for Object Detection. Sensors, 21.","DOI":"10.3390\/s21082842"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11063-020-10197-9","article-title":"An evaluation of retinanet on indoor object detection for blind and visually impaired persons assistance navigation","volume":"51","author":"Afif","year":"2020","journal-title":"Neural Process. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","first-page":"10347","article-title":"Training data-efficient image transformers & distillation through attention","volume":"139","author":"Meila","year":"2021","journal-title":"Proceedings of the 38th International Conference on Machine Learning, Online, 18\u201324 July 2021"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, K., Huang, Z., Cheng, Y.C., and Lee, C.H. (2014, January 4\u20139). A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6854454"},{"key":"ref_26","unstructured":"Chen, W., Liu, Y., Kira, Z., Wang, Y.F., and Huang, J. (2019). A Closer Look at Few-shot Classification. arXiv."},{"key":"ref_27","unstructured":"Finn, C., Abbeel, P., and Levine, S. (2017, January 6\u201311). Model-agnostic meta-learning for fast adaptation of deep networks. Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Eyharabide, V., Bekkouch, I.E.I., and Constantin, N.D. (2021). Knowledge Graph Embedding-Based Domain Adaptation for Musical Instrument Recognition. Computers, 10.","DOI":"10.3390\/computers10080094"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Elgammal, A., Kang, Y., and Den Leeuw, M. (2018, January 2\u20137). Picasso, matisse, or a fake? Automated analysis of drawings at the stroke level for attribution and authentication. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11313"},{"key":"ref_30","unstructured":"Xu, Z., Wilber, M., Fang, C., Hertzmann, A., and Jin, H. (2018). Learning from multi-domain artistic images for arbitrary style transfer. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sabatelli, M., Kestemont, M., Daelemans, W., and Geurts, P. (2018, January 14). Deep transfer learning for art classification problems. Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany.","DOI":"10.1007\/978-3-030-11012-3_48"},{"key":"ref_32","first-page":"206","article-title":"Application of pattern recognition in detection of buried archaeological sites based on analysing environmental variables, Khorramabad Plain, West Iran","volume":"8","author":"Sharafi","year":"2016","journal-title":"J. Archaeol. Sci. Rep."},{"key":"ref_33","first-page":"31","article-title":"Learning to look at LiDAR: The use of R-CNN in the automated detection of archaeological objects in LiDAR data from the Netherlands","volume":"2","author":"Lambers","year":"2019","journal-title":"J. Comput. Appl. Archaeol."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Bekkouch, I.E.I., Youssry, Y., Gafarov, R., Khan, A., and Khattak, A.M. (2019). Triplet Loss Network for Unsupervised Domain Adaptation. Algorithms, 12.","DOI":"10.3390\/a12050096"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kaselimi, M., Doulamis, N., Doulamis, A., Voulodimos, A., and Protopapadakis, E. (2019, January 12\u201317). Bayesian-optimized Bidirectional LSTM Regression Model for Non-intrusive Load Monitoring. Proceedings of the ICASSP 2019\u20132019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8683110"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., and Sculley, D. (2017, January 13\u201317). Google vizier: A service for black-box optimization. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada.","DOI":"10.1145\/3097983.3098043"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Feurer, M., and Hutter, F. (2019). Hyperparameter optimization. Automated Machine Learning, Springer.","DOI":"10.1007\/978-3-030-05318-5_1"}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/8\/2\/18\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:03:56Z","timestamp":1760133836000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/8\/2\/18"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,19]]},"references-count":37,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["jimaging8020018"],"URL":"https:\/\/doi.org\/10.3390\/jimaging8020018","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,19]]}}}