{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T17:46:36Z","timestamp":1778175996341,"version":"3.51.4"},"reference-count":52,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2024,10,18]],"date-time":"2024-10-18T00:00:00Z","timestamp":1729209600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>State-of-the-art object detection models need large and diverse datasets for training. As these are hard to acquire for many practical applications, training images from simulation environments gain more and more attention. A problem arises as deep learning models trained on simulation images usually have problems generalizing to real-world images shown by a sharp performance drop. Definite reasons and influences for this performance drop are not yet found. While previous work mostly investigated the influence of the data as well as the use of domain adaptation, this work provides a novel perspective by investigating the influence of the object detection model itself. Against this background, first, a corresponding measure called sim-to-real generalizability is defined, comprising the capability of an object detection model to generalize from simulation training images to real-world evaluation images. Second, 12 different deep learning-based object detection models are trained and their sim-to-real generalizability is evaluated. The models are trained with a variation of hyperparameters resulting in a total of 144 trained and evaluated versions. The results show a clear influence of the feature extractor and offer further insights and correlations. They open up future research on investigating influences on the sim-to-real generalizability of deep learning-based object detection models as well as on developing feature extractors that have better sim-to-real generalizability capabilities.<\/jats:p>","DOI":"10.3390\/jimaging10100259","type":"journal-article","created":{"date-parts":[[2024,10,18]],"date-time":"2024-10-18T06:46:52Z","timestamp":1729234012000},"page":"259","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Investigating the Sim-to-Real Generalizability of Deep Learning Object Detection Models"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5559-5481","authenticated-orcid":false,"given":"Joachim","family":"R\u00fcter","sequence":"first","affiliation":[{"name":"German Aerospace Center (DLR), Institute of Flight Systems, 38108 Braunschweig, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2928-1710","authenticated-orcid":false,"given":"Umut","family":"Durak","sequence":"additional","affiliation":[{"name":"German Aerospace Center (DLR), Institute of Flight Systems, 38108 Braunschweig, Germany"},{"name":"Institute of Computer Science, Clausthal University of Technology, 38678 Clausthal-Zellerfeld, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8287-2376","authenticated-orcid":false,"given":"Johann C.","family":"Dauer","sequence":"additional","affiliation":[{"name":"German Aerospace Center (DLR), Institute of Flight Systems, 38108 Braunschweig, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,10,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Richter, S.R., Vineet, V., Roth, S., and Koltun, V. (2016, January 11\u201314). Playing for Data: Ground Truth from Computer Games. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_7"},{"key":"ref_2","unstructured":"Wrenninge, M., and Unger, J. (2018). Synscapes: A Photorealistic Synthetic Dataset for Street Scene Parsing. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Gaidon, A., Wang, Q., Cabon, Y., and Vig, E. (2016, January 27\u201330). Virtual Worlds as Proxy for Multi-Object Tracking Analysis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.470"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bondi, E., Dey, D., Kapoor, A., Piavis, J., Shah, S., Fang, F., Dilkina, B., Hannaford, R., Iyer, A., and Joppa, L. (2018, January 20\u201322). AirSim-W: A Simulation Environment for Wildlife Conservation with UAVs. Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, San Jose, CA, USA.","DOI":"10.1145\/3209811.3209880"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Hinniger, C., and R\u00fcter, J. (2023). Synthetic Training Data for Semantic Segmentation of the Environment from UAV Perspective. Aerospace, 10.","DOI":"10.3390\/aerospace10070604"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kiefer, B., Ott, D., and Zell, A. (2022, January 21\u201325). Leveraging Synthetic Data in Object Detection on Unmanned Aerial Vehicles. Proceedings of the 26th International Conference on Pattern Recognition (ICPR), Montr\u00e9al, AC, Canada.","DOI":"10.1109\/ICPR56361.2022.9956710"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Krump, M., Ru\u00df, M., and St\u00fctz, P. (2019, January 29\u201331). Deep learning algorithms for vehicle detection on UAV platforms: First investigations on the effects of synthetic training. Proceedings of the International Conference on Modelling and Simulation for Autonomous Systems (MESAS), Palermo, Italy.","DOI":"10.1007\/978-3-030-43890-6_5"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Krump, M., and St\u00fctz, P. (2022, January 20\u201321). UAV Based Vehicle Detection on Real and Synthetic Image Pairs: Performance Differences and Influence Analysis of Context and Simulation Parameters. Proceedings of the International Conference on Modelling and Simulation for Autonomous Systems (MESAS), Prague, Czech Republic.","DOI":"10.1007\/978-3-030-98260-7_1"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Deisenroth, M.P., Faisal, A.A., and Ong, C.S. (2020). Mathematics for Machine Learning, Cambridge University Press.","DOI":"10.1017\/9781108679930"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.neucom.2018.05.083","article-title":"Deep visual domain adaptation: A survey","volume":"312","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1109\/MSP.2014.2347059","article-title":"Visual domain adaptation: A survey of recent advances","volume":"32","author":"Patel","year":"2015","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_12","unstructured":"European Aviation Safety Agency (EASA), and Deadalean AG (2020). Concepts of Design Assurance for Neural Networks (CoDANN), Technical Report."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ros, G., Sellart, L., Materzynska, J., Vazquez, D., and Lopez, A.M. (2016, January 27\u201330). The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.352"},{"key":"ref_14","unstructured":"Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S.N., Rosaen, K., and Vasudevan, R. (June, January 29). Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks?. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore."},{"key":"ref_15","unstructured":"Nowruzi, F.E., Kapoor, P., Kolhatkar, D., Hassanat, F.A., Laganiere, R., and Rebut, J. (2019). How much real data do we actually need: Analyzing object detection performance using synthetic and real data. arXiv."},{"key":"ref_16","unstructured":"Laux, L., Schirmer, S., Schopferer, S., and Dauer, J.C. (2022, January 22). Build Your Own Training Data\u2014Synthetic Data for Object Detection in Aerial Images. Proceedings of the 4th Workshop on Avionics Systems and Software Engineering, Virtual."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Konen, K., and Hecking, T. (2021, January 1\u20133). Increased Robustness of Object Detection on Aerial Image Datasets using Simulated Imagery. Proceedings of the IEEE Fourth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), Laguna Hills, CA, USA.","DOI":"10.1109\/AIKE52691.2021.00007"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Shah, S., Dey, D., Lovett, C., and Kapoor, A. (2018). AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. arXiv, Available online: https:\/\/microsoft.github.io\/AirSim\/.","DOI":"10.1007\/978-3-319-67361-5_40"},{"key":"ref_19","unstructured":"Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and Koltun, V. (2017, January 13\u201315). CARLA: An Open Urban Driving Simulator. Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T.S., and Wang, Y. (2017, January 23\u201327). UnrealCV: Virtual Worlds for Computer Vision. Proceedings of the MM\u201917: Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA.","DOI":"10.1145\/3123266.3129396"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Beery, S., Liu, Y., Morris, D., Piavis, J., Kapoor, A., Joshi, N., Meister, M., and Perona, P. (2020, January 1\u20135). Synthetic Examples Improve Generalization for Rare Classes. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093570"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"R\u00fcter, J., Maienschein, T., Schirmer, S., Schopferer, S., and Torens, C. (2024). Filling the Gaps: Using Synthetic Low-Altitude Aerial Images to Increase Operational Design Domain Coverage. Sensors, 24.","DOI":"10.3390\/s24041144"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Shermeyer, J., Hossler, T., van Etten, A., Hogan, D., Lewis, R., and Kim, D. (2021, January 5\u20139). RarePlanes: Synthetic Data Takes Flight. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Virtual.","DOI":"10.1109\/WACV48630.2021.00025"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"R\u00fcter, J., and Schmidt, R. (2023, January 17\u201319). Using Only Synthetic Images to Train a Drogue Detector for Aerial Refueling. Proceedings of the International Conference on Modelling and Simulation for Autonomous Systems (MESAS), Palermo, Italy.","DOI":"10.1007\/978-3-031-71397-2_25"},{"key":"ref_25","unstructured":"Kar, A., Prakash, A., Liu, M.Y., Cameracci, E., Yuan, J., Rusiniak, M., Acuna, D., Torralba, A., and Fidler, S. (November, January 27). Meta-Sim: Learning to Generate Synthetic Datasets. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Krump, M., and St\u00fctz, P. (2020, January 21). UAV Based Vehicle Detection with Synthetic Training: Identification of Performance Factors Using Image Descriptors and Machine Learning. Proceedings of the International Conference on Modelling and Simulation for Autonomous Systems (MESAS), Prague, Czech Republic.","DOI":"10.1007\/978-3-030-70740-8_5"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Saleh, F.S., Aliakbarian, M.S., Salzmann, M., Petersson, L., and Alvarez, J.M. (2018, January 8\u201314). Effective Use of Synthetic Data for Urban Scene Semantic Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_6"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1016\/j.neucom.2020.01.085","article-title":"Recent advances in deep learning for object detection","volume":"396","author":"Wu","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"3212","DOI":"10.1109\/TNNLS.2018.2876865","article-title":"Object detection with deep learning: A review","volume":"30","author":"Zhao","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_31","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). SSD: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_34","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). FCOS: Fully Convolutional One-Stage Object Detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_35","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S., and Komodakis, N. (2016). Wide residual networks. arXiv.","DOI":"10.5244\/C.30.87"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_39","unstructured":"Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., and Vasudevan, V. (November, January 27). Searching for mobilenetv3. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_40","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201323). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q.V. (2019, January 15\u201320). Mnasnet: Platform-aware neural architecture search for mobile. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00293"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland. Available online: https:\/\/cocodataset.org\/.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_46","unstructured":"Wehrtechnische Dienststelle f\u00fcr Luftfahrzeuge und Luftfahrtger\u00e4t der Bundeswehr (WTD 61) Images of Air-to-Air Refueling Kindly Provided to German Aerospace Center (DLR) for Research Purposes. Unpublished Work."},{"key":"ref_47","first-page":"8026","article-title":"Pytorch: An imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_49","unstructured":"Kingma, D.P., and Ba, J. (2017). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., and Kalinin, A.A. (2020). Albumentations: Fast and flexible image augmentations. Information, 11.","DOI":"10.3390\/info11020125"},{"key":"ref_51","unstructured":"Loshchilov, I., and Hutter, F. (2019). Decoupled Weight Decay Regularization. arXiv."},{"key":"ref_52","unstructured":"Wilson, A.C., Roelofs, R., Stern, M., Srebro, N., and Recht, B. (2018). The Marginal Value of Adaptive Gradient Methods in Machine Learning. arXiv."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/10\/10\/259\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:16:03Z","timestamp":1760112963000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/10\/10\/259"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,18]]},"references-count":52,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["jimaging10100259"],"URL":"https:\/\/doi.org\/10.3390\/jimaging10100259","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,18]]}}}