{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T16:42:32Z","timestamp":1764175352291,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2023,3,16]],"date-time":"2023-03-16T00:00:00Z","timestamp":1678924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"SEGULA Technologies\u2019 collaboration with IRSEEM"},{"name":"ANRT (Association Nationale de la Recherche et de la Technologie)\u2019s CIFRE program"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Three-dimensional (3D) real-time object detection and tracking is an important task in the case of autonomous vehicles and road and railway smart mobility, in order to allow them to analyze their environment for navigation and obstacle avoidance purposes. In this paper, we improve the efficiency of 3D monocular object detection by using dataset combination and knowledge distillation, and by creating a lightweight model. Firstly, we combine real and synthetic datasets to increase the diversity and richness of the training data. Then, we use knowledge distillation to transfer the knowledge from a large, pre-trained model to a smaller, lightweight model. Finally, we create a lightweight model by selecting the combinations of width, depth &amp; resolution in order to reach a target complexity and computation time. Our experiments showed that using each method improves either the accuracy or the efficiency of our model with no significant drawbacks. Using all these approaches is especially useful for resource-constrained environments, such as self-driving cars and railway systems.<\/jats:p>","DOI":"10.3390\/s23063197","type":"journal-article","created":{"date-parts":[[2023,3,17]],"date-time":"2023-03-17T02:59:26Z","timestamp":1679021966000},"page":"3197","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Improving the Efficiency of 3D Monocular Object Detection and Tracking for Road and Railway Smart Mobility"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4165-5789","authenticated-orcid":false,"given":"Alexandre","family":"Evain","sequence":"first","affiliation":[{"name":"Univ Rouen Normandie, Normandie Univ, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0313-3859","authenticated-orcid":false,"given":"Antoine","family":"Mauri","sequence":"additional","affiliation":[{"name":"Univ Rouen Normandie, Normandie Univ, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"given":"Fran\u00e7ois","family":"Garnier","sequence":"additional","affiliation":[{"name":"Univ Rouen Normandie, Normandie Univ, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"given":"Messmer","family":"Kounouho","sequence":"additional","affiliation":[{"name":"Univ Rouen Normandie, Normandie Univ, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6230-2966","authenticated-orcid":false,"given":"Redouane","family":"Khemmar","sequence":"additional","affiliation":[{"name":"Univ Rouen Normandie, Normandie Univ, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8933-4714","authenticated-orcid":false,"given":"Madjid","family":"Haddad","sequence":"additional","affiliation":[{"name":"SEGULA Technologies, 19 Rue d\u2019Arras, 92000 Nanterre, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1078-5043","authenticated-orcid":false,"given":"R\u00e9mi","family":"Boutteau","sequence":"additional","affiliation":[{"name":"Univ Rouen Normandie, INSA Rouen Normandie, Universit\u00e9 Le Havre Normandie, Normandie Univ, LITIS UR 4108, 76000 Rouen, France"}]},{"given":"S\u00e9bastien","family":"Breteche","sequence":"additional","affiliation":[{"name":"SEGULA Technologies, 19 Rue d\u2019Arras, 92000 Nanterre, France"}]},{"given":"Sofiane","family":"Ahmedali","sequence":"additional","affiliation":[{"name":"IBISC, Evry-Val-d\u2019Essonne University, Universite Paris-Saclay, 91080 \u00c9vry-Courcouronnes, France"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1007\/s11554-022-01202-6","article-title":"Lightweight convolutional neural network for real-time 3D object detection in road and railway environments","volume":"19","author":"Mauri","year":"2022","journal-title":"J.-Real-Time Image Process."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S.K., Girshick, R.B., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_3","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_4","unstructured":"Bochkovskiy, A., Wang, C., and Liao, H.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_5","unstructured":"Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., NanoCode012, Kwon, Y., Xie, T., Michael, K., Fang, J., and Imyhxy (2023, March 13). Ultralytics\/yolov5: v6.2\u2014YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai Integrations. Available online: https:\/\/zenodo.org\/record\/7002879#.ZBMIUHYo9PY."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, Y., Wang, L., and Liu, M. (June, January 30). YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9561423"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Mousavian, A., Anguelov, D., Flynn, J., and Kosecka, J. (2017, January 21\u201326). 3D Bounding Box Estimation Using Deep Learning and Geometry. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.597"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"919","DOI":"10.1109\/LRA.2021.3052442","article-title":"Ground-aware Monocular 3D Object Detection for Autonomous Driving","volume":"6","author":"Liu","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_9","unstructured":"Wang, C., Yeh, I., and Liao, H.M. (2021). You Only Learn One Representation: Unified Network for Multiple Tasks. arXiv."},{"key":"ref_10","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). YOLOX: Exceeding YOLO Series in 2021. arXiv."},{"key":"ref_11","unstructured":"Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., and Nie, W. (2022). YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv."},{"key":"ref_12","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, T., Pang, J., and Lin, D. (2022). Monocular 3D Object Detection with Depth from Motion. arXiv.","DOI":"10.1007\/978-3-031-20077-9_23"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Qin, Z., and Li, X. (2022). MonoGround: Detecting Monocular 3D Objects from the Ground. arXiv.","DOI":"10.1109\/CVPR52688.2022.00377"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Huang, K.C., Wu, T.H., Su, H.T., and Hsu, W.H. (2022). MonoDTR: Monocular 3D Object Detection with Depth-Aware Transforme. arXiv.","DOI":"10.1109\/CVPR52688.2022.00398"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhou, Y., He, Y., Zhu, H., Wang, C., Li, H., and Jiang, Q. (2021). Monocular 3D Object Detection: An Extrinsic Parameter Free Approach. arXiv.","DOI":"10.1109\/CVPR46437.2021.00747"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Lu, J., and Zhou, J. (2021). Objects are Different: Flexible Monocular 3D Object Detection. arXiv.","DOI":"10.1109\/CVPR46437.2021.00330"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Li, P., Zhao, H., Liu, P., and Cao, F. (2020). RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving. arXiv.","DOI":"10.1007\/978-3-030-58580-8_38"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, Z., Zhou, D., Lu, F., Fang, J., and Zhang, L. (2021). AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection. arXiv.","DOI":"10.1109\/ICCV48922.2021.01535"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (2020, January 13\u201319). nuScenes: A multimodal dataset for autonomous driving. Proceedings of the CVPR, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Li, F.-F. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., and Doll\u00e1r, P. (2014). Microsoft COCO: Common Objects in Context. arXiv.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_26","unstructured":"Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and Koltun, V. (2017, January 13\u201315). CARLA: An Open Urban Driving Simulator. Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ros, G., Sellart, L., Materzynska, J., Vazquez, D., and Lopez, A.M. (2016, January 27\u201330). The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.352"},{"key":"ref_28","first-page":"102","article-title":"Playing for Data: Ground Truth from Computer Games","volume":"Volume 9906","author":"Leibe","year":"2016","journal-title":"Proceedings of the European Conference on Computer Vision (ECCV)"},{"key":"ref_29","unstructured":"Kr\u1e27enb\u00fchl, P. (2018, January 18\u201323). Free Supervision from Video Games. Proceedings of the CVPR, Salt Lake City, UT, USA."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Khemmar, R., Mauri, A., Dulompont, C., Gajula, J., Vauchey, V., Haddad, M., and Boutteau, R. (2022). Road and railway smart mobility: A high-definition ground truth hybrid dataset. Sensors, 22.","DOI":"10.3390\/s22103922"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zendel, O., Murschitz, M., Zeilinger, M., Steininger, D., Abbasi, S., and Beleznai, C. (2019, January 15\u201320). RailSem19: A Dataset for Semantic Rail Scene Understanding. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00161"},{"key":"ref_32","unstructured":"Harb, J., R\u00e9b\u00e9na, N., Chosidow, R., Roblin, G., Potarusov, R., and Hajri, H. (2020). FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B. (2016, January 25\u201328). Simple Online and Realtime Tracking. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7533003"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wojke, N., Bewley, A., and Paulus, D. (2017, January 17\u201320). Simple Online and Realtime Tracking with a Deep Association Metric. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Meinhardt, T., Kirillov, A., Leal-Taix\u00e9, L., and Feichtenhofer, C. (2022, January 19\u201320). TrackFormer: Multi-Object Tracking with Transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00864"},{"key":"ref_36","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1789","DOI":"10.1007\/s11263-021-01453-z","article-title":"Knowledge Distillation: A Survey","volume":"129","author":"Gou","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_38","unstructured":"Asif, U., Tang, J., and Harrer, S. (2019). Ensemble Knowledge Distillation for Learning Improved and Efficient Network. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., and Ghasemzadeh, H. (2020). Improved Knowledge Distillation via Teacher Assistant. Proc. AAAI Conf. Artif. Intell., 34.","DOI":"10.1609\/aaai.v34i04.5963"},{"key":"ref_40","first-page":"120","article-title":"The OpenCV Library","volume":"25","author":"Bradski","year":"2000","journal-title":"Dr. Dobb\u2019s J. Softw. Tools"},{"key":"ref_41","first-page":"6105","article-title":"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks","volume":"97","author":"Tan","year":"2019","journal-title":"PMLR"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Koonce, B. (2021). Convolutional Neural Networks with Swift for Tensorflow, Springer.","DOI":"10.1007\/978-1-4842-6168-2"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/MCSE.2007.55","article-title":"Matplotlib: A 2D graphics environment","volume":"9","author":"Hunter","year":"2007","journal-title":"Comput. Sci. Eng."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/6\/3197\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:57:09Z","timestamp":1760122629000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/6\/3197"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,16]]},"references-count":43,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23063197"],"URL":"https:\/\/doi.org\/10.3390\/s23063197","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,3,16]]}}}