{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T03:33:28Z","timestamp":1771558408193,"version":"3.50.1"},"reference-count":34,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2020,1,18]],"date-time":"2020-01-18T00:00:00Z","timestamp":1579305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In core computer vision tasks, we have witnessed significant advances in object detection, localisation and tracking. However, there are currently no methods to detect, localize and track objects in road environments, and taking into account real-time constraints. In this paper, our objective is to develop a deep learning multi object detection and tracking technique applied to road smart mobility. Firstly, we propose an effective detector-based on YOLOv3 which we adapt to our context. Subsequently, to localize successfully the detected objects, we put forward an adaptive method aiming to extract 3D information, i.e., depth maps. To do so, a comparative study is carried out taking into account two approaches: Monodepth2 for monocular vision and MADNEt for stereoscopic vision. These approaches are then evaluated over datasets containing depth information in order to discern the best solution that performs better in real-time conditions. Object tracking is necessary in order to mitigate the risks of collisions. Unlike traditional tracking approaches which require target initialization beforehand, our approach consists of using information from object detection and distance estimation to initialize targets and to track them later. Expressly, we propose here to improve SORT approach for 3D object tracking. We introduce an extended Kalman filter to better estimate the position of objects. Extensive experiments carried out on KITTI dataset prove that our proposal outperforms state-of-the-art approches.<\/jats:p>","DOI":"10.3390\/s20020532","type":"journal-article","created":{"date-parts":[[2020,1,20]],"date-time":"2020-01-20T04:27:09Z","timestamp":1579494429000},"page":"532","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":50,"title":["Deep Learning for Real-Time 3D Multi-Object Detection, Localisation, and Tracking: Application to Smart Mobility"],"prefix":"10.3390","volume":"20","author":[{"given":"Antoine","family":"Mauri","sequence":"first","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6230-2966","authenticated-orcid":false,"given":"Redouane","family":"Khemmar","sequence":"additional","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4037-2880","authenticated-orcid":false,"given":"Benoit","family":"Decoux","sequence":"additional","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5360-2185","authenticated-orcid":false,"given":"Nicolas","family":"Ragot","sequence":"additional","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5130-4798","authenticated-orcid":false,"given":"Romain","family":"Rossi","sequence":"additional","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"given":"Rim","family":"Trabelsi","sequence":"additional","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1078-5043","authenticated-orcid":false,"given":"R\u00e9mi","family":"Boutteau","sequence":"additional","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"given":"Jean-Yves","family":"Ertaud","sequence":"additional","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2494-8595","authenticated-orcid":false,"given":"Xavier","family":"Savatier","sequence":"additional","affiliation":[{"name":"Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000 Rouen, France"}]}],"member":"1968","published-online":{"date-parts":[[2020,1,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Mukojima, H., Deguchi, D., Kawanishi, Y., Ide, I., Murase, H., Ukai, M., Nagamine, N., and Nakasone, R. (2016, January 25\u201328). Moving camera background-subtraction for obstacle detection on railway tracks. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7533104"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yanan, S., Hui, Z., Li, L., and Hang, Z. (December, January 30). Rail Surface Defect Detection Method Based on YOLOv3 Deep Learning Networks. Proceedings of the 2018 IEEE Chinese Automation Congress (CAC), Xi\u2019an, China.","DOI":"10.1109\/CAC.2018.8623082"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Khemmar, R., Gouveia, M., Decoux, B., and Ertaud, J.Y. (2019, January 18\u201322). Real Time Pedestrian and Object Detection and Tracking-based Deep Learning. Application to Drone Visual Tracking. Proceedings of the International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, Czechia.","DOI":"10.24132\/CSRN.2019.2902.2.5"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Chen, Z., Khemmar, R., Decoux, B., Atahouet, A., and Ertaud, J.Y. (2019, January 22\u201324). Real Time Object Detection, Tracking, and Distance and Motion Estimation based on Deep Learning: Application to Smart Mobility. Proceedings of the 2019 Eighth International Conference on Emerging Security Technologies (EST), Colchester, UK.","DOI":"10.1109\/EST.2019.8806222"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Yang, S., and Baum, M. (2017, January 5\u20139). Extended Kalman filter for extended object tracking. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952985"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016). Ssd: Single shot multibox detector. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_7","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_10","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_11","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1377","DOI":"10.1109\/TIM.2007.900126","article-title":"Real-time tree-foliage surface estimation using a ground laser scanner","volume":"56","author":"Tresanchez","year":"2007","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"786403","DOI":"10.1117\/12.872313","article-title":"Harmonic distortion free distance estimation in ToF camera","volume":"Volume 7864","author":"Kang","year":"2011","journal-title":"Three-Dimensional Imaging, Interaction, and Measurement"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Godard, C., Mac Aodha, O., and Brostow, G.J. (2017, January 21\u201326). Unsupervised monocular depth estimation with left-right consistency. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.699"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ciaparrone, G., S\u00e1nchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R., and Herrera, F. (2019). Deep Learning in Video Multi-Object Tracking: A Survey. Neurocomputing, in press.","DOI":"10.1016\/j.neucom.2019.11.023"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wojke, N., Bewley, A., and Paulus, D. (2017, January 17\u201320). Simple online and realtime tracking with a deep association metric. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2008\/246309","article-title":"Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics","volume":"2008","author":"Bernardin","year":"2008","journal-title":"J. Image Video Process."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The KITTI dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_19","unstructured":"Cordts, M., Omran, M., Ramos, S., Scharw\u00e4chter, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2020, January 16). The Cityscapes Dataset. Available online: https:\/\/www.visinf.tu-darmstadt.de\/media\/visinf\/vi_papers\/2015\/cordts-cvprws.pdf."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","article-title":"The pascal visual object classes challenge: A retrospective","volume":"111","author":"Everingham","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft coco: Common objects in context. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_23","unstructured":"Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., and Duerig, T. (2018). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ragot, N., Khemmar, R., Pokala, A., Rossi, R., and Ertaud, J.Y. (2019, January 22\u201324). Benchmark of Visual SLAM Algorithms: ORB-SLAM2 vs RTAB-Map. Proceedings of the 2019 Eighth International Conference on Emerging Security Technologies (EST), Colchester, UK.","DOI":"10.1109\/EST.2019.8806213"},{"key":"ref_25","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_26","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016, January 5\u201310). R-fcn: Object detection via region-based fully convolutional networks. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_27","unstructured":"Redmon, J. (2020, January 16). Darknet: Open Source Neural Networks in C. Available online: http:\/\/pjreddie.com\/darknet\/."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhou, T., Brown, M., Snavely, N., and Lowe, D.G. (2017, January 21\u201326). Unsupervised learning of depth and ego-motion from video. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.700"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Tosi, F., Aleotti, F., Poggi, M., and Mattoccia, S. (2019, January 16\u201320). Learning monocular depth estimation infusing traditional stereo knowledge. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01003"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Godard, C., Mac Aodha, O., Firman, M., and Brostow, G. (2018). Digging into self-supervised monocular depth estimation. arXiv.","DOI":"10.1109\/ICCV.2019.00393"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1431","DOI":"10.1109\/TIP.2008.925372","article-title":"Cost aggregation and occlusion handling with WLS in stereo matching","volume":"17","author":"Min","year":"2008","journal-title":"IEEE Trans. Image Process."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., and Stefano, L.D. (2019, January 15\u201321). Real-time self-adaptive deep stereo. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00028"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kalman (1960). A New Approach to Linear Filtering and Prediction Problems. Trans. ASME J. Basic Eng., 82, 35\u201345.","DOI":"10.1115\/1.3662552"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zendel, O., Murschitz, M., Zeilinger, M., Steininger, D., Abbasi, S., and Beleznai, C. (2019, January 16\u201320). RailSem19: A Dataset for Semantic Rail Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00161"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/2\/532\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:43:55Z","timestamp":1760363035000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/2\/532"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,18]]},"references-count":34,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2020,1]]}},"alternative-id":["s20020532"],"URL":"https:\/\/doi.org\/10.3390\/s20020532","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,18]]}}}