{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T10:29:28Z","timestamp":1771064968102,"version":"3.50.1"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,2,11]],"date-time":"2022-02-11T00:00:00Z","timestamp":1644537600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,2,11]],"date-time":"2022-02-11T00:00:00Z","timestamp":1644537600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003032","name":"anrt","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003032","id-type":"DOI","asserted-by":"crossref"}]},{"name":"segula technologies"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Real-Time Image Proc"],"published-print":{"date-parts":[[2022,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>For smart mobility, and autonomous vehicles (AV), it is necessary to have a very precise perception of the environment to guarantee reliable decision-making, and to be able to extend the results obtained for the road sector to other areas such as rail. To this end, we introduce a new single-stage monocular real-time 3D object detection convolutional neural network (CNN) based on YOLOv5, dedicated to smart mobility applications for both road and rail environments. To perform the 3D parameter regression, we replace YOLOv5\u2019s anchor boxes with our hybrid anchor boxes. Our method is available in different model sizes such as YOLOv5: small, medium, and large. The new model that we propose is optimized for real-time embedded constraints (lightweight, speed, and accuracy) that takes advantage of the improvement brought by split attention (SA) convolutions called small split attention model (Small-SA). To validate our CNN model, we also introduce a new virtual dataset for both road and rail environments by leveraging the video game Grand Theft Auto V (GTAV). We provide extensive results of our different models on both KITTI and our own GTAV datasets. Through our results, we show that our method is the fastest available 3D object detection with accuracy results close to state-of-the-art methods on the KITTI road dataset. We further demonstrate that the pre-training process on our GTAV virtual dataset improves the accuracy on real datasets such as KITTI, thus allowing our method to obtain an even greater accuracy than state-of-the-art approaches with 16.16% 3D average precision on hard car detection with inference time of 11.1 ms\/image on an RTX 3080 GPU.<\/jats:p>","DOI":"10.1007\/s11554-022-01202-6","type":"journal-article","created":{"date-parts":[[2022,2,11]],"date-time":"2022-02-11T08:02:33Z","timestamp":1644566553000},"page":"499-516","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Lightweight convolutional neural network for real-time 3D object detection in road and railway environments"],"prefix":"10.1007","volume":"19","author":[{"given":"A.","family":"Mauri","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6230-2966","authenticated-orcid":false,"given":"R.","family":"Khemmar","sequence":"additional","affiliation":[]},{"given":"B.","family":"Decoux","sequence":"additional","affiliation":[]},{"given":"M.","family":"Haddad","sequence":"additional","affiliation":[]},{"given":"R.","family":"Boutteau","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,2,11]]},"reference":[{"issue":"5","key":"1202_CR1","doi-asserted-by":"publisher","first-page":"4340","DOI":"10.1109\/TGRS.2020.3016820","volume":"59","author":"D Hong","year":"2020","unstructured":"Hong, D., Gao, L., Yokoya, N., Yao, J., Chanussot, J., Du, Q., Zhang, B.: More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 59(5), 4340\u20134354 (2020)","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"1202_CR2","doi-asserted-by":"crossref","unstructured":"Hong, D., Han, Z., Yao, J., Gao, L., Zhang, B., Plaza, A., Chanussot, J.: Spectralformer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. (2021)","DOI":"10.1109\/TGRS.2021.3130716"},{"key":"1202_CR3","doi-asserted-by":"publisher","unstructured":"Jocher, G., Stoken, A., J.\u00a0B. et\u00a0al., ultralytics\/yolov5: v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration (2021). [Online]. Available: https:\/\/doi.org\/10.5281\/zenodo.4418161","DOI":"10.5281\/zenodo.4418161"},{"key":"1202_CR4","unstructured":"Cordts, M., Omran, M., Ramos, S., Scharw\u00e4chter, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset. In: CVPR Workshop on the Future of Datasets in Vision, vol.\u00a02 (2015)"},{"issue":"1","key":"1202_CR5","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","volume":"111","author":"M Everingham","year":"2015","unstructured":"Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98\u2013136 (2015)","journal-title":"Int. J. Comput. Vis."},{"issue":"11","key":"1202_CR6","doi-asserted-by":"publisher","first-page":"1231","DOI":"10.1177\/0278364913491297","volume":"32","author":"A Geiger","year":"2013","unstructured":"Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231\u20131237 (2013)","journal-title":"Int. J. Robot. Res."},{"key":"1202_CR7","doi-asserted-by":"crossref","unstructured":"Caesar, H., Bankiti, V., Lang, A.\u00a0H., Vora, S., Liong, V.\u00a0E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: \u201cnuscenes: A multimodal dataset for autonomous driving,\u201d In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 11 621\u201311 631, (2020)","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"1202_CR8","doi-asserted-by":"crossref","unstructured":"Singh, G., Akrigg, S., Di\u00a0Maio, M., Fontana, V., Alitappeh, R.\u00a0J., Saha, S., Jeddisaravi, K., Yousefi, F., Culley, J., Nicholson, T., et\u00a0al.: Road: The road event awareness dataset for autonomous driving. arXiv preprint arXiv:2102.11585, (2021)","DOI":"10.1109\/TPAMI.2022.3150906"},{"key":"1202_CR9","doi-asserted-by":"crossref","unstructured":"Chang, M.-F., Lambert, J., Sangkloy, P., Singh, J., Bak, S., Hartnett, A., Wang, D., Carr, P., Lucey, S., Ramanan, D., et\u00a0al.: Argoverse: 3d tracking and forecasting with rich maps. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 8748\u20138757 (2019)","DOI":"10.1109\/CVPR.2019.00895"},{"key":"1202_CR10","unstructured":"Kesten, R., Usman, M., Houston, J., Pandya, T., Nadhamuni, K., Ferreira, A., Yuan, M., Low, B., Jain, A., Ondruska, P., et\u00a0al.: Lyft level 5 av dataset 2019. https:\/\/level-5.global\/data\/ (2019)"},{"key":"1202_CR11","doi-asserted-by":"crossref","unstructured":"Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., et\u00a0al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446\u20132454 (2020)","DOI":"10.1109\/CVPR42600.2020.00252"},{"key":"1202_CR12","doi-asserted-by":"crossref","unstructured":"Zendel, O., Murschitz, M., Zeilinger, M., Steininger, D., Abbasi, S., Beleznai, C.: Railsem19: A dataset for semantic rail scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0 (2019)","DOI":"10.1109\/CVPRW.2019.00161"},{"key":"1202_CR13","unstructured":"Harb, J., R\u00e9b\u00e9na, N., Chosidow, R., Roblin, G., Potarusov, R., Hajri, H.: \u201cFrsign: A large-scale traffic light dataset for autonomous trains,\u201d CoRR, vol. abs\/2002.05665, (2020). [Online]. Available: arXiv:2002.05665"},{"key":"1202_CR14","unstructured":"Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: An open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, pp. 1\u201316, (2017)"},{"key":"1202_CR15","doi-asserted-by":"crossref","unstructured":"Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.\u00a0M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3234\u20133243 (2016)","DOI":"10.1109\/CVPR.2016.352"},{"key":"1202_CR16","doi-asserted-by":"crossref","unstructured":"Richter, S.\u00a0R., Vineet, V., Roth, S., Koltun, V.: Playing for data: Ground truth from computer games. In: European Conference on Computer Vision (ECCV), ser. LNCS, B.\u00a0Leibe, J.\u00a0Matas, N.\u00a0Sebe, and M.\u00a0Welling, Eds., vol. 9906. Springer International Publishing, pp. 102\u2013118, (2016)","DOI":"10.1007\/978-3-319-46475-6_7"},{"key":"1202_CR17","doi-asserted-by":"crossref","unstructured":"Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147\u20132156 (2016)","DOI":"10.1109\/CVPR.2016.236"},{"key":"1202_CR18","doi-asserted-by":"crossref","unstructured":"Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440\u20131448 (2015)","DOI":"10.1109\/ICCV.2015.169"},{"key":"1202_CR19","doi-asserted-by":"crossref","unstructured":"Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7074\u20137082 (2017)","DOI":"10.1109\/CVPR.2017.597"},{"key":"1202_CR20","doi-asserted-by":"crossref","unstructured":"Cai, Z., Fan, Q., Feris, R.\u00a0S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection (2016)","DOI":"10.1007\/978-3-319-46493-0_22"},{"key":"1202_CR21","doi-asserted-by":"crossref","unstructured":"Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep manta: a coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2040\u20132049 (2017)","DOI":"10.1109\/CVPR.2017.198"},{"key":"1202_CR22","doi-asserted-by":"crossref","unstructured":"Xu, B., Chen, Z.: Multi-level fusion based 3d object detection from monocular images. In: 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 2345\u20132353 (2018)","DOI":"10.1109\/CVPR.2018.00249"},{"key":"1202_CR23","doi-asserted-by":"crossref","unstructured":"Hu, H.-N., Cai, Q.-Z., Wang, D., Lin, J., Sun, M., Krahenbuhl, P., Darrell, T., Yu, F.: Joint monocular 3d vehicle detection and tracking. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 5390\u20135399 (2019)","DOI":"10.1109\/ICCV.2019.00549"},{"key":"1202_CR24","doi-asserted-by":"crossref","unstructured":"Liu, Z., Wu, Z., T\u00f3th, R.: Smoke: single-stage monocular 3d object detection via keypoint estimation (2020)","DOI":"10.1109\/CVPRW50498.2020.00506"},{"key":"1202_CR25","doi-asserted-by":"crossref","unstructured":"Qin, Z., Wang, J., Lu, Y.: Monogrnet: a general framework for monocular 3d object detection (2021)","DOI":"10.1109\/TPAMI.2021.3074363"},{"key":"1202_CR26","doi-asserted-by":"crossref","unstructured":"Brazil, G., Liu, X.: M3d-rpn: Monocular 3d region proposal network for object detection. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 9287\u20139296 (2019)","DOI":"10.1109\/ICCV.2019.00938"},{"issue":"2","key":"1202_CR27","doi-asserted-by":"publisher","first-page":"919","DOI":"10.1109\/LRA.2021.3052442","volume":"6","author":"Y Liu","year":"2021","unstructured":"Liu, Y., Yixuan, Y., Liu, M.: Ground-aware monocular 3d object detection for autonomous driving. IEEE Robot Autom. Lett. 6(2), 919\u2013926 (2021)","journal-title":"IEEE Robot Autom. Lett."},{"key":"1202_CR28","doi-asserted-by":"crossref","unstructured":"Li, P., Zhao, H., Liu, P., Cao, F.: Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving (2020)","DOI":"10.1007\/978-3-030-58580-8_38"},{"key":"1202_CR29","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)","DOI":"10.1109\/CVPR.2016.90"},{"key":"1202_CR30","doi-asserted-by":"crossref","unstructured":"Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation (2019)","DOI":"10.1109\/CVPR.2018.00255"},{"key":"1202_CR31","unstructured":"UM and F.\u00a0C. for Autonomous\u00a0Vehicles (2021) umautobots\/gtavisionexport. https:\/\/github.com\/umautobots\/GTAVisionExport, Online; Accessed July (2021)"},{"key":"1202_CR32","unstructured":"Jotrius. (2015) Railroad engineer. https:\/\/www.gta5-mods.com\/scripts\/railroad-engineer, Online; Accessed July (2021)"},{"issue":"8","key":"1202_CR33","doi-asserted-by":"publisher","first-page":"145","DOI":"10.3390\/jimaging7080145","volume":"7","author":"A Mauri","year":"2021","unstructured":"Mauri, A., Khemmar, R., Decoux, B., Haddad, M., Boutteau, R.: Real-time 3d multi-object detection and localization based on deep learning for road and railway smart mobility. J Imaging 7(8), 145 (2021)","journal-title":"J Imaging"},{"key":"1202_CR34","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770\u2013778 (2016)","DOI":"10.1109\/CVPR.2016.90"},{"key":"1202_CR35","unstructured":"Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., Sun, Y., He, T., Mueller, J., Manmatha, R. Li, M., Smola, A.\u00a0J.: Resnest: Split-attention networks. arXiv:2004.08955"},{"key":"1202_CR36","unstructured":"Chen, X., Kundu, K.,Zhu, Y., Berneshawi, A.\u00a0G., Ma, H., Fidler, S., Urtasun, R.: 3d object proposals for accurate object class detection. In: Advances in Neural Information Processing Systems. Citeseer, pp. 424\u2013432 (2015)"},{"key":"1202_CR37","unstructured":"Bochkovskiy, A., Wang, C., Liao, H.\u00a0M.: Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934"},{"key":"1202_CR38","doi-asserted-by":"crossref","unstructured":"Smith, L.\u00a0N., Topin, N.: Super-convergence: Very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, vol. 11006. International Society for Optics and Photonics, p. 1100612 (2019)","DOI":"10.1117\/12.2520589"},{"key":"1202_CR39","doi-asserted-by":"crossref","unstructured":"Simonelli, A., Bulo, S.\u00a0R., Porzi, L., L\u00f3pez-Antequera, M., Kontschieder, P.: Disentangling monocular 3d object detection. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 1991\u20131999 (2019)","DOI":"10.1109\/ICCV.2019.00208"},{"key":"1202_CR40","unstructured":"Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv:1804.02767 (2018)"}],"container-title":["Journal of Real-Time Image Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11554-022-01202-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11554-022-01202-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11554-022-01202-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,19]],"date-time":"2022-05-19T07:13:45Z","timestamp":1652944425000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11554-022-01202-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,11]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,6]]}},"alternative-id":["1202"],"URL":"https:\/\/doi.org\/10.1007\/s11554-022-01202-6","relation":{},"ISSN":["1861-8200","1861-8219"],"issn-type":[{"value":"1861-8200","type":"print"},{"value":"1861-8219","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,11]]},"assertion":[{"value":"29 July 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 January 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 February 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}