{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:44:53Z","timestamp":1760233493924,"version":"build-2065373602"},"reference-count":33,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2021,1,23]],"date-time":"2021-01-23T00:00:00Z","timestamp":1611360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"This work was partially financially supported by Government of Russian Federation","award":["(Grant 08-08)"],"award-info":[{"award-number":["(Grant 08-08)"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>In this paper, an EKF (Extended Kalman Filter)-based algorithm is proposed to estimate 3D position and velocity components of different cars in a scene by fusing the semantic information and car model, extracted from successive frames with camera motion parameters. First, a 2D virtual image of the scene is made using a prior knowledge of the 3D Computer Aided Design (CAD) models of the detected cars and their predicted positions. Then, a discrepancy, i.e., distance, between the actual image and the virtual image is calculated. The 3D position and the velocity components are recursively estimated by minimizing the discrepancy using EKF. The experiments on the KiTTi dataset show a good performance of the proposed algorithm with a position estimation error up to 3\u20135% at 30 m and velocity estimation error up to 1 m\/s.<\/jats:p>","DOI":"10.3390\/rs13030388","type":"journal-article","created":{"date-parts":[[2021,1,25]],"date-time":"2021-01-25T09:59:40Z","timestamp":1611568780000},"page":"388","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Towards Semantic SLAM: 3D Position and Velocity Estimation by Fusing Image Semantic Information with Camera Motion Parameters for Traffic Scene Analysis"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7504-7581","authenticated-orcid":false,"given":"Mostafa","family":"Mansour","sequence":"first","affiliation":[{"name":"Faculty of Information Technology and Communication Sciences, Tampere University, 33720 Tampere, Finland"},{"name":"Department of Information and Navigation Systems, ITMO University, 197101 St. Petersburg, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2617-3156","authenticated-orcid":false,"given":"Pavel","family":"Davidson","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology and Communication Sciences, Tampere University, 33720 Tampere, Finland"},{"name":"Huawei Technologies Co., Ltd., Edinburgh EH9 3BF, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3640-3760","authenticated-orcid":false,"given":"Oleg","family":"Stepanov","sequence":"additional","affiliation":[{"name":"Department of Information and Navigation Systems, ITMO University, 197101 St. Petersburg, Russia"},{"name":"CONCERN CSRI \u201cElektropribor\u201d, JSC, 197046 St. Petersburg, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1158-6951","authenticated-orcid":false,"given":"Robert","family":"Pich\u00e9","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology and Communication Sciences, Tampere University, 33720 Tampere, Finland"}]}],"member":"1968","published-online":{"date-parts":[[2021,1,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1109\/TRO.2016.2624754","article-title":"Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age","volume":"32","author":"Cadena","year":"2016","journal-title":"IEEE Trans. Robot."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Mur-Artal, R., and Tardos, J. (2015, January 13\u201317). Probabilistic Semi-Dense Mapping from Highly Accurate Feature-Based Monocular SLAM. Proceedings of the Robotics: Science and Systems XI, Sapienza University of Rome, Rome, Italy.","DOI":"10.15607\/RSS.2015.XI.041"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Campos, C.E.M., Elvira, R., Rodr\u00edguez, J., Montiel, J.M.M., and Tard\u00f3s, J.D. (2020). ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM. arXiv.","DOI":"10.1109\/TRO.2021.3075644"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., and Berg, A.C. (2015). SSD: Single Shot MultiBox Detector. arXiv.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chabot, F., Chaouch, M., Rabarisoa, J., Teuli\u00e8re, C., and Chateau, T. (2017). Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image. arXiv.","DOI":"10.1109\/CVPR.2017.198"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Kundu, A., Li, Y., and Rehg, J.M. (2018, January 18\u201323). 3D-RCNN: Instance-level 3D Object Reconstruction via Render-and-Compare. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00375"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wu, D., Zhuang, Z., Xiang, C., Zou, W., and Li, X. (2019, January 16\u201317). 6D-VNet: End-To-End 6-DoF Vehicle Pose Estimation From Monocular RGB Images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00163"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Song, X., Wang, P., Zhou, D., Zhu, R., Guan, C., Dai, Y., Su, H., Li, H., and Yang, R. (2018). ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving. arXiv.","DOI":"10.1109\/CVPR.2019.00560"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Manhardt, F., Kehl, W., and Gaidon, A. (2018). ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape. arXiv.","DOI":"10.1109\/CVPR.2019.00217"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"He, T., and Soatto, S. (2019). Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors. arXiv.","DOI":"10.1609\/aaai.v33i01.33018409"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Qin, Z., Wang, J., and Lu, Y. (2018). MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization. arXiv.","DOI":"10.1609\/aaai.v33i01.33018851"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Barabanau, I., Artemov, A., Burnaev, E., and Murashkin, V. (2019). Monocular 3D Object Detection via Geometric Reasoning on Keypoints. arXiv.","DOI":"10.5220\/0009102506520659"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Ansari, J.A., Sharma, S., Majumdar, A., Murthy, J.K., and Krishna, K.M. (2018). The Earth ain\u2019t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera. arXiv.","DOI":"10.1109\/IROS.2018.8593698"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_17","unstructured":"Ren, S., He, K., Girshick, R.B., and Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv."},{"key":"ref_18","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Bowman, S.L., Atanasov, N., Daniilidis, K., and Pappas, G.J. (June, January 29). Probabilistic data association for semantic SLAM. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989203"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Doherty, K., Fourie, D., and Leonard, J. (2019, January 20\u201324). Multimodal Semantic SLAM with Probabilistic Data Association. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794244"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Doherty, K., Baxter, D., Schneeweiss, E., and Leonard, J.J. (2019). Probabilistic Data Association via Mixture Models for Robust Semantic SLAM. arXiv.","DOI":"10.1109\/ICRA40945.2020.9197382"},{"key":"ref_22","unstructured":"Davison, A.J. (2018). FutureMapping: The Computational Structure of Spatial AI Systems. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Lepetit, V., Moreno-Noguer, F., and Fua, P. (2009). EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis., 81.","DOI":"10.1007\/s11263-008-0152-6"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1109\/TAES.1970.310128","article-title":"Estimating Optimal Tracking Filter Performance for Manned Maneuvering Targets","volume":"AES-6","author":"Singer","year":"1970","journal-title":"IEEE Trans. Aerosp. Electron. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Bar-Shalom, Y., Li, X., and Kirubarajan, T. (2001). Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software, John Wiley & Sons Ltd.","DOI":"10.1002\/0471221279"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Nebylov, A., and Watson, J. (2016). Optimal and Sub-Optimal Filtering in Integrated Navigation Systems. Aerospace Navigation Systems, John Wiley & Sons Ltd.","DOI":"10.1002\/9781119163060"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sabattini, L., Levratti, A., Venturi, F., Amplo, E., Fantuzzi, C., and Secchi, C. (2012, January 5\u20137). Experimental comparison of 3D vision sensors for mobile robot localization for industrial application: Stereo-camera and RGB-D sensor. Proceedings of the 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV), Guangzhou, China.","DOI":"10.1109\/ICARCV.2012.6485264"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_29","unstructured":"Multi-Object Tracking Benchmark (2020, August 10). The KITTI Vision Benchmark Suite. Available online: http:\/\/www.cvlibs.net\/datasets\/kitti\/eval_tracking.php."},{"key":"ref_30","unstructured":"OXTS (2015). RT v2 GNSS-Aided Inertial Measurement Systems, Oxford Technical Solutions Limited. Revision 180221."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1134\/S2075108719030064","article-title":"Depth estimation with ego-motion assisted monocular camera","volume":"10","author":"Mansour","year":"2019","journal-title":"Gyroscopy Navig."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Davidson, P., Mansour, M., Stepanvo, O., and Pich\u00e9, R. (2019, January 27\u201329). Depth estimation from motion parallax: Experimental evaluation. Proceedings of the 26th Saint Petersburg International Conference on Integrated Navigation Systems (ICINS), Saint Petersburg, Russia.","DOI":"10.23919\/ICINS.2019.8769338"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Mansour, M., Davidson, P., Stepanov, O., and Pich\u00e9, R. (2019). Relative Importance of Binocular Disparity and Motion Parallax for Depth Estimation: A Computer Vision Approach. Remote Sens., 11.","DOI":"10.3390\/rs11171990"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/3\/388\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:14:25Z","timestamp":1760159665000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/3\/388"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,23]]},"references-count":33,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["rs13030388"],"URL":"https:\/\/doi.org\/10.3390\/rs13030388","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2021,1,23]]}}}