{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,26]],"date-time":"2025-12-26T03:48:26Z","timestamp":1766720906005,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,2,12]],"date-time":"2021-02-12T00:00:00Z","timestamp":1613088000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we study deep learning approaches for monocular visual odometry (VO). Deep learning solutions have shown to be effective in VO applications, replacing the need for highly engineered steps, such as feature extraction and outlier rejection in a traditional pipeline. We propose a new architecture combining ego-motion estimation and sequence-based learning using deep neural networks. We estimate camera motion from optical flow using Convolutional Neural Networks (CNNs) and model the motion dynamics using Recurrent Neural Networks (RNNs). The network outputs the relative 6-DOF camera poses for a sequence, and implicitly learns the absolute scale without the need for camera intrinsics. The entire trajectory is then integrated without any post-calibration. We evaluate the proposed method on the KITTI dataset and compare it with traditional and other deep learning approaches in the literature.<\/jats:p>","DOI":"10.3390\/s21041313","type":"journal-article","created":{"date-parts":[[2021,2,12]],"date-time":"2021-02-12T18:45:00Z","timestamp":1613155500000},"page":"1313","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":28,"title":["Leveraging Deep Learning for Visual Odometry Using Optical Flow"],"prefix":"10.3390","volume":"21","author":[{"given":"Tejas","family":"Pandey","sequence":"first","affiliation":[{"name":"Intel Research &amp; Development, W23 CX68 Leixlip, Ireland"}]},{"given":"Dexmont","family":"Pena","sequence":"additional","affiliation":[{"name":"Intel Research &amp; Development, W23 CX68 Leixlip, Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4556-8348","authenticated-orcid":false,"given":"Jonathan","family":"Byrne","sequence":"additional","affiliation":[{"name":"Intel Research &amp; Development, W23 CX68 Leixlip, Ireland"}]},{"given":"David","family":"Moloney","sequence":"additional","affiliation":[{"name":"Intel Research &amp; Development, W23 CX68 Leixlip, Ireland"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1109\/MRA.2011.943233","article-title":"Visual Odometry [Tutorial]","volume":"18","author":"Scaramuzza","year":"2011","journal-title":"IEEE Robot. Autom. Mag."},{"unstructured":"Yang, C., Mark, M., and Larry, M. (2005, January 12). Visual odometry on the Mars Exploration Rovers. Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics, Waikoloa, HI, USA.","key":"ref_2"},{"unstructured":"Corke, P., Strelow, D., and Singh, S. (October, January 28). Omnidirectional visual odometry for a planetary rover. Proceedings of the 2004 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), Sendai, Japan.","key":"ref_3"},{"doi-asserted-by":"crossref","unstructured":"Howard, A. (2008, January 22\u201326). Real-time stereo visual odometry for autonomous ground vehicles. Proceedings of the 2008 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Nice, France.","key":"ref_4","DOI":"10.1109\/IROS.2008.4651147"},{"doi-asserted-by":"crossref","unstructured":"Wang, R., Schworer, M., and Cremers, D. (2017). Stereo DSO: Large-Scale Direct Sparse Visual Odometry With Stereo Cameras. arXiv.","key":"ref_5","DOI":"10.1109\/ICCV.2017.421"},{"unstructured":"Christensen, H., and Khatib, O. (2011). Visual odometry and mapping for autonomous flight using an RGB-D camera. Robotics Research, Springer.","key":"ref_6"},{"doi-asserted-by":"crossref","unstructured":"Kerl, C., Sturm, J., and Cremers, D. (2013, January 6\u201310). Robust odometry estimation for RGB-D cameras. Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany.","key":"ref_7","DOI":"10.1109\/ICRA.2013.6631104"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1177\/0278364914554813","article-title":"Keyframe-based visual\u2013inertial odometry using nonlinear optimization","volume":"34","author":"Leutenegger","year":"2015","journal-title":"Int. J. Robot. Res."},{"doi-asserted-by":"crossref","unstructured":"Bloesch, M., Omari, S., Hutter, M., and Siegwart, R. (October, January 28). Robust visual inertial odometry using a direct EKF-based approach. Proceedings of the 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany.","key":"ref_9","DOI":"10.1109\/IROS.2015.7353389"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2878","DOI":"10.1109\/LRA.2018.2846813","article-title":"Challenges in Monocular Visual Odometry: Photometric Calibration, Motion Bias, and Rolling Shutter Effect","volume":"3","author":"Yang","year":"2018","journal-title":"IEEE Robot. Autom. Lett."},{"unstructured":"Nister, D., Naroditsky, O., and Bergen, J. (July, January 27). Visual odometry. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, Washington, DC, USA.","key":"ref_11"},{"doi-asserted-by":"crossref","unstructured":"Mur-Artal, R., Montiel, J.M.M., and Tard\u00f3s, J.D. (2015). ORB-SLAM: A Versatile and Accurate Monocular SLAM System. arXiv.","key":"ref_12","DOI":"10.1109\/TRO.2015.2463671"},{"doi-asserted-by":"crossref","unstructured":"Geiger, A., Ziegler, J., and Stiller, C. (2011, January 5\u20139). StereoScan: Dense 3d reconstruction in real-time. Proceedings of the 2011 IEEE Intelligent Vehicles Symposium (IV), Baden, Germany.","key":"ref_13","DOI":"10.1109\/IVS.2011.5940405"},{"unstructured":"Harris, C., and Stephens, M. (September, January 31). A combined corner and edge detector. Proceedings of the Alvey Vision Conference, Manchester, UK.","key":"ref_14"},{"unstructured":"Engel, J., Koltun, V., and Cremers, D. (2016). Direct Sparse Odometry. arXiv.","key":"ref_15"},{"unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv.","key":"ref_16"},{"doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.","key":"ref_17","DOI":"10.1109\/CVPR.2016.90"},{"doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going Deeper with Convolutions. arXiv.","key":"ref_18","DOI":"10.1109\/CVPR.2015.7298594"},{"unstructured":"Sabour, S., Frosst, N., and Hinton, G.E. (2017). Dynamic Routing Between Capsules. arXiv.","key":"ref_19"},{"doi-asserted-by":"crossref","unstructured":"Pan, X., Shi, J., Luo, P., Wang, X., and Tang, X. (2017). Spatial As Deep: Spatial CNN for Traffic Scene Understanding. arXiv.","key":"ref_20","DOI":"10.1609\/aaai.v32i1.12301"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1109\/MRA.2012.2182810","article-title":"Visual Odometry: Part II: Matching, Robustness, Optimization, and Applications","volume":"19","author":"Fraundorfer","year":"2012","journal-title":"IEEE Robot. Autom. Mag."},{"doi-asserted-by":"crossref","unstructured":"Forster, C., Pizzoli, M., and Scaramuzza, D. (June, January 31). SVO: Fast semi-direct monocular visual odometry. Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China.","key":"ref_22","DOI":"10.1109\/ICRA.2014.6906584"},{"doi-asserted-by":"crossref","unstructured":"Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T. (2014). LSD-SLAM: Large-scale direct monocular SLAM. Computer Vision\u2014ECCV 2014, Springer.","key":"ref_23","DOI":"10.1007\/978-3-319-10599-4"},{"doi-asserted-by":"crossref","unstructured":"Wang, S., Clark, R., Wen, H., and Trigoni, N. (2017). DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks. arXiv.","key":"ref_24","DOI":"10.1109\/ICRA.2017.7989236"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"94118","DOI":"10.1109\/ACCESS.2019.2926350","article-title":"MagicVO: An End-to-End Hybrid CNN and Bi-LSTM Method for Monocular Visual Odometry","volume":"7","author":"Jiao","year":"2019","journal-title":"IEEE Access"},{"doi-asserted-by":"crossref","unstructured":"Muller, P., and Savakis, A. (2017, January 24\u201331). Flowdometry: An Optical Flow and Deep Learning Based Approach to Visual Odometry. Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA.","key":"ref_26","DOI":"10.1109\/WACV.2017.75"},{"doi-asserted-by":"crossref","unstructured":"Parisotto, E., Chaplot, D.S., Zhang, J., and Salakhutdinov, R. (2018). Global Pose Estimation with an Attention-based Recurrent Network. arXiv.","key":"ref_27","DOI":"10.1109\/CVPRW.2018.00061"},{"doi-asserted-by":"crossref","unstructured":"Fischer, P., Dosovitskiy, A., Ilg, E., H\u00e4usser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., and Brox, T. (2015). FlowNet: Learning Optical Flow with Convolutional Networks. arXiv.","key":"ref_28","DOI":"10.1109\/ICCV.2015.316"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/LRA.2015.2505717","article-title":"Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation","volume":"1","author":"Costante","year":"2016","journal-title":"IEEE Robot. Autom. Lett."},{"doi-asserted-by":"crossref","unstructured":"Costante, G., and Ciarfuglia, T.A. (2017). LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation. arXiv.","key":"ref_30","DOI":"10.1109\/LRA.2018.2803211"},{"doi-asserted-by":"crossref","unstructured":"Mur-Artal, R., and Tard\u00f3s, J.D. (2016). ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. arXiv.","key":"ref_31","DOI":"10.1109\/TRO.2017.2705103"},{"unstructured":"Sener, O., and Koltun, V. (2018). Multi-Task Learning as Multi-Objective Optimization. arXiv.","key":"ref_32"},{"doi-asserted-by":"crossref","unstructured":"Jiang, H., Sun, D., Jampani, V., Yang, M., Learned-Miller, E.G., and Kautz, J. (2018). Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. arXiv.","key":"ref_33","DOI":"10.1109\/CVPR.2018.00938"},{"unstructured":"(2021, January 27). DLSS 2.0\u2014Image Reconstruction for Real-Time Rendering with Deep Learning. Available online: https:\/\/www.nvidia.com\/en-us\/on-demand\/session\/gtcsj20-s22698\/.","key":"ref_34"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1007\/BF01420984","article-title":"Performance of Optical Flow Techniques","volume":"12","author":"Barron","year":"1994","journal-title":"Int. J. Comput. Vision"},{"doi-asserted-by":"crossref","unstructured":"Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. (2016). FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. arXiv.","key":"ref_36","DOI":"10.1109\/CVPR.2017.179"},{"doi-asserted-by":"crossref","unstructured":"Hui, T., Tang, X., and Loy, C.C. (2018). LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation. arXiv.","key":"ref_37","DOI":"10.1109\/CVPR.2018.00936"},{"doi-asserted-by":"crossref","unstructured":"Sun, D., Yang, X., Liu, M., and Kautz, J. (2017). PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. arXiv.","key":"ref_38","DOI":"10.1109\/CVPR.2018.00931"},{"doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","key":"ref_39","DOI":"10.1109\/CVPR.2012.6248074"},{"doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013). Vision meets Robotics: The KITTI Dataset. Int. J. Robot. Res. (IJRR).","key":"ref_40","DOI":"10.1177\/0278364913491297"},{"unstructured":"O\u2019Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., and Invernizzi, L. (2020, August 05). Keras Tuner. Available online: https:\/\/github.com\/keras-team\/keras-tuner.","key":"ref_41"},{"doi-asserted-by":"crossref","unstructured":"Graves, A., Mohamed, A., and Hinton, G.E. (2013). Speech Recognition with Deep Recurrent Neural Networks. arXiv.","key":"ref_42","DOI":"10.1109\/ICASSP.2013.6638947"},{"unstructured":"Lipton, Z.C. (2015). A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv.","key":"ref_43"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/4\/1313\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:23:18Z","timestamp":1760160198000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/4\/1313"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,12]]},"references-count":43,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["s21041313"],"URL":"https:\/\/doi.org\/10.3390\/s21041313","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,2,12]]}}}