{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T16:15:59Z","timestamp":1772554559571,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2024,8,12]],"date-time":"2024-08-12T00:00:00Z","timestamp":1723420800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we proposed Mix-VIO, a monocular and binocular visual-inertial odometry, to address the issue where conventional visual front-end tracking often fails under dynamic lighting and image blur conditions. Mix-VIO adopts a hybrid tracking approach, combining traditional handcrafted tracking techniques with Deep Neural Network (DNN)-based feature extraction and matching pipelines. The system employs deep learning methods for rapid feature point detection, while integrating traditional optical flow methods and deep learning-based sparse feature matching methods to enhance front-end tracking performance under rapid camera motion and environmental illumination changes. In the back-end, we utilize sliding window and bundle adjustment (BA) techniques for local map optimization and pose estimation. We conduct extensive experimental validations of the hybrid feature extraction and matching methods, demonstrating the system\u2019s capability to maintain optimal tracking results under illumination changes and image blur.<\/jats:p>","DOI":"10.3390\/s24165218","type":"journal-article","created":{"date-parts":[[2024,8,12]],"date-time":"2024-08-12T11:23:46Z","timestamp":1723461826000},"page":"5218","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Mix-VIO: A Visual Inertial Odometry Based on a Hybrid Tracking Strategy"],"prefix":"10.3390","volume":"24","author":[{"given":"Huayu","family":"Yuan","sequence":"first","affiliation":[{"name":"School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7308-3244","authenticated-orcid":false,"given":"Ke","family":"Han","sequence":"additional","affiliation":[{"name":"School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China"}]},{"given":"Boyang","family":"Lou","sequence":"additional","affiliation":[{"name":"School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1109\/TRO.2016.2624754","article-title":"Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age","volume":"32","author":"Cadena","year":"2016","journal-title":"IEEE Trans. Robot."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"104992","DOI":"10.1016\/j.engappai.2022.104992","article-title":"A review of visual SLAM methods for autonomous driving vehicles","volume":"114","author":"Cheng","year":"2022","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Yuan, H., Wu, C., Deng, Z., and Yin, J. (2022). Robust Visual Odometry Leveraging Mixture of Manhattan Frames in Indoor Environments. Sensors, 22.","DOI":"10.3390\/s22228644"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011, January 6\u201313). ORB: An efficient alternative to SIFT or SURF. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1255","DOI":"10.1109\/TRO.2017.2705103","article-title":"Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras","volume":"33","year":"2017","journal-title":"IEEE Trans. Robot."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1109\/TRO.2015.2463671","article-title":"ORB-SLAM: A versatile and accurate monocular SLAM system","volume":"31","author":"Montiel","year":"2015","journal-title":"IEEE Trans. Robot."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1874","DOI":"10.1109\/TRO.2021.3075644","article-title":"Orb-slam3: An accurate open-source library for visual, visual\u2013inertial, and multimap slam","volume":"37","author":"Campos","year":"2021","journal-title":"IEEE Trans. Robot."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1109\/TPAMI.2017.2658577","article-title":"Direct sparse odometry","volume":"40","author":"Engel","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1408","DOI":"10.1109\/LRA.2021.3140129","article-title":"DM-VIO: Delayed marginalization visual-inertial odometry","volume":"7","author":"Cremers","year":"2022","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1004","DOI":"10.1109\/TRO.2018.2853729","article-title":"Vins-mono: A robust and versatile monocular visual-inertial state estimator","volume":"34","author":"Qin","year":"2018","journal-title":"IEEE Trans. Robot."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1023\/B:VISI.0000011205.11775.fd","article-title":"Lucas-kanade 20 years on: A unifying framework","volume":"56","author":"Baker","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1157","DOI":"10.1177\/0278364915620033","article-title":"The EuRoC micro aerial vehicle datasets","volume":"35","author":"Burri","year":"2016","journal-title":"Int. J. Robot. Res."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1052","DOI":"10.1177\/0278364920938439","article-title":"The UMA-VI dataset: Visual\u2013inertial odometry in low-textured and dynamic illumination environments","volume":"39","author":"Jaenal","year":"2020","journal-title":"Int. J. Robot. Res."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Klein, G., and Murray, D. (2007, January 13\u201316). Parallel tracking and mapping for small AR workspaces. Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan.","DOI":"10.1109\/ISMAR.2007.4538852"},{"key":"ref_15","unstructured":"Viswanathan, D.G. (2009, January 6\u20138). Features from accelerated segment test (fast). Proceedings of the 10th Workshop on Image Analysis for Multimedia Interactive Services, London, UK."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1109\/TRO.2016.2623335","article-title":"SVO: Semidirect visual odometry for monocular and multicamera systems","volume":"33","author":"Forster","year":"2016","journal-title":"IEEE Trans. Robot."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Xu, H., Yang, C., and Li, Z. (2020, January 18\u201321). OD-SLAM: Real-time localization and mapping in dynamic environment through multi-sensor fusion. Proceedings of the 2020 5th International Conference on Advanced Robotics and Mechatronics (ICARM), Shenzhen, China.","DOI":"10.1109\/ICARM49381.2020.9195374"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mourikis, A.I., and Roumeliotis, S.I. (2007, January 10\u201314). A multi-state constraint Kalman filter for vision-aided inertial navigation. Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Roma, Italy.","DOI":"10.1109\/ROBOT.2007.364024"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Leutenegger, S., Furgale, P., Rabaud, V., Chli, M., Konolige, K., and Siegwart, R. (2013). Keyframe-based visual-inertial slam using nonlinear optimization. Robotis Science and Systems (RSS) 2013, MIT Press.","DOI":"10.15607\/RSS.2013.IX.037"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Clark, R., Wang, S., Wen, H., Markham, A., and Trigoni, N. (2017, January 4\u20139). Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11215"},{"key":"ref_21","unstructured":"Teed, Z., Lipson, L., and Deng, J. (2024). Deep patch visual odometry. Adv. Neural Inf. Process. Syst., 36."},{"key":"ref_22","unstructured":"Tang, C., and Tan, P. (2018). Ba-net: Dense bundle adjustment network. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"(2021). Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis., 129, 23\u201379.","DOI":"10.1007\/s11263-020-01359-2"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Balntas, V., Lenc, K., Vedaldi, A., and Mikolajczyk, K. (2017, January 21\u201326). HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.410"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Verdie, Y., Yi, K., Fua, P., and Lepetit, V. (2015, January 7\u201312). Tilde: A temporally invariant learned detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299165"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yi, K.M., Trulls, E., Lepetit, V., and Fua, P. (2016, January 11\u201314). Lift: Learned invariant feature transform. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46466-4_28"},{"key":"ref_27","unstructured":"Ono, Y., Trulls, E., Fua, P., and Yi, K.M. (2018, January 3\u20138). LF-Net: Learning local features from images. Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_28","first-page":"2","article-title":"Sift-the scale invariant feature transform","volume":"2","author":"Lowe","year":"2004","journal-title":"Int. J"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"DeTone, D., Malisiewicz, T., and Rabinovich, A. (2018, January 18\u201323). Superpoint: Self-supervised interest point detection and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00060"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"108403","DOI":"10.1016\/j.measurement.2020.108403","article-title":"A deep-learning real-time visual SLAM system based on multi-task feature extraction network and self-supervised feature points","volume":"168","author":"Li","year":"2021","journal-title":"Measurement"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sarlin, P.-E., DeTone, D., Malisiewicz, T., and Rabinovich, A. (2020, January 14\u201319). Superglue: Learning feature matching with graph neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00499"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Lindenberger, P., Sarlin, P.-E., and Pollefeys, M. (2023, January 2\u20136). Lightglue: Local feature matching at light speed. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01616"},{"key":"ref_33","unstructured":"Xu, H., Liu, P., Chen, X., and Shen, S. (2022). D2SLAM: Decentralized and Distributed Collaborative Visual-inertial SLAM System for Aerial Swarm. arXiv."},{"key":"ref_34","unstructured":"Shi, J. (1994, January 21\u201323). Good features to track. Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Xu, K., Hao, Y., Yuan, S., Wang, C., and Xie, L. (2023, January 1\u20135). Airvo: An illumination-robust point-line visual odometry. Proceedings of the 2023 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA.","DOI":"10.1109\/IROS55552.2023.10341914"},{"key":"ref_36","first-page":"722","article-title":"LSD: A fast line segment detector with a false detection control","volume":"32","author":"Jakubowicz","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","unstructured":"Liu, P., Feng, C., Xu, Y., Ning, Y., Xu, H., and Shen, S. (2024). OmniNxt: A Fully Open-source and Compact Aerial Robot with Omnidirectional Visual Perception. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1016\/j.robot.2017.03.018","article-title":"Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles","volume":"93","author":"Yang","year":"2017","journal-title":"Robot. Auton. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Liu, Z., Shi, D., Li, R., and Yang, S. (2023). ESVIO: Event-based stereo visual-inertial odometry. Sensors, 23.","DOI":"10.3390\/s23041998"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"3661","DOI":"10.1109\/LRA.2023.3269950","article-title":"Esvio: Event-based stereo visual inertial odometry","volume":"8","author":"Chen","year":"2023","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Schubert, D., Goll, T., Demmel, N., Usenko, V., St\u00fcckler, J., and Cremers, D. (2018, January 1\u20135). The TUM VI benchmark for evaluating visual-inertial odometry. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593419"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1177\/0278364917728574","article-title":"Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback","volume":"36","author":"Bloesch","year":"2017","journal-title":"Int. J. Robot. Res."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1109\/TRO.2019.2899783","article-title":"PL-SLAM: A stereo SLAM system through the combination of points and line segments","volume":"35","author":"Moreno","year":"2019","journal-title":"IEEE Trans. Robot."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/16\/5218\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:35:26Z","timestamp":1760110526000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/16\/5218"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,12]]},"references-count":43,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["s24165218"],"URL":"https:\/\/doi.org\/10.3390\/s24165218","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,12]]}}}