{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T16:07:33Z","timestamp":1772035653738,"version":"3.50.1"},"reference-count":50,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,2,13]],"date-time":"2023-02-13T00:00:00Z","timestamp":1676246400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"UK\u2019s Engineering and Physical Sciences Research Council (EPSRC) Programme","award":["EP\/S016813\/1"],"award-info":[{"award-number":["EP\/S016813\/1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Image tracking and retrieval strategies are of vital importance in visual Simultaneous Localization and Mapping (SLAM) systems. For most state-of-the-art systems, hand-crafted features and bag-of-words (BoW) algorithms are the common solutions. Recent research reports the vulnerability of these traditional algorithms in complex environments. To replace these methods, this work proposes HFNet-SLAM, an accurate and real-time monocular SLAM system built on the ORB-SLAM3 framework incorporated with deep convolutional neural networks (CNNs). This work provides a pipeline of feature extraction, keypoint matching, and loop detection fully based on features from CNNs. The performance of this system has been validated on public datasets against other state-of-the-art algorithms. The results reveal that the HFNet-SLAM achieves the lowest errors among systems available in the literature. Notably, the HFNet-SLAM obtains an average accuracy of 2.8 cm in EuRoC dataset in pure visual configuration. Besides, it doubles the accuracy in medium and large environments in TUM-VI dataset compared with ORB-SLAM3. Furthermore, with the optimisation of TensorRT technology, the entire system can run in real-time at 50 FPS.<\/jats:p>","DOI":"10.3390\/s23042113","type":"journal-article","created":{"date-parts":[[2023,2,14]],"date-time":"2023-02-14T01:41:06Z","timestamp":1676338866000},"page":"2113","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["HFNet-SLAM: An Accurate and Real-Time Monocular SLAM System with Deep Features"],"prefix":"10.3390","volume":"23","author":[{"given":"Liming","family":"Liu","sequence":"first","affiliation":[{"name":"Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield S10 2TN, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4204-4020","authenticated-orcid":false,"given":"Jonathan M.","family":"Aitken","sequence":"additional","affiliation":[{"name":"Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield S10 2TN, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1874","DOI":"10.1109\/TRO.2021.3075644","article-title":"Orb-slam3: An accurate open-source library for visual, visual\u2013inertial, and multimap slam","volume":"37","author":"Campos","year":"2021","journal-title":"IEEE Trans. Robot."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1004","DOI":"10.1109\/TRO.2018.2853729","article-title":"Vins-mono: A robust and versatile monocular visual-inertial state estimator","volume":"34","author":"Qin","year":"2018","journal-title":"IEEE Trans. Robot."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1177\/0278364914554813","article-title":"Keyframe-based visual\u2013inertial odometry using nonlinear optimization","volume":"34","author":"Leutenegger","year":"2015","journal-title":"Int. J. Robot. Res."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bujanca, M., Shi, X., Spear, M., Zhao, P., Lennox, B., and Luj\u00e1n, M. (October, January 27). Robust SLAM Systems: Are We There Yet?. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic.","DOI":"10.1109\/IROS51168.2021.9636814"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Evans, M.H., Aitken, J.M., and Anderson, S.R. (2021, January 6\u201310). Assessing the feasibility of monocular visual simultaneous localization and mapping for live sewer pipes: A field robotics study. Proceedings of the 2021 IEEE 20th International Conference on Advanced Robotics (ICAR), Ljubljana, Slovenia.","DOI":"10.1109\/ICAR53236.2021.9659486"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"140173","DOI":"10.1109\/ACCESS.2021.3115981","article-title":"Simultaneous localization and mapping for inspection robots in water and sewer pipe networks: A review","volume":"9","author":"Aitken","year":"2021","journal-title":"IEEE Access"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Macario Barros, A., Michel, M., Moline, Y., Corre, G., and Carrel, F. (2022). A comprehensive survey of visual slam algorithms. Robotics, 11.","DOI":"10.3390\/robotics11010024"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yi, K.M., Trulls, E., Lepetit, V., and Fua, P. (2016, January 11\u201314). Lift: Learned invariant feature transform. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46466-4_28"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"DeTone, D., Malisiewicz, T., and Rabinovich, A. (2018, January 18\u201322). Superpoint: Self-supervised interest point detection and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00060"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., Hardmeier, H., Sattler, T., and Pollefeys, M. (2017, January 21\u201326). Comparative evaluation of hand-crafted and learned local features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.736"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/s11263-020-01359-2","article-title":"Image matching from handcrafted to deep features: A survey","volume":"129","author":"Ma","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1109\/TRO.2012.2197158","article-title":"Bags of binary words for fast place recognition in image sequences","volume":"28","author":"Tardos","year":"2012","journal-title":"IEEE Trans. Robot."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Arshad, S., and Kim, G.W. (2021). Role of deep learning in loop closure detection for visual and lidar slam: A survey. Sensors, 21.","DOI":"10.3390\/s21041243"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"103470","DOI":"10.1016\/j.robot.2020.103470","article-title":"Loop closure detection using supervised and unsupervised deep neural networks for monocular SLAM systems","volume":"126","author":"Memon","year":"2020","journal-title":"Robot. Auton. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Merrill, N., and Huang, G. (2018). Lightweight unsupervised deep loop closure. arXiv.","DOI":"10.15607\/RSS.2018.XIV.032"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., Cadena, C., Siegwart, R., and Dymczyk, M. (2019, January 15\u201320). From coarse to fine: Robust hierarchical localization at large scale. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01300"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Davison, A.J. (2003, January 13\u201316). Real-time simultaneous localisation and mapping with a single camera. Proceedings of the Computer Vision, IEEE International Conference on. IEEE Computer Society, Nice, France.","DOI":"10.1109\/ICCV.2003.1238654"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mourikis, A.I., and Roumeliotis, S.I. (2007, January 10\u201314). A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation. Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Rome, Italy.","DOI":"10.1109\/ROBOT.2007.364024"},{"key":"ref_19","unstructured":"Chen, Y., Chen, Y., and Wang, G. (2019). Bundle adjustment revisited. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, R., Wang, S., Long, Z., and Gu, D. (2018, January 21\u201325). Undeepvo: Monocular visual odometry through unsupervised deep learning. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461251"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, S., Clark, R., Wen, H., and Trigoni, N. (June, January 29). Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989236"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3577","DOI":"10.1109\/TIE.2020.2982096","article-title":"Deepslam: A robust monocular slam system with unsupervised deep learning","volume":"68","author":"Li","year":"2020","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"23772","DOI":"10.1109\/ACCESS.2021.3050617","article-title":"RDS-SLAM: Real-time dynamic SLAM using semantic segmentation methods","volume":"9","author":"Liu","year":"2021","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1612","DOI":"10.1007\/s11431-020-1582-8","article-title":"Monocular depth estimation based on deep learning: An overview","volume":"63","author":"Zhao","year":"2020","journal-title":"Sci. China Technol. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011, January 6\u201313). ORB: An efficient alternative to SIFT or SURF. Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"ref_26","unstructured":"Shi, J. (1994, January 21\u201323). Good features to track. Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Leutenegger, S., Chli, M., and Siegwart, R.Y. (2011, January 6\u201313). BRISK: Binary robust invariant scalable keypoints. Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126542"},{"key":"ref_28","unstructured":"Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., and Humenberger, M. (2019). R2D2: Repeatable and reliable detector and descriptor. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., and Sattler, T. (2019, January 15\u201320). D2-net: A trainable cnn for joint description and detection of local features. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00828"},{"key":"ref_30","first-page":"14254","article-title":"DISK: Learning local features with policy gradient","volume":"33","author":"Tyszkiewicz","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","unstructured":"Ono, Y., Trulls, E., Fua, P., and Yi, K.M. (2018, January 3\u20138). LF-Net: Learning local features from images. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_32","first-page":"3505","article-title":"GCNv2: Efficient correspondence prediction for real-time SLAM","volume":"4","author":"Tang","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_33","unstructured":"Kang, R., Shi, J., Li, X., Liu, Y., and Liu, X. (2019). DF-SLAM: A deep-learning enhanced visual SLAM system based on deep local features. arXiv."},{"key":"ref_34","unstructured":"Nister, D., and Stewenius, H. (2006, January 17\u201322). Scalable recognition with a vocabulary tree. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201906), New York, NY, USA."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1109\/LRA.2018.2849609","article-title":"ibow-lcd: An appearance-based loop-closure detection approach using incremental bags of binary words","volume":"3","author":"Ortiz","year":"2018","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, X., Su, Y., and Zhu, X. (2017, January 7\u20138). Loop closure detection for visual SLAM systems using convolutional neural network. Proceedings of the 2017 IEEE 23rd International Conference on Automation and Computing (ICAC), Huddersfield, UK.","DOI":"10.23919\/IConAC.2017.8082072"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and Sivic, J. (2016, January 27\u201330). NetVLAD: CNN architecture for weakly supervised place recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.572"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hausler, S., Garg, S., Xu, M., Milford, M., and Fischer, T. (2021, January 19\u201325). Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01392"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"103505","DOI":"10.1016\/j.robot.2020.103505","article-title":"Multi-camera visual SLAM for off-road navigation","volume":"128","author":"Yang","year":"2020","journal-title":"Robot. Auton. Syst."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"103813","DOI":"10.1016\/j.robot.2021.103813","article-title":"Learning whole-image descriptors for real-time loop detection and kidnap recovery under large viewpoint difference","volume":"143","author":"Kuse","year":"2021","journal-title":"Robot. Auton. Syst."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Cao, B., Araujo, A., and Sim, J. (2020, January 23\u201328). Unifying deep local and global features for image search. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58565-5_43"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201323). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Li, D., Shi, X., Long, Q., Liu, S., Yang, W., Wang, F., Wei, Q., and Qiao, F. (2020\u201324, January 24). DXSLAM: A robust and efficient visual SLAM system with deep features. Proceedings of the 2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA.","DOI":"10.1109\/IROS45743.2020.9340907"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1109\/LES.2021.3087707","article-title":"Deep learning inference parallelization on heterogeneous processors with tensorrt","volume":"14","author":"Jeong","year":"2021","journal-title":"IEEE Embed. Syst. Lett."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"948","DOI":"10.1109\/TC.1972.5009071","article-title":"Some computer organizations and their effectiveness","volume":"100","author":"Flynn","year":"1972","journal-title":"IEEE Trans. Comput."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1145\/358669.358692","article-title":"Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography","volume":"24","author":"Fischler","year":"1981","journal-title":"Commun. ACM"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1157","DOI":"10.1177\/0278364915620033","article-title":"The EuRoC micro aerial vehicle datasets","volume":"35","author":"Burri","year":"2016","journal-title":"Int. J. Robot. Res."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Schubert, D., Goll, T., Demmel, N., Usenko, V., St\u00fcckler, J., and Cremers, D. (2018, January 1\u20135). The TUM VI benchmark for evaluating visual-inertial odometry. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593419"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cremers, D. (2012, January 7\u201312). A benchmark for the evaluation of RGB-D SLAM systems. Proceedings of the 2012 IEEE\/RSJ international conference on intelligent robots and systems, Vilamoura-Algarve, Portugal.","DOI":"10.1109\/IROS.2012.6385773"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1363","DOI":"10.1109\/TRO.2020.2991614","article-title":"Direct sparse mapping","volume":"36","author":"Zubizarreta","year":"2020","journal-title":"IEEE Trans. Robot."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/4\/2113\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:33:48Z","timestamp":1760121228000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/4\/2113"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,13]]},"references-count":50,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["s23042113"],"URL":"https:\/\/doi.org\/10.3390\/s23042113","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,13]]}}}