{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T11:31:30Z","timestamp":1771932690886,"version":"3.50.1"},"reference-count":31,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T00:00:00Z","timestamp":1771200000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Yunnan Province Key Research and Development Program\/Yunnan Province Science and Technology Departmen","award":["202503AA080023"],"award-info":[{"award-number":["202503AA080023"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>With the rapid evolution of Digital Twins and Embodied AI, achieving fast, dense, and high-precision 3D perception in unknown environments has become paramount. However, existing Visual SLAM paradigms face a critical dilemma: geometry-based methods often fail in texture-less areas due to feature scarcity, while learning-based approaches frequently suffer from scale drift and unphysical deformations. To bridge this gap, we propose VGGT-Geo, a novel SLAM system that synergizes generative priors from Large Foundation Models with multi-modal geometric optimization. Distinguishing itself from simple cascaded architectures, we construct a Probabilistic Geometric Fusion framework, consisting of (1) Generative Warm-start, leveraging the holistic scene understanding capabilities of the VGGT, (2) Confidence-Aware Optimization to extract dense features via DINOv3 and predict their confidence map, and (3) a Multi-Modal Constraint Closure that fuses point-line features and metric depth priors to constrain rotational Degrees of Freedom in Manhattan Worlds. We conducted systematic evaluations on TUM, Replica, Tanks and Temples, and a challenging self-collected dataset featuring extreme lighting and texture-less walls. Experimental results demonstrate that VGGT-Geo exhibits superior robustness and accuracy in unseen environments. On our most challenging dataset, it achieves an Absolute Trajectory Error of 4\u20135 cm and a Relative Rotation Error of 0.79\u00b0, outperforming current state-of-the-art methods by approximately 50% in trajectory accuracy. This study validates that synergizing the intuition of Large Foundation Models with geometric rigor is a viable path toward next-generation robust SLAM.<\/jats:p>","DOI":"10.3390\/ijgi15020085","type":"journal-article","created":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T09:22:46Z","timestamp":1771320166000},"page":"85","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["VGGT-Geo: Probabilistic Geometric Fusion of Visual Geometry Grounded Transformer Priors for Robust Dense Indoor SLAM"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-6271-3709","authenticated-orcid":false,"given":"Kai","family":"Qin","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}]},{"given":"Jing","family":"Li","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"International Research Center of Big Data for Sustainable Development Goals, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8766-0487","authenticated-orcid":false,"given":"Sisi","family":"Zlatanova","sequence":"additional","affiliation":[{"name":"GRID, School of Built Environment, University of New South Wales, Sydney, NSW 2033, Australia"}]},{"given":"Haitao","family":"Wu","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}]},{"given":"Hao","family":"Wu","sequence":"additional","affiliation":[{"name":"National Geomatics Center of China, Beijing 100830, China"}]},{"given":"Yin","family":"Gao","sequence":"additional","affiliation":[{"name":"National Geomatics Center of China, Beijing 100830, China"}]},{"given":"Dingjie","family":"Zhou","sequence":"additional","affiliation":[{"name":"Yunnan Provincial Institute of Surveying and Mapping, Kunming 650011, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-0083-2162","authenticated-orcid":false,"given":"Yuchen","family":"Li","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8277-5853","authenticated-orcid":false,"given":"Sizhe","family":"Shen","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9972-0815","authenticated-orcid":false,"given":"Xiangjun","family":"Qu","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}]},{"given":"Zhenxin","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Resource Environment and Tourism, Capital Normal University, Beijing 100048, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0872-8645","authenticated-orcid":false,"given":"Banghui","family":"Yang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"National Engineering Research Center for Geoinformatics, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}]},{"given":"Shicheng","family":"Xu","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Fehr, M., Furrer, F., Dryanovski, I., Sturm, J., Gilitschenski, I., Siegwart, R., and Cadena, C. (June, January 29). TSDF-based change detection for consistent long-term dense reconstruction and dynamic object discovery. Proceedings of the 2017 IEEE International Conference on Robotics and automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989614"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yan, L., Hu, X., Zhao, L., Chen, Y., Wei, P., and Xie, H. (2022). DGS-SLAM: A fast and robust RGBD SLAM in dynamic environments combined by geometric and semantic information. Remote Sens., 14.","DOI":"10.3390\/rs14030795"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1002\/rob.22248","article-title":"MVS-SLAM: Enhanced multiview geometry for improved semantic RGBD SLAM in dynamic environment","volume":"41","author":"Islam","year":"2024","journal-title":"J. Field Robot."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1145\/3503250","article-title":"Nerf: Representing scenes as neural radiance fields for view synthesis","volume":"65","author":"Mildenhall","year":"2021","journal-title":"Commun. ACM"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1145\/3592433","article-title":"3D Gaussian splatting for real-time radiance field rendering","volume":"42","author":"Kerbl","year":"2023","journal-title":"ACM Trans. Graph."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., and Revaud, J. (2024, January 16\u201322). Dust3r: Geometric 3d vision made easy. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01956"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Wang, H., and Agapito, L. (2024). 3d reconstruction with spatial memory. arXiv.","DOI":"10.1109\/3DV66043.2025.00013"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, Y., Dong, S., Wang, S., Yin, Y., Yang, Y., Fan, Q., and Chen, B. (2025, January 11\u201315). Slam3r: Real-time dense scene reconstruction from monocular rgb videos. Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA.","DOI":"10.1109\/CVPR52734.2025.01552"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Agarwal, S., Snavely, N., Seitz, S.M., and Szeliski, R. (2010, January 5\u201311). Bundle adjustment in the large. Proceedings of the European Conference on Computer Vision, Heraklion, Crete.","DOI":"10.1007\/978-3-642-15552-9_3"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"103109","DOI":"10.1016\/j.autcon.2020.103109","article-title":"Indoor 3D reconstruction from point clouds for optimal routing in complex buildings to support disaster management","volume":"113","author":"Nikoohemat","year":"2020","journal-title":"Autom. Constr."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., and Novotny, D. (2025, January 11\u201315). Vggt: Visual geometry grounded transformer. Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA.","DOI":"10.1109\/CVPR52734.2025.00499"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1007\/s13042-010-0001-0","article-title":"Understanding bag-of-words model: A statistical framework","volume":"1","author":"Zhang","year":"2010","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011, January 6\u201313). ORB: An efficient alternative to SIFT or SURF. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1109\/TRO.2015.2463671","article-title":"ORB-SLAM: A versatile and accurate monocular SLAM system","volume":"31","author":"Montiel","year":"2015","journal-title":"IEEE Trans. Robot."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"122","DOI":"10.2478\/msr-2013-0021","article-title":"A Comparative Study of SIFT and its Variants","volume":"13","author":"Wu","year":"2013","journal-title":"Meas. Sci. Rev."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Murai, R., Dexheimer, E., and Davison, A.J. (2025, January 11\u201315). MASt3R-SLAM: Real-time dense SLAM with 3D reconstruction priors. Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA.","DOI":"10.1109\/CVPR52734.2025.01556"},{"key":"ref_17","unstructured":"Maggio, D., Lim, H., and Carlone, L. (2025). Vggt-slam: Dense rgb slam optimized on the sl (4) manifold. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Coughlan, J.M., and Yuille, A.L. (1999, January 20\u201325). Manhattan world: Compass direction from a single image by bayesian inference. Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece.","DOI":"10.1109\/ICCV.1999.790349"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1037\/h0054588","article-title":"Degrees of freedom","volume":"31","author":"Walker","year":"1940","journal-title":"J. Educ. Psychol."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Pumarola, A., Vakhitov, A., Agudo, A., Sanfeliu, A., and Moreno-Noguer, F. (June, January 29). PL-SLAM: Real-time monocular visual SLAM with points and lines. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989522"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Engel, J., Sch\u00f6ps, T., and Cremers, D. (2014, January 6\u201312). LSD-SLAM: Large-scale direct monocular SLAM. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10605-2_54"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, R., Xu, S., Dai, C., Xiang, J., Deng, Y., Tong, X., and Yang, J. (2025, January 11\u201315). Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA.","DOI":"10.1109\/CVPR52734.2025.00496"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1109\/89.279278","article-title":"Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains","volume":"2","author":"Gauvain","year":"1994","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_24","unstructured":"Xu, Y., Xu, W., Cheung, K.-Y.K., and Tu, Z. (2021, January 10\u201317). LETR: Line Transformers for Joint End-to-End Line Segment Detection and Description. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Schubert, D., Goll, T., Demmel, N., Usenko, V., St\u00fcckler, J., and Cremers, D. (2018, January 1\u20135). The TUM VI benchmark for evaluating visual-inertial odometry. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593419"},{"key":"ref_26","unstructured":"Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur-Artal, R., Ren, C., and Verma, S. (2019). The replica dataset: A digital replica of indoor spaces. arXiv."},{"key":"ref_27","first-page":"78","article-title":"Tanks and temples: Benchmarking large-scale scene reconstruction","volume":"36","author":"Knapitsch","year":"2017","journal-title":"ACM Trans. Graph. ToG"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"103755","DOI":"10.1016\/j.robot.2021.103755","article-title":"ColMap: A memory-efficient occupancy grid mapping framework","volume":"142","author":"Fisher","year":"2021","journal-title":"Robot. Auton. Syst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Scaramuzza, D. (2018, January 1\u20135). A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593941"},{"key":"ref_30","unstructured":"Sim\u00e9oni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khalidov, V., Szafraniec, M., Yi, S., and Ramamonjisoa, M. (2025). Dinov3. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"326","DOI":"10.1109\/TRO.2024.3502198","article-title":"Fast-livo2: Fast, direct lidar-inertial-visual odometry","volume":"41","author":"Zheng","year":"2024","journal-title":"IEEE Trans. Robot."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/15\/2\/85\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T10:56:05Z","timestamp":1771930565000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/15\/2\/85"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,16]]},"references-count":31,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["ijgi15020085"],"URL":"https:\/\/doi.org\/10.3390\/ijgi15020085","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,16]]}}}