{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T00:18:40Z","timestamp":1758845920144,"version":"3.44.0"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T00:00:00Z","timestamp":1756857600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0"},{"start":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T00:00:00Z","timestamp":1756857600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2025,10]]},"DOI":"10.1007\/s40747-025-02058-6","type":"journal-article","created":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T07:53:35Z","timestamp":1756886015000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["MAGSLAM: multi-modal adaptive generator-based semantic SLAM for enhanced robustness in dynamic environments"],"prefix":"10.1007","volume":"11","author":[{"given":"Lei","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Xiaohan","family":"Yu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,9,3]]},"reference":[{"key":"2058_CR1","doi-asserted-by":"crossref","unstructured":"Huang H, Li L, Cheng H, Yeung SK (2024) Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras, in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 21\u00a0584\u201321\u00a0593)","DOI":"10.1109\/CVPR52733.2024.02039"},{"issue":"1","key":"2058_CR2","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1109\/MRA.2022.3228492","volume":"30","author":"T Deng","year":"2023","unstructured":"Deng T, Xie H, Wang J, Chen W (2023) Long-term visual simultaneous localization and mapping: Using a bayesian persistence filter-based global map prediction. IEEE Robot Autom Mag 30(1):36\u201349","journal-title":"IEEE Robot Autom Mag"},{"key":"2058_CR3","doi-asserted-by":"crossref","unstructured":"Cheng J, Zhang L, Chen Q, Hu X, Cai J (2022) A review of visual slam methods for autonomous driving vehicles. Eng Appl Artif Intell 114:104992","DOI":"10.1016\/j.engappai.2022.104992"},{"key":"2058_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2024.104634","volume":"174","author":"M Zhong","year":"2024","unstructured":"Zhong M, Hong C, Jia Z, Wang C, Wang Z (2024) Dynatm-slam: Fast filtering of dynamic feature points and object-based localization in dynamic indoor environments. Robot Auton Syst 174:104634","journal-title":"Robot Auton Syst"},{"key":"2058_CR5","doi-asserted-by":"crossref","unstructured":"Lin Z, Zhang Q, Tian Z, Yu P, Lan J (2024) Dpl-slam: enhancing dynamic point-line slam through dense semantic methods. IEEE Sens J","DOI":"10.1109\/JSEN.2024.3373892"},{"key":"2058_CR6","doi-asserted-by":"crossref","unstructured":"Wei H, Zhang T, Zhang L, Zhong L (2024) Point-state segmentation for dynamic scenes via probability model and orientation consistency. IEEE Trans Ind Inform","DOI":"10.1109\/TII.2024.3369709"},{"key":"2058_CR7","doi-asserted-by":"crossref","unstructured":"Wang Y, Tian Y, Chen J, Xu K, Ding X (2024) A survey of visual slam in dynamic environment: The evolution from geometric to semantic approaches. IEEE Trans Instrum Meas","DOI":"10.1109\/TIM.2024.3420374"},{"key":"2058_CR8","doi-asserted-by":"crossref","unstructured":"Wang K, Guo J, Chen K, Lu J (2025) An in-depth examination of slam methods: Challenges, advancements, and applications in complex scenes for autonomous driving. IEEE Trans Intell Transp Syst","DOI":"10.1109\/TITS.2025.3545479"},{"key":"2058_CR9","doi-asserted-by":"crossref","unstructured":"Wang K, Yao X, Ma N, Ran G (2024) Plmot-slam: a point-line features fusion slam system with moving object tracking. Vis Comput, pp. 1\u201319","DOI":"10.1007\/s00371-024-03677-9"},{"issue":"10","key":"2058_CR10","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6501\/ad5b0e","volume":"35","author":"C Gong","year":"2024","unstructured":"Gong C, Sun Y, Zou C, Jiang D, Huang L, Tao B (2024) Sfd-slam: a novel dynamic rgb-d slam based on saliency region detection. Meas Sci Technol 35(10):106304","journal-title":"Meas Sci Technol"},{"key":"2058_CR11","doi-asserted-by":"crossref","unstructured":"Qin L, Wu C, Chen Z, Kong X, Lv Z, Zhao Z (2024) Rso-slam: A robust semantic visual slam with optical flow in complex dynamic environments. IEEE Trans Intell Transp Syst","DOI":"10.1109\/TITS.2024.3402241"},{"key":"2058_CR12","doi-asserted-by":"crossref","unstructured":"Qi H, Chen X, Yu Z, Li C, Shi Y, Zhao Q, Huang Q (2024) Semantic-independent dynamic slam based on geometric re-clustering and optical flow residuals. IEEE Trans Circ Syst Video Technol","DOI":"10.1109\/TCSVT.2024.3496489"},{"issue":"4","key":"2058_CR13","doi-asserted-by":"publisher","first-page":"4076","DOI":"10.1109\/LRA.2018.2860039","volume":"3","author":"B Bescos","year":"2018","unstructured":"Bescos B, F\u00e1cil JM, Civera J, Neira J (2018) Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot Autom Lett 3(4):4076\u20134083","journal-title":"IEEE Robot Autom Lett"},{"key":"2058_CR14","doi-asserted-by":"crossref","unstructured":"Yu C, Liu Z, Liu XJ, Xie F, Yang Y, Wei Q, Fei Q (2018) Ds-slam: A semantic visual slam towards dynamic environments, in 2018 IEEE\/RSJ international conference on intelligent robots and systems (IROS).IEEE, , pp. 1168\u20131174","DOI":"10.1109\/IROS.2018.8593691"},{"key":"2058_CR15","unstructured":"Deng Z, Yang Z, Chen C, Zeng C, Meng Y, Yang B (2024) Planesam: Multimodal plane instance segmentation using the segment anything model, arXiv preprint arXiv:2410.16545,"},{"key":"2058_CR16","doi-asserted-by":"crossref","unstructured":"Wu ZF, Huang L, Wang W, Wei Y, Liu Y (2024) Multigen: Zero-shot image generation from multi-modal prompts, in European Conference on Computer Vision. Springer, pp. 297\u2013313","DOI":"10.1007\/978-3-031-73242-3_17"},{"key":"2058_CR17","doi-asserted-by":"crossref","unstructured":"Hou X, Xing J, Qian Y, Guo Y, Xin S, Chen J, Tang K, Wang M, Jiang Z, Liu L et\u00a0al (2024) Sdstrack: Self-distillation symmetric adapter learning for multi-modal visual object tracking, in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 26\u00a0551\u201326\u00a0561","DOI":"10.1109\/CVPR52733.2024.02507"},{"issue":"4","key":"2058_CR18","doi-asserted-by":"publisher","first-page":"5391","DOI":"10.1007\/s40747-024-01408-0","volume":"10","author":"W Zhang","year":"2024","unstructured":"Zhang W, Guo Y, Niu L, Li P, Wan Z, Shao F, Nian C, Farrukh FUD, Zhang D, Zhang C et al (2024) Lp-slam: language-perceptive rgb-d slam framework exploiting large language model. Complex Intell Syst 10(4):5391\u20135409","journal-title":"Complex Intell Syst"},{"key":"2058_CR19","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2024.124918","volume":"255","author":"Y Li","year":"2024","unstructured":"Li Y, Shen H, Fu Y, Wang K (2024) A method of dense point cloud slam based on improved yolov8 and fused with orb-slam3 to cope with dynamic environments. Expert Syst Appl 255:124918","journal-title":"Expert Syst Appl"},{"issue":"5","key":"2058_CR20","doi-asserted-by":"publisher","first-page":"1255","DOI":"10.1109\/TRO.2017.2705103","volume":"33","author":"R Mur-Artal","year":"2017","unstructured":"Mur-Artal R, Tard\u00f3s JD (2017) Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Rob 33(5):1255\u20131262","journal-title":"IEEE Trans Rob"},{"issue":"6","key":"2058_CR21","doi-asserted-by":"publisher","first-page":"1874","DOI":"10.1109\/TRO.2021.3075644","volume":"37","author":"C Campos","year":"2021","unstructured":"Campos C, Elvira R, Rodr\u00edguez JJG, Montiel JM, Tard\u00f3s JD (2021) Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans Rob 37(6):1874\u20131890","journal-title":"IEEE Trans Rob"},{"issue":"11","key":"2058_CR22","doi-asserted-by":"publisher","first-page":"19929","DOI":"10.1109\/TITS.2022.3175656","volume":"23","author":"KA Tsintotas","year":"2022","unstructured":"Tsintotas KA, Bampis L, Gasteratos A (2022) The revisiting problem in simultaneous localization and mapping: A survey on visual loop closure detection. IEEE Trans Intell Transp Syst 23(11):19929\u201319953","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"2058_CR23","doi-asserted-by":"crossref","unstructured":"He W, Lu Z, Liu X, Xu Z, Zhang J, Yang C, Geng L (2024) A real-time and high precision hardware implementation of ransac algorithm for visual slam achieving mismatched feature point pair elimination. Regular Papers. IEEE Trans Circ Syst I","DOI":"10.1109\/TCSI.2024.3422082"},{"key":"2058_CR24","doi-asserted-by":"crossref","unstructured":"Zhang H, Peng J, Yang Q (2024) Pr-slam: Parallel real-time dynamic slam method based on semantic segmentation. IEEE Access","DOI":"10.1109\/ACCESS.2024.3373308"},{"issue":"4","key":"2058_CR25","first-page":"529","volume":"44","author":"Z Zheng","year":"2024","unstructured":"Zheng Z, Su K, Lin S, Fu Z, Yang C (2024) Development of vision-based slam: from traditional methods to multimodal fusion. Robot Intell Autom 44(4):529\u2013548","journal-title":"Robot Intell Autom"},{"key":"2058_CR26","doi-asserted-by":"crossref","unstructured":"Wu Z, Zheng J, Ren X, Vasluianu F-A, Ma C, Paudel DP, Van Gool L (2024) and R. Timofte Single-model and any-modality for video object tracking in Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition .19\u00a0156\u201319\u00a0166","DOI":"10.1109\/CVPR52733.2024.01812"},{"issue":"4","key":"2058_CR27","doi-asserted-by":"publisher","first-page":"163","DOI":"10.3390\/biomimetics7040163","volume":"7","author":"F Long","year":"2022","unstructured":"Long F, Ding L, Li J (2022) Dgflow-slam: a novel dynamic environment rgb-d slam without prior semantic knowledge based on grid segmentation of scene flow. Biomimetics 7(4):163","journal-title":"Biomimetics"},{"key":"2058_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TIM.2023.3326234","volume":"72","author":"S Cheng","year":"2022","unstructured":"Cheng S, Sun C, Zhang S, Zhang D (2022) Sg-slam: A real-time rgb-d visual slam toward dynamic scenes with semantic and geometric information. IEEE Trans Instrum Meas 72:1\u201312","journal-title":"IEEE Trans Instrum Meas"},{"key":"2058_CR29","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2025.110068","volume":"143","author":"KI Rashid","year":"2025","unstructured":"Rashid KI, Yang C, Huang C (2025) Dynamic context-aware high-resolution network for semi-supervised semantic segmentation. Eng Appl Artif Intell 143:110068","journal-title":"Eng Appl Artif Intell"},{"issue":"5","key":"2058_CR30","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1002\/tee.24004","volume":"19","author":"J Li","year":"2024","unstructured":"Li J, Wei Q, Cui X, Jiang B, Li S, Liu J (2024) Yvg-slam: dynamic feature removal slam algorithm without a priori assumptions based on object detection and view geometry. IEEJ Trans Electr Electron Eng 19(5):716\u2013725","journal-title":"IEEJ Trans Electr Electron Eng"},{"issue":"12","key":"2058_CR31","doi-asserted-by":"publisher","first-page":"13210","DOI":"10.1109\/JSEN.2023.3270534","volume":"23","author":"J He","year":"2023","unstructured":"He J, Li M, Wang Y, Wang H (2023) Ovd-slam: An online visual slam for dynamic environments. IEEE Sens J 23(12):13210\u201313219","journal-title":"IEEE Sens J"},{"key":"2058_CR32","doi-asserted-by":"crossref","unstructured":"Wang Y, Tian Y, Chen J, Chen C, Xu K, Ding X (2024) Mssd-slam: Multi-feature semantic rgb-d inertial slam with structural regularity for dynamic environments. IEEE Trans Instrum Meas","DOI":"10.1109\/TIM.2024.3509541"},{"key":"2058_CR33","doi-asserted-by":"crossref","unstructured":"Chen G, Wang Z, Dong W, Alonso-Mora J (2025) Particle-based instance-aware semantic occupancy mapping in dynamic environments. IEEE Trans Robot","DOI":"10.1109\/TRO.2025.3526084"},{"key":"2058_CR34","doi-asserted-by":"crossref","unstructured":"Hu Z, Qi W, Ding K, Qi H, Zhao Y, Zhang X, Wang M (2025) Optimized feature points and keyframe methods for vslam in high-dynamic indoor environments. IEEE Trans Intell Transport Syst","DOI":"10.1109\/TITS.2024.3520177"},{"key":"2058_CR35","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2024.124474","volume":"255","author":"QU Islam","year":"2024","unstructured":"Islam QU, Ibrahim H, Chin PK, Lim K, Abdullah MZ, Khozaei F (2024) Advancing real-world visual slam: Integrating adaptive segmentation with dynamic object detection for enhanced environmental perception. Expert Syst Appl 255:124474","journal-title":"Expert Syst Appl"},{"issue":"3","key":"2058_CR36","doi-asserted-by":"publisher","first-page":"2847","DOI":"10.1109\/TITS.2023.3284228","volume":"25","author":"G Li","year":"2023","unstructured":"Li G, Fan H, Jiang G, Jiang D, Liu Y, Tao B, Yun J (2023) Rgbd-slam based on object detection with two-stream yolov4-mobilenetv3 in autonomous driving. IEEE Trans Intell Transp Syst 25(3):2847\u20132857","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"1","key":"2058_CR37","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1007\/s00138-024-01629-w","volume":"36","author":"R Zheng","year":"2025","unstructured":"Zheng R, Ren Y, Zhou Q, Ye Y, Zeng H (2025) Cross transformer for lidar-based loop closure detection. Mach Vis Appl 36(1):2","journal-title":"Mach Vis Appl"},{"key":"2058_CR38","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijmultiphaseflow.2024.104970","volume":"180","author":"Z Ren","year":"2024","unstructured":"Ren Z, Li D, Zhou W, Li Z, Wang H, Liu J, Li Y, Khoo BC (2024) Gas-liquid mass-transfer characteristics during dissolution and evolution in quasi-static and dynamic processes. Int J Multiph Flow 180:104970","journal-title":"Int J Multiph Flow"},{"key":"2058_CR39","doi-asserted-by":"crossref","unstructured":"Sturm J, Engelhard N, Endres F, Burgard W, Cremers D(2012) A benchmark for the evaluation of rgb-d slam systems, in 2012 IEEE\/RSJ international conference on intelligent robots and systems, IEEE , pp. 573\u2013580","DOI":"10.1109\/IROS.2012.6385773"},{"key":"2058_CR40","doi-asserted-by":"crossref","unstructured":"Palazzolo E, Behley J, Lottes P, Giguere P, Stachniss C (2019) Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals, in 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE, pp. 7855\u20137862","DOI":"10.1109\/IROS40897.2019.8967590"},{"key":"2058_CR41","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2020.103632","volume":"133","author":"D-C Hoang","year":"2020","unstructured":"Hoang D-C, Lilienthal AJ, Stoyanov T (2020) Object-rpe: Dense 3d reconstruction and pose estimation with convolutional neural networks. Robot Auton Syst 133:103632","journal-title":"Robot Auton Syst"},{"key":"2058_CR42","unstructured":"Liu A, Feng B, Wang B, Wang B, Liu B, Zhao C, Dengr C, Ruan C, Dai D, Guo D et\u00a0al (2024) Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, arXiv preprint arXiv:2405.04434,"},{"key":"2058_CR43","unstructured":"Li P, An Z, Abrar S, Zhou L (2025) Large language models for multi-robot systems: A survey, arXiv preprint arXiv:2502.03814,"},{"key":"2058_CR44","doi-asserted-by":"crossref","unstructured":"Zhang W, Guo Y, Niu L, Li P, Zhang C, Wan Z, Yan J, Farrukh FUD, Zhang D (2023) Lp-slam: Language-perceptive rgb-d slam system based on large language model, arXiv preprint arXiv:2303.10089,","DOI":"10.1007\/s40747-024-01408-0"},{"key":"2058_CR45","doi-asserted-by":"crossref","unstructured":"Li H, Yu S, Zhang S, Tan G (2024) Resolving loop closure confusion in repetitive environments for visual slam through ai foundation models assistance, in 2024 IEEE International Conference on Robotics and Automation (ICRA).IEEE , pp. 6657\u20136663","DOI":"10.1109\/ICRA57147.2024.10610083"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-02058-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-025-02058-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-02058-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T13:34:45Z","timestamp":1758807285000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-025-02058-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,3]]},"references-count":45,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10]]}},"alternative-id":["2058"],"URL":"https:\/\/doi.org\/10.1007\/s40747-025-02058-6","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2025,9,3]]},"assertion":[{"value":"20 March 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 August 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 September 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Multi-modal Fusion: The proposed framework fuses RGB and depth data using a multi-modal prompt generator (MPG) and a feature adapter (MFA), leading to highly accurate semantic segmentation with minimal additional parameters. Adaptive Motion Handling: A novel motion-level initialization strategy, coupled with cross-frame motion propagation, effectively differentiates dynamic elements from static scene components, thereby reducing dynamic disturbances. Robust Pose Optimization: Integration of a weighted static constraint into the pose refinement process ensures enhanced localization accuracy even in challenging, dynamic environments. Comprehensive Validation: Extensive experiments on both TUM RGB-D and Bonn RGB-D datasets confirm the system\u2019s superior performance in both global trajectory alignment and local motion consistency, paving the way for robust SLAM applications in real-world dynamic scenarios.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Key Innovations"}}],"article-number":"437"}}