{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T14:03:16Z","timestamp":1760623396940,"version":"build-2065373602"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62522219"],"award-info":[{"award-number":["62522219"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Major Program of Xiangjiang Laboratory","award":["23XJ01009"],"award-info":[{"award-number":["23XJ01009"]}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62325211, 62132021, 62372457, 62572477"],"award-info":[{"award-number":["62325211, 62132021, 62372457, 62572477"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Young Elite Scientists Sponsorship Program by CAST","award":["2023QNRC001"],"award-info":[{"award-number":["2023QNRC001"]}]},{"DOI":"10.13039\/501100004761","name":"Natural Science Foundation of Hunan Province of China","doi-asserted-by":"crossref","award":["2021RC3071, 2022RC1104"],"award-info":[{"award-number":["2021RC3071, 2022RC1104"]}],"id":[{"id":"10.13039\/501100004761","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100018537","name":"National Science and Technology Major Project","doi-asserted-by":"crossref","award":["2022ZD0115302"],"award-info":[{"award-number":["2022ZD0115302"]}],"id":[{"id":"10.13039\/501100018537","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["61379052"],"award-info":[{"award-number":["61379052"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100019048","name":"Science Foundation of Ministry of Education of China","doi-asserted-by":"crossref","award":["2018A02002"],"award-info":[{"award-number":["2018A02002"]}],"id":[{"id":"10.13039\/501100019048","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100019092","name":"Natural Science Foundation for Distinguished Young Scholars of Hunan Province","doi-asserted-by":"crossref","award":["14JJ1026"],"award-info":[{"award-number":["14JJ1026"]}],"id":[{"id":"10.13039\/501100019092","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>\n            The introduction of the neural implicit representation has notably propelled the advancement of online dense reconstruction techniques. Compared to traditional explicit representations, such as TSDF, it substantially improves the mapping completeness and memory efficiency. However, the lack of reconstruction details and the time-consuming learning of neural representations hinder the widespread application of neural-based methods to large-scale online reconstruction. We introduce RemixFusion, a novel residual-based mixed representation for scene reconstruction and camera pose estimation dedicated to high-quality and large-scale online RGB-D reconstruction. In particular, we propose a residual-based map representation comprised of an explicit coarse TSDF grid and an implicit neural module that produces residuals representing fine-grained details to be added to the coarse grid. Such mixed representation allows for detail-rich reconstruction with bounded time and memory budget, contrasting with the overly-smoothed results by the purely implicit representations, thus paving the way for high-quality camera tracking. Furthermore, we extend the residual-based representation to handle multi-frame joint pose optimization via bundle adjustment (BA). In contrast to the existing methods, which optimize poses directly, we opt to optimize pose changes. Combined with a novel technique for adaptive gradient amplification, our method attains better optimization convergence and global optimality. Furthermore, we adopt a local moving volume to factorize the whole mixed scene representation with a divide-and-conquer design to facilitate efficient online learning in our residual-based framework. Extensive experiments demonstrate that our method surpasses all state-of-the-art ones, including those based either on explicit or implicit representations, in terms of the accuracy of both mapping and tracking on large-scale scenes. Project page can be found at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/lanlan96.github.io\/RemixFusion\/\">https:\/\/lanlan96.github.io\/RemixFusion\/<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3769007","type":"journal-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T11:35:06Z","timestamp":1758281706000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction"],"prefix":"10.1145","volume":"45","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0546-355X","authenticated-orcid":false,"given":"Yuqing","family":"Lan","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2838-8601","authenticated-orcid":false,"given":"Chenyang","family":"Zhu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5927-5426","authenticated-orcid":false,"given":"Shuaifeng","family":"Zhi","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9459-293X","authenticated-orcid":false,"given":"Jiazhao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Peking University","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8786-8500","authenticated-orcid":false,"given":"Zhoufeng","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6057-1089","authenticated-orcid":false,"given":"Renjiao","family":"Yi","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2913-4016","authenticated-orcid":false,"given":"Yijie","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9054-0216","authenticated-orcid":false,"given":"Kai","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology","place":["Changsha, China"]},{"name":"Xiangjiang Laboratory","place":["Changsha, China"]}]}],"member":"320","published-online":{"date-parts":[[2025,10,13]]},"reference":[{"key":"e_1_3_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00619"},{"key":"e_1_3_2_3_1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Bian Jia-Wang","year":"2024","unstructured":"Jia-Wang Bian, Wenjing Bian, Victor Adrian Prisacariu, and Philip Torr. 2024. PoRF: Pose residual field for accurate neural surface reconstruction. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_4_1","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2013.IX.035"},{"key":"e_1_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2021.3075644"},{"key":"e_1_3_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01565"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19824-3_20"},{"key":"e_1_3_2_8_1","doi-asserted-by":"crossref","unstructured":"Chi-Ming Chung Yang-Che Tseng Ya-Ching Hsu Xiang-Qian Shi Yun-Hung Hua Jia-Fong Yeh Wen-Chin Chen Yi-Ting Chen and Winston H. Hsu. 2023. Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE 9400\u20139406.","DOI":"10.1109\/ICRA48891.2023.10160950"},{"key":"e_1_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/237170.237269"},{"key":"e_1_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.261"},{"key":"e_1_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3054739"},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3054739"},{"key":"e_1_3_2_13_1","first-page":"180","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Ha Seongbo","year":"2024","unstructured":"Seongbo Ha, Jiung Yeon, and Hyeonwoo Yu. 2024. RGBD GS-ICP SLAM. In Proceedings of the European Conference on Computer Vision. Springer, 180\u2013197."},{"key":"e_1_3_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_15_1","unstructured":"Jiarui Hu Mao Mao Hujun Bao Guofeng Zhang and Zhaopeng Cui. 2023. CP-SLAM: collaborative neural point-based SLAM. In Proceedings of the 37th International Conference on Neural Information Processing Systems. 39429\u201339442."},{"key":"e_1_3_2_16_1","doi-asserted-by":"publisher","unstructured":"Binbin Huang Zehao Yu Anpei Chen Andreas Geiger and Shenghua Gao. 2024b. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields. In Proceedings of the SIGGRAPH 2024 Conference Papers. ACM 1\u201311. DOI:10.1145\/3641519.3657428","DOI":"10.1145\/3641519.3657428"},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02039"},{"key":"e_1_3_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2047196.2047270"},{"key":"e_1_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01670"},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02018"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2013.9"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592433"},{"key":"e_1_3_2_23_1","unstructured":"Jin-Hwa Kim Sang-Woo Lee Donghyun Kwak Min-Oh Heo Jeonghee Kim Jung-Woo Ha and Byoung-Tak Zhang. 2016. Multimodal residual learning for visual QA. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 361\u2013369."},{"key":"e_1_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.09.001"},{"key":"e_1_3_2_25_1","first-page":"34","volume-title":"Proceedings of the Conference on Robot Learning","author":"Koestler Lukas","year":"2022","unstructured":"Lukas Koestler, Nan Yang, Niclas Zeller, and Daniel Cremers. 2022. Tandem: Tracking and dense mapping in real-time using deep multi-view stereo. In Proceedings of the Conference on Robot Learning. PMLR, 34\u201345."},{"key":"e_1_3_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00817"},{"key":"e_1_3_2_27_1","unstructured":"Lingjie Liu Jiatao Gu Kyaw Zaw Lin Tat-Seng Chua and Christian Theobalt. 2020. Neural sparse voxel fields. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 15651\u201315663."},{"key":"e_1_3_2_28_1","doi-asserted-by":"crossref","unstructured":"Yunxuan Mao Xuan Yu Zhuqing Zhang Kai Wang Yue Wang Rong Xiong and Yiyi Liao. 2024. NGEL-SLAM: Neural implicit representation-based global consistent low-latency SLAM system. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE 6952\u20136958.","DOI":"10.1109\/ICRA57147.2024.10611269"},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01708"},{"key":"e_1_3_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_3_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530127"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341156"},{"key":"e_1_3_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2017.2705103"},{"key":"e_1_3_2_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-31438-4_36"},{"key":"e_1_3_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508363.2508374"},{"key":"e_1_3_2_36_1","doi-asserted-by":"crossref","unstructured":"Zhexi Peng Tianjia Shao Yong Liu Jingke Zhou Yin Yang Jingdong Wang and Kun Zhou. 2024. Rtg-slam: Real-time 3d reconstruction at scale using gaussian splatting. In Proceedings of the SIGGRAPH 2024 Conference Papers. ACM 1\u201311.","DOI":"10.1145\/3641519.3657455"},{"key":"e_1_3_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01086"},{"key":"e_1_3_2_38_1","doi-asserted-by":"crossref","unstructured":"Antoni Rosinol Marcus Abate Yun Chang and Luca Carlone. 2020. Kimera: An open-source library for real-time metric-semantic localization and mapping. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE 1689\u20131696.","DOI":"10.1109\/ICRA40945.2020.9196885"},{"key":"e_1_3_2_39_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.26.112"},{"key":"e_1_3_2_40_1","doi-asserted-by":"crossref","unstructured":"Erik Sandstr\u00f6m Yue Li Luc Van Gool and Martin R. Oswald. 2023. Point-SLAM: Dense neural point cloud-based SLAM. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 18433\u201318444.","DOI":"10.1109\/ICCV51070.2023.01690"},{"key":"e_1_3_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00022"},{"key":"e_1_3_2_42_1","unstructured":"Julian Straub Thomas Whelan Lingni Ma Yufan Chen Erik Wijmans Simon Green Jakob J. Engel Raul Mur-Artal Carl Ren Shobhit Verma et\u00a0al. 2019. The replica dataset: A digital replica of indoor spaces. arXiv:1906.05797. Retrieved from https:\/\/arxiv.org\/abs\/1906.05797"},{"key":"e_1_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6385773"},{"key":"e_1_3_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00617"},{"key":"e_1_3_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00807"},{"key":"e_1_3_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3618363"},{"key":"e_1_3_2_47_1","unstructured":"Zachary Teed and Jia Deng. 2021. DROID-SLAM: Deep visual SLAM for monocular stereo and RGB-D cameras. In Proceedings of the 35th International Conference on Neural Information Processing Systems. 16558\u201316569."},{"key":"e_1_3_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01277"},{"key":"e_1_3_2_49_1","unstructured":"Peng Wang Lingjie Liu Yuan Liu Christian Theobalt Taku Komura and Wenping Wang. 2021. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In Proceedings of the 35th International Conference on Neural Information Processing Systems. 27171\u201327183."},{"key":"e_1_3_2_50_1","volume-title":"Proceedings of the RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras","author":"Whelan Thomas","year":"2012","unstructured":"Thomas Whelan, Michael Kaess, Maurice Fallon, Hordur Johannsson, John Leonard, and John McDonald. 2012. Kintinuous: Spatially extended kinectfusion. In Proceedings of the RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras."},{"key":"e_1_3_2_51_1","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2015.XI.001"},{"key":"e_1_3_2_52_1","doi-asserted-by":"crossref","unstructured":"Thomas Whelan Renato F. Salas-Moreno Ben Glocker Andrew J. Davison and Stefan Leutenegger. 2016. ElasticFusion: Real-time dense SLAM and light source estimation. The International Journal of Robotics Research 35 14 (2016) 1697\u20131716.","DOI":"10.1177\/0278364916669237"},{"key":"e_1_3_2_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19824-3_7"},{"key":"e_1_3_2_54_1","unstructured":"Qiangeng Xu Zexiang Xu Julien Philip Sai Bi Zhixin Shu Kalyan Sunkavalli and Ulrich Neumann. 2022b. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5438\u20135448."},{"key":"e_1_3_2_55_1","doi-asserted-by":"crossref","unstructured":"Yabin Xu Liangliang Nan Laishui Zhou Jun Wang and Charlie C. L. Wang. 2022a. HRBF-fusion: Accurate 3D reconstruction from RGB-D data using on-the-fly implicits. ACM Transactions on Graphics (TOG) 41 3 (2022) 1\u201319.","DOI":"10.1145\/3516521"},{"key":"e_1_3_2_56_1","doi-asserted-by":"crossref","unstructured":"Xingrui Yang Hai Li Hongjia Zhai Yuhang Ming Yuqian Liu and Guofeng Zhang. 2022. Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation. In Proceedings of the 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE 499\u2013507.","DOI":"10.1109\/ISMAR55827.2022.00066"},{"key":"e_1_3_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2022.3208503"},{"key":"e_1_3_2_58_1","doi-asserted-by":"crossref","unstructured":"Jiazhao Zhang Chenyang Zhu Lintao Zheng and Kai Xu. 2021. ROSEFusion: Random optimization for online dense reconstruction under fast camera motion. ACM Transactions on Graphics (TOG) 40 4 (2021) 1\u201317.","DOI":"10.1145\/3450626.3459676"},{"key":"e_1_3_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00345"},{"key":"e_1_3_2_60_1","doi-asserted-by":"crossref","unstructured":"Liyuan Zhu Yue Li Erik Sandstr\u00f6m Shengyu Huang Konrad Schindler and Iro Armeni. 2025. Loopsplat: Loop closure by registering 3d gaussian splats. In Proceedings of the 2025 International Conference on 3D Vision (3DV). IEEE 156\u2013167.","DOI":"10.1109\/3DV66043.2025.00020"},{"key":"e_1_3_2_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01245"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769007","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:31:50Z","timestamp":1760365910000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3769007"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,13]]},"references-count":60,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3769007"],"URL":"https:\/\/doi.org\/10.1145\/3769007","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2025,10,13]]},"assertion":[{"value":"2024-08-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-20","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}