{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,20]],"date-time":"2026-07-20T13:02:41Z","timestamp":1784552561223,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":56,"publisher":"ACM","funder":[{"name":"NSFC","award":["62431015"],"award-info":[{"award-number":["62431015"]}]},{"DOI":"10.13039\/501100003399","name":"Science and Technology Commission of Shanghai Municipality","doi-asserted-by":"publisher","award":["No.24511106200"],"award-info":[{"award-number":["No.24511106200"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012656","name":"Shanghai Key Laboratory of Digital Media Processing and Transmission","doi-asserted-by":"publisher","award":["22DZ2229005"],"award-info":[{"award-number":["22DZ2229005"]}],"id":[{"id":"10.13039\/501100012656","id-type":"DOI","asserted-by":"publisher"}]},{"name":"111 project","award":["BP0719010"],"award-info":[{"award-number":["BP0719010"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,27]]},"DOI":"10.1145\/3746027.3758232","type":"proceedings-article","created":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T07:37:21Z","timestamp":1761377841000},"page":"12882-12889","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-7658-968X","authenticated-orcid":false,"given":"Bate","family":"Li","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1499-4811","authenticated-orcid":false,"given":"Houqiang","family":"Zhong","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5364-7427","authenticated-orcid":false,"given":"Zhengxue","family":"Cheng","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4645-9776","authenticated-orcid":false,"given":"Qiang","family":"Hu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5360-9997","authenticated-orcid":false,"given":"Qiang","family":"Wang","sequence":"additional","affiliation":[{"name":"Visionstar Information Technology (Shanghai) Co., Ltd, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7124-5182","authenticated-orcid":false,"given":"Li","family":"Song","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8799-1182","authenticated-orcid":false,"given":"Wenjun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,10,27]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"2018 International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C). IEEE, 29-36","author":"Babu Athira Chandra","year":"2018","unstructured":"Athira Chandra Babu, Ravi Kumar Karri, et al., 2018. Sensor data fusion using Kalman filter. In 2018 International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C). IEEE, 29-36."},{"key":"e_1_3_2_1_2_1","volume-title":"Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions. In The IEEE International Conference on Computer Vision (ICCV).","author":"Bambach Sven","year":"2015","unstructured":"Sven Bambach, Stefan Lee, David J. Crandall, and Chen Yu. 2015. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions. In The IEEE International Conference on Computer Vision (ICCV)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00036"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392485"},{"key":"e_1_3_2_1_5_1","volume-title":"HexPlane: A Fast Representation for Dynamic Scenes. CVPR","author":"Cao Ang","year":"2023","unstructured":"Ang Cao and Justin Johnson. 2023. HexPlane: A Fast Representation for Dynamic Scenes. CVPR (2023)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-021-01531-2"},{"key":"e_1_3_2_1_7_1","volume-title":"Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. In European Conference on Computer Vision (ECCV).","author":"Damen Dima","year":"2018","unstructured":"Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. 2018. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. In European Conference on Computer Vision (ECCV)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2991965"},{"key":"e_1_3_2_1_9_1","unstructured":"Fernando De la Torre Jessica Hodgins Adam Bargteil Xavier Martin Justin Macey Alex Collado and Pep Beltran. 2009. Guide to the carnegie mellon university multimodal activity (cmu-mmac) database. (2009)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550469.3555383"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6247805"},{"key":"e_1_3_2_1_12_1","volume-title":"pySLAM: An open-source, modular, and extensible framework for SLAM. arXiv preprint arXiv:2502.11955","author":"Freda Luigi","year":"2025","unstructured":"Luigi Freda. 2025. pySLAM: An open-source, modular, and extensible framework for SLAM. arXiv preprint arXiv:2502.11955 (2025)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari Rohit Girdhar Jackson Hamburger Hao Jiang Miao Liu Xingyu Liu Miguel Martin Tushar Nagarajan Ilija Radosavovic Santhosh Kumar Ramakrishnan Fiona Ryan Jayant Sharma Michael Wray Mengmeng Xu Eric Zhongcong Xu Chen Zhao Siddhant Bansal Dhruv Batra Vincent Cartillier Sean Crane Tien Do Morrie Doulaty Akshay Erapalli Christoph Feichtenhofer Adriano Fragomeni Qichen Fu Christian Fuegen Abrham Gebreselasie Cristina Gonzalez James Hillis Xuhua Huang Yifei Huang Wenqi Jia Weslie Khoo Jachym Kolar Satwik Kottur Anurag Kumar Federico Landini Chao Li Yanghao Li Zhenqiang Li Karttikeya Mangalam Raghava Modhugu Jonathan Munro Tullie Murrell Takumi Nishiyasu Will Price Paola Ruiz Puentes Merey Ramazanova Leda Sari Kiran Somasundaram Audrey Southerland Yusuke Sugano Ruijie Tao Minh Vo Yuchen Wang Xindi Wu Takuma Yagi Yunyi Zhu Pablo Arbelaez David Crandall Dima Damen Giovanni Maria Farinella Bernard Ghanem Vamsi Krishna Ithapu C. V. Jawahar Hanbyul Joo Kris Kitani Haizhou Li Richard Newcombe Aude Oliva Hyun Soo Park James M. Rehg Yoichi Sato Jianbo Shi Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan and Jitendra Malik. 2022. Ego4D: Around the World in 3 000 Hours of Egocentric Video. In IEEE\/CVF Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.01842"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01834"},{"key":"e_1_3_2_1_15_1","volume-title":"EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video. arXiv preprint arXiv:2505.11709","author":"Hoque Ryan","year":"2025","unstructured":"Ryan Hoque, Peide Huang, David J Yoon, Mouli Sivapurapu, and Jian Zhang. 2025. EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video. arXiv preprint arXiv:2505.11709 (2025)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2025.3559140"},{"key":"e_1_3_2_1_17_1","unstructured":"Qiang Hu Zihan Zheng Houqiang Zhong Sihua Fu Li Song Guangtao Zhai Yanfeng Wang et al. 2025b. 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video. arXiv preprint arXiv:2503.18421 (2025)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i4.32370"},{"key":"e_1_3_2_1_19_1","first-page":"13","article-title":"Spherical linear interpolation and B\u00e9zier curves","volume":"2","author":"Jafari Mehdi","year":"2014","unstructured":"Mehdi Jafari and Habib Molaei. 2014. Spherical linear interpolation and B\u00e9zier curves. General Scientific Researches, Vol. 2, 1 (2014), 13-17.","journal-title":"General Scientific Researches"},{"key":"e_1_3_2_1_20_1","volume-title":"Panoptic Studio: A Massively Multiview System for Social Motion Capture. In The IEEE International Conference on Computer Vision (ICCV).","author":"Joo Hanbyul","year":"2015","unstructured":"Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic Studio: A Massively Multiview System for Social Motion Capture. In The IEEE International Conference on Computer Vision (ICCV)."},{"key":"e_1_3_2_1_21_1","volume-title":"Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh.","author":"Joo Hanbyul","year":"2017","unstructured":"Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Scott Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2017. Panoptic Studio: A Massively Multiview System for Social Interaction Capture. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592433"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6247820"},{"key":"e_1_3_2_1_24_1","first-page":"13485","article-title":"Streaming radiance fields for 3d video synthesis","volume":"35","author":"Li Lingzhi","year":"2022","unstructured":"Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, and Ping Tan. 2022a. Streaming radiance fields for 3d video synthesis. Advances in Neural Information Processing Systems, Vol. 35 (2022), 13485-13498.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00544"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01228-1_38"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298625"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00813"},{"key":"e_1_3_2_1_29_1","volume-title":"Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos. arXiv preprint","author":"Li Zhengqi","year":"2024","unstructured":"Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Holynski, and Noah Snavely. 2024b. MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos. arXiv preprint (2024)."},{"key":"e_1_3_2_1_30_1","volume-title":"Yuhan Huang, Sifan Liu, Mingyu Chen, Rushikesh Zawar, Xue Bai, Yilun Du, Chuang Gan, and Deva Ramanan.","author":"Lin Zhiqiu","year":"2025","unstructured":"Zhiqiu Lin, Siyuan Cen, Daniel Jiang, Jay Karhade, Hewei Wang, Chancharik Mitra, Yu Tong Tiffany Ling, Yuhan Huang, Sifan Liu, Mingyu Chen, Rushikesh Zawar, Xue Bai, Yilun Du, Chuang Gan, and Deva Ramanan. 2025. Towards Understanding Camera Motions in Any Video. (2025)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.02034"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.350"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Jonathon Luiten Georgios Kopanas Bastian Leibe and Deva Ramanan. 2024. Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. In 3DV.","DOI":"10.1109\/3DV62453.2024.00044"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00991"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3025105"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480487"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248010"},{"key":"e_1_3_2_1_38_1","unstructured":"Adobe Premiere Pro. 2018. Adobe Premiere Pro."},{"key":"e_1_3_2_1_39_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition.","author":"Pumarola Albert","year":"2020","unstructured":"Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2020. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.221"},{"key":"e_1_3_2_1_41_1","volume-title":"Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Sch\u00f6nberger Johannes Lutz","year":"2016","unstructured":"Johannes Lutz Sch\u00f6nberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_1_42_1","volume-title":"Charades-ego: A large-scale dataset of paired third and first person videos. arXiv preprint arXiv:1804.09626","author":"Sigurdsson Gunnar A","year":"2018","unstructured":"Gunnar A Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, and Karteek Alahari. 2018. Charades-ego: A large-scale dataset of paired third and first person videos. arXiv preprint arXiv:1804.09626 (2018)."},{"key":"e_1_3_2_1_43_1","volume-title":"Hand Keypoint Detection in Single Images using Multiview Bootstrapping. CVPR","author":"Simon Tomas","year":"2017","unstructured":"Tomas Simon, Hanbyul Joo, and Yaser Sheikh. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping. CVPR (2017)."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01954"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2875441"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00484"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"crossref","unstructured":"Qianqian Wang* Yifei Zhang* Aleksander Holynski Alexei A. Efros and Angjoo Kanazawa. 2025. Continuous 3D Perception Model with Persistent State. In CVPR.","DOI":"10.1109\/CVPR52734.2025.00983"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01558"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01920"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01922"},{"key":"e_1_3_2_1_51_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Yang Zeyu","year":"2024","unstructured":"Zeyu Yang, Hongye Yang, Zijie Pan, and Li Zhang. 2024b. Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting. International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_2_1_52_1","volume-title":"Hyun Soo Park, and Jan Kautz","author":"Yoon Jae Shin","year":"2020","unstructured":"Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, and Jan Kautz. 2020. Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera. (June 2020)."},{"key":"e_1_3_2_1_53_1","volume-title":"Luc Van Gool, and Xi Wang","author":"Zhang Daiwei","year":"2024","unstructured":"Daiwei Zhang, Gengyan Li, Jiajie Li, Micka\u00ebl Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, and Xi Wang. 2024b. Egogaussian: Dynamic scene understanding from egocentric video with 3d gaussian splatting. arXiv preprint arXiv:2406.19811 (2024)."},{"key":"e_1_3_2_1_54_1","volume-title":"MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion. arXiv preprint arxiv:2410.03825","author":"Zhang Junyi","year":"2024","unstructured":"Junyi Zhang, Charles Herrmann, Junhwa Hur, Varun Jampani, Trevor Darrell, Forrester Cole, Deqing Sun, and Ming-Hsuan Yang. 2024a. MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion. arXiv preprint arxiv:2410.03825 (2024)."},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681107"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP51287.2024.10647336"}],"event":{"name":"MM '25: The 33rd ACM International Conference on Multimedia","location":"Dublin Ireland","acronym":"MM '25","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 33rd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746027.3758232","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T05:00:21Z","timestamp":1765342821000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746027.3758232"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,27]]},"references-count":56,"alternative-id":["10.1145\/3746027.3758232","10.1145\/3746027"],"URL":"https:\/\/doi.org\/10.1145\/3746027.3758232","relation":{},"subject":[],"published":{"date-parts":[[2025,10,27]]},"assertion":[{"value":"2025-10-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}