{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T07:55:24Z","timestamp":1776930924966,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":100,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,4,13]]},"DOI":"10.1145\/3772318.3790590","type":"proceedings-article","created":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T06:44:11Z","timestamp":1776062651000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["RealTwin: Concept Graph Representation and Grounding Framework for Reality-Preserving Digital Twin Reconstruction"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8825-0191","authenticated-orcid":false,"given":"Zisu","family":"Li","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology, Hong Kong SAR, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-3833-7878","authenticated-orcid":false,"given":"Ruohao","family":"Li","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6593-2958","authenticated-orcid":false,"given":"Jiawei","family":"Li","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9912-4729","authenticated-orcid":false,"given":"Chao","family":"Liu","sequence":"additional","affiliation":[{"name":"Mechanical Engineering, The University of British Columbia, Vancouver, British Columbia, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6166-6138","authenticated-orcid":false,"given":"Junyi","family":"Zhu","sequence":"additional","affiliation":[{"name":"EECS, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5473-3566","authenticated-orcid":false,"given":"Daniela","family":"Rus","sequence":"additional","affiliation":[{"name":"Distributed Robotics Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0579-2716","authenticated-orcid":false,"given":"Chen","family":"Liang","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0356-4712","authenticated-orcid":false,"given":"Mingming","family":"Fan","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China and The Hong Kong University of Science and Technology, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,13]]},"reference":[{"key":"e_1_3_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445711"},{"key":"e_1_3_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Moayad Aloqaily Ouns Bouachir Fakhri Karray Ismaeel Al\u00a0Ridhawi and Abdulmotaleb El\u00a0Saddik. 2022. Integrating digital twin and advanced intelligent technologies to realize the metaverse. IEEE Consumer Electronics Magazine 12 6 (2022) 47\u201355.","DOI":"10.1109\/MCE.2022.3212570"},{"key":"e_1_3_3_2_4_2","unstructured":"Alisson Azzolini Hannah Brandon Prithvijit Chattopadhyay Huayu Chen Jinju Chu Yin Cui Jenna Diamond Yifan Ding Francesco Ferroni Rama Govindaraju et\u00a0al. 2025. Cosmos-reason1: From physical common sense to embodied reasoning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.15558 (2025)."},{"key":"e_1_3_3_2_5_2","doi-asserted-by":"publisher","unstructured":"Ayush Bhardwaj Ashish Pratap Edilberto\u00a0F. Carrizales Dongbeom Ko Sungjoo Kang and Jin\u00a0Ryong Kim. 2025. MetaTwin: A Collaborative XR Platform for Seamless Physical-Virtual Synchronization. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 9 3 Article 70 (Sept. 2025) 32\u00a0pages. 10.1145\/3749533","DOI":"10.1145\/3749533"},{"key":"e_1_3_3_2_6_2","doi-asserted-by":"crossref","unstructured":"Frank Biocca. 1997. The cyborg\u2019s dilemma: Progressive embodiment in virtual environments. Journal of computer-mediated communication 3 2 (1997) JCMC324.","DOI":"10.1111\/j.1083-6101.1997.tb00070.x"},{"key":"e_1_3_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01514"},{"key":"e_1_3_3_2_8_2","first-page":"287","volume-title":"Conference on robot learning","author":"Brohan Anthony","year":"2023","unstructured":"Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, et\u00a0al. 2023. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on robot learning. PMLR, 287\u2013318."},{"key":"e_1_3_3_2_9_2","unstructured":"Zhejia Cai Puhua Jiang Shiwei Mao Hongkun Cao and Ruqi Huang. 2025. Improving Multi-View Reconstruction via Texture-Guided Gaussian-Mesh Joint Optimization. arxiv:https:\/\/arXiv.org\/abs\/2511.03950\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2511.03950"},{"key":"e_1_3_3_2_10_2","unstructured":"Ziang Cao Zhaoxi Chen Liang Pan and Ziwei Liu. 2025. PhysX-3D: Physical-Grounded 3D Asset Generation. arxiv:https:\/\/arXiv.org\/abs\/2507.12465\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2507.12465"},{"key":"e_1_3_3_2_11_2","unstructured":"Ziang Cao Fangzhou Hong Zhaoxi Chen Liang Pan and Ziwei Liu. 2025. PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image. arxiv:https:\/\/arXiv.org\/abs\/2511.13648\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2511.13648"},{"key":"e_1_3_3_2_12_2","unstructured":"Stuartk Card THOMASP MORAN and Allen Newell. 1986. The model human processor- An engineering model of human performance. Handbook of perception and human performance. 2 45\u20131 (1986) 1\u201335."},{"key":"e_1_3_3_2_13_2","doi-asserted-by":"crossref","unstructured":"Georgia Chalvatzaki Ali Younes Daljeet Nandha An\u00a0Thai Le Leonardo\u00a0FR Ribeiro and Iryna Gurevych. 2023. Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning. Frontiers in Robotics and AI 10 (2023) 1221739.","DOI":"10.3389\/frobt.2023.1221739"},{"key":"e_1_3_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706599.3720269"},{"key":"e_1_3_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10161534"},{"key":"e_1_3_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01370"},{"key":"e_1_3_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01701"},{"key":"e_1_3_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.52202\/068431-2242"},{"key":"e_1_3_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02029"},{"key":"e_1_3_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00874"},{"key":"e_1_3_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1056808.1056894"},{"key":"e_1_3_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV61041.2025.00550"},{"key":"e_1_3_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642579"},{"key":"e_1_3_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3654777.3676379"},{"key":"e_1_3_3_2_25_2","unstructured":"Xianzhe Dong Tongxuan Liu Yuting Zeng Liangyu Liu Yang Liu Siyu Wu Yu Wu Hailong Yang Ke Zhang and Jing Li. 2025. HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving. arxiv:https:\/\/arXiv.org\/abs\/2505.12658\u00a0[cs.DC] https:\/\/arxiv.org\/abs\/2505.12658"},{"key":"e_1_3_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613905.3650752"},{"key":"e_1_3_3_2_27_2","unstructured":"Abdelrahman Elskhawy Mengze Li Nassir Navab and Benjamin Busam. 2025. PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2504.00844 (2025)."},{"key":"e_1_3_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3089269.3089281"},{"key":"e_1_3_3_2_29_2","doi-asserted-by":"publisher","unstructured":"Shaojing Fan Tian-Tsong Ng Bryan\u00a0Lee Koenig Jonathan\u00a0Samuel Herberg Ming Jiang Zhiqi Shen and Qi Zhao. 2018. Image Visual Realism: From Human Perception to Machine Computation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 9 (2018) 2180\u20132193. 10.1109\/TPAMI.2017.2747150","DOI":"10.1109\/TPAMI.2017.2747150"},{"key":"e_1_3_3_2_30_2","doi-asserted-by":"publisher","unstructured":"Shaojing Fan Rangding Wang Tian-Tsong Ng Cheston Y.-C. Tan Jonathan\u00a0S. Herberg and Bryan\u00a0L. Koenig. 2014. Human Perception of Visual Realism for Photo and Computer-Generated Face Images. ACM Trans. Appl. Percept. 11 2 Article 7 (July 2014) 21\u00a0pages. 10.1145\/2620030","DOI":"10.1145\/2620030"},{"key":"e_1_3_3_2_31_2","doi-asserted-by":"publisher","unstructured":"Cathy\u00a0Mengying Fang Patrick Chwalek Quincy Kuang and Pattie Maes. 2024. WatchThis: A Wearable Point-and-Ask Interface powered by Vision-Language Models for Contextual Queries(UIST Adjunct \u201924). Association for Computing Machinery New York NY USA Article 54 4\u00a0pages. 10.1145\/3672539.3686776","DOI":"10.1145\/3672539.3686776"},{"key":"e_1_3_3_2_32_2","first-page":"14061","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Fang Shuangkang","year":"2025","unstructured":"Shuangkang Fang, I Shen, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Shuchang Zhou, Wenrui Ding, Takeo Igarashi, Ming-Hsuan Yang, et\u00a0al. 2025. MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 14061\u201314072."},{"key":"e_1_3_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472306.3478338"},{"key":"e_1_3_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/642611.642703"},{"key":"e_1_3_3_2_35_2","unstructured":"James\u00a0J Gibson. 1979. The ecological approach to visual perception. (1979)."},{"key":"e_1_3_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.4324\/9781315816852-11"},{"key":"e_1_3_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610243"},{"key":"e_1_3_3_2_38_2","doi-asserted-by":"crossref","unstructured":"Junfu Guo Yu Xin Gaoyi Liu Kai Xu Ligang Liu and Ruizhen Hu. 2025. ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.08135 (2025).","DOI":"10.1109\/CVPR52734.2025.02528"},{"key":"e_1_3_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3205326.3205345"},{"key":"e_1_3_3_2_40_2","unstructured":"Haoyu Han Yaochen Xie Hui Liu Xianfeng Tang Sreyashi Nag William Headden Hui Liu Yang Li Chen Luo Shuiwang Ji Qi He and Jiliang Tang. 2025. Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning. arxiv:https:\/\/arXiv.org\/abs\/2501.07845\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2501.07845"},{"key":"e_1_3_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02099"},{"key":"e_1_3_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Wenlong Huang Fei Xia Dhruv Shah Danny Driess Andy Zeng Yao Lu Pete Florence Igor Mordatch Sergey Levine Karol Hausman et\u00a0al. 2023. Grounded decoding: Guiding text generation with grounded models for embodied agents. Advances in Neural Information Processing Systems 36 (2023) 59636\u201359661.","DOI":"10.52202\/075280-2606"},{"key":"e_1_3_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3654777.3676377"},{"key":"e_1_3_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/1357054.1357089"},{"key":"e_1_3_3_2_45_2","doi-asserted-by":"crossref","unstructured":"Senthil\u00a0Kumar Jagatheesaperumal Zhaohui Yang Qianqian Yang Chongwen Huang Wei Xu Mohammad Shikh-Bahaei and Zhaoyang Zhang. 2023. Semantic-aware digital twin for metaverse: A comprehensive review. IEEE Wireless Communications 30 4 (2023) 38\u201346.","DOI":"10.1109\/MWC.003.2200616"},{"key":"e_1_3_3_2_46_2","doi-asserted-by":"crossref","unstructured":"Krishna\u00a0Murthy Jatavallabhula Alihusein Kuwajerwala Qiao Gu Mohd Omama Tao Chen Alaa Maalouf Shuang Li Ganesh Iyer Soroush Saryazdi Nikhil Keetha et\u00a0al. 2023. Conceptfusion: Open-set multimodal 3d mapping. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2302.07241 (2023).","DOI":"10.15607\/RSS.2023.XIX.066"},{"key":"e_1_3_3_2_47_2","unstructured":"Hanxiao Jiang Hao-Yu Hsu Kaifeng Zhang Hsin-Ni Yu Shenlong Wang and Yunzhu Li. 2025. PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos. ICCV (2025)."},{"key":"e_1_3_3_2_48_2","doi-asserted-by":"publisher","unstructured":"Haiyan Jiang Dongdong Weng Xiaonuo Dongye Le Luo and Zhenliang Zhang. 2023. Commonsense Knowledge-Driven Joint Reasoning Approach for Object Retrieval in Virtual Reality. ACM Trans. Graph. 42 6 Article 198 (Dec. 2023) 18\u00a0pages. 10.1145\/3618320","DOI":"10.1145\/3618320"},{"key":"e_1_3_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/VRW58643.2023.00240"},{"key":"e_1_3_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657448"},{"key":"e_1_3_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581448"},{"key":"e_1_3_3_2_52_2","doi-asserted-by":"crossref","unstructured":"Philip\u00a0N Johnson-Laird. 2001. Mental models and deduction. Trends in cognitive sciences 5 10 (2001) 434\u2013442.","DOI":"10.1016\/S1364-6613(00)01751-4"},{"key":"e_1_3_3_2_53_2","volume-title":"The Eleventh International Conference on Learning Representations","author":"Kuo Weicheng","year":"2023","unstructured":"Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, and Anelia Angelova. 2023. Open-Vocabulary Object Detection upon Frozen Vision and Language Models. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=MIMwy4kh9lf"},{"key":"e_1_3_3_2_54_2","unstructured":"Yushi Lan Yihang Luo Fangzhou Hong Shangchen Zhou Honghua Chen Zhaoyang Lyu Shuai Yang Bo Dai Chen\u00a0Change Loy and Xingang Pan. 2025. STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer. arxiv:https:\/\/arXiv.org\/abs\/2508.10893\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2508.10893"},{"key":"e_1_3_3_2_55_2","unstructured":"Long Le Jason Xie William Liang Hung-Ju Wang Yue Yang Yecheng\u00a0Jason Ma Kyle Vedder Arjun Krishna Dinesh Jayaraman and Eric Eaton. 2025. Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model. arxiv:https:\/\/arXiv.org\/abs\/2410.13882\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2410.13882"},{"key":"e_1_3_3_2_56_2","unstructured":"Guanghao Li Kerui Ren Linning Xu Zhewen Zheng Changjian Jiang Xin Gao Bo Dai Jian Pu Mulin Yu and Jiangmiao Pang. 2025. ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation. arxiv:https:\/\/arXiv.org\/abs\/2510.08551\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2510.08551"},{"key":"e_1_3_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687607"},{"key":"e_1_3_3_2_58_2","unstructured":"Yuqi Li Chuanguang Yang Junhao Dong Zhengtao Yao Haoyan Xu Zeyu Dong Hansheng Zeng Zhulin An and Yingli Tian. 2025. AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models. arxiv:https:\/\/arXiv.org\/abs\/2509.00039\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2509.00039"},{"key":"e_1_3_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01350"},{"key":"e_1_3_3_2_60_2","unstructured":"Zhe Li Xiang Bai Jieyu Zhang Zhuangzhe Wu Che Xu Ying Li Chengkai Hou and Shanghang Zhang. 2025. URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model. arxiv:https:\/\/arXiv.org\/abs\/2511.00940\u00a0[cs.RO] https:\/\/arxiv.org\/abs\/2511.00940"},{"key":"e_1_3_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3713882"},{"key":"e_1_3_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00039"},{"key":"e_1_3_3_2_63_2","first-page":"184","volume-title":"European Conference on Computer Vision","author":"Liu Yufei","year":"2024","unstructured":"Yufei Liu, Junwei Zhu, Junshu Tang, Shijie Zhang, Jiangning Zhang, Weijian Cao, Chengjie Wang, Yunsheng Wu, and Dongjin Huang. 2024. Texdreamer: Towards zero-shot high-fidelity 3d human texture generation. In European Conference on Computer Vision. Springer, 184\u2013202."},{"key":"e_1_3_3_2_64_2","doi-asserted-by":"crossref","unstructured":"Chuofan Ma Yi Jiang Xin Wen Zehuan Yuan and Xiaojuan Qi. 2023. Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection. Advances in neural information processing systems 36 (2023) 71078\u201371094.","DOI":"10.52202\/075280-3113"},{"key":"e_1_3_3_2_65_2","doi-asserted-by":"crossref","unstructured":"Marius Matulis and Carlo Harvey. 2021. A robot arm digital twin utilising reinforcement learning. Computers & Graphics 95 (2021) 106\u2013114.","DOI":"10.1016\/j.cag.2021.01.011"},{"key":"e_1_3_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3623263.3623357"},{"key":"e_1_3_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01313"},{"key":"e_1_3_3_2_68_2","doi-asserted-by":"crossref","unstructured":"Masahiro Mori Karl\u00a0F MacDorman and Norri Kageki. 2012. The uncanny valley [from the field]. IEEE Robotics & automation magazine 19 2 (2012) 98\u2013100.","DOI":"10.1109\/MRA.2012.2192811"},{"key":"e_1_3_3_2_69_2","volume-title":"The design of everyday things: Revised and expanded edition","author":"Norman Don","year":"2013","unstructured":"Don Norman. 2013. The design of everyday things: Revised and expanded edition. Basic books."},{"key":"e_1_3_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/2807442.2807497"},{"key":"e_1_3_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376562"},{"key":"e_1_3_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544549.3585835"},{"key":"e_1_3_3_2_73_2","volume-title":"Vision science: Photons to phenomenology","author":"Palmer Stephen\u00a0E","year":"1999","unstructured":"Stephen\u00a0E Palmer. 1999. Vision science: Photons to phenomenology. MIT press."},{"key":"e_1_3_3_2_74_2","unstructured":"Weikun Peng Jun Lv Cewu Lu and Manolis Savva. 2025. Generalizable Articulated Object Reconstruction from Casually Captured RGBD Videos. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2506.08334 (2025)."},{"key":"e_1_3_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW63382.2024.00754"},{"key":"e_1_3_3_2_76_2","doi-asserted-by":"crossref","unstructured":"Luca Randazzo Inaki Iturrate Serafeim Perdikis and J\u00a0d\u00a0R Mill\u00e1n. 2017. mano: A wearable hand exoskeleton for activities of daily living and neurorehabilitation. IEEE Robotics and Automation Letters 3 1 (2017) 500\u2013507.","DOI":"10.1109\/LRA.2017.2771329"},{"key":"e_1_3_3_2_77_2","unstructured":"Tianhe Ren Shilong Liu Ailing Zeng Jing Lin Kunchang Li He Cao Jiayu Chen Xinyu Huang Yukang Chen Feng Yan Zhaoyang Zeng Hao Zhang Feng Li Jie Yang Hongyang Li Qing Jiang and Lei Zhang. 2024. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks. arxiv:https:\/\/arXiv.org\/abs\/2401.14159\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2401.14159"},{"key":"e_1_3_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01661"},{"key":"e_1_3_3_2_79_2","doi-asserted-by":"crossref","unstructured":"Daniel\u00a0M Shafer Corey\u00a0P Carbonara and Michael\u00a0F Korpi. 2019. Factors affecting enjoyment of virtual reality games: a comparison involving consumer-grade virtual reality technology. Games for health journal 8 1 (2019) 15\u201323.","DOI":"10.1089\/g4h.2017.0190"},{"key":"e_1_3_3_2_80_2","doi-asserted-by":"crossref","unstructured":"Thomas\u00a0B Sheridan et\u00a0al. 1992. Musings on telepresence and virtual presence. Presence Teleoperators Virtual Environ. 1 1 (1992) 120\u2013125.","DOI":"10.1162\/pres.1992.1.1.120"},{"key":"e_1_3_3_2_81_2","doi-asserted-by":"crossref","unstructured":"Mel Slater Sylvia Wilbur et\u00a0al. 1997. A framework for immersive virtual environments (FIVE): Speculations on the role of presence in virtual environments. Presence: Teleoperators and virtual environments 6 6 (1997) 603\u2013616.","DOI":"10.1162\/pres.1997.6.6.603"},{"key":"e_1_3_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642552"},{"key":"e_1_3_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1145\/3332165.3347872"},{"key":"e_1_3_3_2_84_2","unstructured":"Zhengyi Wang Jonathan Lorraine Yikai Wang Hang Su Jun Zhu Sanja Fidler and Xiaohui Zeng. 2024. LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models. arxiv:https:\/\/arXiv.org\/abs\/2411.09595\u00a0[cs.LG] https:\/\/arxiv.org\/abs\/2411.09595"},{"key":"e_1_3_3_2_85_2","doi-asserted-by":"crossref","unstructured":"Jing Wei Sungdong Kim Hyunhoon Jung and Young-Ho Kim. 2024. Leveraging large language models to power chatbots for collecting user self-reported data. Proceedings of the ACM on Human-Computer Interaction 8 CSCW1 (2024) 1\u201335.","DOI":"10.1145\/3637364"},{"key":"e_1_3_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00084"},{"key":"e_1_3_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02000"},{"key":"e_1_3_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00420"},{"key":"e_1_3_3_2_89_2","volume-title":"CoRL","author":"Xu Runsen","year":"2024","unstructured":"Runsen Xu, Zhiwei Huang, Tai Wang, Yilun Chen, Jiangmiao Pang, and Dahua Lin. 2024. VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding. In CoRL."},{"key":"e_1_3_3_2_90_2","doi-asserted-by":"crossref","unstructured":"Ran Xu Yan Shen Xiaoqi Li Ruihai Wu and Hao Dong. 2024. Naturalvlm: Leveraging fine-grained natural language for affordance-guided visual manipulation. IEEE Robotics and Automation Letters (2024).","DOI":"10.1109\/LRA.2024.3477095"},{"key":"e_1_3_3_2_91_2","doi-asserted-by":"crossref","unstructured":"Zhan Xu Yang Zhou Evangelos Kalogerakis Chris Landreth and Karan Singh. 2020. RigNet: Neural Rigging for Articulated Characters. ACM Trans. on Graphics 39 (2020).","DOI":"10.1145\/3386569.3392379"},{"key":"e_1_3_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.5954\/ICAROB.2024.OS15-4"},{"key":"e_1_3_3_2_93_2","first-page":"162","volume-title":"European Conference on Computer Vision","author":"Ye Mingqiao","year":"2024","unstructured":"Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. 2024. Gaussian grouping: Segment and edit anything in 3d scenes. In European Conference on Computer Vision. Springer, 162\u2013179."},{"key":"e_1_3_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1145\/1240624.1240626"},{"key":"e_1_3_3_2_95_2","doi-asserted-by":"crossref","unstructured":"Andre Zenner and Antonio Kr\u00fcger. 2017. Shifty: A weight-shifting dynamic passive haptic proxy to enhance object perception in virtual reality. IEEE transactions on visualization and computer graphics 23 4 (2017) 1285\u20131294.","DOI":"10.1109\/TVCG.2017.2656978"},{"key":"e_1_3_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.findings-naacl.306"},{"key":"e_1_3_3_2_97_2","first-page":"388","volume-title":"European Conference on Computer Vision","author":"Zhang Tianyuan","year":"2024","unstructured":"Tianyuan Zhang, Hong-Xing Yu, Rundi Wu, Brandon\u00a0Y Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, and William\u00a0T Freeman. 2024. Physdreamer: Physics-based interaction with 3d objects via video generation. In European Conference on Computer Vision. Springer, 388\u2013406."},{"key":"e_1_3_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3714169"},{"key":"e_1_3_3_2_99_2","doi-asserted-by":"crossref","unstructured":"Yuqing Zhang Yuan Liu Zhiyu Xie Lei Yang Zhongyuan Liu Mengzhou Yang Runze Zhang Qilong Kou Cheng Lin Wenping Wang et\u00a0al. 2024. Dreammat: High-quality pbr material generation with geometry-and light-aware diffusion models. ACM Transactions on Graphics (TOG) 43 4 (2024) 1\u201318.","DOI":"10.1145\/3658170"},{"key":"e_1_3_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i7.28597"},{"key":"e_1_3_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.1145\/3526113.3545668"}],"event":{"name":"CHI 2026: CHI Conference on Human Factors in Computing Systems","location":"Barcelona Spain","acronym":"CHI '26","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3772318.3790590","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T06:45:20Z","timestamp":1776062720000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3772318.3790590"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,13]]},"references-count":100,"alternative-id":["10.1145\/3772318.3790590","10.1145\/3772318"],"URL":"https:\/\/doi.org\/10.1145\/3772318.3790590","relation":{},"subject":[],"published":{"date-parts":[[2026,4,13]]},"assertion":[{"value":"2026-04-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}