{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T15:59:35Z","timestamp":1778947175821,"version":"3.51.4"},"reference-count":119,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T00:00:00Z","timestamp":1731974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2024,12,19]]},"abstract":"<jats:p>\n            MVImgNet is a large-scale dataset that contains multi-view images of ~220k real-world objects in 238 classes. As a counterpart of ImageNet, it introduces 3D visual signals via multi-view shooting, making a soft bridge between 2D and 3D vision. This paper constructs the MVImgNet2.0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain. In addition to the expanded dataset scale and category range, MVImgNet2.0 is of a higher quality than MVImgNet owing to four new features: (i) most shoots capture 360\u00b0 views of the objects, which can support the learning of object reconstruction with completeness; (ii) the segmentation manner is advanced to produce foreground object masks of higher accuracy; (iii) a more powerful structure-from-motion method is adopted to derive the camera pose for each frame of a lower estimation error; (iv) higher-quality dense point clouds are reconstructed via advanced methods for objects captured in 360\n            <jats:sup>\u00b0<\/jats:sup>\n            views, which can serve for downstream applications. Extensive experiments confirm the value of the proposed MVImgNet2.0 in boosting the performance of large 3D reconstruction models. MVImgNet2.0 will be public at\n            <jats:italic>luyues.github.io\/mvimgnet2<\/jats:italic>\n            , including multi-view images of all 520k objects, the reconstructed high-quality point clouds, and data annotation codes, hoping to inspire the broader vision community.\n          <\/jats:p>","DOI":"10.1145\/3687973","type":"journal-article","created":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T15:46:04Z","timestamp":1732031164000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["MVImgNet2.0: A Larger-scale Dataset of Multi-view Images"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-9725-0606","authenticated-orcid":false,"given":"Yushuang","family":"Wu","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2012-5014","authenticated-orcid":false,"given":"Luyue","family":"Shi","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4962-9217","authenticated-orcid":false,"given":"Haolin","family":"Liu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1992-7089","authenticated-orcid":false,"given":"Hongjie","family":"Liao","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3250-0486","authenticated-orcid":false,"given":"Lingteng","family":"Qiu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"},{"name":"Alibaba, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1362-3747","authenticated-orcid":false,"given":"Weihao","family":"Yuan","sequence":"additional","affiliation":[{"name":"Alibaba, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2623-7973","authenticated-orcid":false,"given":"Xiaodong","family":"Gu","sequence":"additional","affiliation":[{"name":"Alibaba, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6833-9102","authenticated-orcid":false,"given":"Zilong","family":"Dong","sequence":"additional","affiliation":[{"name":"Alibaba, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2608-775X","authenticated-orcid":false,"given":"Shuguang","family":"Cui","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0162-3296","authenticated-orcid":false,"given":"Xiaoguang","family":"Han","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,19]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al.","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2001269.2001293"},{"key":"e_1_2_1_3_1","volume-title":"Denoising diffusion via image-based rendering. arXiv preprint arXiv:2402.03445","author":"Anciukevicius Titas","year":"2024","unstructured":"Titas Anciukevicius, Fabian Manhardt, Federico Tombari, and Paul Henderson. 2024. Denoising diffusion via image-based rendering. arXiv preprint arXiv:2402.03445 (2024)."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 12608--12618","author":"Anciukevi\u010dius Titas","year":"2023","unstructured":"Titas Anciukevi\u010dius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J Mitra, and Paul Guerrero. 2023. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 12608--12618."},{"key":"e_1_2_1_5_1","volume-title":"Self-supervised visual learning from interactions with objects. arXiv preprint arXiv:2407.06704","author":"Aubret Arthur","year":"2024","unstructured":"Arthur Aubret, C\u00e9line Teuli\u00e8re, and Jochen Triesch. 2024. Self-supervised visual learning from interactions with objects. arXiv preprint arXiv:2407.06704 (2024)."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings, Part V 14","author":"Bogo Federica","year":"2016","unstructured":"Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part V 14. Springer, 561--578."},{"key":"e_1_2_1_7_1","volume-title":"Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012","author":"Chang Angel X","year":"2015","unstructured":"Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00630"},{"key":"e_1_2_1_9_1","volume-title":"V3d: Video diffusion models are effective 3d generators. arXiv preprint arXiv:2403.06738","author":"Chen Zilong","year":"2024","unstructured":"Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, and Huaping Liu. 2024b. V3d: Video diffusion models are effective 3d generators. arXiv preprint arXiv:2403.06738 (2024)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00609"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00260"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings, Part VIII 14","author":"Choy Christopher B","year":"2016","unstructured":"Christopher B Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 2016. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part VIII 14. Springer, 628--644."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21126--21136","author":"Collins Jasmine","year":"2022","unstructured":"Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F Yago Vicente, Thomas Dideriksen, Himanshu Arora, et al. 2022. Abo: Dataset and benchmarks for real-world 3d object understanding. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21126--21136."},{"key":"e_1_2_1_14_1","volume-title":"Eli Vander-Bilt, Aniruddha Kembhavi","author":"Deitke Matt","year":"2023","unstructured":"Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli Vander-Bilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. 2023. Objaverse-XL: A Universe of 10M+ 3D Objects. arXiv preprint arXiv:2307.05663 (2023)."},{"key":"e_1_2_1_15_1","volume-title":"Objaverse: A Universe of Annotated 3D Objects. arXiv preprint arXiv:2212.08051","author":"Deitke Matt","year":"2022","unstructured":"Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. 2022. Objaverse: A Universe of Annotated 3D Objects. arXiv preprint arXiv:2212.08051 (2022)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01977"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_18_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA46639.2022.9811809"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.264"},{"key":"e_1_2_1_21_1","volume-title":"dense, and robust multiview stereopsis","author":"Furukawa Yasutaka","year":"2009","unstructured":"Yasutaka Furukawa and Jean Ponce. 2009. Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence 32, 8 (2009), 1362--1376."},{"key":"e_1_2_1_22_1","volume-title":"Cat3d: Create anything in 3d with multi-view diffusion models. arXiv preprint arXiv:2405.10314","author":"Gao Ruiqi","year":"2024","unstructured":"Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T Barron, and Ben Poole. 2024. Cat3d: Create anything in 3d with multi-view diffusion models. arXiv preprint arXiv:2405.10314 (2024)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00988"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings, Part XV 16","author":"Goel Shubham","year":"2020","unstructured":"Shubham Goel, Angjoo Kanazawa, and Jitendra Malik. 2020. Shape and viewpoint without keypoints. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XV 16. Springer, 88--104."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00257"},{"key":"e_1_2_1_26_1","volume-title":"Vfusion3d: Learning scalable 3d generative models from video diffusion models. arXiv preprint arXiv:2403.12034","author":"Han Junlin","year":"2024","unstructured":"Junlin Han, Filippos Kokkinos, and Philip Torr. 2024. Vfusion3d: Learning scalable 3d generative models from video diffusion models. arXiv preprint arXiv:2403.12034 (2024)."},{"key":"e_1_2_1_27_1","volume-title":"Cameractrl: Enabling camera control for text-to-video generation. arXiv preprint arXiv:2404.02101","author":"He Hao","year":"2024","unstructured":"Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, and Ceyuan Yang. 2024. Cameractrl: Enabling camera control for text-to-video generation. arXiv preprint arXiv:2404.02101 (2024)."},{"key":"e_1_2_1_28_1","unstructured":"Zexin He and Tengfei Wang. 2023. OpenLRM: Open-Source Large Reconstruction Models. https:\/\/github.com\/3DTopia\/OpenLRM."},{"key":"e_1_2_1_29_1","volume-title":"Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400","author":"Hong Yicong","year":"2023","unstructured":"Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. 2023. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400 (2023)."},{"key":"e_1_2_1_30_1","volume-title":"2d gaussian splatting for geometrically accurate radiance fields. arXiv preprint arXiv:2403.17888","author":"Huang Binbin","year":"2024","unstructured":"Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2024. 2d gaussian splatting for geometrically accurate radiance fields. arXiv preprint arXiv:2403.17888 (2024)."},{"key":"e_1_2_1_31_1","first-page":"76061","article-title":"Navi: Category-agnostic image collections with high-quality 3d shape and pose annotations","volume":"36","author":"Jampani Varun","year":"2023","unstructured":"Varun Jampani, Kevis-Kokitsi Maninis, Andreas Engelhardt, Arjun Karpur, Karen Truong, Kyle Sargent, Stefan Popov, Andr\u00e9 Araujo, Ricardo Martin Brualla, Kaushal Patel, et al. 2023. Navi: Category-agnostic image collections with high-quality 3d shape and pose annotations. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) 36 (2023), 76061--76084.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01271"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10181--10193","author":"Jang Wonbong","year":"2024","unstructured":"Wonbong Jang and Lourdes Agapito. 2024. NViST: In the Wild New View Synthesis from a Single Image with Transformers. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10181--10193."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the ACM International Conference on Machine Learning (ICML). PMLR, 4904--4916","author":"Jia Chao","year":"2021","unstructured":"Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the ACM International Conference on Machine Learning (ICML). PMLR, 4904--4916."},{"key":"e_1_2_1_35_1","volume-title":"Real3D: Scaling Up Large Reconstruction Models with Real-World Images. arXiv preprint arXiv:2406.08479","author":"Jiang Hanwen","year":"2024","unstructured":"Hanwen Jiang, Qixing Huang, and Georgios Pavlakos. 2024. Real3D: Scaling Up Large Reconstruction Models with Real-World Images. arXiv preprint arXiv:2406.08479 (2024)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_23"},{"key":"e_1_2_1_37_1","unstructured":"Will Kay Joao Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)."},{"key":"e_1_2_1_38_1","unstructured":"Junlong Ke Zichen Wen Yechenhao Yang Chenhang Cui Yazhou Ren Xiaorong Pu and Lifang He. 2024a. Integrating Vision-Language Semantic Graphs in Multi-View Clustering. IJCAI."},{"key":"e_1_2_1_39_1","unstructured":"Lei Ke Mingqiao Ye Martin Danelljan Yu-Wing Tai Chi-Keung Tang Fisher Yu et al. 2024b. Segment anything in high quality. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592433"},{"key":"e_1_2_1_41_1","volume-title":"arXiv:2304.02643","author":"Kirillov Alexander","year":"2023","unstructured":"Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll\u00e1r, and Ross Girshick. 2023. Segment Anything. arXiv:2304.02643 (2023)."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 452--461","author":"Kulkarni Nilesh","year":"2020","unstructured":"Nilesh Kulkarni, Abhinav Gupta, David F Fouhey, and Shubham Tulsiani. 2020. Articulation-aware canonical surface mapping. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 452--461."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-020-01316-z"},{"key":"e_1_2_1_45_1","volume-title":"Duoduo CLIP: Efficient 3D Understanding with Multi-View Images. arXiv preprint arXiv:2406.11579","author":"Lee Han-Hung","year":"2024","unstructured":"Han-Hung Lee, Yiming Zhang, and Angel X Chang. 2024. Duoduo CLIP: Efficient 3D Understanding with Multi-View Images. arXiv preprint arXiv:2406.11579 (2024)."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the ACM International Conference on Machine Learning (ICML). PMLR","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023a. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In Proceedings of the ACM International Conference on Machine Learning (ICML). PMLR, 19730--19742."},{"key":"e_1_2_1_47_1","volume-title":"International conference on machine learning. PMLR, 12888--12900","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning. PMLR, 12888--12900."},{"key":"e_1_2_1_48_1","volume-title":"Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214","author":"Li Jiahao","year":"2023","unstructured":"Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. 2023c. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214 (2023)."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings, Part XIV 16","author":"Li Xueting","year":"2020","unstructured":"Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, and Jan Kautz. 2020. Self-supervised single-view 3d reconstruction via semantic consistency. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XIV 16. Springer, 677--693."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8456--8465","author":"Li Zhaoshuo","year":"2023","unstructured":"Zhaoshuo Li, Thomas M\u00fcller, Alex Evans, Russell H Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. 2023b. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8456--8465."},{"key":"e_1_2_1_51_1","volume-title":"VILA: On Pre-training for Visual Language Models. arXiv:2312.07533 [cs.CV]","author":"Lin Ji","year":"2023","unstructured":"Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, and Song Han. 2023. VILA: On Pre-training for Visual Language Models. arXiv:2312.07533 [cs.CV]"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision. 5987--5997","author":"Lindenberger Philipp","year":"2021","unstructured":"Philipp Lindenberger, Paul-Edouard Sarlin, Viktor Larsson, and Marc Pollefeys. 2021. Pixel-perfect structure-from-motion with featuremetric refinement. In Proceedings of the IEEE\/CVF international conference on computer vision. 5987--5997."},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) 36","author":"Liu Haotian","year":"2024","unstructured":"Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2024a. Visual instruction tuning. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) 36 (2024)."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00853"},{"key":"e_1_2_1_56_1","unstructured":"Shilong Liu Zhaoyang Zeng Tianhe Ren Feng Li Hao Zhang Jie Yang Chunyuan Li Jianwei Yang Hang Su Jun Zhu et al. 2023b. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023)."},{"key":"e_1_2_1_57_1","volume-title":"Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models. arXiv:2402.17177 [cs.CV]","author":"Liu Yixin","year":"2024","unstructured":"Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, and Lichao Sun. 2024b. Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models. arXiv:2402.17177 [cs.CV]"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00459"},{"key":"e_1_2_1_59_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 2630--2640","author":"Miech Antoine","year":"2019","unstructured":"Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, and Josef Sivic. 2019. Howto100m: Learning a text-video embedding by watching hundred million narrated video clips. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 2630--2640."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_24"},{"key":"e_1_2_1_61_1","volume-title":"WordNet: A Lexical Database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8--11","author":"Miller George A.","year":"1994","unstructured":"George A. Miller. 1994. WordNet: A Lexical Database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8--11, 1994."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00040"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00394"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530127"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00356"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00778"},{"key":"e_1_2_1_67_1","first-page":"67021","article-title":"Autodecoding latent 3d diffusion models","volume":"36","author":"Ntavelis Evangelos","year":"2023","unstructured":"Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc V Gool, and Sergey Tulyakov. 2023. Autodecoding latent 3d diffusion models. Advances in Neural Information Processing Systems 36 (2023), 67021--67047.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00025"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-007-0086-4"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000025798.50602.3a"},{"key":"e_1_2_1_71_1","volume-title":"The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675","author":"Pont-Tuset Jordi","year":"2017","unstructured":"Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbel\u00e1ez, Alex Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017)."},{"key":"e_1_2_1_72_1","volume-title":"Proceedings of the ACM International Conference on Machine Learning (ICML). PMLR, 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the ACM International Conference on Machine Learning (ICML). PMLR, 8748--8763."},{"key":"e_1_2_1_73_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV).","author":"Reizenstein Jeremy","year":"2021","unstructured":"Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. 2021. Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)."},{"key":"e_1_2_1_74_1","unstructured":"Tianhe Ren Shilong Liu Ailing Zeng Jing Lin Kunchang Li He Cao Jiayu Chen Xinyu Huang Yukang Chen Feng Yan Zhaoyang Zeng Hao Zhang Feng Li Jie Yang Hongyang Li Qing Jiang and Lei Zhang. 2024. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks. arXiv:2401.14159 [cs.CV]"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.455"},{"key":"e_1_2_1_77_1","volume-title":"Burcu Karagol Ayan, Tim Salimans, et al.","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems 35 (2022), 36479--36494."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.445"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.445"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46487-9_31"},{"key":"e_1_2_1_81_1","first-page":"25278","article-title":"Laion-5b: An open large-scale dataset for training next generation image-text models","volume":"35","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) 35 (2022), 25278--25294.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)"},{"key":"e_1_2_1_82_1","unstructured":"Nikita Selin. 2019--2024. CarveKit: Image Background Remove Tool. https:\/\/github.com\/OPHoperHPO\/image-background-remove-tool."},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1238"},{"key":"e_1_2_1_84_1","volume-title":"Anything-3d: Towards singleview anything reconstruction in the wild. arXiv preprint arXiv:2304.10261","author":"Shen Qiuhong","year":"2023","unstructured":"Qiuhong Shen, Xingyi Yang, and Xinchao Wang. 2023. Anything-3d: Towards singleview anything reconstruction in the wild. arXiv preprint arXiv:2304.10261 (2023)."},{"key":"e_1_2_1_85_1","volume-title":"SuperGaussian: Repurposing Video Models for 3D Super Resolution. arXiv preprint arXiv:2406.00609","author":"Shen Yuan","year":"2024","unstructured":"Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J Mitra, Shenlong Wang, and Anna Fr\u00fcst\u00fcck. 2024. SuperGaussian: Repurposing Video Models for 3D Super Resolution. arXiv preprint arXiv:2406.00609 (2024)."},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01369"},{"key":"e_1_2_1_87_1","volume-title":"Hierarchical image saliency detection on extended CSSD","author":"Shi Jianping","year":"2015","unstructured":"Jianping Shi, Qiong Yan, Li Xu, and Jiaya Jia. 2015. Hierarchical image saliency detection on extended CSSD. IEEE transactions on pattern analysis and machine intelligence 38, 4 (2015), 717--729."},{"key":"e_1_2_1_88_1","doi-asserted-by":"crossref","unstructured":"Noah Snavely Steven M Seitz and Richard Szeliski. 2006. Photo tourism: exploring photo collections in 3D. In ACM siggraph 2006 papers. 835--846.","DOI":"10.1145\/1141911.1141964"},{"key":"e_1_2_1_89_1","volume-title":"LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. arXiv preprint arXiv:2402.05054","author":"Tang Jiaxiang","year":"2024","unstructured":"Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. 2024. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. arXiv preprint arXiv:2402.05054 (2024)."},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02086"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1145\/2812802"},{"key":"e_1_2_1_92_1","volume-title":"Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al.","author":"Thoppilan Romal","year":"2022","unstructured":"Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al. 2022. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)."},{"key":"e_1_2_1_93_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)."},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.30"},{"key":"e_1_2_1_95_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 1588--1597","author":"Uy Mikaela Angelina","year":"2019","unstructured":"Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, and Sai-Kit Yeung. 2019. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 1588--1597."},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01252-6_4"},{"key":"e_1_2_1_97_1","volume-title":"Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024","author":"Wang Peng","year":"2023","unstructured":"Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, and Kai Zhang. 2023. Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024 (2023)."},{"key":"e_1_2_1_98_1","unstructured":"Rundi Wu Ben Mildenhall Philipp Henzler Keunhong Park Ruiqi Gao Daniel Watson Pratul P Srinivasan Dor Verbin Jonathan T Barron Ben Poole et al. 2023a. Reconfusion: 3d reconstruction with diffusion priors. arXiv preprint arXiv:2312.02981 (2023)."},{"key":"e_1_2_1_99_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 21551--21561","author":"Wu Rundi","year":"2024","unstructured":"Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. 2024. Reconfusion: 3d reconstruction with diffusion priors. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 21551--21561."},{"key":"e_1_2_1_100_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00091"},{"key":"e_1_2_1_101_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 803--814","author":"Wu Tong","year":"2023","unstructured":"Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, et al. 2023b. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 803--814."},{"key":"e_1_2_1_102_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1912--1920","author":"Wu Zhirong","year":"2015","unstructured":"Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1912--1920."},{"key":"e_1_2_1_103_1","volume-title":"Yandong Li, Yanwu Xu, Kun Zhang, and Tingbo Hou.","author":"Xie Shaoan","year":"2023","unstructured":"Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin CK Chan, Yandong Li, Yanwu Xu, Kun Zhang, and Tingbo Hou. 2023. Dreaminpainter: Text-guided subject-driven image inpainting with diffusion models. arXiv preprint arXiv:2312.03771 (2023)."},{"key":"e_1_2_1_104_1","volume-title":"Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. Advances in neural information processing systems 32","author":"Xu Qiangeng","year":"2019","unstructured":"Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomir Mech, and Ulrich Neumann. 2019. Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_2_1_105_1","volume-title":"Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation. arXiv preprint arXiv:2403.14621","author":"Xu Yinghao","year":"2024","unstructured":"Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wetzstein. 2024. Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation. arXiv preprint arXiv:2403.14621 (2024)."},{"key":"e_1_2_1_106_1","unstructured":"Yinghao Xu Hao Tan Fujun Luan Sai Bi Peng Wang Jiahao Li Zifan Shi Kalyan Sunkavalli Gordon Wetzstein Zexiang Xu et al. 2023. Dmv3d: Denoising multiview diffusion using 3d large reconstruction model. arXiv preprint arXiv:2311.09217 (2023)."},{"key":"e_1_2_1_107_1","volume-title":"Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision. Advances in neural information processing systems 29","author":"Yan Xinchen","year":"2016","unstructured":"Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, and Honglak Lee. 2016. Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision. Advances in neural information processing systems 29 (2016)."},{"key":"e_1_2_1_108_1","first-page":"36324","article-title":"Decoupling features in hierarchical propagation for video object segmentation","volume":"35","author":"Yang Zongxin","year":"2022","unstructured":"Zongxin Yang and Yi Yang. 2022. Decoupling features in hierarchical propagation for video object segmentation. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) 35 (2022), 36324--36336.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)"},{"key":"e_1_2_1_109_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01237-3_47"},{"key":"e_1_2_1_110_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00567"},{"key":"e_1_2_1_111_1","volume-title":"Instant-angelo: Build high-fidelity Digital Twin within 20 Minutes. https:\/\/github.com\/hugoycj\/Instant-angelo.","author":"Ye Chongjie","year":"2023","unstructured":"Chongjie Ye. 2023. Instant-angelo: Build high-fidelity Digital Twin within 20 Minutes. https:\/\/github.com\/hugoycj\/Instant-angelo."},{"key":"e_1_2_1_112_1","volume-title":"GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond. arXiv preprint arXiv:2403.19632","author":"Ye Chongjie","year":"2024","unstructured":"Chongjie Ye, Yinyu Nie, Jiahao Chang, Yuantao Chen, Yihao Zhi, and Xiaoguang Han. 2024. GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond. arXiv preprint arXiv:2403.19632 (2024)."},{"key":"e_1_2_1_113_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00455"},{"key":"e_1_2_1_114_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Yu Xianggang","year":"2023","unstructured":"Xianggang Yu, Mutian Xu, Yidan Zhang, Haolin Liu, Chongjie Ye, Yushuang Wu, Zizheng Yan, Tianyou Liang, Guanying Chen, Shuguang Cui, and Xiaoguang Han. 2023. MVImgNet: A Large-scale Dataset of Multi-view Images. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_1_115_1","volume-title":"GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting. arXiv preprint arXiv:2404.19702","author":"Zhang Kai","year":"2024","unstructured":"Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. 2024. GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting. arXiv preprint arXiv:2404.19702 (2024)."},{"key":"e_1_2_1_116_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_2_1_117_1","volume-title":"CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering. arXiv preprint arXiv:2311.15510","author":"Zhu Haidong","year":"2023","unstructured":"Haidong Zhu, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Ram Nevatia, and Luming Liang. 2023. CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering. arXiv preprint arXiv:2311.15510 (2023)."},{"key":"e_1_2_1_118_1","volume-title":"Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. arXiv preprint arXiv:2312.09147","author":"Zou Zi-Xin","year":"2023","unstructured":"Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. 2023. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. arXiv preprint arXiv:2312.09147 (2023)."},{"key":"e_1_2_1_119_1","volume-title":"Videomv: Consistent multi-view generation based on large video generative model. arXiv preprint arXiv:2403.12010","author":"Zuo Qi","year":"2024","unstructured":"Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, et al. 2024. Videomv: Consistent multi-view generation based on large video generative model. arXiv preprint arXiv:2403.12010 (2024)."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687973","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687973","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:58Z","timestamp":1750295398000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687973"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,19]]},"references-count":119,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,19]]}},"alternative-id":["10.1145\/3687973"],"URL":"https:\/\/doi.org\/10.1145\/3687973","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,19]]},"assertion":[{"value":"2024-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}