{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T02:25:47Z","timestamp":1777775147546,"version":"3.51.4"},"reference-count":60,"publisher":"American Association for the Advancement of Science (AAAS)","issue":"113","funder":[{"DOI":"10.13039\/100022832","name":"Hong Kong Centre for Logistics Robotics","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100022832","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62322318"],"award-info":[{"award-number":["62322318"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002920","name":"Research Grants Council, University Grants Committee, Hong Kong","doi-asserted-by":"publisher","award":["14208424"],"award-info":[{"award-number":["14208424"]}],"id":[{"id":"10.13039\/501100002920","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"Joint Funds of the National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U24A20128"],"award-info":[{"award-number":["U24A20128"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"Joint Funds of the National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U25A6013"],"award-info":[{"award-number":["U25A6013"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["www.science.org"],"crossmark-restriction":true},"short-container-title":["Sci. Robot."],"published-print":{"date-parts":[[2026,4,29]]},"abstract":"<jats:p>Connecting the semantic reasoning of vision-language models (VLMs) to the precise geometric demands of robotic manipulation remains a fundamental challenge. Although VLMs can interpret high-level commands, they lack the intrinsic spatial intelligence required for tasks demanding precise object placement, orientation, and physical reasoning. Here, we introduce Retrieval-Augmented Manipulation (RAM), an object-centric framework that endows general-purpose vision foundation models with the spatial reasoning necessary for robust manipulation. RAM bridges the semantic-to-geometric gap by grounding abstract concepts into an explicit, object-centric three-dimensional (3D) representation. This grounded information is then provided as augmented context to the VLM, empowering it to decompose complex instructions into a sequence of spatially precise and physically plausible subgoals. We demonstrate that RAM, in a zero-shot setting on a real-world robot, can execute these subgoals to fulfill complex spatial language instructions, complete spatially aware manipulation under the guidance of a single 2D image, and adaptively replan tasks by reasoning about physical constraints like object size and collisions. Quantitative evaluations on the Common Object in 3D (CO3D) dataset also validated that RAM\u2019s core vision module generalizes to previously unseen object categories and is robust to variations in shape and occlusions. By providing a structured bridge between semantic intent and geometric execution, RAM represents a critical step toward developing more physically intelligent and general-purpose robotic systems.<\/jats:p>","DOI":"10.1126\/scirobotics.aea2092","type":"journal-article","created":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T18:01:01Z","timestamp":1777485661000},"update-policy":"https:\/\/doi.org\/10.34133\/aaas_crossmark","source":"Crossref","is-referenced-by-count":0,"title":["A retrieval-augmented framework enabling VLM spatial awareness for object-centric robot manipulation"],"prefix":"10.1126","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-1044-2456","authenticated-orcid":true,"given":"Kai","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chengkun","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chang","family":"Tu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiahui","family":"Pan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yiyao","family":"Ma","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Mechanical and Automation Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5430-4419","authenticated-orcid":true,"given":"Zhongxiang","family":"Zhou","sequence":"additional","affiliation":[{"name":"Zhejiang Humanoid Robot Innovation Center Co. Ltd., Ningbo, Zhejiang, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0762-6714","authenticated-orcid":true,"given":"Xuecheng","family":"Xu","sequence":"additional","affiliation":[{"name":"Zhejiang Humanoid Robot Innovation Center Co. Ltd., Ningbo, Zhejiang, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephen","family":"James","sequence":"additional","affiliation":[{"name":"Imperial College London, London, UK."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chi-Wing","family":"Fu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9318-9014","authenticated-orcid":true,"given":"Rong","family":"Xiong","sequence":"additional","affiliation":[{"name":"Zhejiang Humanoid Robot Innovation Center Co. Ltd., Ningbo, Zhejiang, China."},{"name":"College of Control Science and Engineering, Zhejiang University, Hangzhou, Zhejiang, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0628-9932","authenticated-orcid":true,"given":"Pieter","family":"Abbeel","sequence":"additional","affiliation":[{"name":"University of California, Berkeley, CA, USA."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3625-6679","authenticated-orcid":true,"given":"Yun-Hui","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Mechanical and Automation Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3416-9950","authenticated-orcid":true,"given":"Qi","family":"Dou","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"221","reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-25874-z"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.abd9461"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1177\/02783649211046285"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-025-01005-x"},{"key":"e_1_3_2_6_2","unstructured":"J. Zhang J. Zhang K. Pertsch Z. Liu X. Ren M. Chang S.-H. Sun J. J. Lim \u201cBootstrap your own skills: Learning to solve new tasks with large language model guidance\u201d in Proceedings of the 7th Conference on Robot Learning J. Tan M. Toussaint K. Darvish Eds. vol. 229 of Proceedings of Machine Learning Research (PMLR 2023) pp. 302\u2013325."},{"key":"e_1_3_2_7_2","unstructured":"Y. Hu F. Lin T. Zhang L. Yi Y. Gao \u201cLook before you leap: Unveiling the power of GPT-4V in robotic vision-language planning\u201d in Proceedings of the First Workshop on Vision-Language Models for Navigation and Manipulation at ICRA 2024 (Open Review 2024); https:\/\/openreview.net\/forum?id=n82dpqpa7J."},{"key":"e_1_3_2_8_2","unstructured":"A. O\u2019Neill A. Rehman A. Maddukuri A. Gupta A. Padalkar A. Lee \u201cOpen X-embodiment: Robotic learning datasets and RT-X models: Open X-embodiment collaboration\u201d in Proceedings of the IEEE International Conference on Robotics and Automation (IEEE 2024) pp. 6892\u20136903."},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"A. Khazatsky K. Pertsch S. Nair A. Balakrishna S. Dasari S. Karamcheti S. Nasiriany M. K. Srirama L. Y. Chen K. Ellis P. D. Fagan J. Hejna M. Itkina M. Lepert Y. J. Ma P. T. Miller J. Wu S. Belkhale S. Dass H. Ha A. Jain A. Lee Y. Lee M. Memmel S. Park I. Radosavovic K. Wang A. Zhan K. Black C. Chi K. B. Hatch S. Lin J. Lu J. Mercat A. Rehman P. R. Sanketi A. Sharma C. Simpson Q. Vuong H. R. Walke B. Wulfe T. Xiao J. H. Yang A. Yavary T. Z. Zhao C. Agia R. Baijal M. G. Castro D. Chen Q. Chen T. Chung J. Drake E. P. Foster J. Gao D. A. Herrera M. Heo K. Hsu J. Hu D. Jackson C. Le Y. Li R. Lin Z. Ma A. Maddukuri S. Mirchandani D. Morton T. Nguyen A. O'Neill R. Scalise D. Seale V. Son S. Tian E. Tran A. E. Wang Y Wu A. Xie J. Yang P. Yin Y. Zhang O. Bastani G. Berseth J. Bohg K. Goldberg A. Gupta A. Gupta D. Jayaraman J. J. Lim J. Malik R. Mart\u00edn-Mart\u00edn S. Ramamoorthy D. Sadigh S. Song J. Wu M. C. Yip Y. Zhu T. Kollar S. Levine C. Finn \u201cDROID: A large-scale in-the-wild robot manipulation dataset\u201d in Proceedings of Robotics: Science and Systems (RSS Foundation 2024).","DOI":"10.15607\/RSS.2024.XX.120"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2025.3567609"},{"key":"e_1_3_2_11_2","unstructured":"Y. Feng J. Han Z. Yang X. Yue S. Levine J. Luo \u201cReflective planning: Vision-language models for multi-stage long-horizon robotic manipulation\u201d in Proceedings of the 9th Conference on Robot Learning J. Lim S. Song H.-W. Park Eds. vol. 305 of Proceedings of Machine Learning Research (PMLR 2025) pp. 2038\u20132062."},{"key":"e_1_3_2_12_2","unstructured":"K. Rana J. Haviland S. Garg J. Abou-Chakra I. Reid N. Suenderhauf \u201cSayPlan: Grounding large language models using 3D scene graphs for scalable robot task planning\u201d in Proceedings of the 7th Conference on Robot Learning J. Tan M. Toussaint K. Darvish Eds. vol. 229 of Proceedings of Machine Learning Research (PMLR 2023) pp. 23\u201372."},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","unstructured":"H. Huang X. Chen Y. Chen H. Li X. Han Z. Wang T. Wang J. Pang Z. Zhao \u201cRoboGround: Robotic manipulation with grounded vision-language priors\u201d in 2025 Proceedings of the Computer Vision and Pattern Recognition Conference (IEEE 2025) pp. 22540\u201322550.","DOI":"10.1109\/CVPR52734.2025.02099"},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","unstructured":"J. Gao B. Sarkar F. Xia T. Xiao J. Wu B. Ichter A. Majumdar D. Sadigh \u201cPhysically grounded vision-language models for robotic manipulation\u201d in Proceedings of the IEEE International Conference on Robotics and Automation (IEEE 2024) pp. 12462\u201312469.","DOI":"10.1109\/ICRA57147.2024.10610090"},{"key":"e_1_3_2_15_2","unstructured":"R. Wang J. Mao J. Hsu H. Zhao J. Wu Y. Gao \u201cProgrammatically grounded compositionally generalizable robotic manipulation\u201d in The Eleventh International Conference on Learning Representations (Open Review 2023); https:\/\/openreview.net\/forum?id=rZ-wylY5VI."},{"key":"e_1_3_2_16_2","unstructured":"V. Bhat P. Krishnamurthy R. Karri F. Khorrami HiFi-CS: Towards open vocabulary visual grounding for robotic grasping using vision-language models. arXiv:2409.10419 [cs.RO] (2024)."},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","unstructured":"S. Huang I. Ponomarenko Z. Jiang X. Li X. Hu P. Gao H. Li H. Dong \u201cManipVQA: Injecting robotic affordance and physically grounded information into multi-modal large language models\u201d in Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IEEE 2024) pp. 7580\u20137587.","DOI":"10.1109\/IROS58592.2024.10801993"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"W. Cai I. Ponomarenko J. Yuan X. Li W. Yang H. Dong B. Zhao \u201cSpatialBot: Precise spatial understanding with vision language models\u201d in Proceedings of the IEEE International Conference on Robotics and Automation (IEEE 2025) pp. 9490\u20139498.","DOI":"10.1109\/ICRA55743.2025.11128671"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Y. Ding H. Geng C. Xu X. Fang J. Zhang S. Wei Q. Dai Z. Zhang H. Wang \u201cOpen6DOR: Benchmarking open-instruction 6-DoF object rearrangement and a VLM-based approach\u201d in Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IEEE 2024) pp. 7359\u20137366.","DOI":"10.1109\/IROS58592.2024.10802733"},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","unstructured":"M. Dorkenwald N. Barazani C. G. Snoek Y. M. Asano \u201cPin: Positional insert unlocks object localisation abilities in VLMs\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2024) pp. 13548\u201313558.","DOI":"10.1109\/CVPR52733.2024.01286"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","unstructured":"R. Li S. Li L. Kong X. Yang J. Liang \u201cSeeground: See and ground for zero-shot open-vocabulary 3D visual grounding\u201d in Proceedings of the Computer Vision and Pattern Recognition Conference (IEEE 2025) pp. 3707\u20133717.","DOI":"10.1109\/CVPR52734.2025.00351"},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","unstructured":"M. Li S. Zhao Q. Wang K. Wang Y. Zhou S. Srivastava C. Gokmen T. Lee E. L. Li R. Zhang W. Liu P. Liang L. Fei-Fei J. Mao J. Wu \u201cEmbodied agent interface: Benchmarking LLMs for embodied decision making\u201d in Advances in Neural Information Processing Systems A. Globerson L. Mackey D. Belgrave A. Fan U. Paquet J. Tomczak C. Zhang Eds. (Curran Associates 2024) vol. 37 pp. 100428\u2013100534.","DOI":"10.52202\/079017-3188"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364919872545"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2025.3542418"},{"key":"e_1_3_2_25_2","doi-asserted-by":"crossref","unstructured":"X. Li M. Zhang Y. Geng H. Geng Y. Long Y. Shen R. Zhang J. Liu H. Dong \u201cManipLLM: Embodied multimodal large language model for object-centric robotic manipulation\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2024) pp. 18061\u201318070.","DOI":"10.1109\/CVPR52733.2024.01710"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"R. Wu K. Cheng Y. Zhao C. Ning G. Zhan H. Dong \u201cLearning environment-aware affordance for 3D articulated object manipulation under occlusions\u201d in Advances in Neural Information Processing Systems A. Oh T. Naumann A. Globerson K. Saenko M. Hardt S. Levine Eds. (Curran Associates 2023) vol. 36 pp. 60966\u201360983.","DOI":"10.52202\/075280-2664"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2024.3477095"},{"key":"e_1_3_2_28_2","unstructured":"W. Huang C. Wang R. Zhang Y. Li J. Wu F.-F. Li \u201cVoxPoser: Composable 3D value maps for robotic manipulation with language models\u201d in Proceedings of the 7th Conference on Robot Learning J. Tan M. Toussaint K. Darvish Eds. vol. 229 of Proceedings of Machine Learning Research (PMLR 2023) pp. 540\u2013562."},{"key":"e_1_3_2_29_2","unstructured":"W. Huang C. Wang Y. Li R. Zhang F.-F. Li \u201cReKep: Spatio-temporal reasoning of relational keypoint constraints for robotic manipulation\u201d in Proceedings of the 8th Conference on Robot Learning P. Agrawal O. Kroemer W. Burgard Eds. vol. 270 of Proceedings of Machine Learning Research (PMLR 2024) pp. 4573\u20134602."},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","unstructured":"H. Huang F. Lin Y. Hu S. Wang Y. Gao \u201cCoPa: General robotic manipulation through spatial constraints of parts with foundation models\u201d in Proceedings of IEEE\/RSJ International Conference on Intelligent Robots and Systems (IEEE 2024) pp. 9488\u20139495.","DOI":"10.1109\/IROS58592.2024.10801352"},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","unstructured":"M. Pan J. Zhang T. Wu Y. Zhao W. Gao H. Dong \u201cOmniManip: Towards general robotic manipulation via object-centric interaction primitives as spatial constraints\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2025) pp. 17359\u201317369.","DOI":"10.1109\/CVPR52734.2025.01618"},{"key":"e_1_3_2_32_2","doi-asserted-by":"crossref","unstructured":"B. Chen Z. Xu S. Kirmani B. Ichter D. Sadigh L. Guibas F. Xia \u201cSpatialVLM: Endowing vision-language models with spatial reasoning capabilities\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2024) pp. 14455\u201314465.","DOI":"10.1109\/CVPR52733.2024.01370"},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","unstructured":"A.-C. Cheng H. Yin Y. Fu Q. Guo R.-H. Yang J. Kautz X. Wang S. Liu \u201cSpatialRGPT: Grounded spatial reasoning in vision-language models\u201d in Proceedings of Advances in Neural Information Processing Systems A. Globerson L. Mackey D. Belgrave A. Fan U. Paquet J. Tomczak C. Zhang Eds. (Curran Associates 2024) vol. 37 pp. 135062\u2013135093.","DOI":"10.52202\/079017-4293"},{"key":"e_1_3_2_34_2","unstructured":"W. Yuan J. Duan V. Blukis W. Pumacay R. Krishna A. Murali A. Mousavian D. Fox \u201cRoboPoint: A vision-language model for spatial affordance prediction for robotics\u201d in Proceedings of the 8th Conference on Robot Learning P. Agrawal O. Kroemer W. Burgard Eds. vol. 270 of Proceedings of Machine Learning Research (PMLR 2024) pp. 4005\u20134020."},{"key":"e_1_3_2_35_2","unstructured":"P. Lewis E. Perez A. Piktus F. Petroni V. Karpukhin N. Goyal H. K\u00fcttler M. Lewis W.-T. Yih T. Rockt\u00e4schel S. Riedel D. Kiela \u201cRetrieval-augmented generation for knowledge-intensive NLP tasks\u201d in Proceedings of Advances in Neural Information Processing System H. Larochelle M. Ranzato R. Hadsell M. F. Balcan H. Lin Eds. (Curran Associates 2020) vol. 33 pp. 9459\u20139474."},{"key":"e_1_3_2_36_2","unstructured":"Y. Gao Y. Xiong X. Gao K. Jia J. Pan Y. Bi Y. Dai J. Sun H. Wang H. Wang Retrieval-augmented generation for large language models: A survey. arXiv:2312.10997 [cs.CL] (2023)."},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","unstructured":"Z. Jiang F. F. Xu L. Gao Z. Sun Q. Liu J. Dwivedi-Yu Y. Yang J. Callan G. Neubig \u201cActive retrieval augmented generation\u201d in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing H. Bouamor J. Pino K. Bali Eds. (Association for Computational Linguistics 2023) pp. 7969\u20137992.","DOI":"10.18653\/v1\/2023.emnlp-main.495"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","unstructured":"J. Chen H. Lin X. Han L. Sun \u201cBenchmarking large language models in retrieval-augmented generation\u201d in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2024) vol. 38 pp. 17754\u201317762.","DOI":"10.1609\/aaai.v38i16.29728"},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","unstructured":"W. Fan Y. Ding L. Ning S. Wang H. Li D. Yin T.-S. Chua Q. Li \u201cA survey on rag meeting LLMs: Towards retrieval-augmented large language models\u201d in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery 2024) pp. 6491\u20136501.","DOI":"10.1145\/3637528.3671470"},{"key":"e_1_3_2_40_2","unstructured":"Gemini Team R. Anil S. Borgeaud J.-B. Alayrac J. Yu R. Soricut J. Schalkwyk A. M. Dai A. Hauth K. Millican D. Silver M. Johnson I. Antonoglou J. Schrittwieser A. Glaese J. Chen E. Pitler T. Lillicrap A. Lazaridou O. Firat J. Molloy M. Isard P. R. Barham T. Hennigan B. Lee F. Viola M. Reynolds Y. Xu R. Doherty E. Collins C. Meyer E. Rutherford E. Moreira K. Ayoub M. Goel J. Krawczyk C. Du E. Chi H.-T. Cheng E. Ni P. Shah P. Kane B. Chan M. Faruqui A. Severyn H. Lin Y. Li Y. Cheng A. Ittycheriah M. Mahdieh M. Chen P. Sun D. Tran S. Bagri B. Lakshminarayanan J. Liu A. Orban F. G\u00fcra H. Zhou X. Song A. Boffy H. Ganapathy S. Zheng H. Choe \u00c1. Weisz T. Zhu Y. Lu S. Gopal J. Kahn M. Kula J. Pitman R. Shah E. Taropa M. Al Merey M. Baeuml Z. Chen L. El Shafey Y. Zhang O. Sercinoglu G. Tucker E. Piqueras M. Krikun I. Barr N. Savinov I. Danihelka B. Roelofs A. White A. Andreassen T. von Glehn L. Yagati M. Kazemi L. Gonzalez M. Khalman J. Sygnowski A. Frechette C. Smith L. Culp L. Proleev Y. Luan X. Chen J. Lottes N. Schucher F. Lebron A. Rrustemi N. Clay P. Crone T. Kocisky J. Zhao B. Perz D. Yu H. Howard A. Bloniarz J. W. Rae H. Lu L. Sifre M. Maggioni F. Alcober D. Garrette M. Barnes S. Thakoor J. Austin G. Barth-Maron W. Wong R. Joshi R. Chaabouni D. Fatiha A. Ahuja G. S. Tomar E. Senter M. Chadwick I. Kornakov N. Attaluri I. Iturrate R. Liu Y. Li S. Cogan J. Chen C. Jia C. Gu Q. Zhang J. Grimstad A. J. Hartman X. Garcia T. S. Pillai J. Devlin M. Laskin D. de Las Casas D. Valter C. Tao L. Blanco A. P. Badia D. Reitter M. Chen J. Brennan C. Rivera S. Brin S. Iqbal G. Surita J. Labanowski A. Rao S. Winkler E. Parisotto Y. Gu K. Olszewska R. Addanki A. Miech A. Louis D. Teplyashin G. Brown E. Catt J. Balaguer J. Xiang P. Wang Z. Ashwood A. Briukhov A. Webson S. Ganapathy S. Sanghavi A. Kannan M.-W. Chang A. Stjerngren J. Djolonga Y. Sun A. Bapna M. Aitchison P. Pejman H. Michalewski T. Yu C. Wang J. Love J. Ahn D. Bloxwich K. Han P. Humphreys T. Sellam J. Bradbury V. Godbole S. Samangooei B. Damoc A. Kaskasoli S. M. R. Arnold V. Vasudevan S. Agrawal J. Riesa D. Lepikhin R. Tanburn S. Srinivasan H. Lim S. Hodkinson P. Shyam J. Ferret S. Hand A. Garg T. Le Paine J. Li Y. Li M. Giang A. Neitz Z. Abbas S. York M. Reid E. Cole A. Chowdhery D. Das D. Rogozi\u0144ska V. Nikolaev P. Sprechmann Z. Nado L. Zilka F. Prost L. He M. Monteiro G. Mishra C. Welty J. Newlan D. Jia M. Allamanis C. H. Hu R. de Liedekerke J. Gilmer C. Saroufim S. Rijhwani S. Hou D. Shrivastava A. Baddepudi A. Goldin A. Ozturel A. Cassirer Y. Xu D. Sohn D. Sachan R. K. Amplayo C. Swanson D. Petrova S. Narayan A. Guez S. Brahma J. Landon M. Patel R. Zhao K. Villela L. Wang W. Jia M. Rahtz M. Gim\u00e9nez L. Yeung J. Keeling P. Georgiev D. Mincu B. Wu S. Haykal R. Saputro K. Vodrahalli J. Qin Z. Cankara A. Sharma N. Fernando W. Hawkins B. Neyshabur S. Kim A. Hutter P. Agrawal A. Castro-Ros G. van den Driessche T. Wang F. Yang S.-y. Chang P. Komarek R. McIlroy M. Lu\u010di\u0107 G. Zhang W. Farhan M. Sharman P. Natsev P. Michel Y. Bansal S. Qiao K. Cao S. Shakeri C. Butterfield J. Chung P. K. Rubenstein S. Agrawal A. Mensch K. Soparkar K. Lenc T. Chung A. Pope L. Maggiore J. Kay P. Jhakra S. Wang J. Maynez M. Phuong T. Tobin A. Tacchetti M. Trebacz K. Robinson Y. Katariya S. Riedel P. Bailey K. Xiao N. Ghelani L. Aroyo A. Slone N. Houlsby X. Xiong Z. Yang E. Gribovskaya J. Adler M. Wirth L. Lee M. Li T. Kagohara J. Pavagadhi S. Bridgers A. Bortsova S. Ghemawat Z. Ahmed T. Liu R. Powell V. Bolina M. Iinuma P. Zablotskaia J. Besley D.-W. Chung T. Dozat R. Comanescu X. Si J. Greer G. Su M. Polacek R. L. Kaufman S. Tokumine H. Hu E. Buchatskaya Y. Miao M. Elhawaty A. Siddhant N. Tomasev J. Xing C. Greer H. Miller S. Ashraf A. Roy Z. Zhang A. Ma A. Filos M. Besta R. Blevins T. Klimenko C.-K. Yeh S. Changpinyo J. Mu O. Chang M. Pajarskas C. Muir V. Cohen C. Le Lan K. Haridasan A. Marathe S. Hansen S. Douglas R. Samuel M. Wang S. Austin C. Lan J. Jiang J. Chiu J. A. Lorenzo L. L. Sj\u00f6sund S. Cevey Z. Gleicher T. Avrahami A. Boral H. Srinivasan V. Selo R. May K. Aisopos L. Hussenot L. B. Soares K. Baumli M. B. Chang A. Recasens B. Caine A. Pritzel F. Pavetic F. Pardo A. Gergely J. Frye V. Ramasesh D. Horgan K. Badola N. Kassner S. Roy E. Dyer V. C. Campos A. Tomala Y. Tang D. El Badawy E. White B. Mustafa O. Lang A. Jindal S. Vikram Z. Gong S. Caelles R. Hemsley G. Thornton F. Feng W. Stokowiec C. Zheng P. Thacker \u00c7. \u00dcnl\u00fc Z. Zhang M. Saleh J. Svensson M. Bileschi P. Patil A. Anand R. Ring K. Tsihlas A. Vezer M. Selvi T. Shevlane M. Rodriguez T. Kwiatkowski S. Daruki K. Rong A. Dafe N. FitzGerald K. Gu-Lemberg M. Khan L. A. Hendricks M. Pellat V. Feinberg J. Cobon-Kerr T. Sainath M. Rauh S. H. Hashemi R. Ives Y. Hasson E. Noland Y. Cao N. Byrd L. Hou Q. Wang T. Sottiaux M. Paganini J.-B. Lespiau A. Moufarek S. Hassan K. Shivakumar J. van Amersfoort A. Mandhane P. Joshi A. Goyal M. Tung A. Brock H. Sheahan V. Misra C. Li N. Raki\u0107evi\u0107 M. Dehghani F. Liu S. Mittal J. Oh S. Noury E. Sezener F. Huot M. Lamm N. De Cao C. Chen S. Mudgal R. Stella K. Brooks G. Vasudevan C. Liu M. Chain N. Melinkeri A. Cohen V. Wang K. Seymore S. Zubkov R. Goel S. Yue S. Krishnakumaran B. Albert N. Hurley M. Sano A. Mohananey J. Joughin E. Filonov T. K\u0119pa Y. Eldawy J. Lim R. Rishi S. Badiezadegan T. Bos J. Chang S. Jain S. G. S. Padmanabhan S. Puttagunta K. Krishna L. Baker N. Kalb V. Bedapudi A. Kurzrok S. Lei A. Yu O. Litvin X. Zhou Z. Wu S. Sobell A. Siciliano A. Papir R. Neale J. Bragagnolo T. Toor T. Chen V. Anklin F. Wang R. Feng M. Gholami K. Ling L. Liu J. Walter H. Moghaddam A. Kishore J. Adamek T. Mercado J. Mallinson S. Wandekar S. Cagle E. Ofek G. Garrido C. Lombriser M. Mukha B. Sun H. R. Mohammad J. Matak Y. Qian V. Peswani P. Janus Q. Yuan L. Schelin O. David A. Garg Y. He O. Duzhyi A. \u00c4lgmyr T. Lottaz Q. Li V. Yadav L. Xu A. Chinien R. Shivanna A. Chuklin J. Li C. Spadine T. Wolfe K. Mohamed S. Das Z. Dai K. He D. von Dincklage S. Upadhyay A. Maurya L. Chi S. Krause K. Salama P. G. Rabinovitch M. P. K. Reddy A. Selvan M. Dektiarev G. Ghiasi E. Guven H. Gupta B. Liu D. Sharma I. H. Shtacher S. Paul O. Akerlund F.-X. Aubet T. Huang C. Zhu E. Zhu E. Teixeira M. Fritze F. Bertolini L.-E. Marinescu M. B\u00f6lle D. Paulus K. Gupta T. Latkar M. Chang J. Sanders R. Wilson X. Wu Y.-X. Tan L. N. Thiet T. Doshi S. Lall S. Mishra W. Chen T. Luong S. Benjamin J. Lee E. Andrejczuk D. Rabiej V. Ranjan K. Styrc P. Yin J. Simon M. R. Harriott M. Bansal A. Robsky G. Bacon D. Greene D. Mirylenka C. Zhou O. Sarvana A. Goyal S. Andermatt P. Siegler B. Horn A. Israel F. Pongetti C.-W. \u201cL.\u201d Chen M. Selvatici P. Silva K. Wang J. Tolins K. Guu R. Yogev X. Cai A. Agostini M. Shah H. Nguyen N. \u00d3 Donnaile S. Pereira L. Friso A. Stambler A. Kurzrok C. Kuang Y. Romanikhin M. Geller Z. J. Yan K. Jang C.-C. Lee W. Fica E. Malmi Q. Tan D. Banica D. Balle R. Pham Y. Huang D. Avram H. Shi J. Singh C. Hidey N. Ahuja P. Saxena D. Dooley S. P. Potharaju E. O'Neill A. Gokulchandran R. Foley K. Zhao M. Dusenberry Y. Liu P. Mehta R. Kotikalapudi C. Safranek-Shrader A. Goodman J. Kessinger E. Globen P. Kolhar C. Gorgolewski A. Ibrahim Y. Song A. Eichenbaum T. Brovelli S. Potluri P. Lahoti C. Baetu A. Ghorbani C. Chen A. Crawford S. Pal M. Sridhar P. Gurita A. Mujika I. Petrovski P.-L. Cedoz C. Li S. Chen N. D. Santo S. Goyal J. Punjabi K. Kappaganthu C. Kwak Pallavi L. V. S. Velury H. Choudhury J. Hall P. Shah R. Figueira M. Thomas M. Lu T. Zhou C. Kumar T. Jurdi S. Chikkerur Y. Ma A. Yu S. Kwak V. \u00c4hdel S. Rajayogam T. Choma F. Liu A. Barua C. Ji J. H. Park V. Hellendoorn A. Bailey T. Bilal H. Zhou M. Khatir C. Sutton W. Rzadkowski F. Macintosh R. Vij K. Shagin P. Medina C. Liang J. Zhou P. Shah Y. Bi A. Dankovics S. Banga S. Lehmann M. Bredesen Z. Lin J. E. Hoffmann J. Lai R. Chung K. Yang N. Balani A. Bra\u017einskas A. Sozanschi M. Hayes H. F. Alcalde P. Makarov W. Chen A. Stella L. Snijders M. Mandl A. K\u00e4rrman P. Nowak X. Wu A. Dyck K. Vaidyanathan R. Raghavender J. Mallet M. Rudominer E. Johnston S. Mittal A. Udathu J. Christensen V. Verma Z. Irving A. Santucci G. Elsayed E. Davoodi M. Georgiev I. Tenney N. Hua G. Cideron E. Leurent M. Alnahlawi I. Georgescu N. Wei I. Zheng D. Scandinaro H. Jiang J. Snoek M. Sundararajan X. Wang Z. Ontiveros I. Karo J. Cole V. Rajashekhar L. Tumeh E. Ben-David R. Jain J. Uesato R. Datta O. Bunyan S. Wu J. Zhang P. Stanczyk Y. Zhang D. Steiner S. Naskar M. Azzam M. Johnson A. Paszke C.-C. Chiu J. S. Elias A. Mohiuddin F. Muhammad J. Miao A. Lee N. Vieillard J. Park J. Zhang J. Stanway D. Garmon A. Karmarkar Z. Dong J. Lee A. Kumar L. Zhou J. Evens W. Isaac G. Irving E. Loper M. Fink I. Arkatkar N. Chen I. Shafran I. Petrychenko Z. Chen J. Jia A. Levskaya Z. Zhu P. Grabowski Y. Mao A. Magni K. Yao J. Snaider N. Casagrande E. Palmer P. Suganthan A. Casta\u00f1o I. Giannoumis W. Kim M. Rybi\u0144ski A. Sreevatsa J. Prendki D. Soergel A. Goedeckemeyer W. Gierke M. Jafari M. Gaba J. Wiesner D. G. Wright Y. Wei H. Vashisht Y. Kulizhskaya J. Hoover M. Le L. Li C. Iwuanyanwu L Liu K. Ramirez A. Khorlin A. Cui T. Lin M. Wu R. Aguilar K. Pallo A. Chakladar G. Perng E. A. Abellan M. Zhang I. Dasgupta N. Kushman I. Penchev A. Repina X. Wu T. van der Weide P. Ponnapalli C. Kaplan J. Simsa S. Li O. Dousse F. Yang J. Piper N. Ie R. Pasumarthi N. Lintz A. Vijayakumar D. Andor P. Valenzuela M. Lui C. Paduraru D. Peng K. Lee S. Zhang S. Greene D. D. Nguyen P. Kurylowicz C. Hardin L. Dixon L. Janzer K. Choo Z. Feng B. Zhang A. Singhal D. Du D. McKinnon N. Antropova T. Bolukbasi O. Keller D. Reid D. Finchelstein M. A. Raad R. Crocker P. Hawkins R. Dadashi C. Gaffney K. Franko A. Bulanova R. Leblond S. Chung H. Askham L. C. Cobo K. Xu F. Fischer J. Xu C. Sorokin C. Alberti C.-C. Lin C. Evans A. Dimitriev H. Forbes D. Banarse Z. Tung M. Omernick C. Bishop R. Sterneck R. Jain J. Xia E. Amid F. Piccinno X. Wang P. Banzal D. J. Mankowitz A. Polozov V. Krakovna S. Brown M. H. Bateni D. Duan V. Firoiu M. Thotakuri T. Natan M. Geist S. tan Girgin H. Li J. Ye O. Roval R. Tojo M. Kwong J. Lee-Thorp C. Yew D. Sinopalnikov S. Ramos J. Mellor A. Sharma K. Wu D. Miller N. Sonnerat D. Vnukov R. Greig J. Beattie E. Caveness L. Bai J. Eisenschlos A. Korchemniy T. Tsai M. Jasarevic W. Kong P. Dao Z. Zheng F. Liu F. Yang R. Zhu T. Huey Teh J. Sanmiya E. Gladchenko N. Trdin D. Toyama E. Rosen S. Tavakkol L. Xue C. Elkind O. Woodman J. Carpenter G. Papamakarios R. Kemp S. Kafle T. Grunina R. Sinha A. Talbert D. Wu D. Owusu-Afriyie C. Du C. Thornton J. Pont-Tuset P. Narayana J. Li S. Fatehi J. Wieting O. Ajmeri B. Uria Y. Ko L. Knight A. H\u00e9liou N. Niu S. Gu C. Pang Y. Li N. Levine A. Stolovich R. Santamaria-Fernandez S. Goenka W. Yustalim R. Strudel A. Elqursh C. Deck H. Lee Z. Li K. Levin R. Hoffmann D. Holtmann-Rice O. Bachem S. Arora C. Koh S. H. Yeganeh S. P\u00f5der M. Tariq Y. Sun L. Ionita M. Seyedhosseini P. Tafti Z. Liu A. Gulati J. Liu X. Ye B. Chrzaszcz L. Wang N. Sethi T. Li B. Brown S. Singh W. Fan A. Parisi J. Stanton V. Koverkathu C. A. Choquette-Choo Y. Li T. J. Lu A. Ittycheriah P. Shroff M. Varadarajan S. Bahargam R. Willoughby D. Gaddy G. Desjardins M. Cornero B. Robenek B. Mittal B. Albrecht A. Shenoy F. Moiseev H. Jacobsson A. Ghaffarkhah M. Rivi\u00e8re A. Walton C. Crepy A. Parrish Z. Zhou C. Farabet C. Radebaugh P. Srinivasan C. van der Salm A. Fidjeland S. Scellato E. Latorre-Chimoto H. Klimczak-Pluci\u0144ska D. Bridson D. de Cesare T. Hudson P. Mendolicchio L. Walker A. Morris M. Mauger A. Guseynov A. Reid S. Odoom L. Loher V. Cotruta M. Yenugula D. Grewe A. Petrushkina T. Duerig A. Sanchez S. Yadlowsky A. Shen A. Globerson L. Webb S. Dua D. Li S. Bhupatiraju D. Hurt H. Qureshi A. Agarwal T. Shani M. Eyal A. Khare S. R. Belle L. Wang C. Tekur M. S. Kale J. Wei R. Sang B. Saeta T. Liechty Y. Sun Y. Zhao S. Lee P. Nayak D. Fritz M. R. Vuyyuru J. Aslanides N. Vyas M. Wicke X. Ma E. Eltyshev N. Martin H. Cate J. Manyika K. Amiri Y. Kim X. Xiong K. Kang F. Luisier N. Tripuraneni D. Madras M. Guo A. Waters O. Wang J. Ainslie J. Baldridge H Zhang G. Pruthi J. Bauer F. Yang R. Mansour J. Gelman Y. Xu G. Polovets J. Liu H. Cai W. Chen X. Sheng E. Xue S. Ozair C. Angermueller X. Li A. Sinha W. Wang J. Wiesinger E. Koukoumidis Y. Tian A. Iyer M. Gurumurthy M. Goldenson P. Shah M. K. Blake H. Yu A. Urbanowicz J. Palomaki C. Fernando K. Durden H. Mehta N. Momchev E. Rahimtoroghi M. Georgaki A. Raul S. Ruder M. Redshaw J. Lee D. Zhou K. Jalan D. Li B. Hechtman P. Schuh M. Nasr K. Milan V. Mikulik J. Franco T. Green N. Nguyen J. Kelley A. Mahendru A. Hu J. Howland B. Vargas J. Hui K. Bansal V. Rao R. Ghiya E. Wang K. Ye J. M. Sarr M. M. Preston M. Elish S. Li A. Kaku J. Gupta I. Pasupat D.-C. Juan M. Someswar Tejvi M. X. Chen A. Amini A. Fabrikant E. Chu X. Dong A. Muthal S. Buthpitiya S. Jauhari N. Hua U. Khandelwal A. Hitron J. Ren L. Rinaldi S. Drath A. Dabush N.-J. Jiang H. Godhia U. Sachs A. Chen Y. Fan H. Taitelbaum H. Noga Z. Dai J. Wang C. Liang J. Hamer C.-S. Ferng C. Elkind A. Atias P. Lee V. List\u00edk M. Carlen J. van de Kerkhof M. Pikus K. Zaher P. M\u00fcller S. Zykova R. Stefanec V. Gatsko C. Hirnschall A. Sethi X. F. Xu C. Ahuja B. Tsai A. Stefanoiu B. Feng K. Dhandhania M. Katyal A. Gupta A. Parulekar D. Pitta J. Zhao V. Bhatia Y. Bhavnani O. Alhadlaq X. Li P. Danenberg D. Tu A. Pine V. Filippova A. Ghosh B. Limonchik B. Urala C. K. Lanka D. Clive Y. Sun E. Li H. Wu K. Hongtongsak I. Li K. Thakkar K. Omarov K. Majmundar M. Alverson M. Kucharski M. Patel M. Jain M. Zabelin P. Pelagatti R. Kohli S. Kumar J. Kim S. Sankar V. Shah L. Ramachandruni X. Zeng B. Bariach L. Weidinger T. Vu A. Andreev A. He K. Hui S. Kashem A. Subramanya S. Hsiao D. Hassabis K. Kavukcuoglu A. Sadovsky Q. Le T. Strohman Y. Wu S. Petrov J. Dean O. Vinyals Gemini: A family of highly capable multimodal models. arXiv:2312.11805 [cs.CL] (2023)."},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","unstructured":"S. Liu Z. Zeng T. Ren F. Li H. Zhang J. Yang Q. Jiang C. Li J. Yang H. Su L. Zhang \u201cGrounding DINO: Marrying DINO with grounded pre-training for open-set object detection\u201d in Computer Vision\u2014ECCV 2024: 18th European Conference Milan Italy September 29\u2013October 4 2024 Proceedings Part XLVII A. Leonardis E. Ricci S. Roth O. Russakovsky T. Sattler G. Varol Eds. vol. 15105 of Lecture Notes in Computer Science (Springer 2024) pp. 38\u201355.","DOI":"10.1007\/978-3-031-72970-6_3"},{"key":"e_1_3_2_42_2","unstructured":"M. Oquab T. Darcet T. Moutakanni H. Vo M. Szafraniec V. Khalidov P. Fernandez D. Haziza F. Massa A. El-Nouby M. Assran N. Ballas W. Galuba R. Howes P.-Y. Huang S.-W. Li I. Misra M. Rabbat V. Sharma G. Synnaeve H. Xu H. Jegou J. Mairal P. Labatut A. Joulin P. Bojanowski DINOv2: Learning robust visual features without supervision. arXiv:2304.07193 [cs.CV] (2023)."},{"key":"e_1_3_2_43_2","unstructured":"OpenAI A. Hurst A. Lerer A. P Goucher A. Perelman A. Ramesh A. Clark A. J. Ostrow A. Welihinda A. Hayes A. Radford A. M\u0105dry A. Baker-Whitcomb A. Beutel A. Borzunov A. Carney A. Chow A. Kirillov A. Nichol A. Paino A. Renzin A. T. Passos A. Kirillov A. Christakis A. Conneau A. Kamali A. Jabri A. Moyer A. Tam A. Crookes A. Tootoochian A. Tootoonchian A. Kumar A. Vallone A. Karpathy A. Braunstein A. Cann A. Codispoti A. Galu A. Kondrich A. Tulloch A. Mishchenko A. Baek A. Jiang A. Pelisse A. Woodford A. Gosalia A. Dhar A. Pantuliano A. Nayak A. Oliver B. Zoph B. Ghorbani B. Leimberger B. Rossen B. Sokolowsky B. Wang B. Zweig B. Hoover B. Samic B. McGrew B. Spero B. Giertler B. Cheng B. Lightcap B. Walkin B. Quinn B. Guarraci B. Hsu B. Kellogg B. Eastman C. Lugaresi C. Wainwright C. Bassin C. Hudson C. Chu C. Nelson C. Li C. J. Shern C. Conger C. Barette C. Voss C. Ding C. Lu C. Zhang C. Beaumont C. Hallacy C. Koch C. Gibson C. Kim C. Choi C. McLeavey C. Hesse C. Fischer C. Winter C. Czarnecki C. Jarvis C. Wei C. Koumouzelis D. Sherburn D. Kappler D. Levin D. Levy D. Carr D. Farhi D. Mely D. Robinson D. Sasaki D. Jin D. Valladares D. Tsipras D. Li D. P. Nguyen D. Findlay E. Oiwoh E. Wong E. Asdar E. Proehl E. Yang E. Antonow E. Kramer E. Peterson E. Sigler E. Wallace E. Brevdo E. Mays F. Khorasani F. P. Such F. Raso F. Zhang F. von Lohmann F. Sulit G. Goh G. Oden G. Salmon G. Starace G. Brockman H. Salman H. Bao H. Hu H. Wong H. Wang H. Schmidt H. Whitney H. Jun H. Kirchner H. Ponde de Oliveira Pinto H. Ren H. Chang H. W. Chung I. Kivlichan I. O\u2019Connell I. O\u2019Connell I. Osband I. Silber I. Sohl I. Okuyucu I. Lan I. Kostrikov I. Sutskever I. Kanitscheider I. Gulrajani J. Coxon J. Menick J. Pachocki J. Aung J. Betker J. Crooks J. Lennon J. Kiros J. Leike J. Park J. Kwon J. Phang J. Teplitz J. Wei J. Wolfe J. Chen J. Harris J. Varavva J. G. Lee J. Shieh J. Lin J. Yu J. Weng J. Tang J. Yu J. Jang J. Q. Candela J. Beutler J. Landers J. Parish J. Heidecke J. Schulman J. Lachman J. McKay J. Uesato J. Ward J. W. Kim J. Huizinga J. Sitkin J. Kraaijeveld J. Gross J. Kaplan J. Snyder J. Achiam J. Jiao J. Lee J. Zhuang J. Harriman K. Fricke K. Hayashi K. Singhal K. Shi K. Karthik K. Wood K. Rimbach K. Hsu K. Nguyen K. Gu-Lemberg K. Button K. Liu K. Howe K. Muthukumar K. Luther L. Ahmad L. Kai L. Itow L. Workman L. Pathak L. Chen L. Jing L. Guy L. Fedus L. Zhou L. Mamitsuka L. Weng L. McCallum L. Held L. Ouyang L. Feuvrier L. Zhang L. Kondraciuk L. Kaiser L. Hewitt L. Metz L. Doshi M. Aflak M. Simens M. Boyd M. Thompson M. Dukhan M. Chen M. Gray M. Hudnall M. Zhang M. Aljubeh M. Litwin M. Zeng M. Johnson M. Shetty M. Gupta M. Shah M. Yatbaz M. J. Yang M. Zhong M. Glaese M. Chen M. Janner M. Lampe M. Petrov M. Wu M. Wang M. Fradin M. Pokrass M. Castro M. O. Temudo de Castro M. Pavlov M. Brundage M. Wang M. Khan M. Murati M. Bavarian M. Lin M. Yesildal N. Soto N. Gimelshein N. Cone N. Staudacher N. Summers N. LaFontaine N. Chowdhury N. Ryder N. Stathas N. Turley N. Tezak N. Felix N. Kudige N. Keskar N. Deutsch N. Bundick N. Puckett O. Nachum O. Okelola O. Boiko O. Murk O. Jaffe O. Watkins O. Godement O. Campbell-Moore P. Chao P. McMillan P. Belov P. Su P. Bak P. Bakkum P. Deng P. Dolan P. Hoeschele P. Welinder P. Tillet P. Pronin P. Tillet P. Dhariwal Q. Yuan R. Dias R. Lim R. Arora R. Troll R. Lin R. G. Lopes R. Puri R. Miyara R. Leike R. Gaubert R. Zamani R. Wang R. Donnelly R. Honsby R. Smith R. Sahai R. Ramchandani R. Huet R. Carmichael R. Zellers R. Chen R. Chen R. Nigmatullin R. Cheu S. Jain S. Altman S. Schoenholz S. Toizer S. Miserendino S. Agarwal S. Culver S. Ethersmith S. Gray S. Grove S. Metzger S. Hermani S. Jain S. Zhao S. Wu S. Jomoto S. Wu S. (T.) Xia S. Phene S. Papay S. Narayanan S. Coffey S. Lee S. Hall S. Balaji T. Broda T. Stramer T. Xu T. Gogineni T. Christianson T. Sanders T. Patwardhan T. Cunninghman T. Degry T. Dimson T. Raoux T. Shadwell T. Zheng T. Underwood T. Markov T. Sherbakov T. Rubin T. Stasi T. Kaftan T. Heywood T. Peterson T. Walters T. Eloundou V. Qi V. Moeller V. Monaco V. Kuo V. Fomenko W. Chang W. Zheng W. Zhou W. Manassra W. Sheu W. Zaremba Y. Patil Y. Qian Y. Kim Y. Cheng Y. Zhang Y. He Y. Zhang Y. Jin Y. Dai Y. Malkov GPT-4o system card. arXiv:2410.21276 (2024)."},{"key":"e_1_3_2_44_2","unstructured":"Anthropic AI Model card and evaluations for Claude models (2023); https:\/\/www-cdn.anthropic.com\/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf."},{"key":"e_1_3_2_45_2","unstructured":"S. Bai K. Chen X. Liu J. Wang W. Ge S. Song K. Dang P. Wang S. Wang J. Tang H. Zhong Y. Zhu M. Yang Z. Li J. Wan P. Wang W. Ding Z. Fu Y. Xu J. Ye X. Zhang T. Xie Z. Cheng H. Zhang Z. Yang H. Xu J. Lin Qwen2.5-VL technical report. arXiv:2502.13923 [cs.CV] (2025)."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2024.3438036"},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","unstructured":"G. Zhai X. Cai D. Huang Y. Di F. Manhardt F. Tombari N. Navab B. Busam \u201cSG-Bot: Object rearrangement via coarse-to-fine robotic imagination on scene graphs\u201d in Proceedings of IEEE International Conference on Robotics and Automation (IEEE 2024) pp. 4303\u20134310.","DOI":"10.1109\/ICRA57147.2024.10610792"},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","unstructured":"J. Reizenstein R. Shapovalov P. Henzler L. Sbordone P. Labatut D. Novotny \u201cCommon objects in 3D: Large-scale learning and evaluation of real-life 3D category reconstruction\u201d in Proceedings of the IEEE\/CVF International Conference on Computer Vision (IEEE 2021) pp. 10901\u201310911.","DOI":"10.1109\/ICCV48922.2021.01072"},{"key":"e_1_3_2_49_2","doi-asserted-by":"crossref","unstructured":"W. Goodwin S. Vaze I. Havoutis I. Posner \u201cZero-shot category-level object pose estimation\u201d in Computer Vision\u2014ECCV 2022: 17th European Conference Tel Aviv Israel October 23\u201327 2022 Proceedings Part XXXIX S. Avidan G. Brostow M. Ciss\u00e9 G. M. Farinella T. Hassner Eds. vol. 13699 of Lecture Notes in Computer Science (Springer 2022) pp. 516\u2013532.","DOI":"10.1007\/978-3-031-19842-7_30"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10851-009-0161-2"},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","unstructured":"E. P. \u00d6rnek Y. Labb\u00e9 B. Tekin L. Ma C. Keskin C. Forster T. Hodan \u201cFoundPose: Unseen object pose estimation with foundation features\u201d in Computer Vision\u2014ECCV 2024: 18th European Conference Milan Italy September 29\u2013October 4 2024 Proceedings Part XLVII A. Leonardis E. Ricci S. Roth O. Russakovsky T. Sattler G. Varol Eds. vol. 15105 of Lecture Notes in Computer Science (Springer 2024) pp. 163\u2013182.","DOI":"10.1007\/978-3-031-73347-5_10"},{"key":"e_1_3_2_52_2","doi-asserted-by":"crossref","unstructured":"Y. Wu X. Wang X. Yang M. Liu D. Zeng H. Ye S. Li \u201cLearning occlusion-robust vision transformers for real-time UAV tracking\u201d in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2025) pp. 17103\u201317113.","DOI":"10.1109\/CVPR52734.2025.01594"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.adl0628"},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","unstructured":"K. Chen Y. Ma X. Lin S. James J. Zhou Y.-H. Liu P. Abbeel Q. Dou \u201cVision foundation model enables generalizable object pose estimation\u201d in Proceedings of Advances in Neural Information Processing Systems A. Globerson L. Mackey D. Belgrave A. Fan U. Paquet J. Tomczak C. Zhang Eds. (Curran Associates 2024) vol. 37 pp. 19975\u201320002.","DOI":"10.52202\/079017-0630"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.6778"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/358669.358692"},{"key":"e_1_3_2_57_2","unstructured":"A. X. Chang T. Funkhouser L. Guibas P. Hanrahan Q. Huang Z. Li S. Savarese M. Savva S. Song H. Su J. Xiao L. Yi F. Yu ShapeNet: An information-rich 3D model repository. arXiv:1512.03012 [cs.GR] (2015)."},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","unstructured":"Y. Xiao Y. Du R. Marlet \u201cPoseContrast: Class-agnostic object viewpoint estimation in the wild with pose-aware contrastive learning\u201d in Proceedings of International Conference on 3D Vision (IEEE 2021) pp. 74\u201384.","DOI":"10.1109\/3DV53792.2021.00018"},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","unstructured":"G. Pitteri M. Ramamonjisoa S. Ilic V. Lepetit \u201cOn object symmetries and 6D pose estimation from images\u201d in Proceedings of International Conference on 3D Vision (IEEE 2019) pp. 614\u2013622.","DOI":"10.1109\/3DV.2019.00073"},{"key":"e_1_3_2_60_2","unstructured":"T. Ren S. Liu A. Zeng J. Lin K. Li H. Cao J. Chen X. Huang Y. Chen F. Yan Z. Zeng H. Zhang F. Li J. Yang H. Li Q. Jiang L. Zhang Grounded SAM: Assembling open-world models for diverse visual tasks. arXiv: 2401.14159 [cs.CV] (2024)."},{"key":"e_1_3_2_61_2","doi-asserted-by":"crossref","unstructured":"J. Wang M. Chen N. Karaev A. Vedaldi C. Rupprecht D. Novotny \u201cVGGT: Visual geometry grounded transformer\u201d in Proceedings of the Computer Vision and Pattern Recognition Conference (IEEE 2025) pp. 5294\u20135306.","DOI":"10.1109\/CVPR52734.2025.00499"}],"container-title":["Science Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.science.org\/doi\/pdf\/10.1126\/scirobotics.aea2092","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T18:01:55Z","timestamp":1777485715000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.science.org\/doi\/10.1126\/scirobotics.aea2092"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,29]]},"references-count":60,"journal-issue":{"issue":"113","published-print":{"date-parts":[[2026,4,29]]}},"alternative-id":["10.1126\/scirobotics.aea2092"],"URL":"https:\/\/doi.org\/10.1126\/scirobotics.aea2092","relation":{},"ISSN":["2470-9476"],"issn-type":[{"value":"2470-9476","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,29]]},"assertion":[{"value":"2025-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-03-31","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-04-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"eaea2092"}}