{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T01:22:56Z","timestamp":1772068976626,"version":"3.50.1"},"reference-count":66,"publisher":"American Association for the Advancement of Science (AAAS)","issue":"104","content-domain":{"domain":["www.science.org"],"crossmark-restriction":true},"short-container-title":["Sci. Robot."],"published-print":{"date-parts":[[2025,7,9]]},"abstract":"<jats:p>Research on autonomous surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications demand dexterous manipulation over extended durations and robust generalization to the inherent variability of human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning strategies. To address this gap, we propose a hierarchical framework for performing dexterous, long-horizon surgical steps. Our approach uses a high-level policy for task planning and a low-level policy for generating low-level trajectories. The high-level planner plans in language space, generating task-level or corrective instructions that guide the robot through the long-horizon steps and help recover from errors made by the low-level policy. We validated our framework through ex vivo experiments on cholecystectomy, a commonly practiced minimally invasive procedure, and conducted ablation studies to evaluate key components of the system. Our method achieves a 100% success rate across eight different ex vivo gallbladders, operating fully autonomously without human intervention. The hierarchical approach improved the policy\u2019s ability to recover from suboptimal states that are inevitable in the highly dynamic environment of realistic surgical applications. This work demonstrates step-level autonomy in a surgical procedure, marking a milestone toward clinical deployment of autonomous surgical systems.<\/jats:p>","DOI":"10.1126\/scirobotics.adt5254","type":"journal-article","created":{"date-parts":[[2025,7,9]],"date-time":"2025-07-09T17:58:18Z","timestamp":1752083898000},"update-policy":"https:\/\/doi.org\/10.34133\/aaas_crossmark","source":"Crossref","is-referenced-by-count":44,"title":["SRT-H: A hierarchical framework for autonomous surgery via language-conditioned imitation learning"],"prefix":"10.1126","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8669-205X","authenticated-orcid":true,"given":"Ji Woong (Brian)","family":"Kim","sequence":"first","affiliation":[{"name":"Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5118-6610","authenticated-orcid":true,"given":"Juo-Tung","family":"Chen","sequence":"additional","affiliation":[{"name":"Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4833-4184","authenticated-orcid":true,"given":"Pascal","family":"Hansen","sequence":"additional","affiliation":[{"name":"Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9323-9844","authenticated-orcid":true,"given":"Lucy Xiaoyang","family":"Shi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, CA 94305, USA."}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7744-401X","authenticated-orcid":true,"given":"Antony","family":"Goldenberg","sequence":"additional","affiliation":[{"name":"Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8192-9337","authenticated-orcid":true,"given":"Samuel","family":"Schmidgall","sequence":"additional","affiliation":[{"name":"Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8079-1425","authenticated-orcid":true,"given":"Paul Maria","family":"Scheikl","sequence":"additional","affiliation":[{"name":"Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3856-2095","authenticated-orcid":true,"given":"Anton","family":"Deguet","sequence":"additional","affiliation":[{"name":"Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"given":"Brandon M.","family":"White","sequence":"additional","affiliation":[{"name":"Department of Surgery, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"given":"De Ru","family":"Tsai","sequence":"additional","affiliation":[{"name":"Optosurgical, Columbia, MD 21046, USA."}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2169-1464","authenticated-orcid":true,"given":"Richard Jaepyeong","family":"Cha","sequence":"additional","affiliation":[{"name":"Optosurgical, Columbia, MD 21046, USA."},{"name":"Sheikh Zayed Institute for Pediatric Surgical Innovation, Children\u2019s National Hospital, Washington, DC 20010, USA."},{"name":"Department of Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA."}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0694-1704","authenticated-orcid":true,"given":"Jeffrey","family":"Jopling","sequence":"additional","affiliation":[{"name":"Department of Surgery, Johns Hopkins University, Baltimore, MD 21218, USA."}]},{"given":"Chelsea","family":"Finn","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, CA 94305, USA."}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8169-075X","authenticated-orcid":true,"given":"Axel","family":"Krieger","sequence":"additional","affiliation":[{"name":"Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD 21218, USA."}]}],"member":"221","reference":[{"key":"e_1_3_2_2_2","first-page":"1","article-title":"Lapgym - An open source framework for reinforcement learning in robot-assisted laparoscopic surgery","volume":"24","author":"Scheikl P. M.","year":"2023","unstructured":"P. M. Scheikl, B. Gyenes, R. Younis, C. Haas, G. Neumann, F. Mathis-Ullrich, M. Wagner, Lapgym - An open source framework for reinforcement learning in robot-assisted laparoscopic surgery. J. Mach. Learn. Res. 24, 1\u201342 (2023).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Q. Yu M. Moghani K. Dharmarajan V. Schorp William Chung-Ho Panitch J. Liu K. Hari H. Huang M. Mittal K. Goldberg A. Garg Orbitsurgical: An open-simulation framework for learning surgical augmented dexterity. arXiv:2404.16027 [cs.RO] (2024).","DOI":"10.1109\/ICRA57147.2024.10611637"},{"key":"e_1_3_2_4_2","doi-asserted-by":"crossref","unstructured":"J. Xu B. Li B. Lu Y.-H. Liu Q. Dou P.-A. Heng \u201cSurrol: An open-source reinforcement learning centered and dVRK compatible platform for surgical robot learning\u201d in 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE 2021) pp. 1821\u20131828.","DOI":"10.1109\/IROS51168.2021.9635867"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.abj2908"},{"key":"e_1_3_2_6_2","unstructured":"X. Liang C.-P. Wang N. U. Shinde F. Liu F. Richter M. Yip Medic: Autonomous surgical robotic assistance to maximizing exposure for dissection and cautery. arXiv:2409.14287 [cs.RO] (2024)."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.aaw1977"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","unstructured":"B. Thananjeyan A. Garg S. Krishnan C. Chen L. Miller K. Goldberg \u201cMultilateral surgical pattern cutting in 2D orthotropic gauze with deep reinforcement learning policies for tensioning\u201d in 2017 IEEE International Conference on Robotics and Automation (ICRA) (IEEE 2017) pp. 2371\u20132378.","DOI":"10.1109\/ICRA.2017.7989275"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.adf7614"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2022.3171795"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMRB.2022.3214439"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2023.3335693"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2023.3254860"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3227873"},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"Z.-Y. Chiu F. Richter E. K. Funk R. K. Orosco M. C. Yip \u201cBimanual regrasping for suture needles using reinforcement learning for rapid motion planning\u201d in 2021 IEEE International Conference on Robotics and Automation (ICRA) (IEEE 2021) pp. 7737\u20137743.","DOI":"10.1109\/ICRA48506.2021.9561673"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2024.3410297"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2024.3382529"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"C. Shin P. W. Ferguson S. A. Pedram J. Ma E. P. Dutson J. Rosen \u201cAutonomous tissue manipulation via surgical robot using learning based model predictive control\u201d in 2019 International Conference on Robotics and Automation (ICRA) (IEEE 2019) pp. 3875\u20133881.","DOI":"10.1109\/ICRA.2019.8794159"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1177\/02783649211032721"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2020.3045655"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","unstructured":"A. Pore E. Tagliabue M. Piccinelli D. Dall\u2019Alba A. Casals P. Fiorini \u201cLearning from demonstrations for autonomous soft-tissue retraction\u201d in 2021 International Symposium on Medical Robotics (ISMR) (IEEE 2021) pp. 1\u20137.","DOI":"10.1109\/ISMR48346.2021.9661514"},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","unstructured":"S. Schmidgall J. W. Kim A. Kuntz A. E. Ghazi A. Krieger General-purpose foundation models for increased autonomy in robot-assisted surgery. arXiv:2401.00678 [cs.RO] (2024).","DOI":"10.1038\/s42256-024-00917-4"},{"key":"e_1_3_2_23_2","unstructured":"J. W. Kim T. Z. Zhao S. Schmidgall A. Deguet M. Kobilarov C. Finn A. Krieger \u201cSurgical robot transformer (SRT): Imitation learning for surgical subtasks\u201d in Proceedings of the 8th Annual Conference on Robot Learning (CoRL 2024) vol. 270 of Proceedings of Machine Learning Research P. Agrawal O. Kroemer W. Burgard Eds. (MLResearchPress 2024) pp. 130\u2013144."},{"key":"e_1_3_2_24_2","unstructured":"S. Ross G. Gordon D. Bagnell \u201cA reduction of imitation learning and structured prediction to no-regret online learning\u201d in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics vol. 15 of Proceedings of Machine Learning Research G. Gordon D. Dunson M. Dud\u00edk Eds. (MLResearch Press 2011) pp. 627\u2013635."},{"key":"e_1_3_2_25_2","first-page":"6","article-title":"The growing global burden of gallstone disease","volume":"17","author":"Acalovschi M.","year":"2012","unstructured":"M. Acalovschi, F. Lammert, The growing global burden of gallstone disease. World Gastroenterol. News 17, 6\u20139 (2012).","journal-title":"World Gastroenterol. News"},{"key":"e_1_3_2_26_2","unstructured":"K. Hsu M. J. Kim R. Rafailov J. Wu C. Finn \u201cVision-based manipulators need to also see from their hands\u201d in The Tenth International Conference on Learning Representations (ICLR 2022) pp. 1\u201330."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1097\/MS9.0000000000001079"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.21037\/ales.2020.02.06"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0039-6109(03)00169-5"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.adg6042"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1126\/scitranslmed.aad9398"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMRB.2019.2913282"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.4103\/0971-6203.62194"},{"key":"e_1_3_2_34_2","article-title":"Profile: Veebot. Making a robot that can draw blood faster and more safely than a human can","author":"Perry T. S.","year":"2013","unstructured":"T. S. Perry, Profile: Veebot. Making a robot that can draw blood faster and more safely than a human can, IEEE Spectrum, 26 July 2013; https:\/\/spectrum.ieee.org\/profile-veebot.","journal-title":"IEEE Spectrum"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.3390\/s22072501"},{"key":"e_1_3_2_36_2","unstructured":"S. Reed K. Zolna E. Parisotto S. G. Colmenarejo A. Novikov G. Barth-Maron M. Gimenez Y. Sulsky J. Kay J. T. Springenberg T. Eccles J. Bruce A. Razavi A. Edwards N. Heess Y. Chen R. Hadsell O. Vinyals M. Bordbar N. de Freitas A generalist agent. arXiv:2205.06175 [cs.AI] (2022)."},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","unstructured":"A. Brohan N. Brown J. Carbajal Y. Chebotar J. Dabis C. Finn K. Gopalakrishnan K. Hausman A. Herzog J. Hsu J. Ibarz B. Ichter A. Irpan T. Jackson S. Jesmonth N. Joshi R. Julian D. Kalashnikov Y. Kuang I. Leal K.-H. Lee S. Levine Y. Lu U. Malla D. Manjunath I. Mordatch O. Nachum C. Parada J. Peralta E. Perez K. Pertsch J. Quiambao K. Rao M. S. Ryoo G. Salazar P. R. Sanketi K. Sayed J. Singh S. Sontakke A. Stone C. Tan H. Tran V. Vanhoucke S. Vega Q. H. Vuong F. Xia T. Xiao P. Xu S. Xu T. Yu B. Zitkovich \u201cRt-1: Robotics transformer for real-world control at scale\u201d in Proceedings of Robotics: Science and Systems K. Bekris K. Hauser S. Herbert J. Yu Eds. (RSS Foundation 2023) 10.15607\/RSS.2023.XIX.025.","DOI":"10.15607\/RSS.2023.XIX.025"},{"key":"e_1_3_2_38_2","unstructured":"A. Brohan N. Brown J. Carbajal Y. Chebotar X. Chen K. Choromanski T. Ding D. Driess A. Dubey C. Finn P. Florence C. Fu M. G. Arenas K. Gopalakrishnan K. Han K. Hausman A. Herzog J. Hsu B. Ichter A. Irpan N. Joshi R. Julian D. Kalashnikov Y. Kuang I. Leal L. Lee T.-W. E. Lee S. Levine Y. Lu H. Michalewski I. Mordatch K. Pertsch K. Rao K. Reymann M. Ryoo G. Salazar P. Sanketi P. Sermanet J. Singh A. Singh R. Soricut H. Tran V. Vanhoucke Q. Vuong A. Wahid S. Welker P. Wohlhart J. Wu F. Xia T. Xiao P. Xu S. Xu T. Yu B. Zitkovich \u201cRt-2: Vision-language-action models transfer web knowledge to robotic control\u201d in Proceedings of the 7th Conference on Robot Learning vol. 229 of Proceedings of Machine Learning Research J. Tan M. Toussaint K. Darvish Eds. (MLResearchPress 2023) pp. 2165\u20132183."},{"key":"e_1_3_2_39_2","unstructured":"A. O\u2019Neill A. Rehman A. Gupta A. Maddukuri A. Gupta A. Padalkar A. Lee A. Pooley A. Gupta A. Mandlekar A. Jain A. Tung A. Bewley A. Herzog A. Irpan A. Khazatsky A. Rai A. Gupta A. Wang A. Kolobov A. Singh A. Garg A. Kembhavi A. Xie A. Brohan A. Raffin A. Sharma A. Yavary A. Jain A. Balakrishna A. Wahid B. Burgess-Limerick B. Kim B. Sch\u00f6lkopf B. Wulfe B. Ichter C. Lu C. Xu C. Le C. Finn C. Wang C. Xu C. Chi C. Huang C. Chan C. Agia C. Pan C. Fu C. Devin D. Xu D. Morton D. Driess D. Chen D. Pathak D. Shah D. B\u00fcchler D. Jayaraman D. Kalashnikov D. Sadigh E. Johns E. Foster F. Liu F. Ceola F. Xia F. Zhao F. V. Frujeri F. Stulp G. Zhou G. S. Sukhatme G. Salhotra G. Yan G. Feng G. Schiavi G. Berseth G. Kahn G. Yang G. Wang H. Su H.-S. Fang H. Shi H. Bao H. B. Amor H. I. Christensen H. Furuta H. Bharadhwaj H. Walke H. Fang H. Ha I. Mordatch I. Radosavovic I. Leal J. Liang J. Abou-Chakra J. Kim J. Drake J. Peters J. Schneider J. Hsu J. Vakil J. Bohg J. Bingham J. Wu J. Gao J. Hu J. Wu J. Wu J. Sun J. Luo J. Gu J. Tan J. Oh J. Wu J. Lu J. Yang J. Malik J. Silv\u00e9rio J. Hejna J. Booher J. Tompson J. Yang J. Salvador J. J. Lim J. Han K. Wang K. Rao K. Pertsch K. Hausman K. Go K. Gopalakrishnan K. Goldberg K. Byrne K. Oslund K. Kawaharazuka K. Black K. Lin K. Zhang K. Ehsani K. Lekkala K. Ellis K. Rana K. Srinivasan K. Fang K. P. Singh K.-H. Zeng K. Hatch K. Hsu L. Itti L. Y. Chen L. Pinto L. Fei-Fei L. Tan L. Fan L. Ott L. Lee L. Weihs M. Chen M. Lepert M. Memmel M. Tomizuka M. Itkina M. G. Castro M. Spero M. Du M. Ahn M. C. Yip M. Zhang M. Ding M. Heo M. K. Srirama M. Sharma M. J. Kim M. Z. Irshad N. Kanazawa N. Hansen N. Heess N. J. Joshi N. Suenderhauf N. Liu N. D. Palo N. M. M. Shafiullah O. Mees O. Kroemer O. Bastani P. R. Sanketi P. Miller P. Yin P. Wohlhart P. Xu P. D. Fagan P. Mitrano P. Sermanet P. Abbeel P. Sundaresan Q. Chen Q. Vuong R. Rafailov R. Tian R. Doshi R. Mart\u00edn-Mart\u00edn R. Baijal R. Scalise R. Hendrix R. Lin R. Qian R. Zhang R. Mendonca R. Shah R. Hoque R. Julian S. Bustamante S. Kirmani S. Levine S. Lin S. Moore S. Bahl S. Dass S. Sonawani S. Tulsiani S. Song S. Xu S. Haldar S. Karamcheti S. Adebola S. Guist S. Nasiriany S. Schaal S. Welker S. Tian S. Ramamoorthy S. Dasari S. Belkhale S. Park S. Nair S. Mirchandani T. Osa T. Gupta T. Harada T. Matsushima T. Xiao T. Kollar T. Yu T. Ding T. Davchev T. Z. Zhao T. Armstrong T. Darrell T. Chung V. Jain V. Kumar V. Vanhoucke V. Guizilini W. Zhan W. Zhou W. Burgard X. Chen X. Chen X. Wang X. Zhu X. Geng X. Liu X. Liangwei X. Li Y. Pang Y. Lu Y. J. Ma Y. Kim Y. Chebotar Y. Zhou Y. Zhu Y. Wu Y. Xu Y. Wang Y. Bisk Y. Dou Y. Cho Y. Lee Y. Cui Y. Cao Y.-H. Wu Y. Tang Y. Zhu Y. Zhang Y. Jiang Y. Li Y. Li Y. Iwasawa Y. Matsuo Z. Ma Z. Xu Z. J. Cui Z. Zhang Z. Fu Z. Lin Open X-Embodiment: Robotic learning datasets and RT-X models (2023); https:\/\/robotics-transformer-x.github.io."},{"key":"e_1_3_2_40_2","unstructured":"Y. Hu Q. Xie V. Jain J. Francis J. Patrikar N. Keetha S. Kim Y. Xie T. Zhang H.-S. Fang S. Zhao S. Omidshafiei D.-K. Kim Ali-akbar Agha-mohammadi K. Sycara M. Johnson-Roberson D. Batra X. Wang S. Scherer C. Wang Z. Kira F. Xia Y. Bisk Toward general-purpose robots via foundation models: A survey and metaanalysis. arXiv:2312.08782 [cs.RO] (2023)."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.optlaseng.2024.108165"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"J. Sayers N. G. Czakon P. K. Day T. P. Downes R. P. Duan J. Gao J. Glenn S. R. Golwala M. I. Hollister H. G. LeDuc B. A. Mazin P. R. Maloney O. Noroozian H. T. Nguyen J. A. Schlaerth S. Siegel J. E. Vaillancourt A. Vayonakis P. R. Wilson J. Zmuidzinas \u201cOptics for music: a new (sub) millimeter camera for the Caltech submillimeter observatory\u201d in Millimeter Submillimeter and Far-Infrared Detectors and Instrumentation for Astronomy V (SPIE 2010) vol. 7741 pp. 255\u2013266.","DOI":"10.1117\/12.857324"},{"key":"e_1_3_2_43_2","first-page":"431","article-title":"A simple solution to lens fogging during robotic and laparoscopic surgery","volume":"12","author":"Nezhat C.","year":"2008","unstructured":"C. Nezhat, V. Morozov, A simple solution to lens fogging during robotic and laparoscopic surgery. J. Soc. Laparoendosc. Surg. 12, 431 (2008).","journal-title":"J. Soc. Laparoendosc. Surg."},{"key":"e_1_3_2_44_2","unstructured":"ClickClean ClickClean laparoscope lens shield device; https:\/\/clickclean-medeon.com\/."},{"key":"e_1_3_2_45_2","unstructured":"ClearCam Clearcam\u2014Laparoscope lens cleaning; www.clearcam-med.com\/."},{"key":"e_1_3_2_46_2","unstructured":"Y. Chebotar Q. Vuong A. Irpan K. Hausman F. Xia Y. Lu A. Kumar T. Yu A. Herzog K. Pertsch K. Gopalakrishnan J. Ibarz O. Nachum S. Sontakke G. Salazar H. T. Tran J. Peralta C. Tan D. Manjunath J. Singht B. Zitkovich T. Jackson K. Rao C. Finn S. Levine \u201cQ-transformer: Scalable offline reinforcement learning via autoregressive q-functions\u201d in Proceedings of the 7th Conference on Robot Learning vol. 229 of Proceedings of Machine Learning Research J. Tan M. Toussaint K. Darvish Eds. (MLResearchPress 2023) pp. 3909\u20133928."},{"key":"e_1_3_2_47_2","unstructured":"A. Z. Ren A. Dixit A. Bodrova S. Singh S. Tu N. Brown P. Xu L. Takayama F. Xia J. Varley Z. Xu D. Sadigh A. Zeng A. Majumdar \u201cRobots that ask for help: Uncertainty alignment for large language model planners\u201d in Proceedings of the 7th Conference on Robot Learning J. Tan M. Toussaint K. Darvish Eds. (MLResearchPress 2023) vol. 229 pp. 661\u2013682."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1561\/2200000101"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.3028766"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.3390\/computers12110228"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2024.3460408"},{"key":"e_1_3_2_52_2","doi-asserted-by":"crossref","unstructured":"M. Kelly C. Sidrane K. Driggs-Campbell M. J. Kochenderfer \u201cHgdagger: Interactive imitation learning with human experts\u201d in 2019 International Conference on Robotics and Automation (ICRA) (IEEE 2019) pp. 8077\u20138083.","DOI":"10.1109\/ICRA.2019.8793698"},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","unstructured":"L. X. Shi Z. Hu T. Z. Zhao A. Sharma K. Pertsch J. Luo S. Levine C. Finn \u201cYell at your robot: Improving on-the-fly from language corrections\u201d in Proceedings of Robotics: Science and Systems D. Kulic G. Venture K. Bekris E. Coronado Eds. (2024) 10.15607\/RSS.2024.XX.025.","DOI":"10.15607\/RSS.2024.XX.025"},{"key":"e_1_3_2_54_2","unstructured":"A. Vaswani N. Shazeer N. Parmar J. Uszkoreit L. Jones A. N. Gomez \u0141. Kaiser I. Polosukhin \u201cAttention is all you need\u201d in Advances in Neural Information Processing Systems I. Guyon U. Von Luxburg S. Bengio H. Wallach R. Fergus S. Vishwanathan R. Garnett Eds. (Curran Associates 2017) vol. 30 pp. 5998\u20136008."},{"key":"e_1_3_2_55_2","unstructured":"H. Liu C. Li Y. Li B. Li Y. Zhang S. Shen Y. J. Lee LLaVA-NeXT: Improved reasoning OCR and world knowledge LLaVa (2024); https:\/\/llava-vl.github.io\/blog\/2024-01-30-llava-next\/."},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","unstructured":"Z. Liu Y. Lin Y. Cao H. Hu Y. Wei Z. Zhang St. Lin B. Guo \u201cSwin transformer: Hierarchical vision transformer using shifted windows\u201d in Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV) (IEEE 2021) pp. 10012\u201310022.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_2_57_2","doi-asserted-by":"crossref","unstructured":"J. Deng W. Dong R. Socher L.-J. Li K. Li L. Fei-Fei \u201cImageNet: A large-scale hierarchical image database\u201d in 2009 IEEE Conference on Computer Vision and Pattern Recognition (IEEE 2009) pp. 248\u2013255.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","unstructured":"T. Z. Zhao V. Kumar S. Levine C. Finn \u201cLearning fine-grained bimanual manipulation with low-cost hardware\u201d in Proceedings of Robotics: Science and Systems K. Bekris K. Hauser S. Herbert J. Yu Eds. (RSS Foundation 2023) 10.15607\/RSS.2023.XIX.016.","DOI":"10.15607\/RSS.2023.XIX.016"},{"key":"e_1_3_2_59_2","unstructured":"I. Loshchilov F. Hutter \u201cDecoupled weight decay regularization\u201d in ICLR 2019: The Seventh International Conference on Learning Representations (ICLR 2019)."},{"key":"e_1_3_2_60_2","doi-asserted-by":"crossref","unstructured":"E. D. Cubuk B. Zoph J. Shlens Q. V. Le \u201cRandaugment: Practical automated data augmentation with a reduced search space\u201d in 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE 2020) pp. 3008\u20133017.","DOI":"10.1109\/CVPRW50498.2020.00359"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.3390\/info11020125"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3010746"},{"key":"e_1_3_2_63_2","unstructured":"J. Devlin M.-W. Chang K. Lee K. Toutanova \u201cBERT: Pre-training of deep bidirectional transformers for language understanding\u201d in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1 (Long and Short Papers) J. Burstein C. Doran T. Soorio Eds. (Association for Computational Linguistics 2019) pp. 4171\u20134186."},{"key":"e_1_3_2_64_2","unstructured":"M. Tan Q. Le \u201cEfficientNet: Rethinking model scaling for convolutional neural networks\u201d in Proceedings of the 36th International Conference on Machine Learning K. Chaudhuri R. Salakhutdinov Eds. vol. 97 of Proceedings of Machine Learning Research (MLResearchPress 2019) pp. 6105\u20136114."},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","unstructured":"E. Perez F. Strub H. De Vries V. Dumoulin A. Courville \u201cFiLM: Visual reasoning with a general conditioning layer\u201d in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2018) vol. 32 pp. 3942\u20133951.","DOI":"10.1609\/aaai.v32i1.11671"},{"key":"e_1_3_2_66_2","unstructured":"V. Sanh L. Debut J. Chaumond T. Wolf \u201cDistilBERT a distilled version of BERT: Smaller faster cheaper and lighter\u201d in 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing @ NeurIPS 2019 (NeurIPS 2019) pp. 1\u20135."},{"key":"e_1_3_2_67_2","doi-asserted-by":"crossref","unstructured":"Y. Zhou C. Barnes L. Jingwan Y. Jimei L. Hao \u201cOn the continuity of rotation representations in neural networks\u201d in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE 2019) pp. 5745\u20135753.","DOI":"10.1109\/CVPR.2019.00589"}],"container-title":["Science Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.science.org\/doi\/pdf\/10.1126\/scirobotics.adt5254","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,9]],"date-time":"2025-07-09T17:58:49Z","timestamp":1752083929000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.science.org\/doi\/10.1126\/scirobotics.adt5254"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,9]]},"references-count":66,"journal-issue":{"issue":"104","published-print":{"date-parts":[[2025,7,9]]}},"alternative-id":["10.1126\/scirobotics.adt5254"],"URL":"https:\/\/doi.org\/10.1126\/scirobotics.adt5254","relation":{},"ISSN":["2470-9476"],"issn-type":[{"value":"2470-9476","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,9]]},"assertion":[{"value":"2024-09-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"eadt5254"}}