{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T17:58:15Z","timestamp":1773770295250,"version":"3.50.1"},"reference-count":310,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T00:00:00Z","timestamp":1772582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"crossref"}]},{"name":"NSERC Alliance program"},{"name":"NSERC CREATE ADVENTOR fellowship"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action models, have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, reason, and act through physical interaction, mobile service robots can achieve more flexible understanding, adaptive behavior, and robust task execution in dynamic real-world environments. Despite this progress, embodied AI for mobile service robots continues to face fundamental challenges related to the translation of natural language instructions into executable robot actions, multimodal perception in human-centered environments, uncertainty estimation for safe decision-making, and computational constraints for real-time onboard deployment. In this paper, we present the first systematic review of foundation models in mobile service robotics, following the preferred reporting items for systematic reviews and meta-analysis (PRISMA) guidelines. Using an OpenAlex literature search, we considered 7506 papers for the years spanning 1968\u20132025. Our detailed analysis identified four main challenges and how recent advances in foundation models, related to the translation of natural language instructions into executable robot actions, multimodal perception in human-centered environments, uncertainty estimation for safe decision-making, and computational constraints for real-time onboard deployment, have addressed these challenges. We further examine real-world applications in domestic assistance, healthcare, and service automation, highlighting how foundation models enable context-aware, socially responsive, and generalizable robot behaviors. Beyond technical considerations, we discuss ethical, societal, human-interaction, and physical design and ergonomic implications associated with deploying foundation-model-enabled service robots in human environments. Finally, we outline future research directions emphasizing reliability and lifelong adaptation, privacy-aware and resource-constrained deployment, as well as the governance and human-in-the-loop frameworks required for safe, scalable, and trustworthy mobile service robotics.<\/jats:p>","DOI":"10.3390\/robotics15030055","type":"journal-article","created":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T15:01:07Z","timestamp":1772636467000},"page":"55","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-1168-5247","authenticated-orcid":false,"given":"Matthew","family":"Lisondra","sequence":"first","affiliation":[{"name":"Autonomous Systems and Biomechatronics Laboratory (ASBLab), Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5097-3965","authenticated-orcid":false,"given":"Beno","family":"Benhabib","sequence":"additional","affiliation":[{"name":"Autonomous Systems and Biomechatronics Laboratory (ASBLab), Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7080-6857","authenticated-orcid":false,"given":"Goldie","family":"Nejat","sequence":"additional","affiliation":[{"name":"Autonomous Systems and Biomechatronics Laboratory (ASBLab), Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada"},{"name":"KITE, Toronto Rehabilitation Institute, University Health Network (UHN), Toronto, ON M5G 2A2, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,4]]},"reference":[{"key":"ref_1","unstructured":"Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training, OpenAI. Available online: https:\/\/cdn.openai.com\/research-covers\/language-unsupervised\/language_understanding_paper.pdf."},{"key":"ref_2","unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020). Language Models Are Few-Shot Learners. Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6\u201312 December 2020, Curran Associates, Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3495724.3495883."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","article-title":"Large Language Models in Medicine","volume":"29","author":"Thirunavukarasu","year":"2023","journal-title":"Nat. Med."},{"key":"ref_4","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., and Bhosale, S. (2023). LLaMA 2: Open Foundation and Fine-Tuned Chat Models. arXiv."},{"key":"ref_5","unstructured":"Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., and Hashimoto, T.B. (2023). Alpaca: A Strong, Replicable Instruction-Following Model, Stanford Center for Research on Foundation Models. Available online: https:\/\/crfm.stanford.edu\/2023\/03\/13\/alpaca.html."},{"key":"ref_6","first-page":"11324","article-title":"PaLM: Scaling Language Modeling with Pathways","volume":"24","author":"Chowdhery","year":"2023","journal-title":"J. Mach. Learn. Res."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"5625","DOI":"10.1109\/TPAMI.2024.3369699","article-title":"Vision-Language Models for Vision Tasks: A Survey","volume":"46","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., and Zhang, L. (2018). Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18\u201322 June 2018, IEEE.","DOI":"10.1109\/CVPR.2018.00636"},{"key":"ref_9","unstructured":"Li, L.H., Yatskar, M., Yin, D., Hsieh, C.-J., and Chang, K.-W. (2019). VisualBERT: A Simple and Performant Baseline for Vision and Language. arXiv."},{"key":"ref_10","unstructured":"Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv."},{"key":"ref_11","unstructured":"Bao, H., Dong, L., Piao, S., and Wei, F. (2021). BEiT: BERT Pre-Training of Image Transformers. arXiv."},{"key":"ref_12","unstructured":"Ichter, B., Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., and Jang, E. (2023). Do as I Can, Not as I Say: Grounding Language in Robotic Affordances. Proceedings of the 6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand, 14\u201318 December 2022, PMLR. Available online: https:\/\/proceedings.mlr.press\/v205\/ichter23a.html."},{"key":"ref_13","unstructured":"Driess, D., Xia, F., Sajjadi, M.S.M., Lynch, C., Chowdhery, A., Ichter, B., Wahid, A., Tompson, J., Vuong, Q., and Yu, T. (2023). PaLM-E: An Embodied Multimodal Language Model. Proceedings of the 40th International Conference on Machine Learning (ICML 2023), Honolulu, HI, USA, 23\u201329 July 2023, JMLR.org. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3618408.3618748."},{"key":"ref_14","first-page":"2165","article-title":"RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control","volume":"Volume 229","author":"Zitkovich","year":"2023","journal-title":"Proceedings of the 7th Conference on Robot Learning (CoRL 2023), Atlanta, GA, USA, 6\u20139 November 2023"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2458","DOI":"10.1109\/JIOT.2024.3471904","article-title":"AIot Smart Home via Autonomous LLM Agents","volume":"12","author":"Rivkin","year":"2024","journal-title":"IEEE Internet Things J."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Giudici, M., Padalino, L., Paolino, G., Paratici, I., Pascu, A.I., and Garzotto, F. (2024). Designing Home Automation Routines Using An LLM-Based Chatbot. Designs, 8.","DOI":"10.3390\/designs8030043"},{"key":"ref_17","unstructured":"Li, Y., Wen, H., Wang, W., Li, X., Yuan, Y., Liu, G., Liu, J., Xu, W., Wang, X., and Sun, Y. (2024). Personal LLM Agents: Insights and Survey About the Capability, Efficiency and Security. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Pandya, A. (2023). ChatGPT-Enabled daVinci Surgical Robot Prototype: Advancements and Limitations. Robotics, 12.","DOI":"10.20944\/preprints202305.1992.v1"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1007\/s12369-020-00687-0","article-title":"Mini: A New Social Robot for the Elderly","volume":"12","author":"Salichs","year":"2020","journal-title":"Int. J. Soc. Robot."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1145\/2912150","article-title":"Empathy in Virtual Agents and Robots: A Survey","volume":"7","author":"Paiva","year":"2017","journal-title":"ACM Trans. Interact. Intell. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhao, X., Li, M., Weber, C., Hafez, M.B., and Wermter, S. (2023). Chat with the Environment: Interactive Multimodal Perception using Large Language Models. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2023), Detroit, MI, USA, 1\u20135 October 2023, IEEE.","DOI":"10.1109\/IROS55552.2023.10342363"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Xia, Y., Zhang, J., Jazdi, N., and Weyrich, M. (2024). Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility. arXiv.","DOI":"10.51202\/9783181024379-375"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Qin, H. (2024). Intelligent Industrial Production Process Automatic Regulation System Based on LLM Agents. Proceedings of the 5th International Conference on Artificial Intelligence and Electromechanical Automation (AIEA 2024), 14\u201316 June 2024, IEEE.","DOI":"10.1109\/AIEA62095.2024.10692701"},{"key":"ref_24","unstructured":"Huang, W., Abbeel, P., Pathak, D., and Mordatch, I. (2022). Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Proceedings of the 39th International Conference on Machine Learning (ICML 2022), Baltimore, MD, USA, 17\u201323 July 2022, PMLR. Available online: https:\/\/proceedings.mlr.press\/v162\/huang22a.html."},{"key":"ref_25","first-page":"1769","article-title":"Inner Monologue: Embodied Reasoning through Planning with Language Models","volume":"Volume 205","author":"Huang","year":"2023","journal-title":"Proceedings of the 6th Conference on Robot Learning (CoRL 2023), 14\u201318 December 2023"},{"key":"ref_26","first-page":"894","article-title":"CLIPort: What and Where Pathways for Robotic Manipulation","volume":"Volume 164","author":"Shridhar","year":"2022","journal-title":"Proceedings of the 5th Conference on Robot Learning (CoRL 2021), 8\u201311 November 2021"},{"key":"ref_27","unstructured":"Narcomey, A., Tsoi, N., Desai, R., and V\u00e1zquez, M. (2024). Learning human preferences over robot behavior as soft planning constraints. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Cao, Z., Wang, Z., Xie, S., Liu, A., and Fan, L. (2024). Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), Seattle, WA, USA, 17\u201321 June 2024, IEEE.","DOI":"10.1109\/CVPR52733.2024.01713"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Xiao, A., Janaka, N., Hu, T., Gupta, A., Li, K., and Yu, C. (2025). Robi Butler: Multimodal Remote Interaction with a Household Robot Assistant. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2025), Atlanta, GA, USA, 19\u201323 May 2025, IEEE.","DOI":"10.1109\/ICRA55743.2025.11128329"},{"key":"ref_30","unstructured":"Cao, Y., Zhang, J., Yu, Z., Liu, S., Qin, Z., Zou, Q., Du, B., and Xu, K. (2025). CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV 2025), Honolulu, HI, USA, 19\u201323 October 2025, IEEE\/CVF. Available online: https:\/\/openaccess.thecvf.com\/content\/ICCV2025\/papers\/Cao_CogNav_Cognitive_Process_Modeling_for_Object_Goal_Navigation_with_LLMs_ICCV_2025_paper.pdf."},{"key":"ref_31","unstructured":"Puig, X., Undersander, E., Szot, A., Dallaire Cote, M., Yang, T.-Y., Partsey, R., Desai, R., Clegg, A.W., Hlavac, M., and Min, S.Y. (2023). Habitat 3.0: A Co-Habitat for Humans, Ava-tars and Robots. arXiv."},{"key":"ref_32","unstructured":"Eftekhar, A., Weihs, L., Hendrix, R., Caglar, E., Salvador, J., Herrasti, A., Han, W., VanderBil, E., Kembhavi, A., and Farhadi, A. (2024). The One RING: A Robotic Indoor Navigation Generalist. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hu, J., Hendrix, R., Farhadi, A., Kembhavi, A., Mart\u00edn-Mart\u00edn, R., and Stone, P. (2025). FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2025), Atlanta, GA, USA, 19\u201323 May 2025, IEEE.","DOI":"10.1109\/ICRA55743.2025.11127934"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1007\/s10845-023-02294-y","article-title":"Embodied Intelligence in Manufacturing: Leveraging Large Language Models for Autonomous Industrial Robotics","volume":"36","author":"Fan","year":"2025","journal-title":"J. Intell. Manuf."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Salierno, G., Leonardi, L., and Cabri, G. (2025). Generative AI and large language models in Industry 5.0: Shaping Smarter Sustainable Cities. Encyclopedia, 5.","DOI":"10.3390\/encyclopedia5010030"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, S., Wang, J., Dai, R., Ma, W., Ng, W.Y., and Hu, Y. (2025). RoboNurse-VLA: Robotic Scrub Nurse System Based on Vision-Language-Action Model. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2025), Detroit, MI, USA, 19\u201325 October 2025, IEEE.","DOI":"10.1109\/IROS60139.2025.11246030"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1087","DOI":"10.1007\/s10514-023-10139-z","article-title":"Tidybot: Personalized Robot Assistance with Large Language Models","volume":"47","author":"Wu","year":"2023","journal-title":"Auton. Robot."},{"key":"ref_38","first-page":"358","article-title":"Audio-Visual Navigation with Anti-Backtracking","volume":"Volume 15318","author":"Antonacopoulos","year":"2025","journal-title":"Pattern Recognition"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/0004-3702(71)90010-5","article-title":"STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving","volume":"2","author":"Fikes","year":"1971","journal-title":"Artif. Intell."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1613\/jair.1141","article-title":"SHOP2: An HTN Planning System","volume":"20","author":"Nau","year":"2003","journal-title":"J. Artif. Intell. Res."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/MIS.2005.20","article-title":"Applications of SHOP and SHOP2","volume":"20","author":"Nau","year":"2005","journal-title":"IEEE Intell. Syst."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1613\/jair.1129","article-title":"PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains","volume":"20","author":"Fox","year":"2003","journal-title":"J. Artif. Intell. Res."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Ghallab, M., Nau, D., and Traverso, P. (2004). Automated Planning: Theory and Practice, Elsevier. Morgan Kaufmann.","DOI":"10.1016\/B978-155860856-6\/50021-1"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1177\/0278364915602060","article-title":"Tell me Dave: Context-Sensitive Grounding of Natural Language to Manipulation Instructions","volume":"35","author":"Misra","year":"2016","journal-title":"Int. J. Robot. Res."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"MacGlashan, J., Babes-Vroman, M., desJardins, M., Littman, M., Muresan, S., Squire, S., Tellex, S., Arumugam, D., and Yang, L. (2015). Grounding English commands to reward functions. Proceedings of the Robotics: Science and Systems XI (RSS 2015), Rome, Italy, 13\u201317 July 2015, Robotics: Science and Systems Foundation.","DOI":"10.15607\/RSS.2015.XI.018"},{"key":"ref_46","unstructured":"Chen, D., and Mooney, R. (2011, January 7\u201311). Learning to Interpret Natural Language Navigation Instructions from Observations. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2011), San Francisco, CA, USA."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1007\/978-3-319-54181-5_14","article-title":"FuseNet: Incorporating Depth into Semantic Segmentation Via Fusion-Based CNN Architecture","volume":"Volume 10111","author":"Lai","year":"2017","journal-title":"Computer Vision\u2014ACCV 2016"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.isprsjprs.2020.06.014","article-title":"X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data","volume":"167","author":"Hong","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"2862","DOI":"10.1109\/LRA.2019.2922618","article-title":"Automatic Multi-Sensor Extrinsic Calibration for Mobile Robots","volume":"4","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.aej.2024.06.025","article-title":"Enhancing 3D Object Detection Through Multi-Modal Fusion for Cooperative Perception","volume":"104","author":"Xia","year":"2024","journal-title":"Alex. Eng. J."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"332","DOI":"10.1016\/j.csl.2008.10.001","article-title":"The Ravenclaw Dialog Management Framework: Architecture and systems","volume":"23","author":"Bohus","year":"2009","journal-title":"Comput. Speech Lang."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1109\/TRO.2016.2633567","article-title":"How Behavior Trees Modularize Hybrid Control Systems and Generalize Sequential Behavior Compositions, The Subsumption Architecture, And Decision Trees","volume":"33","author":"Colledanchise","year":"2017","journal-title":"IEEE Trans. Robot."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1109\/JPROC.2012.2225812","article-title":"POMDP-Based Statistical Spoken Dialog Systems: A Review","volume":"101","author":"Young","year":"2013","journal-title":"Proc. IEEE"},{"key":"ref_54","first-page":"1050","article-title":"Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning","volume":"Volume 48","author":"Gal","year":"2016","journal-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 19\u201324 June 2016"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"27","DOI":"10.3233\/TAD-2010-0285","article-title":"Towards Automated Models of Activities of Daily Life","volume":"22","author":"Beetz","year":"2010","journal-title":"Technol. Disabil."},{"key":"ref_56","first-page":"1334","article-title":"End-to-End Training of Deep Visuomotor Policies","volume":"17","author":"Levine","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_57","unstructured":"Duan, Y., Chen, X., Houthooft, R., Schulman, J., and Abbeel, P. (2016). Benchmarking Deep Reinforcement Learning for Continuous Control. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 20\u201322 June 2016, JMLR.org. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3045390.3045531."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1109\/TASE.2014.2376492","article-title":"A Survey of Research on Cloud Robotics and Automation","volume":"12","author":"Kehoe","year":"2015","journal-title":"IEEE Trans. Autom. Sci. Eng."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1145\/3724420","article-title":"Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models","volume":"57","author":"Wang","year":"2025","journal-title":"ACM Comput. Surv."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., and Malik, J. (2019). Habitat: A Platform for Embodied AI Research. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV 2019), Seoul, Republic of Korea, 27 October\u20132 November  2019, IEEE.","DOI":"10.1109\/ICCV.2019.00943"},{"key":"ref_61","unstructured":"Hu, Y., Xie, Q., Jain, V., Francis, J., Patrikar, J., Keetha, N., Kim, S., Xie, Y., Zhang, T., and Fang, H.-S. (2024). Toward General-Purpose Robots Via Foundation Models: A Survey and Meta-Analysis. arXiv."},{"key":"ref_62","unstructured":"Zhou, H., Yao, X., Mees, O., Meng, Y., Xiao, T., Bisk, Y., Oh, J., Johns, E., Shridhar, M., and Shah, D. (2024). Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation. arXiv."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1177\/02783649241281508","article-title":"Foundation Models in Robotics: Applications, Challenges, and the Future","volume":"44","author":"Firoozi","year":"2024","journal-title":"Int. J. Robot. Res."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"n71","DOI":"10.1136\/bmj.n71","article-title":"The PRISMA 2020 statement: An Updated Guideline for Reporting Systematic Reviews","volume":"372","author":"Page","year":"2021","journal-title":"BMJ"},{"key":"ref_65","unstructured":"Priem, J., Piwowar, H., and Orr, R. (2022). OpenAlex: A Fully-Open Index of Scholarly Works, Authors, Venues, Institutions, And Concepts. arXiv."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"G\u00f6rner, M., Haschke, R., Ritter, H., and Zhang, J. (2019). MoveIt! Task Constructor for Task-Level Motion Planning. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2019), Montreal, QC, Canada, 20\u201324 May 2019, IEEE.","DOI":"10.1109\/ICRA.2019.8793898"},{"key":"ref_67","first-page":"64","article-title":"Approaching the Symbol Grounding Problem with Probabilistic Graphical Models","volume":"32","author":"Tellex","year":"2011","journal-title":"AI Mag."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Williams, T., Cantrell, R., Briggs, G., Schermerhorn, P., and Scheutz, M. (2013, January 14\u201318). Grounding Natural Language References to Unvisited and Hypothetical Locations. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Bellevue, WA, USA.","DOI":"10.1609\/aaai.v27i1.8563"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Kollar, T., Tellex, S., Roy, D., and Roy, N. (2010). Toward Understanding Natural Language Directions. Proceedings of the 5th ACM\/IEEE International Conference on Human-Robot Interaction (HRI), Osaka, Japan, 2\u20135 March 2010, IEEE.","DOI":"10.1109\/HRI.2010.5453186"},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"1191","DOI":"10.55417\/fr.2022040","article-title":"Language Understanding for Field and Service Robots in A Priori Unknown Environments","volume":"2","author":"Walter","year":"2022","journal-title":"Field Robot."},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Paul, R., Arkin, J., Roy, N., and Howard, T.M. (2017). Grounding Abstract Spatial Concepts for Language Interaction with Robots. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017), Melbourne, Australia, 19\u201325 August 2017, IJCAI.","DOI":"10.24963\/ijcai.2017\/696"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Kaelbling, L.P., and Lozano-P\u00e9rez, T. (2011). Hierarchical Task and Motion Planning in The Now. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2011), Shanghai, China, 9\u201313 May 2011, IEEE.","DOI":"10.1109\/ICRA.2011.5980391"},{"key":"ref_73","first-page":"193","article-title":"Planning with Independent Task Networks","volume":"Volume 10505","author":"Thimm","year":"2017","journal-title":"KI 2017: Advances in Artificial Intelligence"},{"key":"ref_74","first-page":"57","article-title":"FF: The Fast-Forward Planning System","volume":"22","author":"Hoffmann","year":"2001","journal-title":"AI Mag."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1613\/jair.1705","article-title":"The Fast Downward Planning System","volume":"26","author":"Helmert","year":"2006","journal-title":"J. Artif. Intell. Res."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Prassler, E., Z\u00f6llner, M., Bischoff, R., Burgard, W., Haschke, R., H\u00e4gele, M., Lawitzky, G., Nebel, B., Pl\u00f6ger, P., and Reiser, U. (2012). Semantic Attachments for Domain-Independent Planning Systems. Towards Service Robots for Everyday Environments, Springer.","DOI":"10.1007\/978-3-642-25116-0"},{"key":"ref_77","unstructured":"Mao, W., Desai, R., Iuzzolino, M.L., and Kamra, N. (2023). Action Dynamics Task Graphs for Learning Plannable Representations of Procedural Tasks. arXiv."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Bacon, P.-L., Harb, J., and Precup, D. (2017). The Option-Critic Architecture. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017), San Francisco, CA, USA, 4\u20139 February 2017, AAAI Press. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3298483.3298491.","DOI":"10.1609\/aaai.v31i1.10916"},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1177\/0278364910386986","article-title":"Motion planning under uncertainty for robotic tasks with long time horizons","volume":"30","author":"Kurniawati","year":"2011","journal-title":"Int. J. Robot. Res."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TRO.2022.3200138","article-title":"Partially Observable Markov Decision Processes in Robotics: A Survey","volume":"39","author":"Lauri","year":"2023","journal-title":"IEEE Trans. Robot."},{"key":"ref_81","unstructured":"Silver, D., and Veness, J. (2010). Monte-Carlo Planning in Large POMDPs. Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS 2010), Vancouver, BC, Canada, 6\u20139 December 2010, Curran Associates, Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/2997046.2997137."},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Beetz, M. (1999). Structured Reactive Controllers: Controlling Robots That Perform Everyday Activity. Proceedings of the 3rd Annual Conference on Autonomous Agents (AGENTS \u201999), Seattle, WA, USA, 1\u20135 May 1999, Association for Computing Machinery.","DOI":"10.1145\/301136.301201"},{"key":"ref_83","unstructured":"Kortenkamp, D., Bonasso, R.P., and Murphy, R. (1998). Three-Layer Architectures. Artificial Intelligence and Mobile Robots: Case Studies of Successful Robot Systems, MIT Press. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/292092.292130."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Cashmore, M., Fox, M., Long, D., Magazzeni, D., Ridder, B., Carrera, A., Palomeras, N., Hurtos, N., and Carreras, M. (2015). ROSPlan: Planning in the Robot Operating System. Proceedings of the 25th International Conference on Automated Planning and Scheduling (ICAPS 2015), Jerusalem, Israel, 7\u201311 June 2015, AAAI Press.","DOI":"10.1609\/icaps.v25i1.13699"},{"key":"ref_85","unstructured":"Stentz, A. (1994, January 8\u201313). Optimal and Efficient Path Planning for Partially-Known Environments. Proceedings of the 1994 IEEE International Conference on Robotics and Automation (ICRA), San Diego, CA, USA."},{"key":"ref_86","unstructured":"Koenig, S., and Likhachev, M. (August, January 28). D* Lite. Proceedings of the 18th National Conference on Artificial Intelligence (AAAI 2002), Edmonton, AB, Canada. Available online: https:\/\/cdn.aaai.org\/AAAI\/2002\/AAAI02-072.pdf."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"1194","DOI":"10.1177\/0278364913484072","article-title":"Integrated Task and Motion Planning in Belief Space","volume":"32","author":"Kaelbling","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0004-3702(98)00023-X","article-title":"Planning and Acting in Partially Observable Stochastic Domains","volume":"101","author":"Kaelbling","year":"1998","journal-title":"Artif. Intell."},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1109\/MSP.2017.2738401","article-title":"Deep Multimodal Learning: A Survey on Recent Advances and Trends","volume":"34","author":"Ramachandram","year":"2017","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_90","unstructured":"Li, S., and Tang, H. (2024). Multimodal Alignment and Fusion: A Survey. arXiv."},{"key":"ref_91","doi-asserted-by":"crossref","first-page":"2300359","DOI":"10.1002\/aisy.202300359","article-title":"Multimodal Human\u2013Robot Interaction for Human-Centric Smart Manufacturing: A Survey","volume":"6","author":"Wang","year":"2024","journal-title":"Adv. Intell. Syst."},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Mora, A., Prados, A., Mendez, A., Barber, R., and Garrido, S. (2022). Sensor Fusion for Social Navigation on a Mobile Robot Based on Fast Marching Square and Gaussian Mixture Model. Sensors, 22.","DOI":"10.3390\/s22228728"},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Zuo, X., Geneva, P., Lee, W., Liu, Y., and Huang, G. (2019). LIC-Fusion: Lidar-Inertial-Camera Odometry. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), Macau, China, 3\u20138 November 2019, IEEE.","DOI":"10.1109\/IROS40897.2019.8967746"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Geneva, P., Eckenhoff, K., Lee, W., Yang, Y., and Huang, G. (2020). OpenVINS: A research platform for visual-inertial estimation. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2020), Paris, France, 31 May\u201331 August 2020, IEEE.","DOI":"10.1109\/ICRA40945.2020.9196524"},{"key":"ref_95","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1109\/TRO.2016.2529645","article-title":"A General Approach to Spatiotemporal Calibration in Multisensor Systems","volume":"32","author":"Rehder","year":"2016","journal-title":"IEEE Trans. Robot."},{"key":"ref_96","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1109\/TRO.2016.2624754","article-title":"Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age","volume":"32","author":"Cadena","year":"2016","journal-title":"IEEE Trans. Robot."},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"1255","DOI":"10.1109\/TRO.2017.2705103","article-title":"ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras","volume":"33","year":"2017","journal-title":"IEEE Trans. Robot."},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T. (2014). LSD-SLAM: Large-scale direct monocular SLAM. Computer Vision\u2014ECCV 2014, Springer International Publishing.","DOI":"10.1007\/978-3-319-10590-1"},{"key":"ref_99","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TRO.2016.2597321","article-title":"On-Manifold Preintegration for Real-Time Visual\u2013Inertial Odometry","volume":"33","author":"Forster","year":"2017","journal-title":"IEEE Trans. Robot."},{"key":"ref_100","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TRO.2011.2170332","article-title":"Visual-Inertial-Aided Navigation for High-Dynamic Motion in Built Environments Without Initial Conditions","volume":"28","author":"Lupton","year":"2012","journal-title":"IEEE Trans. Robot."},{"key":"ref_101","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1007\/s10462-023-10562-9","article-title":"A Survey of Uncertainty in Deep Neural Networks","volume":"56","author":"Gawlikowski","year":"2023","journal-title":"Artif. Intell. Rev."},{"key":"ref_102","doi-asserted-by":"crossref","first-page":"2039","DOI":"10.1109\/TRO.2021.3139964","article-title":"Photometric Visual-Inertial Navigation with Uncertainty-Aware Ensembles","volume":"38","author":"Jung","year":"2022","journal-title":"IEEE Trans. Robot."},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1177\/0278364918770733","article-title":"The Limits and Potentials of Deep Learning for Robotics","volume":"37","author":"Brock","year":"2018","journal-title":"Int. J. Robot. Res."},{"key":"ref_104","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1177\/0278364916679498","article-title":"1 Year, 1000 Km: The Oxford RobotCar Dataset","volume":"36","author":"Maddern","year":"2017","journal-title":"Int. J. Robot. Res."},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., and Birchfield, S. (2018). Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2018), Salt Lake City, UT, USA, 18\u201322 June 2018, IEEE.","DOI":"10.1109\/CVPRW.2018.00143"},{"key":"ref_106","doi-asserted-by":"crossref","first-page":"1059","DOI":"10.1002\/rob.20169","article-title":"Improving robot navigation through self-supervised online learning","volume":"23","author":"Sofman","year":"2006","journal-title":"J. Field Robot."},{"key":"ref_107","doi-asserted-by":"crossref","unstructured":"Porav, H., Maddern, W., and Newman, P. (2018). Adversarial Training for Adverse Conditions: Robust Metric Localization Using Appearance Transfer. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2018), Brisbane, QLD, Australia, 21\u201325 May 2018, IEEE.","DOI":"10.1109\/ICRA.2018.8462894"},{"key":"ref_108","unstructured":"Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4\u20139 December 2017, Curran Associates, Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3295222.3295387."},{"key":"ref_109","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1016\/j.robot.2017.10.011","article-title":"A Survey of Robotic Motion Planning in Dynamic Environments","volume":"100","author":"Mohanan","year":"2018","journal-title":"Robot. Auton. Syst."},{"key":"ref_110","first-page":"1321","article-title":"On Calibration of Modern Neural Networks","volume":"Volume 70","author":"Guo","year":"2017","journal-title":"Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, NSW, Australia, 6\u201311 August 2017"},{"key":"ref_111","doi-asserted-by":"crossref","first-page":"1303","DOI":"10.1049\/iet-cta.2009.0032","article-title":"Kalman Filtering with State Constraints: A Survey of Linear and Nonlinear Algorithms","volume":"4","author":"Simon","year":"2010","journal-title":"IET Control Theory Appl."},{"key":"ref_112","doi-asserted-by":"crossref","first-page":"727","DOI":"10.1007\/s11831-022-09815-7","article-title":"A Review on Kalman Filter Models","volume":"30","author":"Khodarahmi","year":"2023","journal-title":"Arch. Comput. Methods Eng."},{"key":"ref_113","doi-asserted-by":"crossref","unstructured":"Kurniawati, H., Hsu, D., and Lee, W.S. (2008, January 25\u201328). SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces. Proceedings of the Robotics: Science and Systems IV (RSS 2008), Zurich, Switzerland.","DOI":"10.15607\/RSS.2008.IV.009"},{"key":"ref_114","doi-asserted-by":"crossref","unstructured":"Wan, E.A., and Van Der Merwe, R. (2000). The Unscented Kalman Filter for Nonlinear Estimation. Proceedings of the IEEE Adaptive Systems for Signal Processing, Communications, and Control Symposium, Lake Louise, AB, Canada, 1\u20134 October 2000, IEEE.","DOI":"10.1109\/ASSPCC.2000.882463"},{"key":"ref_115","first-page":"3461","article-title":"The Square-Root Unscented Kalman Filter for State and Parameter Estimation","volume":"Volume 6","author":"Wan","year":"2001","journal-title":"Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), Salt Lake City, UT, USA, 7\u201311 May 2001"},{"key":"ref_116","doi-asserted-by":"crossref","first-page":"navi.652","DOI":"10.33012\/navi.652","article-title":"Kalman Filtering with Uncertain and Asynchronous Measurement Epochs","volume":"71","author":"Brouk","year":"2024","journal-title":"Navig. J. Inst. Navig."},{"key":"ref_117","doi-asserted-by":"crossref","first-page":"103904","DOI":"10.1016\/j.robot.2021.103904","article-title":"Application of the Unscented Kalman Filter in Position Estimation a Case Study on a Robot for Precise Positioning","volume":"147","author":"Naab","year":"2022","journal-title":"Robot. Auton. Syst."},{"key":"ref_118","doi-asserted-by":"crossref","unstructured":"Platt, R., Tedrake, R., Kaelbling, L.P., and Lozano-P\u00e9rez, T. (2010, January 27\u201330). Belief space planning assuming maximum likelihood observations. Proceedings of the Robotics: Science and Systems (RSS), Zaragoza, Spain.","DOI":"10.15607\/RSS.2010.VI.037"},{"key":"ref_119","unstructured":"Leusmann, J., Wang, C., Gienger, M., Schmidt, A., and Mayer, S. (2023). Understanding the Uncertainty Loop of Human\u2013Robot Interaction. arXiv."},{"key":"ref_120","doi-asserted-by":"crossref","unstructured":"Cumbal, R., Lopes, J., and Engwall, O. (2020). Uncertainty in Robot Assisted Second Language Conversation Practice. Proceedings of the Companion of the 2020 ACM\/IEEE International Conference on Human-Robot Interaction (HRI \u201920), Cambridge, UK, 23\u201326 March 2020, ACM.","DOI":"10.1145\/3371382.3378306"},{"key":"ref_121","doi-asserted-by":"crossref","unstructured":"Hough, J., and Schlangen, D. (2017). It\u2019s Not What You Do, It\u2019s How You Do It: Grounding Uncertainty for a Simple Ro-bot. Proceedings of the 12th ACM\/IEEE International Conference on Human-Robot Interaction (HRI 2017), Vienna, Austria, 6\u20139 March 2017, ACM.","DOI":"10.1145\/2909824.3020214"},{"key":"ref_122","doi-asserted-by":"crossref","unstructured":"Trick, S., Koert, D., Peters, J., and Rothkopf, C.A. (2019). Multimodal Uncertainty Reduction for Intention Recognition in Human\u2013Robot Interaction. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3\u20138 November 2019, IEEE.","DOI":"10.1109\/IROS40897.2019.8968171"},{"key":"ref_123","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1145\/2047296.2047298","article-title":"Detecting Geographical References in the Form of Place Names and Associated Spatial Natural Language","volume":"3","author":"Leidner","year":"2011","journal-title":"SIGSPATIAL Spec."},{"key":"ref_124","doi-asserted-by":"crossref","unstructured":"Tellex, S., Knepper, R., Li, A., Rus, D., and Roy, N. (2014, January 12\u201316). Asking for Help Using Inverse Semantics. Proceedings of the Robotics: Science and Systems (RSS 2014), Berkeley, CA, USA.","DOI":"10.15607\/RSS.2014.X.024"},{"key":"ref_125","doi-asserted-by":"crossref","unstructured":"Dragan, A.D., Lee, K.C.T., and Srinivasa, S.S. (2013, January 3\u20136). Legibility and Predictability of Robot Motion. Proceedings of the 8th ACM\/IEEE International Conference on Human\u2013Robot Interaction (HRI 2013), Tokyo, Japan.","DOI":"10.1109\/HRI.2013.6483603"},{"key":"ref_126","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27\u201330 June 2016, IEEE.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_127","doi-asserted-by":"crossref","unstructured":"Qi, X., Liao, R., Jia, J., Fidler, S., and Urtasun, R. (2017). 3D Graph Neural Networks For RGB-D Semantic Segmentation. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22\u201329 October 2017, IEEE.","DOI":"10.1109\/ICCV.2017.556"},{"key":"ref_128","doi-asserted-by":"crossref","first-page":"7099","DOI":"10.1109\/TPAMI.2022.3225573","article-title":"A Survey on Deep Learning Techniques for Video Segmentation","volume":"45","author":"Zhou","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_129","first-page":"802","article-title":"Elastic bands: Connecting path planning and control","volume":"Volume 2","author":"Quinlan","year":"1993","journal-title":"Proceedings of the 1993 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 2\u20136 May 1993"},{"key":"ref_130","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1109\/100.580977","article-title":"The Dynamic Window Approach to Collision Avoidance","volume":"4","author":"Fox","year":"1997","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_131","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Gordon, D., Kolve, E., Fox, D., Fei-Fei, L., Gupta, A., Mottaghi, R., and Farhadi, A. (2017, January 22\u201329). Visual Semantic Plan-ning Using Deep Successor Representations. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.60"},{"key":"ref_132","doi-asserted-by":"crossref","unstructured":"Zhang, H.-B., Zhang, Y.-X., Zhong, B., Lei, Q., Yang, L., Du, J.-X., and Chen, D.-S. (2019). A Comprehensive Survey of Vision-Based Human Action Recognition Methods. Sensors, 19.","DOI":"10.3390\/s19051005"},{"key":"ref_133","doi-asserted-by":"crossref","unstructured":"P\u00fctz, S., Sim\u00f3n, J.S., and Hertzberg, J. (2018). Move Base Flex: A Highly Flexible Navigation Framework for Mobile Robots. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1\u20135 October 2018, IEEE.","DOI":"10.1109\/IROS.2018.8593829"},{"key":"ref_134","unstructured":"Arkin, R.C. (1998). Behavior-Based Robotics, MIT Press. Available online: https:\/\/books.google.ca\/books\/about\/Behavior_Based_Robotics.html?id=mRWT6alZt9oC&redir_esc=y."},{"key":"ref_135","doi-asserted-by":"crossref","unstructured":"Noroozi, F., Daneshmand, M., and Fiorini, P. (2023). Conventional, Heuristic and Learning-Based Robot Motion Planning: Reviewing Frameworks of Current Practical Significance. Machines, 11.","DOI":"10.3390\/machines11070722"},{"key":"ref_136","first-page":"96","article-title":"Mobile Robot Navigation and Obstacle Avoidance Techniques: A Review","volume":"2","author":"Pandey","year":"2017","journal-title":"Int. Robot. Autom. J."},{"key":"ref_137","unstructured":"Lewis, F.L., and Ge, S.S. (2018). Autonomous Mobile Robots: Sensing, Control, Decision Making and Applications, CRC Press."},{"key":"ref_138","doi-asserted-by":"crossref","unstructured":"Guo, X., Lyu, M., Xia, B., Zhang, K., and Zhang, L. (2023). An Improved Visual SLAM Method with Adaptive Feature Extraction. Appl. Sci., 13.","DOI":"10.3390\/app131810038"},{"key":"ref_139","doi-asserted-by":"crossref","unstructured":"Kurz, G., Holoch, M., and Biber, P. (2021). Geometry-Based Graph Pruning for Lifelong SLAM. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September\u20131 October 2021, IEEE.","DOI":"10.1109\/IROS51168.2021.9636530"},{"key":"ref_140","doi-asserted-by":"crossref","first-page":"1697","DOI":"10.1177\/0278364916669237","article-title":"ElasticFusion: Real-Time Dense SLAM and Light Source Estimation","volume":"35","author":"Whelan","year":"2016","journal-title":"Int. J. Rob. Res."},{"key":"ref_141","unstructured":"Murali, A., Liu, W., Marino, K., Chernova, S., and Gupta, A. (2020, January 16\u201318). Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping. Proceedings of the 2020 Conference on Robot Learning (CoRL), PMLR, Virtual Event. Available online: https:\/\/proceedings.mlr.press\/v155\/murali21a.html."},{"key":"ref_142","doi-asserted-by":"crossref","first-page":"5615","DOI":"10.1109\/LRA.2022.3155805","article-title":"Adaptive and Risk-Aware Target Tracking for Robot Teams with Heterogeneous Sensors","volume":"7","author":"Mayya","year":"2022","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_143","doi-asserted-by":"crossref","unstructured":"Wang, J., Lin, S., and Liu, A. (2023). Bioinspired Perception and Navigation of Service Robots in Indoor Environments: A Review. Biomimetics, 8.","DOI":"10.3390\/biomimetics8040350"},{"key":"ref_144","unstructured":"Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A.J., Welihinda, A., and Hayes, A. (2024). GPT-4o System Card. arXiv."},{"key":"ref_145","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MN, USA, 2\u20137 June 2019, Association for Computational Linguistics."},{"key":"ref_146","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021). Learning Transferable Visual Models from Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, Virtually, 1 July 2021, PMLR. Available online: https:\/\/proceedings.mlr.press\/v139\/radford21a.html."},{"key":"ref_147","doi-asserted-by":"crossref","unstructured":"Touvron, H., Cord, M., and J\u00e9gou, H. (2022). DeiT III: Revenge of the ViT. Computer Vision\u2014ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23\u201327 October 2022, Springer. Proceedings, Part XXIV.","DOI":"10.1007\/978-3-031-20053-3_30"},{"key":"ref_148","unstructured":"Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., and Altman, S. (2023). GPT-4 Technical Report. arXiv."},{"key":"ref_149","doi-asserted-by":"crossref","first-page":"633","DOI":"10.1038\/s41586-025-09422-z","article-title":"DeepSeek-R1 Incentivizes Reasoning in LLMs through Reinforcement Learning","volume":"645","author":"Guo","year":"2025","journal-title":"Nature"},{"key":"ref_150","doi-asserted-by":"crossref","unstructured":"Yang, J., Tan, R., Wu, Q., Zheng, R., Peng, B., and Liang, Y. (2025). Magma: A Foundation Model for Multimodal AI Agents. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025), Nashville, TN, USA, 10\u201317 June 2025, IEEE.","DOI":"10.1109\/CVPR52734.2025.01325"},{"key":"ref_151","first-page":"2679","article-title":"OpenVLA: An Open-Source Vision-Language-Action Model","volume":"Volume 270","author":"Kim","year":"2025","journal-title":"Proceedings of the 8th Conference on Robot Learning (CoRL 2025), Atlanta, GA, USA, 6\u20139 November 2025"},{"key":"ref_152","unstructured":"Zhu, H., Wang, Y., Zhou, J., Chang, W., Zhou, Y., Li, Z., Chen, J., Shen, C., Pang, J., and He, T. (2025, January 19\u201325). Aether: Geometric-Aware Unified World Modeling. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV 2025), Paris, France. Available online: https:\/\/openaccess.thecvf.com\/content\/ICCV2025\/papers\/Zhu_Aether_Geometric-Aware_Unified_World_Modeling_ICCV_2025_paper.pdf."},{"key":"ref_153","doi-asserted-by":"crossref","unstructured":"Black, K., Brown, N., Driess, D., Esmail, A., Equi, M.R., Finn, C., Fusai, N., Groom, L., Hausman, K., and Ichter, B. (2025, January 21\u201325). \u03c00: A Vision-Language-Action Flow Model for General Robot Control. Proceedings of the Robotics: Science and Systems (RSS 2025), Los Angeles, CA, USA.","DOI":"10.15607\/RSS.2025.XXI.010"},{"key":"ref_154","first-page":"17","article-title":"\u03c00.5: A Vision\u2013Language\u2013Action Model with Open-World Generalization","volume":"Volume 305","author":"Lim","year":"2025","journal-title":"Proceedings of the 9th Conference on Robot Learning (CoRL 2025), Seoul, Republic of Korea, 27\u201330 September 2025"},{"key":"ref_155","unstructured":"Szot, A., Clegg, A., Undersander, E., Wijmans, E., Zhao, Y., Turner, J., Maestre, N., Mukadam, M., Chaplot, D.S., and Maksymets, O. (2021). Habitat 2.0: Training Home Assistants to Rearrange Their Habitat. Proceedings of the 35th International Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6\u201314 December 2021, Curran Associates, Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3540261.3540281."},{"key":"ref_156","unstructured":"Christodoulopoulos, C., Chakraborty, T., Rose, C., and Peng, V. (2025). Large Vision\u2013Language Model Alignment and Misalignment: A Survey through the Lens of Explainability. Findings of the Association for Computational Linguistics: EMNLP 2025, Association for Computational Linguistics."},{"key":"ref_157","doi-asserted-by":"crossref","first-page":"220101","DOI":"10.1007\/s11432-024-4231-5","article-title":"How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites","volume":"67","author":"Chen","year":"2024","journal-title":"Sci. China Inf. Sci."},{"key":"ref_158","unstructured":"Agarwal, N., Ali, A., Bala, M., Balaji, Y., Barker, E., Cai, T., Chattopadhyay, P., Chen, Y., and Cui, Y. (2025). Cosmos World Foundation Model Platform for Physical AI. arXiv."},{"key":"ref_159","unstructured":"Ye, Y., Huang, Z., Xiao, Y., Chern, E., Xia, S., and Liu, P. (2025). LIMO: Less Is More for Reasoning. arXiv."},{"key":"ref_160","unstructured":"Dong, X., Zhang, P., Zang, Y., Cao, Y., Wang, B., Ouyang, L., Zhang, S., Duan, H., Zhang, W., and Li, Y. (2024). InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD. Proceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10\u201315 December 2024, Curran Associates, Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3737916.3739264."},{"key":"ref_161","doi-asserted-by":"crossref","unstructured":"Gu, Q., Kuwajerwala, A., Morin, S., Jatavallabhula, K.M., Sen, B., Agarwal, A., Rivera, C., Paul, W., Ellis, K., and Chellappa, R. (2024). Conceptgraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning. Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13\u201317 May 2024, IEEE.","DOI":"10.1109\/ICRA57147.2024.10610243"},{"key":"ref_162","first-page":"1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_163","unstructured":"Yang, Z., Li, L., Lin, K., Wang, J., Lin, C.-C., Liu, Z., and Wang, L. (2023). The dawn of LMMs: Preliminary Explorations with GPT-4V(ision). arXiv."},{"key":"ref_164","doi-asserted-by":"crossref","unstructured":"Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., and Lo, W.-Y. (2023). Segment Anything. Proceedings of the 2023 IEEE\/CVF International Conference on Computer Vision (ICCV), Paris, France, 4\u20136 October 2023, IEEE.","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"ref_165","unstructured":"D\u00e9fossez, A., Mazar\u00e9, L., Orsini, M., Royer, A., P\u00e9rez, P., J\u00e9gou, H., Grave, E., and Zeghidour, N. (2024). Moshi: A Speech-Text Foundation Model for Real-Time Dialogue. arXiv."},{"key":"ref_166","doi-asserted-by":"crossref","unstructured":"Zuo, G., Tong, J., Liu, H., Chen, W., and Li, J. (2021). Graph-Based Visual Manipulation Relationship Reasoning in Object-Stacking Scenes. Proceedings of the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18\u201322 July 2021, IEEE.","DOI":"10.1109\/IJCNN52387.2021.9534389"},{"key":"ref_167","doi-asserted-by":"crossref","unstructured":"Chen, X., Djolonga, J., Padlewski, P., Mustafa, B., Changpinyo, S., and Wu, J. (2024, January 16\u201322). On Scaling Up a Multilingual Vision and Language Model. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01368"},{"key":"ref_168","doi-asserted-by":"crossref","unstructured":"Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., Florence, P., and Zeng, A. (2023). Code as Policies: Language Model Programs for Embodied Control. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May\u20132 June 2023, IEEE.","DOI":"10.1109\/ICRA48891.2023.10160591"},{"key":"ref_169","doi-asserted-by":"crossref","unstructured":"Song, C.H., Sadler, B.M., Wu, J., Chao, W.-L., Washington, C., and Su, Y. (2023). LLM-Planner: Few-shot grounded planning for embodied agents with large language models. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Paris, France, 4\u20136 October 2023, IEEE.","DOI":"10.1109\/ICCV51070.2023.00280"},{"key":"ref_170","doi-asserted-by":"crossref","unstructured":"Muennighoff, N., Yang, Z., Shi, W., Li, X.L., Li, F.-F., Hajishirzi, H., Zettlemoyer, L., Liang, P., Candes, E., and Hashimoto, T. (2025, January 4\u20139). s1: Simple Test-Time Scaling. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), Suzhou, China.","DOI":"10.18653\/v1\/2025.emnlp-main.1025"},{"key":"ref_171","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1177\/0278364908097884","article-title":"A Hybrid Approach to Intricate Motion, Manipulation and Task Planning","volume":"28","author":"Cambon","year":"2009","journal-title":"Int. J. Robot. Res."},{"key":"ref_172","unstructured":"Zhou, X., Gandhi, S., Fan, L., Lin, Z., Du, Y., Abbeel, P., Wu, J., and Xia, F. (2025, December 05). GENESIS: A Generative and Universal Physics Engine for Robotics and Beyond. Available online: https:\/\/genesis-embodied-ai.github.io\/."},{"key":"ref_173","unstructured":"Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., Cui, Y., Zhou, Z., Gong, C., and Shen, Y. (2023). A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. arXiv."},{"key":"ref_174","unstructured":"Yue, Y., Garg, A., Peng, N., Sha, F., and Yu, R. (2025). SAM 2: Segment Anything in Images and Videos. Proceedings of the International Conference on Learning Representations (ICLR 2025), Singapore, 24\u201328 April 2025, ICLR. Available online: https:\/\/proceedings.iclr.cc\/paper_files\/paper\/2025\/file\/45c1f6a8cbf2da59ebf2c802b4f742cd-Paper-Conference.pdf."},{"key":"ref_175","unstructured":"Jaegle, A., Borgeaud, S., Alayrac, J.-B., Doersch, C., Ionescu, C., Ding, D., Koppula, S., Zoran, D., Brock, A., and Shelhamer, E. (2021). Perceiver IO: A General Architecture for Structured Inputs and Outputs. arXiv."},{"key":"ref_176","first-page":"785","article-title":"Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation","volume":"Volume 205","author":"Shridhar","year":"2023","journal-title":"Proceedings of the 6th Conference on Robot Learning (CoRL 2023), Atlanta, GA, USA, 6\u20139 November 2023"},{"key":"ref_177","unstructured":"Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., and Vaughan, A. (2024). The LLaMA 3 herd of models. arXiv."},{"key":"ref_178","doi-asserted-by":"crossref","unstructured":"Song, X., Chen, W., Liu, Y., Chen, W., Li, G., and Lin, L. (2025, January 10\u201317). Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025), Seattle, WA, USA.","DOI":"10.1109\/CVPR52734.2025.01128"},{"key":"ref_179","unstructured":"Abdin, M., Jacobs, S.A., Awan, A.A., Aneja, J., Awadallah, A., Awadalla, H., Bach, N., Bahree, A., Bakhtiari, A., and Behl, H. (2024). Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv."},{"key":"ref_180","doi-asserted-by":"crossref","unstructured":"Dai, Y., Peng, R., Li, S., and Chai, J. (2024). Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation. Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024), Yokohama, Japan, 13\u201317 May 2024, IEEE.","DOI":"10.1109\/ICRA57147.2024.10610178"},{"key":"ref_181","doi-asserted-by":"crossref","unstructured":"Joublin, F., Ceravola, A., Smirnov, P., Ocker, F., Deigmoeller, J., Belardinelli, A., Wang, C., Hasler, S., Tanneberg, D., and Gienger, M. (2024). CoPAL: Corrective Planning of Robot Actions with Large Language Models. Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024), Yokohama, Japan, 13\u201317 May 2024, IEEE.","DOI":"10.1109\/ICRA57147.2024.10610434"},{"key":"ref_182","doi-asserted-by":"crossref","first-page":"10201","DOI":"10.1109\/LRA.2024.3471457","article-title":"ReplanVLM: Replanning Robotic Tasks with Visual Language Models","volume":"9","author":"Mei","year":"2024","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_183","first-page":"1","article-title":"PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects","volume":"Volume 15133","author":"Leonardis","year":"2024","journal-title":"Computer Vision\u2014ECCV 2024"},{"key":"ref_184","unstructured":"Diao, H., Cui, Y., Li, X., Wang, Y., Lu, H., and Wang, X. (2024). Unveiling Encoder-Free Vision-Language Models. Proceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10\u201315 December 2024, Curran Associates, Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3737916.3739581."},{"key":"ref_185","first-page":"7595","article-title":"VideoJAM: Joint Appearance\u2013Motion Representations for Enhanced Motion Generation in Video Models","volume":"Volume 267","author":"Chefer","year":"2025","journal-title":"Proceedings of the 42nd International Conference on Machine Learning (ICML 2025), Vancouver, BC, Canada, 13\u201319 July 2025"},{"key":"ref_186","doi-asserted-by":"crossref","unstructured":"Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., and Varol, G. (2024). InternVideo2: Scaling Foundation Models for Multimodal Video Understanding. Computer Vision\u2014ECCV 2024, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-031-72980-5"},{"key":"ref_187","unstructured":"Chu, X., Qiao, L., Zhang, X., Xu, S., Wei, F., Yang, Y., Sun, X., Hu, Y., Lin, X., and Zhang, B. (2024). MobileVLM V2: Faster and Stronger Baseline for Vision-Language Models. arXiv."},{"key":"ref_188","unstructured":"Cho, M., Cao, Y., Sun, J., Zhang, Q., Pavone, M., Park, J.J., Yang, H., and Mao, Z.M. (2024). Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion. arXiv."},{"key":"ref_189","unstructured":"Ren, T., Chen, Y., Jiang, Q., Zeng, Z., Xiong, Y., Liu, W., Ma, Z., Shen, J., Gao, Y., and Jiang, X. (2024). DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding. arXiv."},{"key":"ref_190","unstructured":"Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., and El-Nouby, A. (2023). DINOv2: Learning Robust Visual Features without Supervision. arXiv."},{"key":"ref_191","doi-asserted-by":"crossref","unstructured":"Liu, H., Li, C., Li, Y., and Lee, Y.J. (2024). Improved Baselines with Visual Instruction Tuning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16\u201322 June 2024, IEEE.","DOI":"10.1109\/CVPR52733.2024.02484"},{"key":"ref_192","unstructured":"Tong, S., Brown, E.L., Wu, P., Woo, S., Iyer, A.J., Akula, S.C., Yang, S., Yang, J., Middepogu, M., and Wang, Z. (2024, January 10\u201315). Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3737916.3740687."},{"key":"ref_193","doi-asserted-by":"crossref","unstructured":"Li, H., Zhu, J., Jiang, X., Zhu, X., Li, H., Yuan, C., Wang, X., Qiao, Y., Wang, X., and Wang, W. (2023). Uni-Perceiver V2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18\u201322 June 2023, IEEE.","DOI":"10.1109\/CVPR52729.2023.00264"},{"key":"ref_194","unstructured":"Awadalla, A., Gao, I., Gardner, J., Hessel, J., Hanafy, Y., Zhu, W., Marathe, K., Bitton, Y., Gadre, S., and Sagawa, S. (2023). OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models. arXiv."},{"key":"ref_195","unstructured":"Hafner, D., Lillicrap, T., Ba, J., and Norouzi, M. (2019). Dream to Control: Learning Behaviors by Latent Imagination. arXiv."},{"key":"ref_196","unstructured":"Hafner, D., Lillicrap, T., Norouzi, M., and Ba, J. (2021). Mastering Atari with Discrete World Models. Proceedings of the 9th International Conference on Learning Representations (ICLR 2021), Virtual Event, Austria, 3\u20137 May 2021, OpenReview Foundation. Available online: https:\/\/openreview.net\/forum?id=0oabwyZbOu."},{"key":"ref_197","doi-asserted-by":"crossref","unstructured":"Li, Q., Lin, Y., Luo, Q., and Yu, L. (2025). DreamerV3 for Traffic Signal Control: Hyperparameter Tuning and Performance. arXiv.","DOI":"10.3233\/ATDE250554"},{"key":"ref_198","unstructured":"Feichtenhofer, C., Fan, H., Li, Y., and He, K. (2022). Masked Autoencoders as Spatiotemporal Learners. Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November\u20139 December 2022, Curran Associates, Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3600270.3602875."},{"key":"ref_199","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10\u201317 October 2021, IEEE.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_200","doi-asserted-by":"crossref","unstructured":"Bar-Tal, O., Chefer, H., Tov, O., Herrmann, C., Paiss, R., Zada, S., Ephrat, A., Hur, J., Liu, G., and Raj, A. (2024). Lumiere: A space-time diffusion model for video generation. SIGGRAPH Asia 2024 Conference Papers, Association for Computing Machinery.","DOI":"10.1145\/3680528.3687614"},{"key":"ref_201","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1109\/LRA.2024.3511409","article-title":"VLM-Social-Nav: Socially Aware Robot Navigation Through Scoring Using Vision-Language Models","volume":"10","author":"Song","year":"2025","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_202","doi-asserted-by":"crossref","unstructured":"Narasimhan, S., Tan, A.H., Choi, D., and Nejat, G. (2025). OLiVia-Nav: An Online Lifelong Vision-Language Approach for Mobile Robot Social Navigation. Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA 2025), Atlanta, GA, USA, 19\u201323 May 2025, IEEE.","DOI":"10.1109\/ICRA55743.2025.11128004"},{"key":"ref_203","unstructured":"Huang, Y., Zhang, Q., Y, P.S., and Sun, L. (2023). TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models. arXiv."},{"key":"ref_204","doi-asserted-by":"crossref","unstructured":"Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., Zhang, C.B.C., Shaaban, M., Ling, J., and Shi, S. (2025). Humanity\u2019s Last Exam. arXiv.","DOI":"10.70777\/si.v2i1.13973"},{"key":"ref_205","doi-asserted-by":"crossref","unstructured":"Tjomsland, J., Kalkan, S., and Gunes, H. (2022). Mind Your Manners! A Dataset and a Continual Learning Approach for Assessing Social Appropriateness of Robot Actions. Front. Robot. AI, 9.","DOI":"10.3389\/frobt.2022.669420"},{"key":"ref_206","first-page":"10347","article-title":"Training Data-Efficient Image Transformers and Distillation through Attention","volume":"Volume 139","author":"Touvron","year":"2021","journal-title":"Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual, 18\u201324 July 2021"},{"key":"ref_207","first-page":"1","article-title":"VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks","volume":"Volume 15148","author":"Leonardis","year":"2024","journal-title":"Computer Vision\u2014ECCV 2024"},{"key":"ref_208","unstructured":"Li, A., Gong, B., Yang, B., Shan, B., Liu, C., Zhu, C., Zhang, C., Guo, C., and Chen, D. (2025). Minimax-01: Scaling Foundation Models with Lightning Attention. arXiv."},{"key":"ref_209","first-page":"120","article-title":"Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity","volume":"23","author":"Fedus","year":"2022","journal-title":"J. Mach. Learn. Res."},{"key":"ref_210","unstructured":"Lepikhin, D., Lee, H., Xu, Y., Chen, D., Firat, O., Huang, Y., Krikun, M., Shazeer, N., and Chen, Z. (2020). GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. arXiv."},{"key":"ref_211","unstructured":"Chen, W., Huang, W., Du, X., Song, X., Wang, Z., and Zhou, D. (2022). Auto-Scaling Vision Transformers without Training. arXiv."},{"key":"ref_212","doi-asserted-by":"crossref","first-page":"3820","DOI":"10.1109\/COMST.2025.3527641","article-title":"Mobile Edge Intelligence for Large Language Models: A Contemporary Survey","volume":"27","author":"Qu","year":"2025","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_213","doi-asserted-by":"crossref","unstructured":"Ge, T., Chen, S.-Q., and Wei, F. (2022). EdgeFormer: A parameter-efficient transformer for on-device seq2seq generation. arXiv.","DOI":"10.18653\/v1\/2022.emnlp-main.741"},{"key":"ref_214","unstructured":"(2025, December 05). Service Robotics Market Size, Share and Trends. Available online: https:\/\/www.marketsandmarkets.com\/Market-Reports\/service-robotics-market-681.html."},{"key":"ref_215","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1007\/s11628-020-00423-8","article-title":"Impacts of service robots on service quality","volume":"14","author":"Chiang","year":"2020","journal-title":"Serv. Bus."},{"key":"ref_216","doi-asserted-by":"crossref","unstructured":"Liu, P., Orru, Y., Vakil, J., Paxton, C., Shafiullah, N.M.M., and Pinto, L. (2024, January 15\u201319). Demonstrating OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics. Proceedings of the Robotics: Science and Systems (RSS 2024), Delft, The Netherlands.","DOI":"10.15607\/RSS.2024.XX.091"},{"key":"ref_217","doi-asserted-by":"crossref","unstructured":"Ghosh, D., Walke, H.R., Pertsch, K., Black, K., Mees, O., Dasari, S., Hejna, J., Kreiman, T., Xu, C., and Luo, J. (2024, January 15\u201319). Octo: An Open-Source Generalist Robot Policy. Proceedings of the Robotics: Science and Systems (RSS 2024), Delft, The Netherlands.","DOI":"10.15607\/RSS.2024.XX.090"},{"key":"ref_218","unstructured":"Chang, M., Chhablani, G., Clegg, A., Dallaire Cote, M., Desai, R., Hlavac, M., Karashchuk, V., Krantz, J., Mottaghi, R., and Parashar, P. (2024). PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-Agent Tasks. arXiv."},{"key":"ref_219","unstructured":"Gu, Q., Ju, Y., Sun, S., Gilitschenski, I., Nishimura, H., Itkina, M., and Shkurti, F. (2025). SAFE: Multitask Failure Detection for Vision-Language-Action Models. arXiv."},{"key":"ref_220","doi-asserted-by":"crossref","unstructured":"Chen, B., Xia, F., Ichter, B., Rao, K., Gopalakrishnan, K., and Ryoo, M.S. (2023). Open-Vocabulary Queryable Scene Representations for Real World Planning. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2023), London, UK, 29 May\u20132 June 2023, IEEE.","DOI":"10.1109\/ICRA48891.2023.10161534"},{"key":"ref_221","doi-asserted-by":"crossref","unstructured":"Shin, J., Han, J., Kim, S., Oh, Y., and Kim, E. (2024). Task Planning for Long-Horizon Cooking Tasks Based on Large Language Models. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 14\u201318 October 2024, IEEE.","DOI":"10.1109\/IROS58592.2024.10801687"},{"key":"ref_222","doi-asserted-by":"crossref","unstructured":"Takebayashi, R., Isume, V.H., Kiyokawa, T., Wan, W., and Harada, K. (2025). Cooking Task Planning Using LLM and Verified by Graph Network. Proceedings of the 21st IEEE International Conference on Automation Science and Engineering (CASE 2025), Los Angeles, CA, USA, 17\u201321 August 2025, IEEE.","DOI":"10.1109\/CASE58245.2025.11164158"},{"key":"ref_223","doi-asserted-by":"crossref","unstructured":"Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., and Garg, A. (2023). ProgPrompt: Generating Situated Robot Task Plans Using Large Language Models. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May\u20132 June 2023, IEEE.","DOI":"10.1109\/ICRA48891.2023.10161317"},{"key":"ref_224","doi-asserted-by":"crossref","unstructured":"Yang, J., Chen, X., Qian, S., Madaan, N., Iyengar, M., Fouhey, D.F., and Chai, J. (2024). LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13\u201317 May 2024, IEEE.","DOI":"10.1109\/ICRA57147.2024.10610443"},{"key":"ref_225","doi-asserted-by":"crossref","first-page":"9646","DOI":"10.1109\/LRA.2025.3595038","article-title":"GSON: A Group-Based Social Navigation Framework with Large Multimodal Model","volume":"10","author":"Luo","year":"2025","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_226","unstructured":"Ahn, M., Dwibedi, D., Finn, C., Arenas, M.G., Gopalakrishnan, K., Hausman, K., Ichter, B., Irpan, A., Joshi, N., and Julian, R. (2024). AutoRT: Embodied Foundation Models for Large-Scale Orchestration of Robotic Agents. arXiv."},{"key":"ref_227","first-page":"23","article-title":"SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning","volume":"Volume 229","author":"Rana","year":"2023","journal-title":"Proceedings of the 7th Conference on Robot Learning (CoRL 2023), Atlanta, GA, USA, 6\u20139 November 2023"},{"key":"ref_228","first-page":"1950","article-title":"Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs","volume":"Volume 229","author":"Chang","year":"2023","journal-title":"Proceedings of the 7th Conference on Robot Learning (CoRL 2023), Atlanta, GA, USA, 6\u20139 November 2023"},{"key":"ref_229","unstructured":"Shah, D., Osi\u0144ski, B., Ichter, B., and Levine, S. (2023). LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action. Proceedings of the 6th Conference on Robot Learning (CoRL), Auckland, New Zealand, 14\u201318 December 2022, PMLR. Available online: https:\/\/proceedings.mlr.press\/v205\/shah23b.html."},{"key":"ref_230","doi-asserted-by":"crossref","unstructured":"Shah, D., Eysenbach, B., Kahn, G., Rhinehart, N., and Levine, S. (2021). ViNG: Learning Open-World Navigation with Visual Goals. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China, 30 May\u20135 June 2021, IEEE.","DOI":"10.1109\/ICRA48506.2021.9561936"},{"key":"ref_231","unstructured":"Shi, L.X., Ichter, B., Equi, M., Ke, L., Pertsch, K., Vuong, Q., Tanner, J., Walling, A., Wang, H., and Fusai, N. (2025). Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models. arXiv."},{"key":"ref_232","doi-asserted-by":"crossref","unstructured":"Gao, J., Sarkar, B., Xia, F., Xiao, T., Wu, J., Ichter, B., Majumdar, A., and Sadigh, D. (2024). Physically Grounded Vision-Language Models for Robotic Manipulation. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13\u201317 May 2024, IEEE.","DOI":"10.1109\/ICRA57147.2024.10610090"},{"key":"ref_233","doi-asserted-by":"crossref","unstructured":"Uysal, M., and Sirgy, M.J. (2023). Robots, Artificial Intelligence and Service Automation in Tourism and Quality of Life. Handbook of Tourism and Quality-of-Life Research II, Springer.","DOI":"10.1007\/978-3-031-31513-8"},{"key":"ref_234","first-page":"613","article-title":"Service Robots in Hotels: Understanding the Service Quality Perceptions of Human\u2013Robot Interaction","volume":"29","author":"Choi","year":"2020","journal-title":"J. Hosp. Mark. Manag."},{"key":"ref_235","first-page":"4005","article-title":"RoboPoint: A Vision-Language Model for Spatial Affordance Prediction in Robotics","volume":"Volume 270","author":"Agrawal","year":"2025","journal-title":"Proceedings of the 8th Conference on Robot Learning (CoRL 2025), Atlanta, GA, USA, 6\u20139 November 2025"},{"key":"ref_236","unstructured":"Zheng, R., Liang, Y., Huang, S., Gao, J., Daum\u00e9, H., Kolobov, A., Huang, F., and Yang, J. (2024). TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies. arXiv."},{"key":"ref_237","unstructured":"Jiang, Y., Gupta, A., Zhang, Z., Wang, G., Dou, Y., Chen, Y., Li, F.-F., Anandkumar, A., Zhu, Y., and Fan, L. (2023, January 23\u201329). VIMA: Robot Manipulation with Multimodal Prompts. Proceedings of the 40th International Conference on Machine Learning (ICML 2023), Honolulu, HI, USA. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3618408.3619019."},{"key":"ref_238","doi-asserted-by":"crossref","unstructured":"Min, S.Y., Puig, X., Chaplot, D.S., Yang, T.-Y., Rai, A., Parashar, P., Salakhutdinov, R., Bisk, Y., and Mottaghi, R. (2024). Situated Instruction Following. Computer Vision\u2014ECCV 2024, Proceedings of the 18th European Conference on Computer Vision, Milan, Italy, 29 September\u20134 October 2024, Springer.","DOI":"10.1007\/978-3-031-73030-6_12"},{"key":"ref_239","unstructured":"Dalal, M., Chiruvolu, T., Chaplot, D., and Salakhutdinov, R. (2024, January 7\u201311). Plan-Seq-Learn: Language Model Guided RL for Solving Long-Horizon Robotics Tasks. Proceedings of the International Conference on Learning Representations (ICLR 2024), Vienna, Austria. Available online: https:\/\/proceedings.iclr.cc\/paper_files\/paper\/2024\/file\/2e9f9cde1b709281a06dd14f679e4c51-Paper-Conference.pdf."},{"key":"ref_240","doi-asserted-by":"crossref","unstructured":"Zhao, T.Z., Kumar, V., Levine, S., and Finn, C. (2023, January 10\u201314). Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. Proceedings of the Robotics: Science and Systems (RSS 2023), Daegu, Republic of Korea.","DOI":"10.15607\/RSS.2023.XIX.016"},{"key":"ref_241","first-page":"4066","article-title":"Mobile ALOHA: Learning Bimanual Mobile Manipulation Using Low-Cost Whole-Body Teleoperation","volume":"Volume 270","author":"Fu","year":"2025","journal-title":"Proceedings of the 8th Conference on Robot Learning (CoRL 2025), Atlanta, GA, USA, 6\u20139 November 2025"},{"key":"ref_242","unstructured":"Shafiullah, N.M., Paxton, C., Pinto, L., Chintala, S., and Szlam, A. (2023, January 10\u201314). CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory. Proceedings of the Robotics: Science and Systems (RSS 2023), Daegu, Republic of Korea."},{"key":"ref_243","unstructured":"Rozi\u00e8re, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X.E., Adi, Y., Liu, J., Sauvestre, R., and Remez, T. (2024). Code Llama: Open Foundation Models for Code. arXiv."},{"key":"ref_244","doi-asserted-by":"crossref","first-page":"1729881419857432","DOI":"10.1177\/1729881419857432","article-title":"Control Strategies for Cleaning Robots in Domestic Applications: A Comprehensive Review","volume":"16","author":"Kim","year":"2019","journal-title":"Int. J. Adv. Robot. Syst."},{"key":"ref_245","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi\u00e8re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv."},{"key":"ref_246","doi-asserted-by":"crossref","unstructured":"Lym, H.J., Son, H.I., Kim, D.-Y., Kim, J., Kim, M.-G., and Chung, J.H. (2024). Child-Centered Home Service Design for a Family Robot Companion. Front. Robot. AI, 11.","DOI":"10.3389\/frobt.2024.1346257"},{"key":"ref_247","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2020). DistilBERT: A distilled version of BERT\u2014Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_248","doi-asserted-by":"crossref","first-page":"8298","DOI":"10.1109\/LRA.2024.3441495","article-title":"Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation","volume":"9","author":"Honerkamp","year":"2024","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_249","unstructured":"Wang, Y., Xian, Z., Chen, F., Wang, T.-H., Wang, Y., Fragkiadaki, K., Erickson, Z., Held, D., and Gan, C. (2024). RoboGen: Towards unleashing infinite data for automated robot learning via generative simulation. Proceedings of the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria, 21\u201327 July 2024, JMLR.org. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3692070.3694197."},{"key":"ref_250","unstructured":"Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H.P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., and Brockman, G. (2021). Evaluating Large Language Models Trained on Code. arXiv."},{"key":"ref_251","doi-asserted-by":"crossref","unstructured":"Peng, S., Genova, K., Jiang, C., Tagliasacchi, A., Pollefeys, M., and Funkhouser, T. (2023). OpenScene: 3D scene understanding with open vocabularies. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17\u201324 June 2023, IEEE.","DOI":"10.1109\/CVPR52729.2023.00085"},{"key":"ref_252","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27\u201330 June 2016, IEEE.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_253","doi-asserted-by":"crossref","first-page":"8271","DOI":"10.1109\/TIP.2025.3639996","article-title":"Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation","volume":"34","author":"Bai","year":"2025","journal-title":"IEEE Trans. Image Process."},{"key":"ref_254","first-page":"711","article-title":"ViNT: A Foundation Model for Visual Navigation","volume":"Volume 229","author":"Shah","year":"2023","journal-title":"Proceedings of the 7th Conference on Robot Learning (CoRL 2023), Atlanta, GA, USA, 6\u20139 November 2023"},{"key":"ref_255","unstructured":"Narasimhan, S., Lisondra, M., Wang, H., and Nejat, G. (2025). SplatSearch: Instance Image Goal Navigation for Mobile Robots Using 3D Gaussian Splatting and Diffusion Models. arXiv."},{"key":"ref_256","doi-asserted-by":"crossref","unstructured":"Fung, A., Tan, A.H., Wang, H., Benhabib, B., and Nejat, G. (2025). MLLM-Search: A Zero-Shot Approach to Finding People Using Multimodal Large Language Models. Robotics, 14.","DOI":"10.3390\/robotics14080102"},{"key":"ref_257","doi-asserted-by":"crossref","first-page":"7667","DOI":"10.1109\/LRA.2025.3577456","article-title":"Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach","volume":"10","author":"Tan","year":"2025","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_258","unstructured":"Beyer, L., Steiner, A., Pinto, A.S., Kolesnikov, A., Wang, X., Salz, D., Neumann, M., Alabdulmohsin, I., Tschannen, M., and Bugliarello, E. (2024). Paligemma: A Versatile 3B Vision-Language Model for Transfer. arXiv."},{"key":"ref_259","unstructured":"Dai, W., Li, J., Li, D., Tiong, A.M.H., Zhao, J., Wang, W., Li, B., Fung, P., and Hoi, S. (2023). InstructBLIP: Towards General-Purpose Vision\u2013Language Models with Instruction Tuning. Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10\u201316 December 2023, Curran Associates, Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3666122.3668264."},{"key":"ref_260","doi-asserted-by":"crossref","first-page":"100211","DOI":"10.1016\/j.hcc.2024.100211","article-title":"A Survey on Large Language Model (LLM) Security and Privacy: The Good, The Bad, and The Ugly","volume":"4","author":"Yao","year":"2024","journal-title":"High-Confid. Comput."},{"key":"ref_261","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1109\/COMST.2017.2779824","article-title":"Systematic Classification of Side-Channel Attacks: A Case Study for Mobile Devices","volume":"20","author":"Spreitzer","year":"2018","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_262","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1007\/s13389-019-00212-8","article-title":"Applications of Machine Learning Techniques in Side-Channel Attacks: A Survey","volume":"10","author":"Hettwer","year":"2020","journal-title":"J. Cryptogr. Eng."},{"key":"ref_263","doi-asserted-by":"crossref","unstructured":"Pa Pa, Y.M., Tanizaki, S., Kou, T., van Eeten, M., Yoshioka, K., and Matsumoto, T. (2023). An Attacker\u2019s Dream? Exploring the Capabilities of ChatGPT for Developing Malware. Proceedings of the 16th Cyber Security Experimentation and Test Workshop (CSET 2023), Marina del Rey, CA, USA, 7\u20138 August 2023, Association for Computing Machinery.","DOI":"10.1145\/3607505.3607513"},{"key":"ref_264","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3721127","article-title":"Accountability in Code Review: The Role of Intrinsic Drivers and the Impact of LLMs","volume":"34","author":"Alami","year":"2025","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"ref_265","doi-asserted-by":"crossref","unstructured":"Raptis, E.K., Kapoutsis, A.C., and Kosmatopoulos, E.B. (2025). Agentic LLM-Based Robotic Systems for Real-World Applications: A Review on Their Agenticness and Ethics. Front. Robot. AI, 12.","DOI":"10.3389\/frobt.2025.1605405"},{"key":"ref_266","unstructured":"Sathish, V., Lin, H., Kamath, A.K., and Nyayachavadi, A. (2024). LLeMpower: Understanding Disparities in the Control and Access of Large Language Models. arXiv."},{"key":"ref_267","doi-asserted-by":"crossref","first-page":"e64226","DOI":"10.2196\/64226","article-title":"Economics and Equity of Large Language Models: Health Care Perspective","volume":"26","author":"Nagarajan","year":"2024","journal-title":"J. Med. Internet Res."},{"key":"ref_268","doi-asserted-by":"crossref","first-page":"pgaf107","DOI":"10.1093\/pnasnexus\/pgaf107","article-title":"AI Exposure Predicts Unemployment Risk: A New Approach to Technology-Driven Job Loss","volume":"4","author":"Frank","year":"2025","journal-title":"PNAS Nexus"},{"key":"ref_269","doi-asserted-by":"crossref","first-page":"1306","DOI":"10.1126\/science.adj0998","article-title":"GPTs Are GPTs: Labor Market Impact Potential of LLMs","volume":"384","author":"Eloundou","year":"2024","journal-title":"Science"},{"key":"ref_270","doi-asserted-by":"crossref","unstructured":"Adilazuarda, M.F., Mukherjee, S., Lavania, P., Singh, S.S., Aji, A.F., O\u2019Neill, J., Modi, A., and Choudhury, M. (2024). Towards Measuring and Modeling \u201cCulture\u201d in LLMs: A Survey. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Miami, FL, USA, 12\u201316 November 2024, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2024.emnlp-main.882"},{"key":"ref_271","unstructured":"Li, C., Chen, M., Wang, J., Sitaram, S., and Xie, X. (2024). CultureLLM: Incorporating Cultural Differences into Large Language Models. Proceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10\u201315 December 2024, Curran Associates Inc.. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3737916.3740609."},{"key":"ref_272","doi-asserted-by":"crossref","first-page":"2227","DOI":"10.2147\/PPA.S527922","article-title":"Generative AI\/LLMs for Plain Language Medical Information for Patients, Caregivers and General Public: Opportunities, Risks and Ethics","volume":"19","author":"Pal","year":"2025","journal-title":"Patient Prefer. Adherence"},{"key":"ref_273","doi-asserted-by":"crossref","first-page":"676","DOI":"10.1080\/0144929X.2024.2333933","article-title":"The Emotional Impact of Generative AI: Negative Emotions and Perception of Threat","volume":"44","author":"Alessandro","year":"2025","journal-title":"Behav. Inf. Technol."},{"key":"ref_274","doi-asserted-by":"crossref","first-page":"100072","DOI":"10.1016\/j.chbah.2024.100072","article-title":"Exploring People\u2019s Perceptions of LLM-Generated Advice","volume":"2","author":"Wester","year":"2024","journal-title":"Comput. Hum. Behav. Artif. Hum."},{"key":"ref_275","doi-asserted-by":"crossref","unstructured":"Huang, J.-T., Lam, M.H., Li, E.J., Ren, S., Wang, W., Jiao, W., Tu, Z., and Lyu, M.R. (2024). Apathetic or Empathetic? Evaluating LLMss\u2019 Emotional Alignments with Humans. Advances in Neural Information Processing Systems, NeurIPS 2024, Curran Associates Inc.","DOI":"10.52202\/079017-3077"},{"key":"ref_276","unstructured":"Li, Y., Huang, Y., Wang, H., Cheng, Y., Zhang, X., Zou, J., and Sun, L. (2024). Evaluating Large Language Models with Psychometrics. arXiv."},{"key":"ref_277","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1093\/arbint\/aiae031","article-title":"ChatGPT as a Fourth Arbitrator? The Ethics and Risks of Using Large Language Models in Arbitration","volume":"41","author":"Moreira","year":"2025","journal-title":"Arbitr. Int."},{"key":"ref_278","doi-asserted-by":"crossref","unstructured":"Mahadevan, K., Chien, J., Brown, N., Xu, Z., Parada, C., Xia, F., Zeng, A., Takayama, L., and Sadigh, D. (2024). Generative expressive robot behaviors using large language models. Proceedings of the 2024 ACM\/IEEE International Conference on Human-Robot Interaction (HRI 2024), Boulder, CO, USA, 11\u201314 March 2024, Association for Computing Machinery.","DOI":"10.1145\/3610977.3634999"},{"key":"ref_279","unstructured":"Hu, Y., Huang, P., Sivapurapu, M., and Zhang, J. (2025). ELEGNT: Expressive and Functional Movement Design for Non-Anthropomorphic Robot. arXiv."},{"key":"ref_280","unstructured":"Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., and Brunskill, E. (2021). On the Opportunities and Risks of Foundation Models. arXiv."},{"key":"ref_281","doi-asserted-by":"crossref","unstructured":"Wasi, A.T., and Islam, M.R. (2024). CogErgLLM: Exploring Large Language Model Systems Design Perspective Using Cognitive Ergonomics. Proceedings of the 1st Workshop on NLP for Science (NLP4Science 2024), Miami, FL, USA, 12\u201316 November 2024, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2024.nlp4science-1.22"},{"key":"ref_282","doi-asserted-by":"crossref","unstructured":"Sang, H., Zhang, L., Chen, T., Guo, W., and Zhang, Z. (2026). Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware. Remote Sens., 18.","DOI":"10.3390\/rs18020298"},{"key":"ref_283","doi-asserted-by":"crossref","first-page":"77418","DOI":"10.1109\/ACCESS.2025.3565918","article-title":"Large Language Models in Human-Robot Collaboration with Cognitive Validation Against Context-Induced Hallucinations","volume":"13","author":"Ranasinghe","year":"2025","journal-title":"IEEE Access"},{"key":"ref_284","doi-asserted-by":"crossref","unstructured":"Hamid, O.H. (2024). Beyond probabilities: Unveiling the Delicate Dance of Large Language Models (LLMs) and AI Hallucination. Proceedings of the 2024 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA), Montreal, QC, Canada, 7\u201310 May 2024, IEEE.","DOI":"10.1109\/CogSIMA61085.2024.10553755"},{"key":"ref_285","doi-asserted-by":"crossref","first-page":"11242","DOI":"10.1109\/LRA.2024.3487484","article-title":"Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced with Memory-Aware Synapses","volume":"9","author":"Dong","year":"2024","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_286","doi-asserted-by":"crossref","unstructured":"Taie, W., ElGeneidy, K., Al-Yacoub, A., and Sun, R. (2024). Addressing Catastrophic Forgetting in Payload Parameter Identification Using Incremental Ensemble Learning. Front. Robot. AI, 11.","DOI":"10.3389\/frobt.2024.1470163"},{"key":"ref_287","unstructured":"O\u2019Neill, A., Rehman, A., Maddukuri, A., Gupta, A., Padalkar, A., Lee, A., Pooley, A., Gupta, A., Mandlekar, A., and Jain, A. (2024, January 13\u201317). Open X-Embodiment: Robotic Learning Datasets And RT-X Models. Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan."},{"key":"ref_288","first-page":"1","article-title":"Resource-Efficient Algorithms and Systems of Foundation Models: A Survey","volume":"57","author":"Xu","year":"2025","journal-title":"ACM Comput. Surv."},{"key":"ref_289","doi-asserted-by":"crossref","first-page":"109171","DOI":"10.1016\/j.rineng.2026.109171","article-title":"Energy-Aware Multi-Robot Exploration and Coverage in Fragmented Un-known Environments Using Collaborative Reinforcement Learning","volume":"29","author":"Xue","year":"2026","journal-title":"Results Eng."},{"key":"ref_290","doi-asserted-by":"crossref","unstructured":"Maresca, F., Romero, A., Delgado, C., Sciancalepore, V., Paradells, J., and Costa-P\u00e9rez, X. (2025). REACT: Multi-Robot Energy-Aware Orchestrator for Indoor Search and Rescue Critical Tasks. Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA 2025), Atlanta, GA, USA, 19\u201323 May 2025, IEEE.","DOI":"10.1109\/ICRA55743.2025.11127906"},{"key":"ref_291","unstructured":"Li, P., Toprak, O.S., Narayanan, A., Topcu, U., and Chinchali, S. (2024). Online Foundation Model Selection in Robotics. arXiv."},{"key":"ref_292","doi-asserted-by":"crossref","first-page":"1556","DOI":"10.1162\/tacl_a_00704","article-title":"A survey on model compression for large language models","volume":"12","author":"Zhu","year":"2024","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_293","unstructured":"Park, Y., Hyun, J., Cho, S., Sim, B., and Lee, J.W. (2024). Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs. arXiv."},{"key":"ref_294","unstructured":"Yang, C., Si, Q., Duan, Y., Zhu, Z., Zhu, C., Li, Q., Chen, M., Lin, Z., and Wang, W. (2025). Dynamic Early Exit in Reasoning Models. arXiv."},{"key":"ref_295","doi-asserted-by":"crossref","first-page":"65726","DOI":"10.1109\/ACCESS.2025.3559076","article-title":"Task Decomposition and Self-Evaluation Mechanisms for Home Healthcare Robots Using Large Language Models","volume":"13","author":"Liu","year":"2025","journal-title":"IEEE Access"},{"key":"ref_296","unstructured":"Cohen, V., Liu, J.X., Mooney, R., Tellex, S., and Watkins, D. (2024). A Survey of Robotic Language Grounding: Tradeoffs between Symbols and Embeddings. arXiv."},{"key":"ref_297","unstructured":"Khan, M.T., and Waheed, A. (2025). Foundation Model Driven Robotics: A Comprehensive Review. arXiv."},{"key":"ref_298","doi-asserted-by":"crossref","first-page":"10767","DOI":"10.1109\/LRA.2025.3604702","article-title":"A Human-in-The-Loop Approach to Robot Action Replanning Through LLM Common-Sense Reasoning","volume":"10","author":"Merlo","year":"2025","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_299","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1109\/LRA.2025.3632119","article-title":"X-Nav: Learning End-to-End Cross-Embodiment Navigation for Mobile Robots","volume":"11","author":"Wang","year":"2026","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_300","doi-asserted-by":"crossref","first-page":"1232","DOI":"10.1080\/01691864.2024.2408593","article-title":"Real-World Robot Applications of Foundation Models: A Review","volume":"38","author":"Kawaharazuka","year":"2024","journal-title":"Adv. Robot."},{"key":"ref_301","unstructured":"Chen, J., Yu, C., Zhou, X., Xu, T., Mu, Y., Hu, M., Shao, W., Wang, Y., Li, G., and Shao, L. (2024). EMOS: Embodiment-Aware Heterogeneous Multi-Robot Operating System with LLM Agents. arXiv."},{"key":"ref_302","unstructured":"Wang, W., Obi, I., and Min, B.-C. (2025). Multi-Agent LLM Actor-Critic Framework for Social Robot Navigation. arXiv."},{"key":"ref_303","doi-asserted-by":"crossref","unstructured":"Lin, X., Alam, N., Shuvo, M.I.R., Fime, A.A., and Kim, J.H. (2024). MechLMM: A Collaborative Knowledge Framework for Enhanced Data Fusion in Multi-Robot Systems Using Large Multimodal Models. Proceedings of the 2024 IEEE 8th International Conference on Information and Communication Technology (CICT), Prayagraj, India, 6\u20138 December 2024, IEEE.","DOI":"10.1109\/CICT64037.2024.10899705"},{"key":"ref_304","doi-asserted-by":"crossref","unstructured":"Mandi, Z., Jain, S., and Song, S. (2024). RoCo: Dialectic Multi-Robot Collaboration with Large Language Models. Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13\u201317 May 2024, IEEE.","DOI":"10.1109\/ICRA57147.2024.10610855"},{"key":"ref_305","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1007\/s11701-025-02667-w","article-title":"Robotic Surgical Curriculum for Medical Students: A Scoping Review","volume":"19","author":"Ahmadipour","year":"2025","journal-title":"J. Robot. Surg."},{"key":"ref_306","doi-asserted-by":"crossref","first-page":"e018815","DOI":"10.1136\/bmjopen-2017-018815","article-title":"Scoping Review on the use of Socially Assistive Robot Technology in Elderly Care","volume":"8","author":"Abdi","year":"2018","journal-title":"BMJ Open"},{"key":"ref_307","doi-asserted-by":"crossref","unstructured":"Al Bayrakdar, A., Dragone, M., Wojcik, G., McConnell, A., King, M., and Paterson, R. (J. Diabetes Sci. Technol., 2025). Robotics Use in the Care and Management of People Living with Diabetes Mellitus: A Scoping Review, J. Diabetes Sci. Technol., ahead of print.","DOI":"10.1177\/19322968251356298"},{"key":"ref_308","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1080\/10447318.2020.1741118","article-title":"Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy","volume":"36","author":"Shneiderman","year":"2020","journal-title":"Int. J. Hum.-Comput. Interact."},{"key":"ref_309","doi-asserted-by":"crossref","first-page":"467","DOI":"10.7326\/M18-0850","article-title":"PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation","volume":"169","author":"Tricco","year":"2018","journal-title":"Ann. Intern. Med."},{"key":"ref_310","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1007\/s12369-013-0202-2","article-title":"Why Robots? A Survey on the Roles and Benefits of Social Robots in the Therapy of Children with Autism","volume":"5","author":"Cabibihan","year":"2013","journal-title":"Int. J. Soc. Robot."}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/15\/3\/55\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T05:19:54Z","timestamp":1772774394000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/15\/3\/55"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,4]]},"references-count":310,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["robotics15030055"],"URL":"https:\/\/doi.org\/10.3390\/robotics15030055","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,4]]}}}