{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T04:56:36Z","timestamp":1778302596865,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T00:00:00Z","timestamp":1730073600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No. 62172001, 92370124, 62276149, 92248303"],"award-info":[{"award-number":["No. 62172001, 92370124, 62276149, 92248303"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,28]]},"DOI":"10.1145\/3664647.3680616","type":"proceedings-article","created":{"date-parts":[[2024,10,26]],"date-time":"2024-10-26T06:59:27Z","timestamp":1729925967000},"page":"8120-8128","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-3300-396X","authenticated-orcid":false,"given":"Shuyuan","family":"Liu","sequence":"first","affiliation":[{"name":"Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1712-8027","authenticated-orcid":false,"given":"Jiawei","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0481-5855","authenticated-orcid":false,"given":"Shouwei","family":"Ruan","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence, Beihang University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8294-6315","authenticated-orcid":false,"given":"Hang","family":"Su","sequence":"additional","affiliation":[{"name":"Dept. of Comp. Sci. &amp; Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University &amp; Zhongguancun Laboratory, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0387-4806","authenticated-orcid":false,"given":"Zhaoxia","family":"Yin","sequence":"additional","affiliation":[{"name":"Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2024,10,28]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(02)00021-3"},{"key":"e_1_3_2_1_2_1","volume-title":"Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390","author":"Awadalla Anas","year":"2023","unstructured":"Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, et al. 2023. Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390 (2023)."},{"key":"e_1_3_2_1_3_1","volume-title":"Conference on robot learning. PMLR, 287--318","author":"Brohan Anthony","year":"2023","unstructured":"Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, et al. 2023. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on robot learning. PMLR, 287--318."},{"key":"e_1_3_2_1_4_1","volume-title":"Daphne Ippolito, Katherine Lee, Florian Tramer, and Ludwig Schmidt.","author":"Carlini Nicholas","year":"2023","unstructured":"Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, and Ludwig Schmidt. 2023. Are aligned neural networks adversarially aligned?arxiv: 2306.15447 [cs.CL]"},{"key":"e_1_3_2_1_5_1","unstructured":"Patrick Chao Alexander Robey Edgar Dobriban Hamed Hassani George J. Pappas and Eric Wong. 2023. Jailbreaking Black Box Large Language Models in Twenty Queries. arxiv: 2310.08419 [cs.LG]"},{"key":"e_1_3_2_1_6_1","volume-title":"Dynamic planning with a llm. arXiv preprint arXiv:2308.06391","author":"Dagan Gautier","year":"2023","unstructured":"Gautier Dagan, Frank Keller, and Alex Lascarides. 2023. Dynamic planning with a llm. arXiv preprint arXiv:2308.06391 (2023)."},{"key":"e_1_3_2_1_7_1","volume-title":"A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily. arXiv preprint arXiv:2311.08268","author":"Ding Peng","year":"2023","unstructured":"Peng Ding, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, and Shujian Huang. 2023. A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily. arXiv preprint arXiv:2311.08268 (2023)."},{"key":"e_1_3_2_1_8_1","volume-title":"How Robust is Google's Bard to Adversarial Image Attacks? arXiv preprint arXiv:2309.11751","author":"Dong Yinpeng","year":"2023","unstructured":"Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, and Jun Zhu. 2023. How Robust is Google's Bard to Adversarial Image Attacks? arXiv preprint arXiv:2309.11751 (2023)."},{"key":"e_1_3_2_1_9_1","volume-title":"Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis. arXiv preprint arXiv:2403.11487","author":"Dorbala Vishnu Sashank","year":"2024","unstructured":"Vishnu Sashank Dorbala, Sanjoy Chowdhury, and Dinesh Manocha. 2024. Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis. arXiv preprint arXiv:2403.11487 (2024)."},{"key":"e_1_3_2_1_10_1","volume-title":"James F Mullen Jr, and Dinesh Manocha","author":"Dorbala Vishnu Sashank","year":"2023","unstructured":"Vishnu Sashank Dorbala, James F Mullen Jr, and Dinesh Manocha. 2023. Can an Embodied Agent Find Your 'Cat-shaped Mug'' LLM-Based Zero-Shot Object Navigation. IEEE Robotics and Automation Letters (2023)."},{"key":"e_1_3_2_1_11_1","unstructured":"Wenlong Huang Fei Xia Ted Xiao Harris Chan Jacky Liang Pete Florence Andy Zeng Jonathan Tompson Igor Mordatch Yevgen Chebotar et al. 2022. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608 (2022)."},{"key":"e_1_3_2_1_12_1","unstructured":"Yangsibo Huang Samyak Gupta Mengzhou Xia Kai Li and Danqi Chen. 2023. Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation. arxiv: 2310.06987 [cs.CL]"},{"key":"e_1_3_2_1_13_1","unstructured":"Eric Kolve Roozbeh Mottaghi Winson Han Eli VanderBilt Luca Weihs Alvaro Herrasti Matt Deitke Kiana Ehsani Daniel Gordon Yuke Zhu et al. 2017. Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474 (2017)."},{"key":"e_1_3_2_1_14_1","volume-title":"Mimic-it: Multi-modal in-context instruction tuning. arXiv preprint arXiv:2306.05425","author":"Li Bo","year":"2023","unstructured":"Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, and Ziwei Liu. 2023. Mimic-it: Multi-modal in-context instruction tuning. arXiv preprint arXiv:2306.05425 (2023)."},{"key":"e_1_3_2_1_15_1","volume-title":"Advances in Neural Information Processing Systems","volume":"36","author":"Li Chunyuan","year":"2024","unstructured":"Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. 2024. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems, Vol. 36 (2024)."},{"key":"e_1_3_2_1_16_1","volume-title":"Deepinception: Hypnotize large language model to be jailbreaker. arXiv preprint arXiv:2311.03191","author":"Li Xuan","year":"2023","unstructured":"Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, and Bo Han. 2023. Deepinception: Hypnotize large language model to be jailbreaker. arXiv preprint arXiv:2311.03191 (2023)."},{"key":"e_1_3_2_1_17_1","volume-title":"Autodan: Generating stealthy jailbreak prompts on aligned large language models. arXiv preprint arXiv:2310.04451","author":"Liu Xiaogeng","year":"2023","unstructured":"Xiaogeng Liu, Nan Xu, Muhao Chen, and Chaowei Xiao. 2023. Autodan: Generating stealthy jailbreak prompts on aligned large language models. arXiv preprint arXiv:2310.04451 (2023)."},{"key":"e_1_3_2_1_18_1","volume-title":"Interactive language: Talking to robots in real time","author":"Lynch Corey","year":"2023","unstructured":"Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, and Pete Florence. 2023. Interactive language: Talking to robots in real time. IEEE Robotics and Automation Letters (2023)."},{"key":"e_1_3_2_1_19_1","volume-title":"International Conference on Machine Learning. PMLR, 26311--26325","author":"Nottingham Kolby","year":"2023","unstructured":"Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, and Roy Fox. 2023. Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling. In International Conference on Machine Learning. PMLR, 26311--26325."},{"key":"e_1_3_2_1_20_1","unstructured":"Xiangyu Qi Yi Zeng Tinghao Xie Pin-Yu Chen Ruoxi Jia Prateek Mittal and Peter Henderson. 2023. Fine-tuning Aligned Language Models Compromises Safety Even When Users Do Not Intend To!arxiv: 2310.03693 [cs.CL]"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01444"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i17.29858"},{"key":"e_1_3_2_1_23_1","unstructured":"Rusheb Shah Quentin Feuillade-Montixi Soroush Pour Arush Tagade Stephen Casper and Javier Rando. 2023. Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation. arxiv: 2311.03348 [cs.CL]"},{"key":"e_1_3_2_1_24_1","volume-title":"Plan Diffuser: Grounding LLM Planners with Diffusion Models for Robotic Manipulation. In Bridging the Gap between Cognitive Science and Robot Learning in the Real World: Progresses and New Directions.","author":"Sharan SP","year":"2024","unstructured":"SP Sharan, Ruihan Zhao, Zhangyang Wang, Sandeep P Chinchali, et al. 2024. Plan Diffuser: Grounding LLM Planners with Diffusion Models for Robotic Manipulation. In Bridging the Gap between Cognitive Science and Robot Learning in the Real World: Progresses and New Directions."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00280"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-023-00669-7"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3150855"},{"key":"e_1_3_2_1_28_1","volume-title":"Pathasst: Redefining pathology through generative foundation ai assistant for pathology. arXiv preprint arXiv:2305.15072","author":"Sun Yuxuan","year":"2023","unstructured":"Yuxuan Sun, Chenglu Zhu, Sunyi Zheng, Kai Zhang, Zhongyi Shui, Xiaoxuan Yu, Yizhi Zhao, Honglin Li, Yunlong Zhang, Ruojia Zhao, et al. 2023. Pathasst: Redefining pathology through generative foundation ai assistant for pathology. arXiv preprint arXiv:2305.15072 (2023)."},{"key":"e_1_3_2_1_29_1","volume-title":"The Twelfth International Conference on Learning Representations.","author":"Szot Andrew","year":"2023","unstructured":"Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Rin Metcalf, Walter Talbott, Natalie Mackraz, R Devon Hjelm, and Alexander T Toshev. 2023. Large language models as generalizable policies for embodied tasks. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s44267-024-00038-x"},{"key":"e_1_3_2_1_31_1","volume-title":"BERTTune: Fine-tuning neural machine translation with BERTScore. arXiv preprint arXiv:2106.02208","author":"Unanue Inigo Jauregi","year":"2021","unstructured":"Inigo Jauregi Unanue, Jacob Parnell, and Massimo Piccardi. 2021. BERTTune: Fine-tuning neural machine translation with BERTScore. arXiv preprint arXiv:2106.02208 (2021)."},{"key":"e_1_3_2_1_32_1","volume-title":"Chatgpt for robotics: Design principles and model abilities. arXiv preprint arXiv:2306.17582","author":"Vemprala Sai","year":"2023","unstructured":"Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. 2023. Chatgpt for robotics: Design principles and model abilities. arXiv preprint arXiv:2306.17582 (2023)."},{"key":"e_1_3_2_1_33_1","volume-title":"Jailbroken: How Does LLM Safety Training Fail?arxiv: 2307.02483 [cs.LG]","author":"Wei Alexander","year":"2023","unstructured":"Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How Does LLM Safety Training Fail?arxiv: 2307.02483 [cs.LG]"},{"key":"e_1_3_2_1_34_1","volume-title":"Advances in Neural Information Processing Systems","volume":"36","author":"Wei Alexander","year":"2024","unstructured":"Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2024. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems, Vol. 36 (2024)."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-023-10139-z"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2010.2045169"},{"key":"e_1_3_2_1_37_1","volume-title":"Embodied task planning with large language models. arXiv preprint arXiv:2307.01848","author":"Wu Zhenyu","year":"2023","unstructured":"Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, and Haibin Yan. 2023. Embodied task planning with large language models. arXiv preprint arXiv:2307.01848 (2023)."},{"key":"e_1_3_2_1_38_1","volume-title":"Embodied multi-modal agent trained by an llm from a parallel textworld. arXiv preprint arXiv:2311.16714","author":"Yang Yijun","year":"2023","unstructured":"Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, and Yuhui Shi. 2023. Embodied multi-modal agent trained by an llm from a parallel textworld. arXiv preprint arXiv:2311.16714 (2023)."},{"key":"e_1_3_2_1_39_1","volume-title":"GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts. arxiv: 2309.10253 [cs.AI]","author":"Yu Jiahao","year":"2023","unstructured":"Jiahao Yu, Xingwei Lin, Zheng Yu, and Xinyu Xing. 2023. GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts. arxiv: 2309.10253 [cs.AI]"},{"key":"e_1_3_2_1_40_1","volume-title":"Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485","author":"Zhang Hongxin","year":"2023","unstructured":"Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B Tenenbaum, Tianmin Shu, and Chuang Gan. 2023. Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485 (2023)."},{"key":"e_1_3_2_1_41_1","volume-title":"Chatcad: Towards a universal and reliable interactive cad using llms. arXiv preprint arXiv:2305.15964","author":"Zhao Zihao","year":"2023","unstructured":"Zihao Zhao, Sheng Wang, Jinchen Gu, Yitao Zhu, Lanzhuju Mei, Zixu Zhuang, Zhiming Cui, Qian Wang, and Dinggang Shen. 2023. Chatcad: Towards a universal and reliable interactive cad using llms. arXiv preprint arXiv:2305.15964 (2023)."},{"key":"e_1_3_2_1_42_1","volume-title":"Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds. In The Twelfth International Conference on Learning Representations.","author":"Zheng Sipeng","year":"2023","unstructured":"Sipeng Zheng, Yicheng Feng, Zongqing Lu, et al. 2023. Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_1_43_1","volume-title":"Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043","author":"Zou Andy","year":"2023","unstructured":"Andy Zou, Zifan Wang, J Zico Kolter, and Matt Fredrikson. 2023. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043 (2023)."}],"event":{"name":"MM '24: The 32nd ACM International Conference on Multimedia","location":"Melbourne VIC Australia","acronym":"MM '24","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 32nd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664647.3680616","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3664647.3680616","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:56Z","timestamp":1750295876000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664647.3680616"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,28]]},"references-count":43,"alternative-id":["10.1145\/3664647.3680616","10.1145\/3664647"],"URL":"https:\/\/doi.org\/10.1145\/3664647.3680616","relation":{},"subject":[],"published":{"date-parts":[[2024,10,28]]},"assertion":[{"value":"2024-10-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}