{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,24]],"date-time":"2026-07-24T14:49:52Z","timestamp":1784904592786,"version":"3.55.0"},"reference-count":235,"publisher":"Association for Computing Machinery (ACM)","issue":"7","license":[{"start":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T00:00:00Z","timestamp":1740096000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,7,31]]},"abstract":"<jats:p>An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or makes decisions based on pre-defined objectives and data inputs. AI agents, capable of perceiving user inputs, reasoning and planning tasks, and executing actions, have seen remarkable advancements in algorithm development and task performance. However, the security challenges they pose remain under-explored and unresolved. This survey delves into the emerging security threats faced by AI agents, categorizing them into four critical knowledge gaps: unpredictability of multi-step user inputs, complexity in internal executions, variability of operational environments, and interactions with untrusted external entities. By systematically reviewing these threats, this article highlights both the progress made and the existing limitations in safeguarding AI agents. The insights provided aim to inspire further research into addressing the security threats associated with AI agents, thereby fostering the development of more robust and secure AI agent applications.<\/jats:p>","DOI":"10.1145\/3716628","type":"journal-article","created":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T11:04:27Z","timestamp":1738926267000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":168,"title":["AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-5469-0762","authenticated-orcid":false,"given":"Zehang","family":"Deng","sequence":"first","affiliation":[{"name":"Department of Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-9279-3010","authenticated-orcid":false,"given":"Yongjian","family":"Guo","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-7253-8176","authenticated-orcid":false,"given":"Changzhou","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6305-1740","authenticated-orcid":false,"given":"Wanlun","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2028-510X","authenticated-orcid":false,"given":"Junwu","family":"Xiong","sequence":"additional","affiliation":[{"name":"Ant Group CO Ltd, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0655-666X","authenticated-orcid":false,"given":"Sheng","family":"Wen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5252-0831","authenticated-orcid":false,"given":"Yang","family":"Xiang","sequence":"additional","affiliation":[{"name":"Swinburne University of Technology, Hawthorn, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,2,21]]},"reference":[{"key":"e_1_3_1_2_2","volume-title":"Instruction Defense","author":"LearnPrompting","year":"2023","unstructured":"LearnPrompting. 2023. Instruction Defense. Retrieved from https:\/\/learnprompting.org\/docs\/prompt_hacking\/defensive_measures\/instruction"},{"key":"e_1_3_1_3_2","volume-title":"Sandwich Defense","author":"LearnPrompting","year":"2023","unstructured":"LearnPrompting. 2023. Sandwich Defense. Retrieved from https:\/\/learnprompting.org\/docs\/prompt_hacking\/defensive_measures\/sandwich_defense"},{"key":"e_1_3_1_4_2","unstructured":"Mahyar Abbasian Iman Azimi Amir M. Rahmani and Ramesh Jain. 2023. Conversational health agents: A personalized LLM-powered agent framework. Retrieved from https:\/\/arxiv.org\/abs\/2310.02374"},{"key":"e_1_3_1_5_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et\u00a0al. 2023. GPT-4 technical report. Retrieved from https:\/\/arxiv.org\/pdf\/2303.08774"},{"key":"e_1_3_1_6_2","unstructured":"Divyansh Agarwal Alexander R. Fabbri Philippe Laban Shafiq Joty Caiming Xiong and Chien-Sheng Wu. 2024. Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions. Retrieved from https:\/\/arxiv.org\/pdf\/2404.16251v1"},{"key":"e_1_3_1_7_2","unstructured":"Gabriel Alon and Michael Kamfonas. 2023. Detecting language model attacks with perplexity. Retrieved from https:\/\/arxiv.org\/pdf\/2308.14132"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Jacob Andreas. 2022. Language models as agent models. Retrieved from: https:\/\/arxiv.org\/pdf\/2212.01681","DOI":"10.18653\/v1\/2022.findings-emnlp.423"},{"key":"e_1_3_1_9_2","article-title":"Many-shot Jailbreaking","year":"2024","unstructured":"Anthropic. 2024. Many-shot Jailbreaking. Retrieved from https:\/\/www.anthropic.com\/research\/many-shot-jailbreaking","journal-title":"R"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2021.103500"},{"key":"e_1_3_1_11_2","unstructured":"Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli Tom Henighan et\u00a0al. 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. Retrieved from https:\/\/arxiv.org\/pdf\/2204.05862"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","unstructured":"Ying Bao Wankun Gong and Kaiwen Yang. 2022. A literature review of human\u2013AI synergy in decision making: From the perspective of affordance actualization theory. Systems 11 9 (2023) 442.","DOI":"10.3390\/systems11090442"},{"key":"e_1_3_1_13_2","unstructured":"Rishabh Bhardwaj and Soujanya Poria. 2023. Language model unalignment: Parametric red-teaming to expose hidden harms and biases. Retrieved from https:\/\/arxiv.org\/pdf\/2310.14303"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-3002"},{"key":"e_1_3_1_15_2","article-title":"Language models are few-shot learners","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. NeurIPS (2020).","journal-title":"NeurIPS"},{"key":"e_1_3_1_16_2","article-title":"Robust multi-agent reinforcement learning via adversarial regularization: Theoretical foundation and stable algorithms","author":"Bukharin Alexander","year":"2024","unstructured":"Alexander Bukharin, Yan Li, Yue Yu, Qingru Zhang, Zhehui Chen, Simiao Zuo, Chao Zhang, Songan Zhang, and Tuo Zhao. 2024. Robust multi-agent reinforcement learning via adversarial regularization: Theoretical foundation and stable algorithms. NeurIPS (2024).","journal-title":"NeurIPS"},{"key":"e_1_3_1_17_2","volume-title":"Proceedings of the ICML","author":"Carta Thomas","year":"2023","unstructured":"Thomas Carta, Cl\u00e9ment Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, and Pierre-Yves Oudeyer. 2023. Grounding large language models in interactive environments with online reinforcement learning. In Proceedings of the ICML."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CSDE59766.2023.10487759"},{"key":"e_1_3_1_19_2","unstructured":"Patrick Chao Alexander Robey Edgar Dobriban Hamed Hassani George J. Pappas and Eric Wong. 2023. Jailbreaking black box large language models in twenty queries. Retrieved from https:\/\/arxiv.org\/pdf\/2310.08419"},{"key":"e_1_3_1_20_2","unstructured":"Dake Chen Hanbin Wang Yunhao Huo Yuzhao Li and Haoyang Zhang. 2023. Gamegpt: Multi-agent collaborative framework for game development. Retrieved from https:\/\/arxiv.org\/pdf\/2310.08067"},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","unstructured":"Mengqi Chen Bin Guo Hao Wang Haoyu Li Qian Zhao Jingqi Liu Yasan Ding Yan Pan and Zhiwen Yu. 2024. The future of cognitive strategy-enhanced persuasive dialogue agents: New perspectives and trends. Retrieved from https:\/\/arxiv.org\/pdf\/2402.04631","DOI":"10.1007\/s11704-024-40057-x"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00068"},{"key":"e_1_3_1_23_2","unstructured":"Xuan Chen Yuzhou Nie Lu Yan Yunshu Mao Wenbo Guo and Xiangyu Zhang. 2024. RL-JACK: Reinforcement learning-powered black-box jailbreaking attack against LLMs. Retrieved from https:\/\/arxiv.org\/pdf\/2406.08725"},{"key":"e_1_3_1_24_2","unstructured":"Yongchao Chen Jacob Arkin Yang Zhang Nicholas Roy and Chuchu Fan. 2023. Scalable multi-robot collaboration with large language models: Centralized or decentralized systems? Retrieved from https:\/\/arxiv.org\/html\/2309.15943v2"},{"key":"e_1_3_1_25_2","unstructured":"Zhaorun Chen Zhen Xiang Chaowei Xiao Dawn Song and Bo Li. 2024. Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases. Retrieved from https:\/\/arxiv.org\/pdf\/2407.12784"},{"key":"e_1_3_1_26_2","unstructured":"Steffi Chern Zhen Fan and Andy Liu. 2024. Combating adversarial attacks with multi-agent debate. Retrieved from https:\/\/arxiv.org\/pdf\/2401.05998"},{"key":"e_1_3_1_27_2","article-title":"Deep reinforcement learning from human preferences","volume":"30","author":"Christiano Paul F.","year":"2017","unstructured":"Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. NeurIPS 30 (2017).","journal-title":"NeurIPS"},{"key":"e_1_3_1_28_2","article-title":"Teen\u2019s Parents Sue Character.AI, Claiming its Chatbot Contributed to their Son\u2019s Suicide","year":"2024","unstructured":"CNN. 2024. Teen\u2019s Parents Sue Character.AI, Claiming its Chatbot Contributed to their Son\u2019s Suicide. Retrieved from https:\/\/edition.cnn.com\/2024\/10\/30\/tech\/teen-suicide-character-ai-lawsuit\/index.html. Accessed: 2024-12-01.","journal-title":"R"},{"key":"e_1_3_1_29_2","unstructured":"Stav Cohen Ron Bitton and Ben Nassi. 2024. Here comes the AI worm: Unleashing zero-click worms that target GenAI-powered applications. Retrieved from https:\/\/arxiv.org\/pdf\/2403.02817"},{"key":"e_1_3_1_30_2","unstructured":"Maxwell Crouse Ibrahim Abdelaziz Kinjal Basu Soham Dan Sadhana Kumaravel Achille Fokoue Pavan Kapanipathi and Luis Lastras. 2023. Formally specifying the high-level behavior of LLM-based agents. Retrieved from https:\/\/arxiv.org\/pdf\/2310.08535"},{"key":"e_1_3_1_31_2","unstructured":"Chenhang Cui Gelei Deng An Zhang Jingnan Zheng Yicong Li Lianli Gao Tianwei Zhang and Tat-Seng Chua. 2024. Safe+ Safe= Unsafe? Exploring how safe images can be exploited to jailbreak large vision-language models. Retrieved from https:\/\/arxiv.org\/pdf\/2411.11496"},{"key":"e_1_3_1_32_2","unstructured":"Tianyu Cui Yanling Wang Chuanpu Fu Yong Xiao Sijia Li Xinhao Deng Yunpeng Liu Qinglin Zhang Ziyi Qiu Peiyang Li et\u00a0al. 2024. Risk taxonomy mitigation and assessment benchmarks of large language model systems. Retrieved from https:\/\/arxiv.org\/pdf\/2401.05778"},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","unstructured":"Luigi De Angelis Francesco Baglivo Guglielmo Arzilli Gaetano Pierpaolo Privitera Paolo Ferragina Alberto Eugenio Tozzi and Caterina Rizzo. 2023. ChatGPT and the rise of large language models: The new AI-driven infodemic threat in public health. Frontiers in Public Health 11 (2023) 1166120.","DOI":"10.3389\/fpubh.2023.1166120"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Jerry den Hartog and Nicola Zannone. 2018. Security and privacy for innovative automotive applications: A survey. Computer Communications 132 (2018) 17\u201341.","DOI":"10.1016\/j.comcom.2018.09.010"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Gelei Deng Yi Liu Yuekang Li Kailong Wang Ying Zhang Zefeng Li Haoyu Wang Tianwei Zhang and Yang Liu. 2023. Jailbreaker: Automated jailbreak across multiple large language model chatbots. Retrieved from https:\/\/arxiv.org\/pdf\/2307.08715","DOI":"10.14722\/ndss.2024.24188"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.88"},{"key":"e_1_3_1_37_2","unstructured":"V. Dibia. 2023. Generative AI: Practical steps to reduce hallucination and improve performance of systems built with large language models. In designing with ML: How to build usable machine learning applications. Self-published on designingwithml.com."},{"key":"e_1_3_1_38_2","unstructured":"Yiran Ding Li Lyna Zhang Chengruidong Zhang Yuanyuan Xu Ning Shang Jiahang Xu Fan Yang and Mao Yang. 2024. LongRoPE: Extending LLM context window beyond 2 million tokens. Retrieved from https:\/\/arxiv.org\/pdf\/2402.13753"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3427228.3427264"},{"key":"e_1_3_1_40_2","unstructured":"Tian Dong Guoxing Chen Shaofeng Li Minhui Xue Rayne Holland Yan Meng Zhen Liu and Haojin Zhu. 2023. Unleashing cheapfakes through trojan plugins of large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2312.00374v1"},{"key":"e_1_3_1_41_2","unstructured":"Yingkai Dong Zheng Li Xiangtao Meng Ning Yu and Shanqing Guo. 2024. Jailbreaking text-to-image models with LLM-based agents. Retrieved from https:\/\/arxiv.org\/pdf\/2408.00523"},{"key":"e_1_3_1_42_2","unstructured":"Yilun Du Shuang Li Antonio Torralba Joshua B. Tenenbaum and Igor Mordatch. 2023. Improving factuality and reasoning in language models through multiagent debate. Retrieved from https:\/\/arxiv.org\/pdf\/2305.14325"},{"key":"e_1_3_1_43_2","volume-title":"Proceedings of the ICML","author":"Du Yuqing","year":"2023","unstructured":"Yuqing Du, Olivia Watkins, Zihan Wang, C\u00e9dric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, and Jacob Andreas. 2023. Guiding pretraining in reinforcement learning with large language models. In Proceedings of the ICML. PMLR."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.168"},{"key":"e_1_3_1_45_2","unstructured":"Eva Eigner and Thorsten H\u00e4ndler. 2024. Determinants of LLM-assisted decision-making. Retrieved from https:\/\/arxiv.org\/pdf\/2402.17385"},{"key":"e_1_3_1_46_2","article-title":"ChatGPT Plugins: Data Exfiltration Via Images & Cross Plugin Request Forgery","author":"Red Embrace The","year":"2023","unstructured":"Embrace The Red. 2023. ChatGPT Plugins: Data Exfiltration Via Images & Cross Plugin Request Forgery. Retrieved from https:\/\/embracethered.com\/blog\/posts\/2023\/chatgpt-webpilot-data-exfil-via-markdown-injection\/","journal-title":"R"},{"key":"e_1_3_1_47_2","volume-title":"Proceedings of the UbiSec","author":"Esmradi Aysan","year":"2023","unstructured":"Aysan Esmradi, Daniel Wankit Yip, and Chun Fai Chan. 2023. A comprehensive survey of attack techniques, implementation, and mitigation strategies in large language models. In Proceedings of the UbiSec."},{"key":"e_1_3_1_48_2","doi-asserted-by":"crossref","unstructured":"Meta Fundamental AI Research Diplomacy Team (FAIR). Anton Bakhtin Noam Brown Emily Dinan Gabriele Farina Colin Flaherty Daniel Fried et\u00a0al. 2022. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science 378 6624 (2022) 1067\u20131074.","DOI":"10.1126\/science.ade9097"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICITR61062.2023.10382917"},{"key":"e_1_3_1_50_2","unstructured":"Isabel O. Gallegos Ryan A. Rossi Joe Barrow Md Mehrab Tanjim Sungchul Kim Franck Dernoncourt Tong Yu Ruiyi Zhang and Nesreen K. Ahmed. 2023. Bias and fairness in large language models: A survey. Retrieved from https:\/\/arxiv.org\/pdf\/2309.00770"},{"key":"e_1_3_1_51_2","doi-asserted-by":"crossref","unstructured":"Chen Gao Xiaochong Lan Zhihong Lu Jinzhu Mao Jinghua Piao Huandong Wang Depeng Jin and Yong Li. 2023. S \\(^{3}\\) : Social-network simulation system with large language model-empowered agents. Retrieved from https:\/\/arxiv.org\/pdf\/2307.14984","DOI":"10.2139\/ssrn.4607026"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.301"},{"key":"e_1_3_1_53_2","unstructured":"Jonas Geiping Alex Stein Manli Shu Khalid Saifullah Yuxin Wen and Tom Goldstein. 2024. Coercing LLMs to do and reveal (almost) anything. Retrieved from https:\/\/arxiv.org\/abs\/2402.14020"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3608134"},{"key":"e_1_3_1_55_2","doi-asserted-by":"crossref","unstructured":"Judy Wawira Gichoya Kaesha Thomas Leo Anthony Celi Nabile Safdar Imon Banerjee John D. Banja Laleh Seyyed-Kalantari Hari Trivedi and Saptarshi Purkayastha. 2023. AI pitfalls and what not to do: Mitigating bias in AI. The British Journal of Radiology 96 1150 (2023) 20230023.","DOI":"10.1259\/bjr.20230023"},{"key":"e_1_3_1_56_2","unstructured":"Kai Greshake Sahar Abdelnabi Shailesh Mishra Christoph Endres Thorsten Holz and Mario Fritz. 2023. More than you\u2019ve asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2302.12173"},{"key":"e_1_3_1_57_2","first-page":"79","volume-title":"Proceedings of the AISec","author":"Greshake Kai","year":"2023","unstructured":"Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you\u2019ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the AISec. 79\u201390."},{"key":"e_1_3_1_58_2","unstructured":"Xiangming Gu Xiaosen Zheng Tianyu Pang Chao Du Qian Liu Ye Wang Jing Jiang and Min Lin. 2024. Agent smith: A single image can jailbreak one million multimodal LLM agents exponentially fast. Retrieved from https:\/\/arxiv.org\/pdf\/2402.08567"},{"key":"e_1_3_1_59_2","volume-title":"Proceedings of the SmartSP","author":"Guastalla Michael","year":"2023","unstructured":"Michael Guastalla, Yiyi Li, Arvin Hekmati, and Bhaskar Krishnamachari. 2023. Application of large language models to DDoS attack detection. In Proceedings of the SmartSP."},{"key":"e_1_3_1_60_2","doi-asserted-by":"crossref","unstructured":"Maanak Gupta CharanKumar Akiri Kshitiz Aryal Eli Parker and Lopamudra Prahara. 2023. From ChatGPT to ThreatGPT: Impact of generative AI in cybersecurity and privacy. IEEE Access 11 (2023) 80218\u201380243.","DOI":"10.1109\/ACCESS.2023.3300381"},{"key":"e_1_3_1_61_2","unstructured":"Rui Hao Linmei Hu Weijian Qi Qingliu Wu Yirui Zhang and Liqiang Nie. 2023. ChatLLM network: More brains more intelligence. Retrieved from https:\/\/arxiv.org\/pdf\/2304.12998"},{"key":"e_1_3_1_62_2","unstructured":"Rich Harang. 2023. Securing LLM Systems Against Prompt Injection. Retrieved from https:\/\/developer.nvidia.com\/blog\/securing-llm-systems-against-prompt-injection\/"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.199"},{"key":"e_1_3_1_64_2","volume-title":"Proceedings of the AAAI","author":"Hatalis Kostas","year":"2023","unstructured":"Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith Lambert, Adam Amos-Binks, Zohreh Dannenhauer, and Dustin Dannenhauer. 2023. Memory matters: The need to improve long-term memory in LLM-Agents. In Proceedings of the AAAI."},{"key":"e_1_3_1_65_2","unstructured":"Keegan Hines Gary Lopez Matthew Hall Federico Zarfati Yonatan Zunger and Emre Kiciman. 2024. Defending against indirect prompt injection attacks with spotlighting. Retrieved from https:\/\/arxiv.org\/pdf\/2403.14720"},{"key":"e_1_3_1_66_2","volume-title":"Proceedings of the ICLR","author":"Hong Sirui","year":"2024","unstructured":"Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and J\u00fcrgen Schmidhuber. 2024. MetaGPT: Meta programming for a multi-agent collaborative framework. In Proceedings of the ICLR."},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.293"},{"key":"e_1_3_1_68_2","volume-title":"Proceedings of the CHI","author":"Hou Yuki","year":"2024","unstructured":"Yuki Hou, Haruki Tamoto, and Homei Miyashita. 2024. \u201cMy agent understands me better\u201d: Integrating dynamic human-like memory recall and consolidation in LLM-based agents. In Proceedings of the CHI."},{"key":"e_1_3_1_69_2","article-title":"UK Cybersecurity agency Warns of Chatbot \u2018Prompt Injection\u2019 Attacks\u2014theguardian.com","author":"farah https:\/\/www.theguardian.com\/profile\/hibaq","year":"2024","unstructured":"https:\/\/www.theguardian.com\/profile\/hibaq farah. 2024. UK Cybersecurity agency Warns of Chatbot \u2018Prompt Injection\u2019 Attacks\u2014theguardian.com. Retrieved from https:\/\/www.theguardian.com\/technology\/2023\/aug\/30\/uk-cybersecurity-agency-warns-of-chatbot-prompt-injection-attacks","journal-title":"R"},{"key":"e_1_3_1_70_2","unstructured":"Bin Hu Chenyang Zhao Pu Zhang Zihao Zhou Yuanhang Yang Zenglin Xu and Bin Liu. 2023. Enabling intelligent interactions between an agent and an LLM: A reinforcement learning approach. Retrieved from https:\/\/arxiv.org\/pdf\/2306.03604"},{"key":"e_1_3_1_71_2","doi-asserted-by":"crossref","unstructured":"W. Hua X. Yang M. Jin Z. Li W. Cheng R. Tang and Y. Zhang. 2024. November. Trustagent: Towards safe and trustworthy llm-based agents. In Findings of the Association for Computational Linguistics: EMNLP 2024. 10000\u201310016.","DOI":"10.18653\/v1\/2024.findings-emnlp.585"},{"key":"e_1_3_1_72_2","doi-asserted-by":"crossref","unstructured":"Xiaowei Huang Wenjie Ruan Wei Huang Gaojie Jin Yi Dong Changshun Wu Saddek Bensalem et\u00a0al. 2024. A survey of safety and trustworthiness of large language models through the lens of verification and validation. Artificial Intelligence Review 57 7 (2024) 175.","DOI":"10.1007\/s10462-024-10824-0"},{"key":"e_1_3_1_73_2","unstructured":"Xijie Huang Li Lyna Zhang Kwang-Ting Cheng and Mao Yang. 2023. Boosting LLM reasoning: Push the limits of few-shot learning with reinforced in-context pruning. Retrieved from https:\/\/arxiv.org\/pdf\/2312.08901"},{"key":"e_1_3_1_74_2","unstructured":"Yue Huang Qihui Zhang Lichao Sun et\u00a0al. 2023. Trustgpt: A benchmark for trustworthy and responsible large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2306.11507"},{"key":"e_1_3_1_75_2","volume-title":"Proceedings of the ICLR","author":"Humeau S.","year":"2020","unstructured":"S. Humeau, K. Shuster, M. Lachaux, and J. Weston. 2020. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv. In Proceedings of the ICLR."},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.inlg-main.3"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.74"},{"key":"e_1_3_1_78_2","unstructured":"Neel Jain Avi Schwarzschild Yuxin Wen Gowthami Somepalli John Kirchenbauer Ping-yeh Chiang Micah Goldblum Aniruddha Saha Jonas Geiping and Tom Goldstein. 2023. Baseline defenses for adversarial attacks against aligned language models. Retrieved from https:\/\/arxiv.org\/pdf\/2309.00614"},{"key":"e_1_3_1_79_2","doi-asserted-by":"crossref","unstructured":"Ziwei Ji Nayeon Lee Rita Frieske Tiezheng Yu Dan Su Yan Xu Etsuko Ishii Ye Jin Bang Andrea Madotto and Pascale Fung. 2023. Survey of hallucination in natural language generation. ACM Computing Surveys 55 12 (2023) 1\u201338.","DOI":"10.1145\/3571730"},{"key":"e_1_3_1_80_2","unstructured":"Zhenlan Ji Daoyuan Wu Pingchuan Ma Zongjie Li and Shuai Wang. 2024. Testing and understanding erroneous planning in LLM agents through synthesized user inputs. Retrieved from https:\/\/arxiv.org\/pdf\/2404.17833"},{"key":"e_1_3_1_81_2","volume-title":"Proceedings of the ICLR","author":"Jiang Fengqing","year":"2023","unstructured":"Fengqing Jiang, Zhangchen Xu, Luyao Niu, Boxin Wang, Jinyuan Jia, Bo Li, and Radha Poovendran. 2023. Identifying and mitigating vulnerabilities in LLM-Integrated applications. In Proceedings of the ICLR."},{"key":"e_1_3_1_82_2","unstructured":"Mintong Kang Nezihe Merve G\u00fcrel Ning Yu Dawn Song and Bo Li. 2024. C-RAG: Certified generation risks for retrieval-augmented language models. Retrieved from https:\/\/arxiv.org\/pdf\/2402.03181"},{"key":"e_1_3_1_83_2","article-title":"Large language models are zero-shot reasoners","author":"Kojima Takeshi","year":"2022","unstructured":"Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. NeurIPS (2022).","journal-title":"NeurIPS"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00038"},{"key":"e_1_3_1_85_2","unstructured":"Aounon Kumar Chirag Agarwal Suraj Srinivas Soheil Feizi and Hima Lakkaraju. 2024. Certifying LLM safety against adversarial prompting. Retrieved from https:\/\/arxiv.org\/pdf\/2309.02705"},{"key":"e_1_3_1_86_2","article-title":"Weight poisoning attacks on pre-trained models","author":"Kurita Keita","year":"2020","unstructured":"Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight poisoning attacks on pre-trained models. arXiv (2020).","journal-title":"arXiv"},{"key":"e_1_3_1_87_2","article-title":"Factuality enhanced language models for open-ended text generation","volume":"35","author":"Lee Nayeon","year":"2022","unstructured":"Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale N. Fung, Mohammad Shoeybi, and Bryan Catanzaro. 2022. Factuality enhanced language models for open-ended text generation. NeurIPS 35.","journal-title":"NeurIPS"},{"key":"e_1_3_1_88_2","unstructured":"Patrick Levi and Christoph P. Neumann. 2024. Vocabulary attack to hijack large language model applications. Retrieved from https:\/\/arxiv.org\/pdf\/2404.02637"},{"key":"e_1_3_1_89_2","volume-title":"Proceedings of the NeurIPS","author":"Lewis Patrick","year":"2020","unstructured":"Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, Tim Rockt\u00e4schel, et al.. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the NeurIPS."},{"key":"e_1_3_1_90_2","volume-title":"Proceedings of the NeurIPS","author":"Li Guohao","year":"2023","unstructured":"Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. CAMEL: Communicative agents for \u201dMind\u201d exploration of large language model society. In Proceedings of the NeurIPS."},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.272"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.881"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.397"},{"key":"e_1_3_1_94_2","unstructured":"Yuanchun Li Hao Wen Weijun Wang Xiangyu Li Yizhen Yuan Guohong Liu Jiacheng Liu Wenxing Xu Xiang Wang Yi Sun et\u00a0al. 2024. Personal LLM agents: Insights and survey about the capability efficiency and security. Retrieved from https:\/\/arxiv.org\/pdf\/2401.05459"},{"key":"e_1_3_1_95_2","unstructured":"Zelong Li Wenyue Hua Hao Wang He Zhu and Yongfeng Zhang. 2024. Formal-LLM: Integrating formal language and natural language for controllable LLM-based agents. Retrieved from https:\/\/arxiv.org\/pdf\/2402.00798"},{"key":"e_1_3_1_96_2","unstructured":"Tian Liang Zhiwei He Wenxiang Jiao Xing Wang Yan Wang Rui Wang Yujiu Yang Zhaopeng Tu and Shuming Shi. 2023. Encouraging divergent thinking in large language models through multi-agent debate. Retrieved from https:\/\/arxiv.org\/pdf\/2305.19118"},{"key":"e_1_3_1_97_2","unstructured":"Zeyi Liao Lingbo Mo Chejian Xu Mintong Kang Jiawei Zhang Chaowei Xiao Yuan Tian Bo Li and Huan Sun. 2024. Eia: Environmental injection attack on generalist web agents for privacy leakage. Retrieved from https:\/\/arxiv.org\/pdf\/2409.11295"},{"key":"e_1_3_1_98_2","volume-title":"Proceedings of the NeurIPS Workshop","author":"Light Jonathan","year":"2023","unstructured":"Jonathan Light, Min Cai, Sheng Shen, and Ziniu Hu. 2023. AvalonBench: Evaluating LLMs playing the game of avalon. In Proceedings of the NeurIPS Workshop."},{"key":"e_1_3_1_99_2","unstructured":"Baihan Lin Djallel Bouneffouf Guillermo Cecchi and Kush R. Varshney. 2023. Towards healthy AI: large language models need therapists too. Retrieved from https:\/\/arxiv.org\/pdf\/2304.00416"},{"key":"e_1_3_1_100_2","unstructured":"Jiaju Lin Haoran Zhao Aochi Zhang Yiting Wu Huqiuyue Ping and Qin Chen. 2023. Agentsims: An open-source sandbox for large language model evaluation. Retrieved from https:\/\/arxiv.org\/pdf\/2308.04026"},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_8"},{"key":"e_1_3_1_102_2","article-title":"Efficient adversarial attacks on online multi-agent reinforcement learning","author":"Liu Guanlin","year":"2023","unstructured":"Guanlin Liu and Lifeng Lai. 2023. Efficient adversarial attacks on online multi-agent reinforcement learning. NeurIPS (2023).","journal-title":"NeurIPS"},{"key":"e_1_3_1_103_2","unstructured":"Sheng Liu Lei Xing and James Zou. 2023. In-context vectors: Making in context learning more effective and controllable through latent space steering. Retrieved from https:\/\/arxiv.org\/pdf\/2311.06668"},{"key":"e_1_3_1_104_2","unstructured":"Tong Liu Zizhuang Deng Guozhu Meng Yuekang Li and Kai Chen. 2023. Demystifying rce vulnerabilities in LLM-integrated apps. Retrieved from https:\/\/arxiv.org\/pdf\/2309.02926"},{"key":"e_1_3_1_105_2","unstructured":"Yi Liu Gelei Deng Yuekang Li Kailong Wang Tianwei Zhang Yepang Liu Haoyu Wang Yan Zheng and Yang Liu. 2023. Prompt injection attack against LLM-integrated applications. Retrieved from https:\/\/arxiv.org\/pdf\/2306.05499"},{"key":"e_1_3_1_106_2","unstructured":"Yi Liu Gelei Deng Zhengzi Xu Yuekang Li Yaowen Zheng Ying Zhang Lida Zhao Tianwei Zhang and Yang Liu. 2023. Jailbreaking chatgpt via prompt engineering: An empirical study. Retrieved from https:\/\/arxiv.org\/pdf\/2305.13860"},{"key":"e_1_3_1_107_2","volume-title":"Proceedings of the USENIX Security","author":"Liu Yupei","year":"2024","unstructured":"Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In Proceedings of the USENIX Security."},{"key":"e_1_3_1_108_2","unstructured":"Yang Liu Yuanshun Yao Jean-Francois Ton Xiaoying Zhang Ruocheng Guo Hao Cheng Yegor Klochkov Muhammad Faaiz Taufiq and Hang Li. 2023. Trustworthy LLMs: A survey and guideline for evaluating large language models\u2019 alignment. Retrieved from https:\/\/arxiv.org\/pdf\/2308.05374"},{"key":"e_1_3_1_109_2","unstructured":"Qinghua Lu Liming Zhu Xiwei Xu Zhenchang Xing Stefan Harrer and Jon Whittle. 2023. Building the future of responsible AI: A reference architecture for designing large language model based agents. Retrieved from https:\/\/arxiv.org\/pdf\/2311.13148"},{"key":"e_1_3_1_110_2","doi-asserted-by":"crossref","unstructured":"Ziqing Lu Guanlin Liu Lifeng Lai and Weiyu Xu. 2024. Camouflage adversarial attacks on multiple agent systems. Retrieved from https:\/\/arxiv.org\/pdf\/2401.17405","DOI":"10.1109\/ISIT57864.2024.10619346"},{"key":"e_1_3_1_111_2","doi-asserted-by":"publisher","DOI":"10.1109\/CISS59072.2024.10480189"},{"key":"e_1_3_1_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2024.3367737"},{"key":"e_1_3_1_113_2","article-title":"Self-refine: Iterative refinement with self-feedback","volume":"36","author":"Madaan Aman","year":"2024","unstructured":"Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang et\u00a0al. 2024. Self-refine: Iterative refinement with self-feedback. NeurIPS 36.","journal-title":"NeurIPS"},{"key":"e_1_3_1_114_2","unstructured":"Zhao Mandi Shreeya Jain and Shuran Song. 2023. Roco: Dialectic multi-robot collaboration with large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2307.04738"},{"key":"e_1_3_1_115_2","unstructured":"Tula Masterman Sandi Besen Mason Sawtell and Alex Chao. 2024. The landscape of emerging AI agent architectures for reasoning planning and tool calling: A survey. Retrieved from https:\/\/arxiv.org\/pdf\/2404.11584"},{"key":"e_1_3_1_116_2","doi-asserted-by":"crossref","unstructured":"Richard May and Kerstin Denecke. 2022. Security privacy and healthcare-related conversational agents: A scoping review. Informatics for Health and Social Care 47 2 (2022) 194\u2013210.","DOI":"10.1080\/17538157.2021.1983578"},{"key":"e_1_3_1_117_2","volume-title":"Proceedings of the EACL","author":"Mehta Nikhil","year":"2024","unstructured":"Nikhil Mehta, Milagro Teruel, Xin Deng, Sergio Figueroa Sanz, Ahmed Awadallah, and Julia Kiseleva. 2024. Improving grounded language understanding in a collaborative environment by interacting with agents through help feedback. In Proceedings of the EACL."},{"key":"e_1_3_1_118_2","unstructured":"Kai Mei Zelong Li Shuyuan Xu Ruosong Ye Yingqiang Ge and Yongfeng Zhang. 2024. AIOS: LLM agent operating system. Retrieved from https:\/\/arxiv.org\/pdf\/2403.16971"},{"key":"e_1_3_1_119_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.741"},{"key":"e_1_3_1_120_2","unstructured":"Lingbo Mo Zeyi Liao Boyuan Zheng Yu Su Chaowei Xiao and Huan Sun. 2024. A trembling house of cards? Mapping adversarial attacks against language agents. Retrieved from https:\/\/arxiv.org\/pdf\/2402.10196"},{"key":"e_1_3_1_121_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.765"},{"key":"e_1_3_1_122_2","unstructured":"Stephen Moskal Sam Laney Erik Hemberg and Una-May O\u2019Reilly. 2023. LLMs killed the script kiddie: How agents supported by large language models change the landscape of network threat testing. Retrieved from https:\/\/arxiv.org\/pdf\/2310.06936"},{"key":"e_1_3_1_123_2","volume-title":"Proceedings of the NeurIPS Workshop","author":"Motwani Sumeet Ramesh","year":"2023","unstructured":"Sumeet Ramesh Motwani, Mikhail Baranchuk, Lewis Hammond, and Christian Schroeder de Witt. 2023. A perfect collusion benchmark: How can AI agents be prevented from colluding with information-theoretic undetectability?. In Proceedings of the NeurIPS Workshop."},{"key":"e_1_3_1_124_2","doi-asserted-by":"crossref","unstructured":"Hichem Mrabet Sana Belguith Adeeb Alhomoud and Abderrazak Jemai. 2020. A survey of IoT security based on a layered architecture of sensing and data analysis. Sensors 20 13 (2020) 3625.","DOI":"10.3390\/s20133625"},{"key":"e_1_3_1_125_2","volume-title":"Proceedings of the EACL","author":"Muhlgay Dor","year":"2024","unstructured":"Dor Muhlgay, Ori Ram, Inbal Magar, Yoav Levine, Nir Ratner, Yonatan Belinkov, Omri Abend, Kevin Leyton-Brown, Amnon Shashua, and Yoav Shoham. 2024. Generating benchmarks for factuality evaluation of language models. In Proceedings of the EACL."},{"key":"e_1_3_1_126_2","doi-asserted-by":"crossref","unstructured":"Varun Nair Elliot Schumacher Geoffrey Tso and Anitha Kannan. 2023. DERA: Enhancing large language model completions with dialog-enabled resolving agents. Retrieved from https:\/\/arxiv.org\/pdf\/2303.17071","DOI":"10.18653\/v1\/2024.clinicalnlp-1.12"},{"key":"e_1_3_1_127_2","volume-title":"Yohei\u2019s Blog Post","author":"Nakajima Yohei","year":"2022","unstructured":"Yohei Nakajima. 2022. Yohei\u2019s Blog Post. Retrieved from https:\/\/twitter.com\/yoheinakajima\/status\/1582844144640471040"},{"key":"e_1_3_1_128_2","unstructured":"Tai Nguyen and Eric Wong. 2023. In-context example selection with influences. Retrieved from https:\/\/arxiv.org\/pdf\/2302.11042"},{"key":"e_1_3_1_129_2","unstructured":"Aidan O\u2019Gara. 2023. Hoodwinked: Deception and cooperation in a text-based game for language models. Retrieved from https:\/\/arxiv.org\/pdf\/2308.01404"},{"key":"e_1_3_1_130_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.329"},{"key":"e_1_3_1_131_2","article-title":"Training language models to follow instructions with human feedback","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray et\u00a0al. 2022. Training language models to follow instructions with human feedback. NeurIPS.","journal-title":"NeurIPS"},{"key":"e_1_3_1_132_2","doi-asserted-by":"crossref","unstructured":"James Jie Pan Jianguo Wang and Guoliang Li. 2024. Survey of vector database management systems. The VLDB Journal 33 5 (2024) 1591\u20131615.","DOI":"10.1007\/s00778-024-00864-x"},{"key":"e_1_3_1_133_2","unstructured":"Yikang Pan Liangming Pan Wenhu Chen Preslav Nakov Min-Yen Kan and William Yang Wang. 2023. On the risk of misinformation pollution with large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2305.13661"},{"key":"e_1_3_1_134_2","doi-asserted-by":"publisher","DOI":"10.1145\/3586183.3606763"},{"key":"e_1_3_1_135_2","doi-asserted-by":"crossref","unstructured":"Peter S. Park Simon Goldstein Aidan O\u2019Gara Michael Chen and Dan Hendrycks. 2024. AI deception: A survey of examples risks and potential solutions. Patterns 5 5 (2024).","DOI":"10.1016\/j.patter.2024.100988"},{"key":"e_1_3_1_136_2","unstructured":"Rodrigo Pedro Daniel Castro Paulo Carreira and Nuno Santos. 2023. From prompt injections to SQL injection attacks: How protected is your LLM-integrated web application? Retrieved from https:\/\/arxiv.org\/pdf\/2308.01990"},{"key":"e_1_3_1_137_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.847"},{"key":"e_1_3_1_138_2","article-title":"Ignore previous prompt: Attack techniques for language models","author":"Perez F\u00e1bio","year":"2022","unstructured":"F\u00e1bio Perez and Ian Ribeiro. 2022. Ignore previous prompt: Attack techniques for language models. NeurIPS (2022).","journal-title":"NeurIPS"},{"key":"e_1_3_1_139_2","article-title":"Ignore previous prompt: Attack techniques for language models","author":"Perez F\u00e1bio","year":"2022","unstructured":"F\u00e1bio Perez and Ian Ribeiro. 2022. Ignore previous prompt: Attack techniques for language models. Retrieved from https:\/\/arXiv:2211.09527","journal-title":"Retrieved from https:\/\/arXiv:2211.09527"},{"key":"e_1_3_1_140_2","unstructured":"Steve Phelps and Rebecca Ranson. 2023. Of Models and Tin Men\u2013a behavioural economics study of principal-agent problems in AI alignment using large-language models. Retrieved from https:\/\/arxiv.org\/pdf\/2307.11137"},{"key":"e_1_3_1_141_2","unstructured":"Lukas P\u00f6hler Valentin Schrader Alexander Ladwein and Florian von Keller. 2024. A technological perspective on misuse of available AI. Retrieved from https:\/\/arxiv.org\/pdf\/2403.15325"},{"key":"e_1_3_1_142_2","doi-asserted-by":"crossref","unstructured":"Harsha Putla Chanakya Patibandla Krishna Pratap Singh and P. Nagabhushan. 2024. A pilot study of observation poisoning on selective reincarnation in multi-agent reinforcement learning. Neural Processing Letters 56 3 (2024) 161.","DOI":"10.1007\/s11063-024-11625-w"},{"key":"e_1_3_1_143_2","unstructured":"Chen Qian Xin Cong Cheng Yang Weize Chen Yusheng Su Juyuan Xu Zhiyuan Liu and Maosong Sun. 2023. Communicative agents for software development. Retrieved from https:\/\/arxiv.org\/pdf\/2307.07924"},{"key":"e_1_3_1_144_2","doi-asserted-by":"crossref","unstructured":"Francisco Quiroga Gabriel Hermosilla German Varas Francisco Alonso and Karla Schr\u00f6der. 2024. RL-Based Sim2Real Enhancements for Autonomous Beach-Cleaning Agents. Applied Sciences 14 11 (2024) 4602.","DOI":"10.3390\/app14114602"},{"key":"e_1_3_1_145_2","unstructured":"Jack W. Rae Sebastian Borgeaud Trevor Cai Katie Millican Jordan Hoffmann Francis Song John Aslanides Sarah Henderson Roman Ring Susannah Young et\u00a0al. 2021. Scaling language models: Methods analysis & insights from training Gopher. Retrieved from https:\/\/arxiv.org\/pdf\/2112.11446"},{"key":"e_1_3_1_146_2","unstructured":"Fathima Abdul Rahman and Guang Lu. 2023. A contextualized real-time multimodal emotion recognition for conversational agents using graph convolutional networks in reinforcement learning. Retrieved from https:\/\/arxiv.org\/pdf\/2310.18363"},{"key":"e_1_3_1_147_2","unstructured":"Leonardo Ranaldi and Giulia Pucci. 2023. When large language models contradict humans? Large language models\u2019 sycophantic behaviour. Retrieved from https:\/\/arxiv.org\/pdf\/2311.09410"},{"key":"e_1_3_1_148_2","article-title":"Indirect Prompt Injection via YouTube Transcripts  \\(\\cdot\\)  Embrace The Red\u2014embracethered.com","author":"Red Embrace The","year":"2023","unstructured":"Embrace The Red. 2023. Indirect Prompt Injection via YouTube Transcripts \\(\\cdot\\) Embrace The Red\u2014embracethered.com. Retrieved from https:\/\/embracethered.com\/blog\/posts\/2023\/chatgpt-plugin-youtube-indirect-prompt-injection\/","journal-title":"R"},{"key":"e_1_3_1_149_2","unstructured":"Alexander Robey Zachary Ravichandran Vijay Kumar Hamed Hassani and George J Pappas. 2024. Jailbreaking LLM-controlled robots. Retrieved from https:\/\/arxiv.org\/pdf\/2410.13691"},{"key":"e_1_3_1_150_2","unstructured":"Sippo Rossi Alisia Marianne Michel Raghava Rao Mukkamala and Jason Bennett Thatcher. 2024. An early categorization of prompt injection attacks on large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2402.00898"},{"key":"e_1_3_1_151_2","volume-title":"Proceedings of the ICLR","author":"Ruan Yangjun","year":"2024","unstructured":"Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the risks of LM agents with an LM-emulated sandbox. In Proceedings of the ICLR."},{"key":"e_1_3_1_152_2","unstructured":"Ahmed Salem Andrew Paverd and Boris K\u00f6pf. 2023. Maatphor: Automated variant analysis for prompt injection attacks. Retrieved from https:\/\/arxiv.org\/pdf\/2312.11513"},{"key":"e_1_3_1_153_2","unstructured":"Sergei Savvov. 2023. Fixing hallucinations in LLMs. Retrieved from https:\/\/betterprogramming.pub\/fixing-hallucinations-in-llms-9ff0fd438e33"},{"key":"e_1_3_1_154_2","article-title":"Toolformer: Language models can teach themselves to use tools","author":"Schick Timo","year":"2024","unstructured":"Timo Schick, Jane Dwivedi-Yu, Roberto Dess\u00ec, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2024. Toolformer: Language models can teach themselves to use tools. NeurIPS.","journal-title":"NeurIPS"},{"key":"e_1_3_1_155_2","unstructured":"Leo Schwinn David Dobre Stephan G\u00fcnnemann and Gauthier Gidel. 2023. Adversarial attacks and defenses in large language models: Old and new threats. Retrieved from https:\/\/arxiv.org\/pdf\/2310.19737"},{"key":"e_1_3_1_156_2","unstructured":"Jose Selvi. 2022. Exploring Prompt Injection Attacks. Retrieved from https:\/\/nccgroup.com\/au\/research-blog\/exploring-prompt-injection-attacks\/"},{"key":"e_1_3_1_157_2","volume-title":"Proceedings of the ACL","author":"Shaikh Omar","year":"2023","unstructured":"Omar Shaikh, Hongxin Zhang, William Held, Michael Bernstein, and Diyi Yang. 2023. On second thought, let\u2019s not think step by step! Bias and toxicity in zero-shot reasoning. In Proceedings of the ACL."},{"key":"e_1_3_1_158_2","unstructured":"Mrinank Sharma Meg Tong Tomasz Korbak David Duvenaud Amanda Askell Samuel R. Bowman Newton Cheng Esin Durmus Zac Hatfield-Dodds Scott R. Johnston et\u00a0al. 2023. Towards understanding sycophancy in language models. Retrieved from https:\/\/arxiv.org\/pdf\/2310.13548"},{"key":"e_1_3_1_159_2","doi-asserted-by":"crossref","unstructured":"Xinyue Shen Zeyuan Chen Michael Backes Yun Shen and Yang Zhang. 2024. \u201cdo anything now\u201d: Characterizing and evaluating in-the-wild jailbreak prompts on large language models. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 1671\u20131685.","DOI":"10.1145\/3658644.3670388"},{"key":"e_1_3_1_160_2","unstructured":"Xinyue Shen Yixin Wu Michael Backes and Yang Zhang. 2024. Voice jailbreak attacks against GPT-4o. Retrieved from https:\/\/arxiv.org\/abs\/2405.19103"},{"key":"e_1_3_1_161_2","article-title":"Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face","author":"Shen Yongliang","year":"2024","unstructured":"Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. 2024. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. NeurIPS.","journal-title":"NeurIPS"},{"key":"e_1_3_1_162_2","doi-asserted-by":"crossref","unstructured":"Chuan Sheng Wanlun Ma Qing-Long Han Wei Zhou Xiaogang Zhu Sheng Wen Yang Xiang and Fei-Yue Wang. 2024. Pager explosion: Cybersecurity insights and afterthoughts. IEEE\/CAA Journal of Automatica Sinica 11 12 (2024) 2359\u20132362.","DOI":"10.1109\/JAS.2024.125034"},{"key":"e_1_3_1_163_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.320"},{"key":"e_1_3_1_164_2","unstructured":"Emily H. Soice Rafael Rocha Kimberlee Cordova Michael Specter and Kevin M. Esvelt. 2023. Can large language models democratize access to dual-use biotechnology? Retrieved from https:\/\/arxiv.org\/pdf\/2306.03809"},{"key":"e_1_3_1_165_2","doi-asserted-by":"publisher","DOI":"10.1145\/3372297.3417270"},{"key":"e_1_3_1_166_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00280"},{"key":"e_1_3_1_167_2","article-title":"Mariogpt: Open-ended text2level generation through large language models","author":"Sudhakaran Shyam","year":"2024","unstructured":"Shyam Sudhakaran, Miguel Gonz\u00e1lez-Duque, Matthias Freiberger, Claire Glanois, Elias Najarro, and Sebastian Risi. 2024. Mariogpt: Open-ended text2level generation through large language models. NeurIPS.","journal-title":"NeurIPS"},{"key":"e_1_3_1_168_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i11.26596"},{"key":"e_1_3_1_169_2","unstructured":"Yuxiang Sun Checheng Yu Junjie Zhao Wei Wang and Xianzhong Zhou. 2023. Self generated wargame AI: Double layer agent task planning based on large language model. Retrieved from https:\/\/arxiv.org\/pdf\/2312.01090"},{"key":"e_1_3_1_170_2","unstructured":"Harini Suresh and John V. Guttag. 2019. A framework for understanding unintended consequences of machine learning. Retrieved from https:\/\/arxiv.org\/pdf\/1901.10002"},{"key":"e_1_3_1_171_2","doi-asserted-by":"crossref","unstructured":"Gaurav Suri Lily R. Slater Ali Ziaee and Morgan Nguyen. 2024. Do large language models show decision heuristics similar to humans? A case study using GPT-3.5. Journal of Experimental Psychology: General 153 4 (2024) 1066\u20131075.","DOI":"10.1037\/xge0001547"},{"key":"e_1_3_1_172_2","volume-title":"Proceedings of the ICLR","author":"Tan Weihao","year":"2024","unstructured":"Weihao Tan, Wentao Zhang, Shanqi Liu, Longtao Zheng, Xinrun Wang, and Bo An. 2024. True knowledge comes from practice: Aligning large language models with embodied environments via reinforcement learning. In Proceedings of the ICLR."},{"key":"e_1_3_1_173_2","unstructured":"Xiangru Tang Qiao Jin Kunlun Zhu Tongxin Yuan Yichi Zhang Wangchunshu Zhou Meng Qu Yilun Zhao Jian Tang Zhuosheng Zhang et\u00a0al. 2024. Prioritizing safeguarding over autonomy: Risks of LLM agents for science. Retrieved from https:\/\/arxiv.org\/pdf\/2402.04247"},{"key":"e_1_3_1_174_2","volume-title":"Proceedings of the Multi-Agent Security Workshop @ NeurIPS\u201923","author":"Terekhov Mikhail","year":"2023","unstructured":"Mikhail Terekhov, Romain Graux, Eduardo Neville, Denis Rosset, and Gabin Kolly. 2023. Second-order Jailbreaks: Generative agents successfully manipulate through an intermediary. In Proceedings of the Multi-Agent Security Workshop @ NeurIPS\u201923. Retrieved from https:\/\/openreview.net\/forum?id=HPmhaOTseN"},{"key":"e_1_3_1_175_2","unstructured":"Yu Tian Xiao Yang Jingyuan Zhang Yinpeng Dong and Hang Su. 2023. Evil geniuses: Delving into the safety of llm-based agents. Retrieved from https:\/\/arxiv.org\/pdf\/2311.11855"},{"key":"e_1_3_1_176_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. Retrieved from https:\/\/arxiv.org\/pdf\/2302.13971"},{"key":"e_1_3_1_177_2","volume-title":"Proceedings of the ICLR","author":"Toyer Sam","year":"2024","unstructured":"Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, and Stuart Russell. 2024. Tensor trust: Interpretable prompt injection attacks from an online game. In Proceedings of the ICLR."},{"key":"e_1_3_1_178_2","doi-asserted-by":"crossref","unstructured":"Dennis Ulmer Elman Mansimov Kaixiang Lin Justin Sun Xibin Gao and Yi Zhang. 2024. Bootstrapping LLM-based task-oriented dialogue agents via self-talk. Retrieved from https:\/\/arxiv.org\/pdf\/2401.05033","DOI":"10.18653\/v1\/2024.findings-acl.566"},{"key":"e_1_3_1_179_2","article-title":"Attention is all you need","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. NeurIPS.","journal-title":"NeurIPS"},{"key":"e_1_3_1_180_2","unstructured":"Bertie Vidgen Adarsh Agrawal Ahmed M. Ahmed Victor Akinwande Namir Al-Nuaimi Najla Alfaraj Elie Alhajjar Lora Aroyo Trupti Bavalatti Borhane Blili-Hamelin et\u00a0al. 2024. Introducing v0. 5 of the AI safety benchmark from MLCommons. Retrieved from https:\/\/arxiv.org\/pdf\/2404.12241"},{"key":"e_1_3_1_181_2","unstructured":"Celine Wald and Lukas Pfahler. 2023. Exposing bias in online communities through large-scale language models. Retrieved from https:\/\/arxiv.org\/pdf\/2306.02294"},{"key":"e_1_3_1_182_2","unstructured":"Eric Wallace Kai Xiao Reimar Leike Lilian Weng Johannes Heidecke and Alex Beutel. 2024. The instruction hierarchy: Training LLMs to prioritize privileged instructions. Retrieved from https:\/\/arxiv.org\/pdf\/2404.13208"},{"key":"e_1_3_1_183_2","volume-title":"Proceedings of the ICML","author":"Wan Alexander","year":"2023","unstructured":"Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. 2023. Poisoning language models during instruction tuning. In Proceedings of the ICML."},{"key":"e_1_3_1_184_2","volume-title":"Proceedings of the NeurIPS","author":"Wang Boxin","year":"2023","unstructured":"Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, et al.. 2023. DecodingTrust: A comprehensive assessment of trustworthiness in GPT models. In Proceedings of the NeurIPS."},{"key":"e_1_3_1_185_2","unstructured":"Fengxiang Wang Ranjie Duan Peng Xiao Xiaojun Jia YueFeng Chen Chongwen Wang Jialing Tao Hang Su Jun Zhu and Hui Xue. 2024. MRJ-Agent: An effective jailbreak agent for multi-round dialogue. Retrieved from https:\/\/arxiv.org\/pdf\/2411.03814"},{"key":"e_1_3_1_186_2","article-title":"Self-consistency improves chain of thought reasoning in language models","author":"Wang Xuezhi","year":"2023","unstructured":"Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-consistency improves chain of thought reasoning in language models. In Proceedings of the ICLR.","journal-title":"Proceedings of the ICLR"},{"key":"e_1_3_1_187_2","doi-asserted-by":"crossref","unstructured":"Yuntao Wang Yanghe Pan Miao Yan Zhou Su and Tom H. Luan. 2023. A survey on ChatGPT: AI-generated contents challenges and solutions. IEEE Open Journal of the Computer Society 4 (2023) 280\u2013302.","DOI":"10.1109\/OJCS.2023.3300321"},{"key":"e_1_3_1_188_2","article-title":"Toxicity detection with generative prompt-based inference","author":"Wang Yau-Shian","year":"2022","unstructured":"Yau-Shian Wang and Yingshan Chang. 2022. Toxicity detection with generative prompt-based inference. arXiv (2022).","journal-title":"arXiv"},{"key":"e_1_3_1_189_2","doi-asserted-by":"publisher","DOI":"10.1145\/3627106.3627122"},{"key":"e_1_3_1_190_2","unstructured":"Jerry Wei Da Huang Yifeng Lu Denny Zhou and Quoc V. Le. 2023. Simple synthetic data reduces sycophancy in large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2308.03958"},{"key":"e_1_3_1_191_2","article-title":"Chain-of-thought prompting elicits reasoning in large language models","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou et\u00a0al. 2022. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS.","journal-title":"NeurIPS"},{"key":"e_1_3_1_192_2","unstructured":"Jerry Wei Chengrun Yang Xinying Song Yifeng Lu Nathan Hu Dustin Tran Daiyi Peng Ruibo Liu Da Huang Cosmo Du et\u00a0al. 2024. Long-form factuality in large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2403.18802"},{"key":"e_1_3_1_193_2","unstructured":"Zeming Wei Yifei Wang and Yisen Wang. 2023. Jailbreak and guard aligned language models with only few in-context demonstrations. Retrieved from https:\/\/arxiv.org\/pdf\/2310.06387"},{"key":"e_1_3_1_194_2","unstructured":"Roy Weiss Daniel Ayzenshteyn Guy Amit and Yisroel Mirsky. 2024. What was your prompt? A remote keylogging attack on AI assistants. Retrieved from https:\/\/arxiv.org\/pdf\/2403.09751"},{"key":"e_1_3_1_195_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.210"},{"key":"e_1_3_1_196_2","unstructured":"Jules White Quchen Fu Sam Hays Michael Sandborn Carlos Olea Henry Gilbert Ashraf Elnashar Jesse Spencer-Smith and Douglas C. Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. Retrieved from https:\/\/arxiv.org\/pdf\/2302.11382"},{"key":"e_1_3_1_197_2","unstructured":"Simon Willison. 2023. Delimiters won\u2019t save you from prompt injection. Retrieved from https:\/\/simonwillison.net\/2023\/May\/11\/delimiters-wont-save-you\/"},{"key":"e_1_3_1_198_2","doi-asserted-by":"crossref","unstructured":"David Windridge Henrik Svensson and Serge Thill. 2021. On the utility of dreaming: A general model for how learning in artificial agents can benefit from data hallucination. Adaptive Behavior 29 3 (2021) 267\u2013280.","DOI":"10.1177\/1059712319896489"},{"key":"e_1_3_1_199_2","unstructured":"Yotam Wolf Noam Wies Yoav Levine and Amnon Shashua. 2023. Fundamental limitations of alignment in large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2304.11082"},{"key":"e_1_3_1_200_2","unstructured":"Chen Henry Wu Jing Yu Koh Ruslan Salakhutdinov Daniel Fried and Aditi Raghunathan. 2024. Adversarial attacks on multimodal agents. Retrieved from https:\/\/arxiv.org\/pdf\/2406.12814v1"},{"key":"e_1_3_1_201_2","unstructured":"Fangzhou Wu Shutong Wu Yulong Cao and Chaowei Xiao. 2024. WIPI: A new web threat for LLM-driven web agents. Retrieved from https:\/\/arxiv.org\/pdf\/2402.16965"},{"key":"e_1_3_1_202_2","unstructured":"Fangzhou Wu Ning Zhang Somesh Jha Patrick McDaniel and Chaowei Xiao. 2024. A new era in LLM security: Exploring security concerns in real-world LLM-based systems. Retrieved from https:\/\/arxiv.org\/pdf\/2402.18649"},{"key":"e_1_3_1_203_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i9.26240"},{"key":"e_1_3_1_204_2","unstructured":"Zhiheng Xi Wenxiang Chen Xin Guo Wei He Yiwen Ding Boyang Hong Ming Zhang Junzhe Wang Senjie Jin Enyu Zhou et\u00a0al. 2023. The rise and potential of large language model based agents: A survey. Retrieved from https:\/\/arxiv.org\/pdf\/2309.07864"},{"key":"e_1_3_1_205_2","unstructured":"Junlin Xie Zhihong Chen Ruifei Zhang Xiang Wan and Guanbin Li. 2024. Large multimodal agents: A survey. Retrieved from https:\/\/arxiv.org\/pdf\/2402.15116"},{"key":"e_1_3_1_206_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigDIA60676.2023.10429609"},{"key":"e_1_3_1_207_2","unstructured":"Frank Xing. 2024. Designing heterogeneous LLM agents for financial sentiment analysis. Retrieved from https:\/\/arxiv.org\/pdf\/2401.05799"},{"key":"e_1_3_1_208_2","unstructured":"Binfeng Xu Zhiyuan Peng Bowen Lei Subhabrata Mukherjee Yuchen Liu and Dongkuan Xu. 2023. Rewoo: Decoupling reasoning from observations for efficient augmented language models. Retrieved from https:\/\/arxiv.org\/pdf\/2305.18323"},{"key":"e_1_3_1_209_2","unstructured":"Yuzhuang Xu Shuo Wang Peng Li Fuwen Luo Xiaolong Wang Weidong Liu and Yang Liu. 2023. Exploring large language models for communication games: An empirical study on werewolf. Retrieved from https:\/\/arxiv.org\/pdf\/2309.04658"},{"key":"e_1_3_1_210_2","unstructured":"Zelai Xu Chao Yu Fei Fang Yu Wang and Yi Wu. 2023. Language agents with reinforcement learning for strategic play in the werewolf game. Retrieved from https:\/\/arxiv.org\/pdf\/2309.04658"},{"key":"e_1_3_1_211_2","unstructured":"Xue Yan Yan Song Xinyu Cui Filippos Christianos Haifeng Zhang David Henry Mguni and Jun Wang. 2023. Ask more know better: Reinforce-learned prompt questions for decision making with large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2310.18127"},{"key":"e_1_3_1_212_2","unstructured":"Wenkai Yang Xiaohan Bi Yankai Lin Sishuo Chen Jie Zhou and Xu Sun. 2024. Watch out for your agents! Investigating backdoor threats to LLM-Based agents. Retrieved from https:\/\/arxiv.org\/pdf\/2402.11208"},{"key":"e_1_3_1_213_2","unstructured":"Yong Yang Xuhong Zhang Yi Jiang Xi Chen Haoyu Wang Shouling Ji and Zonghui Wang. 2024. PRSA: Prompt reverse stealing attacks against large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2402.19200"},{"key":"e_1_3_1_214_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP48485.2024.10448041"},{"key":"e_1_3_1_215_2","article-title":"Tree of thoughts: Deliberate problem solving with large language models","author":"Yao Shunyu","year":"2024","unstructured":"Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2024. Tree of thoughts: Deliberate problem solving with large language models. NeurIPS (2024).","journal-title":"NeurIPS"},{"key":"e_1_3_1_216_2","doi-asserted-by":"crossref","unstructured":"Yifan Yao Jinhao Duan Kaidi Xu Yuanfang Cai Zhibo Sun and Yue Zhang. 2024. A survey on large language model (llm) security and privacy: The good the bad and the ugly. High-Confidence Computing (2024) 100211.","DOI":"10.1016\/j.hcc.2024.100211"},{"key":"e_1_3_1_217_2","unstructured":"Jingwei Yi Yueqi Xie Bin Zhu Keegan Hines Emre Kiciman Guangzhong Sun Xing Xie and Fangzhao Wu. 2023. Benchmarking and defending against indirect prompt injection attacks on large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2312.14197"},{"key":"e_1_3_1_218_2","unstructured":"Jiahao Yu Xingwei Lin and Xinyu Xing. 2023. Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts. Retrieved from https:\/\/arxiv.org\/pdf\/2309.10253"},{"key":"e_1_3_1_219_2","unstructured":"Zhiyuan Yu Xiaogeng Liu Shunning Liang Zach Cameron Chaowei Xiao and Ning Zhang. 2024. Don\u2019t listen to me: Understanding and exploring jailbreak prompts of large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2403.17336"},{"key":"e_1_3_1_220_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.79"},{"key":"e_1_3_1_221_2","doi-asserted-by":"crossref","unstructured":"Shenglai Zeng Jiankun Zhang Pengfei He Yue Xing Yiding Liu Han Xu Jie Ren Shuaiqiang Wang Dawei Yin Yi Chang et\u00a0al. 2024. The good and the bad: Exploring privacy issues in retrieval-augmented generation (RAG). Retrieved from https:\/\/arxiv.org\/pdf\/2402.16893","DOI":"10.18653\/v1\/2024.findings-acl.267"},{"key":"e_1_3_1_222_2","unstructured":"Yifan Zeng Yiran Wu Xiao Zhang Huazheng Wang and Qingyun Wu. 2024. AutoDefense: Multi-agent LLM defense against jailbreak attacks. Retrieved from https:\/\/arxiv.org\/pdf\/2403.04783"},{"key":"e_1_3_1_223_2","doi-asserted-by":"crossref","unstructured":"Qiusi Zhan Zhixiang Liang Zifan Ying and Daniel Kang. 2024. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. Retrieved from https:\/\/arxiv.org\/pdf\/2403.02691","DOI":"10.18653\/v1\/2024.findings-acl.624"},{"key":"e_1_3_1_224_2","volume-title":"Proceedings of the NeurIPS Workshop","author":"Zhang Hongxin","year":"2023","unstructured":"Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua Tenenbaum, Tianmin Shu, and Chuang Gan. 2023. Building cooperative embodied agents modularly with large language models. In Proceedings of the NeurIPS Workshop."},{"key":"e_1_3_1_225_2","unstructured":"Wanpeng Zhang and Zongqing Lu. 2023. Rladapter: Bridging large language models to reinforcement learning in open worlds. Retrieved from https:\/\/arxiv.org\/pdf\/2309.17176"},{"key":"e_1_3_1_226_2","doi-asserted-by":"crossref","unstructured":"Xinyu Zhang Huiyu Xu Zhongjie Ba Zhibo Wang Yuan Hong Jian Liu Zhan Qin and Kui Ren. 2024. PrivacyAsst: Safeguarding user privacy in tool-using large language model agents. IEEE Transactions on Dependable and Secure Computing 21 6 (2024) 5242\u20135258.","DOI":"10.1109\/TDSC.2024.3372777"},{"key":"e_1_3_1_227_2","unstructured":"Yiming Zhang Nicholas Carlini and Daphne Ippolito. 2024. Effective prompt extraction from language models. Retrieved from https:\/\/arxiv.org\/pdf\/2307.06865"},{"key":"e_1_3_1_228_2","unstructured":"Yuqi Zhang Liang Ding Lefei Zhang and Dacheng Tao. 2024. Intention analysis prompting makes large language models a good jailbreak defender. Retrieved from https:\/\/arxiv.org\/pdf\/2401.06561"},{"key":"e_1_3_1_229_2","unstructured":"Yue Zhang Yafu Li Leyang Cui Deng Cai Lemao Liu Tingchen Fu Xinting Huang Enbo Zhao Yu Zhang Yulong Chen et\u00a0al. 2023. Siren\u2019s song in the AI ocean: A survey on hallucination in large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2309.01219"},{"key":"e_1_3_1_230_2","unstructured":"Zeyu Zhang Xiaohe Bo Chen Ma Rui Li Xu Chen Quanyu Dai Jieming Zhu Zhenhua Dong and Ji-Rong Wen. 2024. A survey on the memory mechanism of large language model based agents. Retrieved from https:\/\/arxiv.org\/pdf\/2404.13501"},{"key":"e_1_3_1_231_2","unstructured":"Zhexin Zhang Junxiao Yang Pei Ke and Minlie Huang. 2023. Defending large language models against jailbreaking attacks through goal prioritization. Retrieved from https:\/\/arxiv.org\/pdf\/2311.09096"},{"key":"e_1_3_1_232_2","unstructured":"Qinlin Zhao Jindong Wang Yixuan Zhang Yiqiao Jin Kaijie Zhu Hao Chen and Xing Xie. 2023. Competeai: Understanding the competition behaviors in large language model-based agents. Retrieved from https:\/\/arxiv.org\/pdf\/2310.17512"},{"key":"e_1_3_1_233_2","doi-asserted-by":"crossref","unstructured":"Wei Zhou Xiaogang Zhu Qing-Long Han Lin Li Xiao Chen Sheng Wen and Yang Xiang. 2024. The security of using large language models: A survey with emphasis on ChatGPT. IEEE\/CAA Journal of Automatica Sinica 12 1 (2024) 1\u201326.","DOI":"10.1109\/JAS.2024.124983"},{"key":"e_1_3_1_234_2","volume-title":"Proceedings of the ICLR","author":"Zhou Yiyang","year":"2024","unstructured":"Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, and Huaxiu Yao. 2024. Analyzing and mitigating object hallucination in large vision-language models. In Proceedings of the ICLR."},{"key":"e_1_3_1_235_2","doi-asserted-by":"crossref","unstructured":"Xiaogang Zhu Wei Zhou Qing-Long Han Wanlun Ma Sheng Wen and Yang Xiang. 2025. When software security meets large language models: A survey. IEEE\/CAA Journal of Automatica Sinica 12 2 (2024) 317\u2013334.","DOI":"10.1109\/JAS.2024.124971"},{"key":"e_1_3_1_236_2","unstructured":"Wei Zou Runpeng Geng Binghui Wang and Jinyuan Jia. 2024. PoisonedRAG: Knowledge poisoning attacks to retrieval-augmented generation of large language models. Retrieved from https:\/\/arxiv.org\/pdf\/2402.07867"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716628","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3716628","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:19:11Z","timestamp":1750295951000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716628"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,21]]},"references-count":235,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,31]]}},"alternative-id":["10.1145\/3716628"],"URL":"https:\/\/doi.org\/10.1145\/3716628","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,21]]},"assertion":[{"value":"2024-06-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-03","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}