{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,11]],"date-time":"2026-07-11T17:21:02Z","timestamp":1783790462857,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CMMI-2038215, CNS-2321532, 2112562"],"award-info":[{"award-number":["CMMI-2038215, CNS-2321532, 2112562"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,13]]},"DOI":"10.1145\/3733800.3763264","type":"proceedings-article","created":{"date-parts":[[2025,12,1]],"date-time":"2025-12-01T10:39:10Z","timestamp":1764585550000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["LLM Safeguard is a Double-Edged Sword: Exploiting False Positives for Denial-of-Service Attacks"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2598-5988","authenticated-orcid":false,"given":"Qingzhao","family":"Zhang","sequence":"first","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4172-123X","authenticated-orcid":false,"given":"Ziyang","family":"Xiong","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9844-2055","authenticated-orcid":false,"given":"Morley","family":"Mao","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,12]]},"reference":[{"key":"e_1_3_3_2_2_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia\u00a0Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et\u00a0al. 2023. Gpt-4 technical report. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2303.08774 (2023)."},{"key":"e_1_3_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Rana Alabdan. 2020. Phishing attacks survey: Types vectors and technical approaches. Future internet 12 10 (2020) 168.","DOI":"10.3390\/fi12100168"},{"key":"e_1_3_3_2_4_2","unstructured":"Rishabh Bhardwaj and Soujanya Poria. 2023. Red-teaming large language models using chain of utterances for safety-alignment. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.09662 (2023)."},{"key":"e_1_3_3_2_5_2","unstructured":"Federico Bianchi Mirac Suzgun Giuseppe Attanasio Paul R\u00f6ttger Dan Jurafsky Tatsunori Hashimoto and James Zou. 2023. Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.07875 (2023)."},{"key":"e_1_3_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP54263.2024.00179"},{"key":"e_1_3_3_2_7_2","unstructured":"Patrick Chao Alexander Robey Edgar Dobriban Hamed Hassani George\u00a0J Pappas and Eric Wong. 2023. Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.08419 (2023)."},{"key":"e_1_3_3_2_8_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde De\u00a0Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et\u00a0al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2107.03374 (2021)."},{"key":"e_1_3_3_2_9_2","unstructured":"Wei-Lin Chiang Zhuohan Li Zi Lin Ying Sheng Zhanghao Wu Hao Zhang Lianmin Zheng Siyuan Zhuang Yonghao Zhuang Joseph\u00a0E Gonzalez et\u00a0al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https:\/\/vicuna. lmsys. org (accessed 14 April 2023) 2 3 (2023) 6."},{"key":"e_1_3_3_2_10_2","doi-asserted-by":"crossref","unstructured":"Edoardo Debenedetti Jie Zhang Mislav Balunovic Luca Beurer-Kellner Marc Fischer and Florian Tram\u00e8r. 2024. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents. Advances in Neural Information Processing Systems 37 (2024) 82895\u201382920.","DOI":"10.52202\/079017-2636"},{"key":"e_1_3_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Zhichen Dong Zhanhui Zhou Chao Yang Jing Shao and Yu Qiao. 2024. Attacks defenses and evaluations for llm conversation safety: A survey. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.09283 (2024).","DOI":"10.18653\/v1\/2024.naacl-long.375"},{"key":"e_1_3_3_2_12_2","unstructured":"Kuofeng Gao Tianyu Pang Chao Du Yong Yang Shu-Tao Xia and Min Lin. 2024. Denial-of-service poisoning attacks against large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.10760 (2024)."},{"key":"e_1_3_3_2_13_2","unstructured":"Suyu Ge Chunting Zhou Rui Hou Madian Khabsa Yi-Chia Wang Qifan Wang Jiawei Han and Yuning Mao. 2023. Mart: Improving llm safety with multi-round automatic red-teaming. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.07689 (2023)."},{"key":"e_1_3_3_2_14_2","unstructured":"Zhengmian Hu Gang Wu Saayan Mitra Ruiyi Zhang Tong Sun Heng Huang and Vishy Swaminathan. 2023. Token-level adversarial prompt detection based on perplexity measures and contextual information. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.11509 (2023)."},{"key":"e_1_3_3_2_15_2","unstructured":"Hakan Inan Kartikeya Upasani Jianfeng Chi Rashi Rungta Krithika Iyer Yuning Mao Michael Tontchev Qing Hu Brian Fuller Davide Testuggine et\u00a0al. 2023. Llama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.06674 (2023)."},{"key":"e_1_3_3_2_16_2","unstructured":"Jiabao Ji Bairu Hou Alexander Robey George\u00a0J Pappas Hamed Hassani Yang Zhang Eric Wong and Shiyu Chang. 2024. Defending large language models against jailbreak attacks via semantic smoothing. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.16192 (2024)."},{"key":"e_1_3_3_2_17_2","first-page":"2","volume-title":"Proceedings of naacL-HLT","volume":"1","author":"Kenton Jacob Devlin Ming-Wei\u00a0Chang","year":"2019","unstructured":"Jacob Devlin Ming-Wei\u00a0Chang Kenton and Lee\u00a0Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol.\u00a01. 2."},{"key":"e_1_3_3_2_18_2","unstructured":"Mantas Mazeika Long Phan Xuwang Yin Andy Zou Zifan Wang Norman Mu Elham Sakhaee Nathaniel Li Steven Basart Bo Li et\u00a0al. 2024. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.04249 (2024)."},{"key":"e_1_3_3_2_19_2","unstructured":"MITRE Corporation. 2024. CVE-2024-3570. https:\/\/www.cve.org\/CVERecord?id=CVE-2024-3570. Accessed: 2024-09-15."},{"key":"e_1_3_3_2_20_2","unstructured":"MITRE Corporation. 2024. CVE-2024-4181. https:\/\/www.cve.org\/CVERecord?id=CVE-2024-4181. Accessed: 2024-09-15."},{"key":"e_1_3_3_2_21_2","unstructured":"MITRE Corporation. 2024. CVE-2024-5211. https:\/\/www.cve.org\/CVERecord?id=CVE-2024-5211. Accessed: 2024-09-15."},{"key":"e_1_3_3_2_22_2","unstructured":"MITRE Corporation. 2024. CVE-2024-5826. https:\/\/www.cve.org\/CVERecord?id=CVE-2024-5826. Accessed: 2024-09-15."},{"key":"e_1_3_3_2_23_2","unstructured":"OpenAI. 2024. Moderation API Overview. https:\/\/platform.openai.com\/docs\/guides\/moderation\/overview Accessed: 2024-09-15."},{"key":"e_1_3_3_2_24_2","doi-asserted-by":"crossref","unstructured":"Traian Rebedea Razvan Dinu Makesh Sreedhar Christopher Parisien and Jonathan Cohen. 2023. Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.10501 (2023).","DOI":"10.18653\/v1\/2023.emnlp-demo.40"},{"key":"e_1_3_3_2_25_2","doi-asserted-by":"crossref","unstructured":"N Reimers. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1908.10084 (2019).","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_3_2_26_2","unstructured":"Alexander Robey Eric Wong Hamed Hassani and George\u00a0J Pappas. 2023. Smoothllm: Defending large language models against jailbreaking attacks. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.03684 (2023)."},{"key":"e_1_3_3_2_27_2","doi-asserted-by":"crossref","unstructured":"Taylor Shin Yasaman Razeghi Robert\u00a0L Logan\u00a0IV Eric Wallace and Sameer Singh. 2020. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2010.15980 (2020).","DOI":"10.18653\/v1\/2020.emnlp-main.346"},{"key":"e_1_3_3_2_28_2","unstructured":"Nisan Stiennon Long Ouyang Jeffrey Wu Daniel Ziegler Ryan Lowe Chelsea Voss Alec Radford Dario Amodei and Paul\u00a0F Christiano. 2020. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020) 3008\u20133021."},{"key":"e_1_3_3_2_29_2","unstructured":"Yashar Talebirad and Amirhossein Nadiri. 2023. Multi-agent collaboration: Harnessing the power of intelligent llm agents. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2306.03314 (2023)."},{"key":"e_1_3_3_2_30_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2302.13971 (2023)."},{"key":"e_1_3_3_2_31_2","unstructured":"A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems (2017)."},{"key":"e_1_3_3_2_32_2","unstructured":"Eric\u00a0J. Wang. 2023. Alpaca-LoRA. https:\/\/github.com\/tloen\/alpaca-lora"},{"key":"e_1_3_3_2_33_2","doi-asserted-by":"crossref","unstructured":"Jiongxiao Wang Jiazhao Li Yiquan Li Xiangyu Qi Junjie Hu Sharon Li Patrick McDaniel Muhao Chen Bo Li and Chaowei Xiao. 2024. Backdooralign: Mitigating fine-tuning based jailbreak attack with backdoor enhanced safety alignment. Advances in Neural Information Processing Systems 37 (2024) 5210\u20135243.","DOI":"10.52202\/079017-0169"},{"key":"e_1_3_3_2_34_2","unstructured":"Xinyuan Wang Victor Shea-Jay Huang Renmiao Chen Hao Wang Chengwei Pan Lei Sha and Minlie Huang. 2024. BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.09804 (2024)."},{"key":"e_1_3_3_2_35_2","unstructured":"Fangzhou Wu Ning Zhang Somesh Jha Patrick McDaniel and Chaowei Xiao. 2024. A new era in llm security: Exploring security concerns in real-world llm-based systems. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.18649 (2024)."},{"key":"e_1_3_3_2_36_2","unstructured":"Zhiyuan Yu Xiaogeng Liu Shunning Liang Zach Cameron Chaowei Xiao and Ning Zhang. 2024. Don\u2019t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.17336 (2024)."},{"key":"e_1_3_3_2_37_2","unstructured":"Zhuowen Yuan Zidi Xiong Yi Zeng Ning Yu Ruoxi Jia Dawn Song and Bo Li. 2024. Rigorllm: Resilient guardrails for large language models against undesired content. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.13031 (2024)."},{"key":"e_1_3_3_2_38_2","unstructured":"Boyang Zhang Yicong Tan Yun Shen Ahmed Salem Michael Backes Savvas Zannettou and Yang Zhang. 2024. Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.20859 (2024)."},{"key":"e_1_3_3_2_39_2","unstructured":"Wanjun Zhong Ruixiang Cui Yiduo Guo Yaobo Liang Shuai Lu Yanlin Wang Amin Saied Weizhu Chen and Nan Duan. 2023. Agieval: A human-centric benchmark for evaluating foundation models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2304.06364 (2023)."},{"key":"e_1_3_3_2_40_2","unstructured":"Andy Zou Zifan Wang Nicholas Carlini Milad Nasr J\u00a0Zico Kolter and Matt Fredrikson. 2023. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.15043 (2023)."}],"event":{"name":"LAMPS '25: Proceedings of the 2025 Workshop on Large AI Systems and Models with Privacy and Security Analysis","location":"Taipei Taiwan","acronym":"LAMPS '25","sponsor":["SIGSAC ACM Special Interest Group on Security, Audit, and Control"]},"container-title":["Proceedings of the 2025 Workshop on Large AI Systems and Models with Privacy and Security Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3733800.3763264","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,1]],"date-time":"2025-12-01T10:39:22Z","timestamp":1764585562000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3733800.3763264"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,13]]},"references-count":39,"alternative-id":["10.1145\/3733800.3763264","10.1145\/3733800"],"URL":"https:\/\/doi.org\/10.1145\/3733800.3763264","relation":{},"subject":[],"published":{"date-parts":[[2025,10,13]]},"assertion":[{"value":"2025-12-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}