{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T16:04:26Z","timestamp":1781366666452,"version":"3.54.1"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGKDD Explor. Newsl."],"published-print":{"date-parts":[[2025,7,7]]},"abstract":"<jats:p>Jailbreak attacks represent one of the most sophisticated threats to the security of large language models (LLMs). To deal with such risks, we introduce an innovative framework that can help evaluate the effectiveness of jailbreak attacks on LLMs. Unlike traditional binary evaluations focusing solely on the robustness of LLMs, our method assesses the attacking prompts' effectiveness. We present two distinct evaluation frameworks: a coarse-grained evaluation and a fine-grained evaluation. Each framework uses a scoring range from 0 to 1, offering unique perspectives and allowing for the assessment of attack effectiveness in different scenarios. Additionally, we develop a comprehensive ground truth dataset specifically tailored for jailbreak prompts. This dataset is a crucial benchmark for our current study and provides a foundational resource for future research. By comparing with traditional evaluation methods, our study shows that the current results align with baseline metrics while offering a more nuanced and fine-grained assessment. It also helps identify potentially harmful attack prompts that might appear harmless in traditional evaluations. Overall, our work establishes a solid foundation for assessing a broader range of attack prompts in prompt injection.<\/jats:p>","DOI":"10.1145\/3748239.3748242","type":"journal-article","created":{"date-parts":[[2025,7,7]],"date-time":"2025-07-07T19:19:42Z","timestamp":1751915982000},"page":"10-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models"],"prefix":"10.1145","volume":"27","author":[{"given":"Dong","family":"Shu","sequence":"first","affiliation":[{"name":"Northwestern University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chong","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Liverpool"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mingyu","family":"Jin","sequence":"additional","affiliation":[{"name":"Rutgers University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zihao","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Liverpool"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lingyao","family":"Li","sequence":"additional","affiliation":[{"name":"University of South Florida"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,7,7]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Zheng et al., \"Prompt injection attack against llm-integrated applications,\" arXiv preprint arXiv:2306.05499","author":"Liu Y.","year":"2023","unstructured":"Y. Liu, G. Deng, Y. Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y. Liu, H. Wang, Y. Zheng et al., \"Prompt injection attack against llm-integrated applications,\" arXiv preprint arXiv:2306.05499, 2023."},{"key":"e_1_2_1_2_1","volume-title":"Multilingual jailbreak challenges in large language models,\" arXiv preprint arXiv:2310.06474","author":"Deng Y.","year":"2023","unstructured":"Y. Deng, W. Zhang, S. J. Pan, and L. Bing, \"Multilingual jailbreak challenges in large language models,\" arXiv preprint arXiv:2310.06474, 2023."},{"key":"e_1_2_1_3_1","unstructured":"P. Chao A. Robey E. Dobriban H. Hassani G. J. Pappas and E.Wong \"Jailbreaking black box large language models in twenty queries \" arXiv preprint arXiv:2310.08419 2023."},{"key":"e_1_2_1_4_1","volume-title":"Jailbreaking attack against multimodal large language model,\" arXiv preprint arXiv:2402.02309","author":"Niu Z.","year":"2024","unstructured":"Z. Niu, H. Ren, X. Gao, G. Hua, and R. Jin, \"Jailbreaking attack against multimodal large language model,\" arXiv preprint arXiv:2402.02309, 2024."},{"key":"e_1_2_1_5_1","volume-title":"Jailbreaking large language models against moderation guardrails via cipher characters,\" arXiv preprint arXiv:2405.20413","author":"Jin H.","year":"2024","unstructured":"H. Jin, A. Zhou, J. D. Menke, and H. Wang, \"Jailbreaking large language models against moderation guardrails via cipher characters,\" arXiv preprint arXiv:2405.20413, 2024."},{"key":"e_1_2_1_6_1","volume-title":"Defending large language models against jailbreak attacks via semantic smoothing,\" arXiv preprint arXiv:2402.16192","author":"Ji J.","year":"2024","unstructured":"J. Ji, B. Hou, A. Robey, G. J. Pappas, H. Hassani, Y. Zhang, E. Wong, and S. Chang, \"Defending large language models against jailbreak attacks via semantic smoothing,\" arXiv preprint arXiv:2402.16192, 2024. a.Robey, E.Wong, H. Hassani, and G. J. Pappas, \"Smoothllm: Defending large language models against jailbreaking attacks,\" 2023."},{"key":"e_1_2_1_7_1","volume-title":"Don't listen to me: Understanding and exploring jailbreak prompts of large language models,\" arXiv preprint arXiv:2403.17336","author":"Yu Z.","year":"2024","unstructured":"Z. Yu, X. Liu, S. Liang, Z. Cameron, C. Xiao, and N. Zhang, \"Don't listen to me: Understanding and exploring jailbreak prompts of large language models,\" arXiv preprint arXiv:2403.17336, 2024."},{"key":"e_1_2_1_8_1","volume-title":"A survey on large language model (llm) security and privacy: The good, the bad, and the ugly,\" High-Confidence Computing","author":"Yao Y.","year":"2024","unstructured":"Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, and Y. Zhang, \"A survey on large language model (llm) security and privacy: The good, the bad, and the ugly,\" High-Confidence Computing, 2024."},{"key":"e_1_2_1_9_1","volume-title":"Fine-tuning aligned language models compromises safety, even when users do not intend to!\" arXiv preprint arXiv:2310.03693","author":"Qi X.","year":"2023","unstructured":"X. Qi, Y. Zeng, T. Xie, P.-Y. Chen, R. Jia, P. Mittal, and P. Henderson, \"Fine-tuning aligned language models compromises safety, even when users do not intend to!\" arXiv preprint arXiv:2310.03693, 2023."},{"key":"e_1_2_1_10_1","volume-title":"Jailbreaking leading safety-aligned llms with simple adaptive attacks,\" arXiv preprint arXiv:2404.02151","author":"Andriushchenko M.","year":"2024","unstructured":"M. Andriushchenko, F. Croce, and N. Flammarion, \"Jailbreaking leading safety-aligned llms with simple adaptive attacks,\" arXiv preprint arXiv:2404.02151, 2024."},{"key":"e_1_2_1_11_1","volume-title":"Mixture-of-agents enhances large language model capabilities,\" arXiv preprint arXiv:2406.04692","author":"Wang J.","year":"2024","unstructured":"J. Wang, J. Wang, B. Athiwaratkun, C. Zhang, and J. Zou, \"Mixture-of-agents enhances large language model capabilities,\" arXiv preprint arXiv:2406.04692, 2024."},{"key":"e_1_2_1_12_1","volume-title":"Dong et al., \"A survey of large language models,\" arXiv preprint arXiv:2303.18223","author":"Zhao W. X.","year":"2023","unstructured":"W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong et al., \"A survey of large language models,\" arXiv preprint arXiv:2303.18223, 2023."},{"key":"e_1_2_1_13_1","volume-title":"Wang et al., \"A survey on evaluation of large language models,\" ACM Transactions on Intelligent Systems and Technology","author":"Chang Y.","year":"2024","unstructured":"Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang et al., \"A survey on evaluation of large language models,\" ACM Transactions on Intelligent Systems and Technology, 2024."},{"key":"e_1_2_1_14_1","first-page":"1830","volume-title":"Aug. 2024","author":"Jin M.","year":"2024","unstructured":"M. Jin, Q. Yu, D. Shu, H. Zhao, W. Hua, Y. Meng, Y. Zhang, and M. Du, \"The impact of reasoning step length on large language models,\" in Findings of the Association for Computational Linguistics ACL 2024, Bangkok, Thailand and virtual meeting, Aug. 2024, pp. 1830--1842. [Online]. Available: https:\/\/aclanthology.org\/2024.findings-acl.108"},{"key":"e_1_2_1_15_1","volume-title":"Evil geniuses: Delving into the safety of llm-based agents,\" arXiv preprint arXiv:2311.11855","author":"Tian Y.","year":"2023","unstructured":"Y. Tian, X. Yang, J. Zhang, Y. Dong, and H. Su, \"Evil geniuses: Delving into the safety of llm-based agents,\" arXiv preprint arXiv:2311.11855, 2023."},{"key":"e_1_2_1_16_1","volume-title":"et al., \"Decodingtrust: A comprehensive assessment of trustworthiness in gpt models","author":"Wang B.","year":"2023","unstructured":"B. Wang, W. Chen, and H. P. et al., \"Decodingtrust: A comprehensive assessment of trustworthiness in gpt models,\" 2023."},{"key":"e_1_2_1_17_1","volume-title":"do anything now\": Characterizing and evaluating in-the-wild jailbreak prompts on large language models,\" arXiv preprint arXiv:2308.03825","author":"Shen X.","year":"2023","unstructured":"X. Shen, Z. Chen, M. Backes, Y. Shen, and Y. Zhang, \"\" do anything now\": Characterizing and evaluating in-the-wild jailbreak prompts on large language models,\" arXiv preprint arXiv:2308.03825, 2023."},{"key":"e_1_2_1_18_1","volume-title":"Ray et al., \"Training language models to follow instructions with human feedback,\" Advances in Neural Information Processing Systems","author":"Ouyang L.","year":"2022","unstructured":"L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., \"Training language models to follow instructions with human feedback,\" Advances in Neural Information Processing Systems, 2022."},{"issue":"17","key":"e_1_2_1_19_1","first-page":"750","volume":"38","author":"Zhou Z.","year":"2024","unstructured":"Z. Zhou, Q. Wang, M. Jin, J. Yao, J. Ye, W. Liu, W. Wang, X. Huang, and K. Huang, \"Mathattack: Attacking large language models towards math solving ability,\" in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 19 750--19 758.","journal-title":"\"Mathattack: Attacking large language models towards math solving ability,\" in Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"e_1_2_1_20_1","volume-title":"Beyond text: Improving llm's decision making for robot navigation via vocal cues,\" arXiv preprint arXiv:2402.03494","author":"Sun X.","year":"2024","unstructured":"X. Sun, H. Meng, S. Chakraborty, A. S. Bedi, and A. Bera, \"Beyond text: Improving llm's decision making for robot navigation via vocal cues,\" arXiv preprint arXiv:2402.03494, 2024."},{"key":"e_1_2_1_21_1","volume-title":"A dynamic environment to evaluate attacks and defenses for llm agents,\" arXiv preprint arXiv:2406.13352","author":"Debenedetti E.","year":"2024","unstructured":"E. Debenedetti, J. Zhang, M. Balunovi\u00b4c, L. Beurer-Kellner, M. Fischer, and F. Tram'er, \"Agentdojo: A dynamic environment to evaluate attacks and defenses for llm agents,\" arXiv preprint arXiv:2406.13352, 2024."},{"key":"e_1_2_1_22_1","volume-title":"A new era in llm security: Exploring security concerns in realworld llm-based systems,\" arXiv preprint arXiv:2402.18649","author":"Wu F.","year":"2024","unstructured":"F. Wu, N. Zhang, S. Jha, P. McDaniel, and C. Xiao, \"A new era in llm security: Exploring security concerns in realworld llm-based systems,\" arXiv preprint arXiv:2402.18649, 2024."},{"key":"e_1_2_1_23_1","volume-title":"Autodan: Generating stealthy jailbreak prompts on aligned large language models","author":"Liu X.","year":"2023","unstructured":"X. Liu, N. Xu, M. Chen, and C. Xiao, \"Autodan: Generating stealthy jailbreak prompts on aligned large language models,\" 2023."},{"key":"e_1_2_1_24_1","volume-title":"Guiding large language models via directional stimulus prompting,\" Advances in Neural Information Processing Systems","author":"Li Z.","year":"2024","unstructured":"Z. Li, B. Peng, P. He, M. Galley, J. Gao, and X. Yan, \"Guiding large language models via directional stimulus prompting,\" Advances in Neural Information Processing Systems, vol. 36, 2024."},{"key":"e_1_2_1_25_1","first-page":"941","volume-title":"Goalguided generative prompt injection attack on large language models,\" in 2024 IEEE International Conference on Data Mining (ICDM)","author":"Zhang C.","year":"2024","unstructured":"C. Zhang, M. Jin, Q. Yu, C. Liu, H. Xue, and X. Jin, \"Goalguided generative prompt injection attack on large language models,\" in 2024 IEEE International Conference on Data Mining (ICDM), 2024, pp. 941--946."},{"key":"e_1_2_1_26_1","volume-title":"Target-driven attack for large language models,\" in ECAI","author":"Zhang C.","year":"2024","unstructured":"C. Zhang, M. Jin, D. Shu, T. Wang, D. Liu, and X. Jin, \"Target-driven attack for large language models,\" in ECAI, 2024."},{"key":"e_1_2_1_27_1","volume-title":"Empower the llm to safeguard itself","author":"Wang Z.","year":"2023","unstructured":"Z. Wang, F. Yang, L. Wang, P. Zhao, H. Wang, L. Chen, Q. Lin, and K.-F. Wong, \"Self-guard: Empower the llm to safeguard itself,\" 2023."},{"key":"e_1_2_1_28_1","volume-title":"Jailbreaking chatgpt via prompt engineering: An empirical study","author":"Liu Y.","year":"2023","unstructured":"Y. Liu, G. Deng, Z. Xu, Y. Li, Y. Zheng, Y. Zhang, L. Zhao, T. Zhang, and Y. Liu, \"Jailbreaking chatgpt via prompt engineering: An empirical study,\" 2023."},{"key":"e_1_2_1_29_1","volume-title":"Does refusal training in llms generalize to the past tense?\" arXiv preprint arXiv:2407.11969","author":"Andriushchenko M.","year":"2024","unstructured":"M. Andriushchenko and N. Flammarion, \"Does refusal training in llms generalize to the past tense?\" arXiv preprint arXiv:2407.11969, 2024."},{"key":"e_1_2_1_30_1","volume-title":"Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts,\" arXiv preprint arXiv:2309.10253","author":"Yu J.","year":"2023","unstructured":"J. Yu, X. Lin, and X. Xing, \"Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts,\" arXiv preprint arXiv:2309.10253, 2023."},{"key":"e_1_2_1_31_1","first-page":"3","volume-title":"B. Chen","author":"Wang T.","year":"2024","unstructured":"T.Wang, Z. Fang, H. Xue, C. Zhang, M. Jin, W. Xu, D. Shu, S. Yang, Z.Wang, and D. Liu, \"Large vision-language model security: A survey,\" in Frontiers in Cyber Security, B. Chen, X. Fu, and M. Huang, Eds. Singapore: Springer Nature Singapore, 2024, pp. 3--22."},{"key":"e_1_2_1_32_1","volume-title":"Counterfactual explainable incremental prompt attack analysis on large language models,\" arXiv preprint arXiv:2407.09292","author":"Shu D.","year":"2024","unstructured":"D. Shu, M. Jin, T. Chen, C. Zhang, and Y. Zhang, \"Counterfactual explainable incremental prompt attack analysis on large language models,\" arXiv preprint arXiv:2407.09292, 2024."}],"container-title":["ACM SIGKDD Explorations Newsletter"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748239.3748242","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,7]],"date-time":"2025-07-07T19:20:39Z","timestamp":1751916039000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748239.3748242"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,7]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,7,7]]}},"alternative-id":["10.1145\/3748239.3748242"],"URL":"https:\/\/doi.org\/10.1145\/3748239.3748242","relation":{},"ISSN":["1931-0145","1931-0153"],"issn-type":[{"value":"1931-0145","type":"print"},{"value":"1931-0153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,7]]},"assertion":[{"value":"2025-07-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}