{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T20:42:56Z","timestamp":1778272976070,"version":"3.51.4"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2024,12,9]],"date-time":"2024-12-09T00:00:00Z","timestamp":1733702400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"name":"Strategic Priority Research Program of Chinese Academy of Sciences","award":["XDC02030200"],"award-info":[{"award-number":["XDC02030200"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62202466"],"award-info":[{"award-number":["62202466"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004739","name":"Youth Innovation Promotion Association CAS","doi-asserted-by":"publisher","award":["2022159"],"award-info":[{"award-number":["2022159"]}],"id":[{"id":"10.13039\/501100004739","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences"},{"name":"Beijing Key Laboratory of Network Security and Protection Technology"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>This paper delves into a possible security flaw in large language models (LLMs), particularly in their capacity to identify malicious intent within intricate or ambiguous inquiries. We have discovered that LLMs might overlook the malicious nature of highly veiled requests, even without alterations to the malevolent text in those queries, thus exposing a significant weakness in their content analysis systems. To be specific, we pinpoint and scrutinize two aspects of this vulnerability: (i) LLMs\u2019 diminished capability to perceive maliciousness when parsing extremely obscured queries, and (ii) LLMs\u2019 inability to discern malicious intent in queries that have been intentionally altered to increase their ambiguity by modifying the malevolent content itself. To illustrate and tackle this problem, we propose a theoretical framework and analytical strategy, and introduce a novel black-box jailbreak attack technique called IntentObfuscator. This technique exploits the identified vulnerability by concealing the genuine intentions behind user prompts, thereby compelling LLMs to inadvertently produce restricted content and circumvent their inherent content safety protocols. We elaborate on two specific applications within this framework: \u201dObscure Intention\u201d and \u201dCreate Ambiguity,\u201d which skillfully manipulate the complexity and ambiguity of queries to effectively dodge the detection of malicious intent. We empirically confirm the efficacy of the IntentObfuscator approach across various models, including ChatGPT-3.5, ChatGPT-4, Qwen, and Baichuan, achieving an average jailbreak success rate of 69.21%. Remarkably, our tests on ChatGPT-3.5, boasting 100 million weekly active users, yielded an impressive success rate of 83.65%. Additionally, we verify our approach across a range of sensitive content categories, including graphic violence, racism, sexism, political sensitivity, cybersecurity threats, and criminal techniques, further highlighting the considerable impact of our findings on refining \u201dRed Team\u201d tactics against LLM content security frameworks.<\/jats:p>","DOI":"10.1093\/comjnl\/bxae124","type":"journal-article","created":{"date-parts":[[2024,12,11]],"date-time":"2024-12-11T01:56:23Z","timestamp":1733882183000},"page":"460-478","source":"Crossref","is-referenced-by-count":7,"title":["Can LLMs deeply detect complex malicious queries? A framework for jailbreaking via obfuscating intent"],"prefix":"10.1093","volume":"68","author":[{"given":"Shang","family":"Shang","sequence":"first","affiliation":[{"name":"Institute of Information Engineering , Chinese Academy of Sciences, No. 19 Shucun Rd. Haidian District, 100085 Beijing,","place":["China"]},{"name":"School of Cyber Security , University of Chinese Academy of Sciences, NO. 19 Yuquan Rd. Shijingshan District, 100049 Beijing,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xinqiang","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Cyber Security , University of Chinese Academy of Sciences, NO. 19 Yuquan Rd. Shijingshan District, 100049 Beijing,","place":["China"]},{"name":"China Electronics Standardization Institute , No. 1, Andingmen East Street, Dongcheng District, 100007 Beijing,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhongjiang","family":"Yao","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering , Chinese Academy of Sciences, No. 19 Shucun Rd. Haidian District, 100085 Beijing,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yepeng","family":"Yao","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering , Chinese Academy of Sciences, No. 19 Shucun Rd. Haidian District, 100085 Beijing,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liya","family":"Su","sequence":"additional","affiliation":[{"name":"Security Lab , JD Cloud, Beijing,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zijing","family":"Fan","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering , Chinese Academy of Sciences, No. 19 Shucun Rd. Haidian District, 100085 Beijing,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaodan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering , Chinese Academy of Sciences, No. 19 Shucun Rd. Haidian District, 100085 Beijing,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhengwei","family":"Jiang","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering , Chinese Academy of Sciences, No. 19 Shucun Rd. Haidian District, 100085 Beijing,","place":["China"]},{"name":"School of Cyber Security , University of Chinese Academy of Sciences, NO. 19 Yuquan Rd. Shijingshan District, 100049 Beijing,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,12,9]]},"reference":[{"key":"2025052712503569600_ref1","article-title":"Language models are few-shot learners","volume-title":"Language models are few-shot learners.","author":"Brown","year":"2020"},{"key":"2025052712503569600_ref2","volume-title":"What Is the Size of the Training Set for GPT-3","author":"Abcarter","year":"2023"},{"key":"2025052712503569600_ref3","article-title":"Prompting4Debugging: Red-teaming text-to-image diffusion models by finding problematic prompts","volume-title":"ICML","author":"Chin","year":"2024"},{"key":"2025052712503569600_ref4","article-title":"RatGPT: Turning online LLMs into proxies for malware attacks","author":"Beckerich","year":"2023"},{"key":"2025052712503569600_ref5","article-title":"Spear phishing with large language models","author":"Hazell","year":"2023"},{"key":"2025052712503569600_ref6","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1145\/3627106.3627122","article-title":"A first look at toxicity injection attacks on open-domain Chatbots","volume-title":"Proceedings of the 39th Annual Computer Security Applications Conference","author":"Weeks","year":"2023"},{"key":"2025052712503569600_ref7","doi-asserted-by":"crossref","DOI":"10.1145\/3607199.3607237","article-title":"Understanding multi-turn toxic Behaviors in open-domain Chatbots","volume-title":"Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses","author":"Chen","year":"2023"},{"key":"2025052712503569600_ref8","volume-title":"Google\u2019s Secure AI Framework (SAIF)","author":"Google, Inc","year":"2023"},{"key":"2025052712503569600_ref9","volume-title":"Content Policy","author":"Openai, Inc","year":"2022"},{"key":"2025052712503569600_ref10","volume-title":"Cloud Natural Language","author":"Google, Inc","year":"2023"},{"key":"2025052712503569600_ref11","article-title":"\u201ddo anything now\u201d: Characterizing and evaluating In-the-wild jailbreak prompts on large language models","author":"Shen","year":"2023"},{"key":"2025052712503569600_ref12","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1038\/s41586-023-06647-8","article-title":"Role play with large language models","volume":"623","author":"Shanahan","year":"2023","journal-title":"Nature"},{"key":"2025052712503569600_ref13","article-title":"Jailbreaking ChatGPT via prompt engineering: An empirical study","author":"Liu","year":"2023"},{"key":"2025052712503569600_ref14","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.findings-emnlp.272","article-title":"Multi-step jailbreaking privacy attacks on ChatGPT","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Li","year":"2023"},{"key":"2025052712503569600_ref15","article-title":"Universal and transferable adversarial attacks on aligned language models","author":"Zou","year":"2023"},{"key":"2025052712503569600_ref16","volume-title":"Jailbreakchat","author":"Alexalbert","year":"2023"},{"key":"2025052712503569600_ref17","article-title":"Adapting large language models for content moderation: Pitfalls in data engineering and supervised fine-tuning","author":"Ma","year":"2023"},{"key":"2025052712503569600_ref18","article-title":"Training a helpful and harmless assistant with reinforcement learning from human feedback","author":"Bai","year":"2022"},{"key":"2025052712503569600_ref19","article-title":"Jailbroken: How does LLM safety training fail?","volume-title":"Advances in Neural Information Processing Systems.","author":"Wei","year":"2023"},{"key":"2025052712503569600_ref20","article-title":"Training language models to follow instructions with human feedback","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"Ouyang","year":"2024"},{"key":"2025052712503569600_ref21","article-title":"Pretraining language models with human preferences","volume-title":"Proceedings of the 40th International Conference on Machine Learning.","author":"Korbak","year":"2023"},{"key":"2025052712503569600_ref22","article-title":"Improving alignment of dialogue agents via targeted human judgements","author":"Glaese","year":"2022"},{"key":"2025052712503569600_ref23","article-title":"Prompting4Debugging: Red-teaming text-to-image diffusion models by finding problematic prompts","author":"Chin","year":"2023"},{"key":"2025052712503569600_ref24","article-title":"Red-teaming the stable diffusion safety filter","author":"Rando","year":"2022"},{"key":"2025052712503569600_ref25","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2022.emnlp-main.225","article-title":"Red teaming language models with language models","volume-title":"Conference on Empirical Methods in Natural Language Processing","author":"Perez","year":"2022"},{"key":"2025052712503569600_ref26","article-title":"FuzzLLM: A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models","author":"Yao","year":"2023"},{"key":"2025052712503569600_ref27","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.findings-emnlp.143","article-title":"Attack prompt generation for red teaming and defending large language models","volume-title":"Conference on Empirical Methods in Natural Language Processing","author":"Deng","year":"2023"},{"key":"2025052712503569600_ref28","doi-asserted-by":"publisher","first-page":"80218","DOI":"10.1109\/ACCESS.2023.3300381","article-title":"From ChatGPT to ThreatGPT: Impact of generative AI in cybersecurity and privacy","volume":"11","author":"Gupta","year":"2023","journal-title":"IEEE Access"},{"key":"2025052712503569600_ref29","article-title":"GPTFUZZER: Red teaming large language models with auto-generated jailbreak prompts","author":"Yu","year":"2023"},{"key":"2025052712503569600_ref30","article-title":"Jailbreaker: Automated jailbreak across multiple large language model Chatbots","author":"Deng","year":"2023"},{"key":"2025052712503569600_ref31","article-title":"AutoDAN: Generating stealthy jailbreak prompts on aligned large language models","author":"Liu","year":"2023"},{"key":"2025052712503569600_ref32","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D18-1316","article-title":"Generating natural language adversarial examples","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.","author":"Alzantot","year":"2018"},{"key":"2025052712503569600_ref33","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/P19-1103","article-title":"Generating natural language adversarial examples through probability weighted word saliency","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.","author":"Ren","year":"2019"},{"key":"2025052712503569600_ref34","article-title":"Adversarial GLUE: A multi-task benchmark for robustness evaluation of language models","volume-title":"Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)","author":"Wang","year":"2021"},{"key":"2025052712503569600_ref35","doi-asserted-by":"crossref","DOI":"10.3390\/app14167150","article-title":"Open sesame! Universal black box jailbreaking of large language models","volume-title":"ICLR 2024 Workshop on Secure and Trustworthy Large Language Models","author":"Lapid","year":"2024"},{"key":"2025052712503569600_ref36","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.findings-emnlp.388","article-title":"ASSERT: Automated safety scenario red teaming for evaluating the robustness of large language models","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Mei","year":"2023"},{"key":"2025052712503569600_ref37","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.emnlp-main.41","article-title":"FLIRT: Feedback loop In-context red teaming","volume-title":"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing","author":"Mehrabi","year":"2024"},{"key":"2025052712503569600_ref38","article-title":"Are aligned neural networks adversarially aligned?","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"Carlini","year":"2024"},{"key":"2025052712503569600_ref39","article-title":"SmoothLLM: Defending large language models against jailbreaking attacks","author":"Robey","year":"2023"},{"key":"2025052712503569600_ref40","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.acl-long.646","article-title":"Query-efficient black-box red teaming via Bayesian optimization","author":"Lee","year":"2023"},{"key":"2025052712503569600_ref41","doi-asserted-by":"crossref","DOI":"10.1145\/3605764.3623985","article-title":"Not what You\u2019ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection","volume-title":"Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security","author":"Greshake","year":"2023"},{"key":"2025052712503569600_ref42","article-title":"Multilingual jailbreak challenges in large language models","author":"Deng","year":"2023"},{"key":"2025052712503569600_ref43","article-title":"Red-teaming large language models using chain of utterances for safety-alignment","author":"Bhardwaj","year":"2023"},{"key":"2025052712503569600_ref44","article-title":"Prompt packer: Deceiving LLMs through compositional instruction with hidden attacks","author":"Jiang","year":"2023"},{"key":"2025052712503569600_ref45","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.findings-emnlp.813","article-title":"DrAttack: Prompt decomposition and reconstruction makes powerful LLM Jailbreakers","author":"Li","year":"2024"},{"key":"2025052712503569600_ref46","volume-title":"GPT-3.5 Turbo","author":"OpenAI, Inc"},{"key":"2025052712503569600_ref47","volume-title":"GPT-4 and GPT-4 Turbo","author":"OpenAI, Inc"},{"key":"2025052712503569600_ref48","article-title":"Qwen technical report","author":"Bai","year":"2023"},{"key":"2025052712503569600_ref49","article-title":"Baichuan 2: Open large-scale language models","author":"Yang","year":"2023"},{"key":"2025052712503569600_ref50","article-title":"JADE: A linguistics-based safety evaluation platform for large language models","author":"Zhang","year":"2023"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/5\/460\/61006766\/bxae124.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/5\/460\/61006766\/bxae124.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T16:51:10Z","timestamp":1748364670000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/68\/5\/460\/7919595"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,9]]},"references-count":50,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,12,9]]},"published-print":{"date-parts":[[2025,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxae124","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"value":"0010-4620","type":"print"},{"value":"1460-2067","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,5]]},"published":{"date-parts":[[2024,12,9]]}}}