{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:06:11Z","timestamp":1750309571697,"version":"3.41.0"},"reference-count":20,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T00:00:00Z","timestamp":1735603200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Queue"],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>The vulnerability of LLMs to hallucination, prompt injection, and jailbreaks poses a significant but surmountable challenge to their widespread adoption and responsible use. We have argued that these problems are inherent, certainly in the present generation of models and likely in LLMs per se, and so our approach can never be based on eliminating them; rather, we should apply strategies of \"defense in depth\" to mitigate them, and when building and using these systems, do so on the assumption that they will sometimes fail in these directions.<\/jats:p>","DOI":"10.1145\/3711679","type":"journal-article","created":{"date-parts":[[2025,1,21]],"date-time":"2025-01-21T23:18:56Z","timestamp":1737501536000},"page":"38-61","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["The Price of Intelligence"],"prefix":"10.1145","volume":"22","author":[{"given":"Mark","family":"Russinovich","sequence":"first","affiliation":[{"name":"Microsoft Azure"}]},{"given":"Ahmed","family":"Salem","sequence":"additional","affiliation":[{"name":"MSRC (Microsoft Security Response Center)"}]},{"given":"Santiago","family":"Zanella-B\u00e9guelin","sequence":"additional","affiliation":[{"name":"Microsoft Azure Research, Cambridge, UK"}]},{"given":"Yonatan","family":"Zunger","sequence":"additional","affiliation":[{"name":"Microsoft"}]}],"member":"320","published-online":{"date-parts":[[2025,1,21]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Hallucination rates and reference accuracy of ChatGPT and Bard for systematic reviews: comparative study. Journal of Medical Internet Research 26","author":"Chelli M.","year":"2024","unstructured":"Chelli, M., Descamps, J., Lavou\u00e9, V., Trojani, C., Azar, M., Deckert, M., Raynier, J.L., Clowez, G., Boileau, P., Ruetsch-Chelli, C. 2024. Hallucination rates and reference accuracy of ChatGPT and Bard for systematic reviews: comparative study. Journal of Medical Internet Research 26; https:\/\/www.jmir.org\/2024\/1\/e53164\/."},{"key":"e_1_2_1_2_1","unstructured":"Chern I-C. Chern S. Chen S. Yuan W. Feng K. Zhou C. He J. Neubig G. Liu P. 2023. FacTool: factuality detection in generative AI ? a tool augmented framework for multi-task and multi-domain scenarios; https:\/\/arxiv.org\/abs\/2307.13528."},{"key":"e_1_2_1_3_1","unstructured":"Forbes G. Levin E. Beltagy I. 2023. Metric ensembles for hallucination detection; https:\/\/arxiv.org\/abs\/2310.10495."},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Ji Z. Lee N. Frieske R. Yu T. Su D. Xu Y. Ishii E. Bang Y. Chen D. Dai W. Chan H. S. Madotto A. Fung P. 2022. Survey of hallucination in natural language generation. ACM Computing Surveys 55(12) 1?38; https:\/\/dl.acm.org\/doi\/10.1145\/3571730.","DOI":"10.1145\/3571730"},{"key":"e_1_2_1_5_1","article-title":"How do people react to AI failure? Automation bias, algorithmic aversion, and perceived controllability","author":"Jones-Jang S. Mo.","year":"2023","unstructured":"Jones-Jang, S. Mo., Park, Y. J. 2023. How do people react to AI failure? Automation bias, algorithmic aversion, and perceived controllability. Journal of Computer-Mediated Communication 28(1); https:\/\/academic.oup.com\/jcmc\/article\/28\/1\/zmac029\/6827859.","journal-title":"Journal of Computer-Mediated Communication 28(1); https:\/\/academic.oup.com\/jcmc\/article\/28\/1\/zmac029\/6827859."},{"key":"e_1_2_1_6_1","volume-title":"Let's verify step by step","author":"Lightman H.","year":"2005","unstructured":"Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., Cobbe, K. 2023. Let's verify step by step; https:\/\/arxiv.org\/abs\/2305.20050."},{"key":"e_1_2_1_7_1","volume-title":"Complacency and bias in human use of automation: an attentional integration. Human Factors 52(3), 381?410","author":"Parasuraman R.","year":"1872","unstructured":"Parasuraman, R., Manzey, D. H. 2010. Complacency and bias in human use of automation: an attentional integration. Human Factors 52(3), 381?410; https:\/\/journals.sagepub.com\/doi\/10.1177\/0018720810376055."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/HRI.2016.7451740"},{"key":"e_1_2_1_9_1","unstructured":"Weller O. Chang B. MacAvaney S. Lo K. Cohan A. Van Durme B. Lawrie D. Soldaini L. 2024. FollowIR: evaluating and teaching information retrieval models to follow instructions. https:\/\/arxiv.org\/abs\/2403.15246."},{"key":"e_1_2_1_10_1","volume-title":"Complacency and automation bias in the use of imperfect automation. Human Factors 57(5), 728?739","author":"Wickens C. D.","year":"1872","unstructured":"Wickens, C. D., Clegg, B. A., Vieane, A. Z., Sebok, A. L. 2015. Complacency and automation bias in the use of imperfect automation. Human Factors 57(5), 728?739; https:\/\/journals.sagepub.com\/doi\/10.1177\/0018720815581940."},{"key":"e_1_2_1_11_1","unstructured":"Zhang T. Patil S. G. Jain N. Shen S. Zaharia M. Stoica I. Gonzalez J. E. 2024. RAFT: adapting language model to domain specific RAG; https:\/\/arxiv.org\/abs\/2403.10131."},{"key":"e_1_2_1_12_1","unstructured":"Gu et al. (2023). Exploring the role of instruction tuning in mitigating prompt injection attacks in large language model. https:\/\/arxiv.org\/abs\/2306.10783. (Claude 3.5 Sonnet hallucinated this reference. The paper does not exist; the link points to a paper on Astrophysics)"},{"key":"e_1_2_1_13_1","unstructured":"Hines K. Lopez G. Hall M. Zarfati F. Zunger Y. Kiciman E. 2024. Defending against indirect prompt injection attacks with spotlighting; https:\/\/arxiv.org\/abs\/2403.14720."},{"key":"e_1_2_1_14_1","unstructured":"Liu Y. Deng G. Li Y. Wang K. Wang Z. Wang X. Zhang T. Liu Y. Wang H. Zheng Y. Liu Y. 2023. Prompt injection attack against LLM-integrated applications; https:\/\/arxiv.org\/abs\/2306.05499."},{"key":"e_1_2_1_15_1","unstructured":"Wallace E. Xiao K. Leike R. Weng L. Heidecke J. Beutel A. 2024. The instruction hierarchy: training LLMs to prioritize privileged instructions; https:\/\/arxiv.org\/abs\/2404.13208."},{"key":"e_1_2_1_16_1","volume-title":"Mitigating Skeleton Key, a new type of generative AI jailbreak technique. Microsoft Security blog","author":"Russinovich M.","year":"2024","unstructured":"Russinovich, M. 2024. Mitigating Skeleton Key, a new type of generative AI jailbreak technique. Microsoft Security blog; https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/26\/mitigating-skeleton-key-a-new-type-of-generative-ai-jailbreak-technique\/."},{"key":"e_1_2_1_17_1","volume-title":"now write an article about that: the Crescendo multi-turn LLM jailbreak attack","author":"Russinovich M.","year":"1833","unstructured":"Russinovich, M., Salem, A., Eldan, R. 2024. Great, now write an article about that: the Crescendo multi-turn LLM jailbreak attack; https:\/\/arxiv.org\/abs\/2404.01833."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3658644.3670388"},{"key":"e_1_2_1_19_1","series-title":"January 25","volume-title":"Trolls have flooded X with graphic Taylor Swift AI fakes. The Verge","author":"Weatherbed J.","year":"2024","unstructured":"Weatherbed, J. 2024. Trolls have flooded X with graphic Taylor Swift AI fakes. The Verge (January 25); https:\/\/www.theverge.com\/2024\/1\/25\/24050334\/x-twitter-taylor-swift-ai-fake-images-trending."},{"key":"e_1_2_1_20_1","unstructured":"Zou A. Wang Z. Carlini N. Nasr M. Kolter J. Z. Fredrikson M. 2023. Universal and transferable adversarial attacks on aligned language models; https:\/\/arxiv.org\/abs\/2307.15043."}],"container-title":["Queue"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3711679","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3711679","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:19:15Z","timestamp":1750295955000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3711679"}},"subtitle":["Three risks inherent in LLMs"],"short-title":[],"issued":{"date-parts":[[2024,12,31]]},"references-count":20,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3711679"],"URL":"https:\/\/doi.org\/10.1145\/3711679","relation":{},"ISSN":["1542-7730","1542-7749"],"issn-type":[{"type":"print","value":"1542-7730"},{"type":"electronic","value":"1542-7749"}],"subject":[],"published":{"date-parts":[[2024,12,31]]},"assertion":[{"value":"2025-01-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}