{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T08:15:44Z","timestamp":1778573744743,"version":"3.51.4"},"reference-count":76,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T00:00:00Z","timestamp":1778544000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T00:00:00Z","timestamp":1778544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005713","name":"Technische Universit\u00e4t M\u00fcnchen","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005713","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["AI Ethics"],"published-print":{"date-parts":[[2026,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Current AI safety discourse still focuses disproportionately on visible failures, including obvious harms, dramatic misuse, and hypothetical catastrophic scenarios. That focus is incomplete. In deployed systems, many of the most consequential failures are quieter: plausible rather than spectacular, distributed across components rather than localized in a single output, and normalized by workflows before they are recognized as hazards. We argue that a central safety challenge in modern AI systems is increasingly not only whether a model emits a harmful response, but whether the broader socio-technical system preserves the conditions under which errors remain visible, contestable, containable, and recoverable. We propose a five-layer framework for diagnosing these hidden risks: (1)\n                    <jats:italic>epistemic integrity<\/jats:italic>\n                    , concerning whether evidence and uncertainty are represented honestly enough to support calibrated reliance; (2)\n                    <jats:italic>control integrity<\/jats:italic>\n                    , concerning whether authority, permissions, and action boundaries remain robust under attack and optimization; (3)\n                    <jats:italic>temporal integrity<\/jats:italic>\n                    , concerning whether safety holds across sessions, memory updates, and deployment drift; (4)\n                    <jats:italic>organizational integrity<\/jats:italic>\n                    , concerning whether institutions retain the capacity to audit, assign responsibility, and intervene effectively; and (5)\n                    <jats:italic>ecosystem integrity<\/jats:italic>\n                    , concerning whether AI systems preserve rather than erode the information environment on which future oversight depends. Across these layers, we identify under-recognized risk patterns, including overreliance, uncertainty and legitimacy laundering in retrieval, prompt injection, reward hacking, memory poisoning, evaluation deception, fictional human oversight, synthetic evidence pollution, and model collapse. We conclude with actionable design and governance recommendations and a research agenda for shifting AI safety from narrow model-centric evaluation toward socio-technical reliability.\n                  <\/jats:p>","DOI":"10.1007\/s43681-026-01132-0","type":"journal-article","created":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T08:03:36Z","timestamp":1778573016000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["The safety failures we are not instrumenting: a perspective on hidden safety-critical challenges in modern AI systems"],"prefix":"10.1007","volume":"6","author":[{"given":"Gjergji","family":"Kasneci","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Enkelejda","family":"Kasneci","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,5,12]]},"reference":[{"key":"1132_CR1","unstructured":"Abercrombie, G., Benbouzid, D., Giudici, P., Golpayegani, D., Hernandez, J., Noro, P., Pandit, H., Paraschou, E., Pownall, C., Prajapati, J., et al.: A collaborative, human-centred taxonomy of AI, algorithmic, and automation harms.(2024). https:\/\/doi.org\/10.48550\/arXiv.2407.01294. arXiv preprint arXiv:2407.01294"},{"key":"1132_CR2","doi-asserted-by":"crossref","unstructured":"Alemohammad, S., Casco-Rodriguez, J., Luzi, L., Humayun, A.I., Babaei, H., LeJeune, D., Siahkoohi, A., Baraniuk, R.: Self-consuming generative models go mad. In: The Twelfth International Conference on Learning Representations. (2023). https:\/\/openreview.net\/forum?id=ShjMHfmPs0","DOI":"10.52591\/lxai202312101"},{"issue":"4","key":"1132_CR3","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1007\/s11023-022-09611-z","volume":"33","author":"K Alfrink","year":"2023","unstructured":"Alfrink, K., Keller, I., Kortuem, G., Doorn, N.: Contestable AI by design: towards a framework. Mind. Mach. 33(4), 613\u2013639 (2023). https:\/\/doi.org\/10.1007\/s11023-022-09611-z","journal-title":"Mind. Mach."},{"key":"1132_CR4","unstructured":"Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Man\u00e9, D.: Concrete problems in AI safety. (2016). https:\/\/doi.org\/10.48550\/arXiv.1606.06565 arXiv preprint arXiv:1606.06565"},{"key":"1132_CR5","doi-asserted-by":"publisher","unstructured":"Bo, J.Y., Wan, S., Anderson, A.: To rely or not to rely? Evaluating interventions for appropriate reliance on large language models. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp. 1\u201323. (2025). https:\/\/doi.org\/10.1145\/3706598.3714097","DOI":"10.1145\/3706598.3714097"},{"key":"1132_CR6","unstructured":"Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card,D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J.Q., Demszky, D., Liang, P.: On the Opportunities and Risks of Foundation Models. (2021). https:\/\/doi.org\/10.48550\/arXiv.2108.07258 arXiv preprint arXiv:2108.07258"},{"key":"1132_CR7","unstructured":"Briesch, M., Sobania, D., Rothlauf, F.: Large language models suffer from their own output: an analysis of the self-consuming training loop. arXiv preprint arXiv:2311.16822(2023). https:\/\/openreview.net\/forum?id=SaOxhcDCM3"},{"key":"1132_CR8","doi-asserted-by":"crossref","unstructured":"Bucinca, Z., Malaya, M.B., Gajos, K.Z.: To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. In: Proceedings of the ACM on Human\u2013computer Interaction 5(CSCW1), 1\u201321 (2021). https:\/\/doi.org\/10.1145\/ 3449287","DOI":"10.1145\/3449287"},{"key":"1132_CR9","doi-asserted-by":"publisher","unstructured":"Cao, H., Jing, S., Wang, Y., Peng, Z., Bai, Z., Cao, Z., Fang, M., Feng, F., Liu, J., Wang, B., Yang, T., Huo, J., Gao, Y., Meng, F., Yang, X., Deng, C., Feng, J.: SafeDialBench: a fine-grained safety evaluation benchmark for large language models in multi-turn dialogues with diverse jailbreak attacks. In: The Fourteenth International Conference on Learning Representations. (2026). https:\/\/doi.org\/10.48550\/arXiv.2502.11090","DOI":"10.48550\/arXiv.2502.11090"},{"key":"1132_CR10","doi-asserted-by":"publisher","unstructured":"Casper, S., Ezell, C., Siegmann, C., Kolt, N., Curtis, T.L., Bucknall, B., Haupt, A., Wei, K., Scheurer, J., Hobbhahn, M., Sharkey, L., Krishna, S., Von Hagen, M., Alberti, S., Chan, A., Sun, Q., Gerovitch, M., Bau, D., Tegmark, M., Hadfield-Menell, D.: Black-box access is insufficient for rigorous AI audits. In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pp. 2254\u20132272. (2024) https:\/\/doi.org\/10.1145\/3630106.3659037","DOI":"10.1145\/3630106.3659037"},{"key":"1132_CR11","doi-asserted-by":"crossref","unstructured":"Chen, Z., Xiang, Z., Xiao, C., Song, D., Li, B.: AgentPoison: red-teaming LLM agents via poisoning memory or knowledge bases. In: The Thirty-eighth Annual Conference on Neural Information Processing Systems. (2024). https:\/\/openreview.net\/forum?id=Y841BRW9rY","DOI":"10.52202\/079017-4136"},{"key":"1132_CR12","unstructured":"Chua, J., ety in generative AI large language models: a survey. (2024). https:\/\/doi.org\/10.48550\/arXiv.2407.18369. arXiv preprint arXiv:2407.18369"},{"key":"1132_CR13","doi-asserted-by":"publisher","unstructured":"Cobbe, J., Lee, M.S.A., Singh, J.: Reviewable automated decision-making: a framework for accountable algorithmic systems. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 598\u2013609. (2021). https:\/\/doi.org\/10.1145\/3442188.3445921","DOI":"10.1145\/3442188.3445921"},{"key":"1132_CR14","doi-asserted-by":"publisher","unstructured":"Cobbe, J., Veale, M., Singh, J.: Understanding accountability in algorithmic supply chains. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1186\u20131197.(2023). https:\/\/doi.org\/10.1145\/3593013.3594073","DOI":"10.1145\/3593013.3594073"},{"key":"1132_CR15","doi-asserted-by":"publisher","unstructured":"Debenedetti, E., Shumailov, I., Fan, T., Hayes, J., Carlini, N., Fabian, D., Kern, C., Shi, C., Terzis, A., Tram\u00e8r, F.: Defeating Prompt Injections by Design. arXiv preprint arXiv:2503.18813(2025). https:\/\/doi.org\/10.48550\/arXiv.2503.18813","DOI":"10.48550\/arXiv.2503.18813"},{"key":"1132_CR16","doi-asserted-by":"publisher","unstructured":"Debenedetti, E., Zhang, J., Balunovic, M., Beurer-Kellner, L., Fischer, M., Tram\u00e8r, F. AgentDojo.: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. In: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. (2024). https:\/\/doi.org\/10.52202\/079017-2636","DOI":"10.52202\/079017-2636"},{"key":"1132_CR17","doi-asserted-by":"publisher","unstructured":"Deng, J., Cheng, J., Sun, H., Zhang, Z., Huang, M.: Towards safer generative language models: a survey on safety risks, evaluations, and improvements. arXiv preprint arXiv:2302.09270(2023). https:\/\/doi.org\/10.48550\/arXiv.2302.09270","DOI":"10.48550\/arXiv.2302.09270"},{"key":"1132_CR18","doi-asserted-by":"publisher","unstructured":"Denison, C., MacDiarmid, M., Barez, F., Duvenaud, D., Kravec, S., Marks, S., Schiefer, N., Soklaski, R., Tamkin, A., Kaplan, J., Shlegeris, B., Bowman, S.R., Perez, E., Hubinger, E.: Sycophancy to subterfuge: investigating reward-tampering in large language models. arXiv preprint arXiv:2406.10162(2024). https:\/\/doi.org\/10.48550\/arXiv.2406.10162","DOI":"10.48550\/arXiv.2406.10162"},{"key":"1132_CR19","unstructured":"European Union. Regulation (EU) 2024\/1689 of the European parliament and of the council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union. (2024). https:\/\/eur-lex.europa.eu\/legal-content\/EN\/TXT\/?uri=CELEX:32024R1689"},{"key":"1132_CR20","unstructured":"Ferbach, D., Bertrand, Q., Bose, A.J., Gidel, G.: Self-consuming generative models with curated data provably optimize human preferences. In: The Thirty-eighth Annual Conference on Neural Information Processing Systems. (2024). https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2024\/hash\/b9e88ae0308cf82d0b0f634ddbdf809a-Abstract-Conference.html"},{"key":"1132_CR21","doi-asserted-by":"publisher","unstructured":"Gao, T., Yen, H., Yu, J., Chen, D.: Enabling large language models to generate text with citations. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 6465\u20136488. (2023). https:\/\/doi.org\/10.18653\/v1\/2023.emnlp-main.398","DOI":"10.18653\/v1\/2023.emnlp-main.398"},{"key":"1132_CR22","unstructured":"Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Korbak, T., Sleight, H., Agrawal, R., Hughes, J., Pai, D.B., Gromov, A., Roberts, D., Yang, D., Donoho, D.L., Koyejo, S.: Is model collapse inevitable? Breaking the curse of recursion by accumulating real and synthetic data. In: First Conference on Language Modeling. (2024). https:\/\/openreview.net\/forum?id=5B2K4LRgmz"},{"issue":"1","key":"1132_CR23","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1136\/amiajnl-2011-000089","volume":"19","author":"K Goddard","year":"2012","unstructured":"Goddard, K., Roudsari, A., Wyatt, J.C.: Automation bias: a systematic review of frequency, effect mediators, and mitigators. J. Am. Med. Inform. Assoc. 19(1), 121\u2013127 (2012). https:\/\/doi.org\/10.1136\/amiajnl-2011-000089","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1132_CR24","doi-asserted-by":"publisher","unstructured":"Greenblatt, R., Denison, C., Wright, B., Roger, F., MacDiarmid, M., Marks, S., Treutlein, J., Belonax, T., Chen, J., Duvenaud, D., et al.: Alignment faking in large language models. arXiv preprint arXiv:2412.14093(2024). https:\/\/doi.org\/10.48550\/arXiv.2412.14093","DOI":"10.48550\/arXiv.2412.14093"},{"key":"1132_CR25","doi-asserted-by":"publisher","unstructured":"Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M.: Not what you\u2019ve signed up for: compromising real-world llm-integrated applications with indirect prompt injection. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pp. 79\u201390. (2023). https:\/\/doi.org\/10.1145\/3605764.3623985","DOI":"10.1145\/3605764.3623985"},{"issue":"4","key":"1132_CR26","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1038\/s42256-025-01020-y","volume":"7","author":"B Gyevnar","year":"2025","unstructured":"Gyevnar, B., Kasirzadeh, A.: AI safety for everyone. Nat. Mach. Intell. 7(4), 531\u2013542 (2025). https:\/\/doi.org\/10.1038\/s42256-025-01020-y","journal-title":"Nat. Mach. Intell."},{"key":"1132_CR27","doi-asserted-by":"publisher","unstructured":"Habli, I., Hawkins, R., Paterson, C., Ryan, P., Jia, Y., Sujan, M., McDermid, J.: The big argument for AI safety cases. arXiv preprint arXiv:2503.11705(2025). https:\/\/doi.org\/10.48550\/arXiv.2503.11705","DOI":"10.48550\/arXiv.2503.11705"},{"issue":"6","key":"1132_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3773080","volume":"58","author":"F He","year":"2025","unstructured":"He, F., Zhu, T., Ye, D., Liu, B., Zhou, W., Yu, P.S.: The emerged security and privacy of LLM agent: a survey with case studies. ACM Comput. Surv. 58(6), 1 (2025). https:\/\/doi.org\/10.1145\/3773080","journal-title":"ACM Comput. Surv."},{"key":"1132_CR29","doi-asserted-by":"publisher","unstructured":"Ibrahim, L., Collins, K.M., Kim, S.S., Reuel, A., Lamparth, M., Feng, K., Ahmad, L., Soni, P., Kattan, A.E., Stein, M., et al.: Measuring and mitigating overreliance is necessary for building human-compatible AI. arXiv preprint arXiv:2509.08010(2025). https:\/\/doi.org\/10.48550\/arXiv.2509.08010","DOI":"10.48550\/arXiv.2509.08010"},{"key":"1132_CR30","doi-asserted-by":"publisher","unstructured":"Jiang, Z., Xu, F.F., Gao, L., Sun, Z., Liu, Q., Dwivedi-Yu, J., Yang, Y., Callan, J., Neubig, G.: Active retrieval augmented generation. In: The 2023 Conference on Empirical Methods in Natural Language Processing. (2023). https:\/\/doi.org\/10.18653\/v1\/2023.emnlp-main.495","DOI":"10.18653\/v1\/2023.emnlp-main.495"},{"key":"1132_CR31","doi-asserted-by":"publisher","unstructured":"Kim, S.S., Vaughan, J.W., Liao, Q.V., Lombrozo, T., Russakovsky, O.: Fostering appropriate reliance on large language models: The role of explanations, sources, and inconsistencies. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp. 1\u201319.(2025). https:\/\/doi.org\/10.1145\/3706598.3714020","DOI":"10.1145\/3706598.3714020"},{"key":"1132_CR32","unstructured":"Kirchhof, M., Kasneci, G., Kasneci, E.: Position: uncertainty quantification needs reassessment for large language model agents. In: Proceedings of the 42nd International Conference on Machine Learning, 267, 81665\u201381677. (2025). https:\/\/proceedings.mlr.press\/v267\/kirchhof25b.html"},{"key":"1132_CR33","doi-asserted-by":"publisher","first-page":"108352","DOI":"10.1016\/j.chb.2024.108352","volume":"160","author":"A Klingbeil","year":"2024","unstructured":"Klingbeil, A., Gr\u00fctzner, C., Schreck, P.: Trust and reliance on AI\u2013An experimental study on the extent and costs of overreliance on AI. Comput. Hum. Behav. 160, 108352 (2024). https:\/\/doi.org\/10.1016\/j.chb.2024.108352","journal-title":"Comput. Hum. Behav."},{"key":"1132_CR34","unstructured":"Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., Legg, S.: Specification gaming: the flip side of AI ingenuity. DeepMind Blog, 3, 40\u201353. (2020). https:\/\/deepmind.google\/blog\/specification-gaming-the-flip-side-of-ai-ingenuity\/"},{"key":"1132_CR35","doi-asserted-by":"publisher","unstructured":"Krishna, S., Krishna, K., Mohananey, A., Schwarcz, S., Stambler, A., Upadhyay, S., Faruqui, M.: Fact, fetch, and reason: a unified evaluation of retrieval-augmented generation. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 4745\u20134759.(2025). https:\/\/doi.org\/10.18653\/v1\/2025.naacl-long.243","DOI":"10.18653\/v1\/2025.naacl-long.243"},{"key":"1132_CR36","unstructured":"Langosco, L.L.D., Koch, J., Sharkey, L.D., Pfau, J., Krueger, D.: Goal misgeneralization in deep reinforcement learning. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning (pp.\u00a012004\u201312019, Vol.\u00a0162). PMLR.(2022). https:\/\/proceedings.mlr.press\/v162\/langosco22a.html"},{"key":"1132_CR37","volume-title":"Engineering a Safer World: Systems Thinking Applied to Safety","author":"NG Leveson","year":"2016","unstructured":"Leveson, N.G.: Engineering a Safer World: Systems Thinking Applied to Safety. MIT press (2016)"},{"key":"1132_CR38","doi-asserted-by":"publisher","unstructured":"Li, J., Yang, Y., Zhang, R., Liao, Q.V., Song, T., Xu, Z., Lee, Y.-c.: Understanding the Effects of Miscalibrated AI Confidence on User Trust, Reliance, and Decision Efficacy. arXiv preprint arXiv:2402.07632(2024). https:\/\/doi.org\/10.48550\/arXiv.2402.07632","DOI":"10.48550\/arXiv.2402.07632"},{"key":"1132_CR39","doi-asserted-by":"publisher","unstructured":"Li, M., Bickersteth, W., Tang, N., Cranor, L., Hong, J., Shen, H., Heidari, H.: A closer look at the existing risks of generative AI: mapping the who, what, and how of real-world incidents. In: Proceedings of the AAAI\/ACM Conference on AI, Ethics, and Society 8(2), 1561\u20131573 (2025). https:\/\/doi.org\/10.1609\/aies.v8i2.36655","DOI":"10.1609\/aies.v8i2.36655"},{"key":"1132_CR40","doi-asserted-by":"publisher","unstructured":"Li, N., Han, Z., Steneker, I., Primack, W., Goodside, R., Zhang, H., Wang, Z., Menghini, C., Yue, S.: LLM defenses are not robust to multi-turn human jailbreaks yet. arXiv preprint arXiv:2408.15221(2024). https:\/\/doi.org\/10.48550\/arXiv.2408.15221","DOI":"10.48550\/arXiv.2408.15221"},{"key":"1132_CR41","doi-asserted-by":"publisher","unstructured":"Li, X., Yu, S., Pan, M., Sun, Y., Li, B., Song, D., Lin, X., Shi, W.: Unsafer in many turns: benchmarking and defending multi-turn safety risks in tool-using agents. arXiv preprint arXiv:2602.13379(2026). https:\/\/doi.org\/10.48550\/arXiv.2602.13379","DOI":"10.48550\/arXiv.2602.13379"},{"key":"1132_CR42","unstructured":"Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., Zhang, S., Deng, X., Zeng, A., Du, Z., Zhang, C., Shen, S., Zhang, T., Su, Y., Sun, H., Tang, J.: AgentBench: evaluating LLMs as agents. In: The Twelfth International Conference on Learning Representations. (2024). https:\/\/openreview.net\/forum?id=zAdUB0aCTQ"},{"key":"1132_CR43","doi-asserted-by":"publisher","unstructured":"Liu, X., Yu, Z., Zhang, Y., Zhang, N., Xiao, C.: Automatic and universal prompt injection attacks against large language models. arXiv preprint arXiv:2403.04957(2024). https:\/\/doi.org\/10.48550\/arXiv.2403.04957","DOI":"10.48550\/arXiv.2403.04957"},{"key":"1132_CR44","doi-asserted-by":"publisher","unstructured":"Lynch, A., Wright, B., Larson, C., Ritchie, S.J., Mindermann, S., Hubinger, E., Perez, E., Troy, K.: Agentic misalignment: how LLMs could be insider threats. arXiv preprint arXiv:2510.05179(2025). https:\/\/doi.org\/10.48550\/arXiv.2510.05179","DOI":"10.48550\/arXiv.2510.05179"},{"key":"1132_CR45","unstructured":"Ma, C., Zhang, J., Zhu, Z., Yang, C., Yang, Y., Jin, Y., Lan, Z., Kong, L., He, J.: AgentBoard: an analytical evaluation board of multi-turn LLM agents. In: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. (2024). https:\/\/openreview.net\/forum?id=4S8agvKjle"},{"key":"1132_CR46","unstructured":"National Cyber Security Centre. Prompt injection is not SQL injection (it may be worse). NCSC blog: (2025). https:\/\/www.ncsc.gov.uk\/blog-post\/prompt-injection-is-not-sql-injection"},{"key":"1132_CR47","unstructured":"National Institute of Standards and Technology. Artificial intelligence risk management framework (AI RMF 1.0). NIST, (AI 100-1). https:\/\/nvlpubs.nist.gov\/nistpubs\/ai\/nist.ai.100-1.pdf. (2023)"},{"key":"1132_CR48","unstructured":"National Institute of Standards and Technology. Artificial intelligence risk management framework: generative artificial intelligence profile. NIST, (AI 600-1). (2024). https:\/\/nvlpubs.nist.gov\/nistpubs\/ai\/NIST.AI.600-1.pdf"},{"key":"1132_CR49","doi-asserted-by":"publisher","unstructured":"Ni, B., Liu, Z., Wang, L., Lei, Y., Zhao, Y., Cheng, X., Zeng, Q., Dong, L., Xia, Y., Kenthapadi, K., et al.: Towards trustworthy retrieval augmented generation for large language models: a survey. arXiv preprint arXiv:2502.06872(2025). https:\/\/doi.org\/10.48550\/arXiv.2502.06872","DOI":"10.48550\/arXiv.2502.06872"},{"issue":"3","key":"1132_CR50","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1177\/0018720810376055","volume":"52","author":"R Parasuraman","year":"2010","unstructured":"Parasuraman, R., Manzey, D.H.: Complacency and bias in human use of automation: an attentional integration. Hum. Factors 52(3), 381\u2013410 (2010). https:\/\/doi.org\/10.1177\/0018720810376055","journal-title":"Hum. Factors"},{"issue":"2","key":"1132_CR51","doi-asserted-by":"publisher","first-page":"230","DOI":"10.1518\/001872097778543886","volume":"39","author":"R Parasuraman","year":"1997","unstructured":"Parasuraman, R., Riley, V.: Humans and automation: use, misuse, disuse, abuse. Hum. Factors 39(2), 230\u2013253 (1997). https:\/\/doi.org\/10.1518\/001872097778543886","journal-title":"Hum. Factors"},{"key":"1132_CR52","doi-asserted-by":"publisher","unstructured":"Raji, I.D., Smart, A., White, R.N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., Barnes, P.: Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 33\u201344.(2020). https:\/\/doi.org\/10.1145\/3351095.3372873","DOI":"10.1145\/3351095.3372873"},{"key":"1132_CR53","doi-asserted-by":"publisher","DOI":"10.4324\/9781315543543","author":"J Reason","year":"1997","unstructured":"Reason, J.: Managing the risks of organizational accidents. Ashgate (1997). https:\/\/doi.org\/10.4324\/9781315543543","journal-title":"Ashgate"},{"key":"1132_CR54","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3331846","author":"Y Rong","year":"2024","unstructured":"Rong, Y., Leemann, T., Nguyen, T.-T., Fiedler, L., Qian, P., Unhelkar, V., Seidel, T., Kasneci, G., Kasneci, E.: Towards human-centered explainable AI: a survey of user studies for model explanations. IEEE Trans. Pattern Anal. Mach. Intell. (2024). https:\/\/doi.org\/10.1109\/TPAMI.2023.3331846","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"1132_CR55","doi-asserted-by":"publisher","first-page":"1074","DOI":"10.1109\/TLT.2025.3630117","volume":"18","author":"Y Rong","year":"2025","unstructured":"Rong, Y., Se\u00dfler, K., G\u00f6zl\u00fckl\u00fc, E., Kasneci, E.: Benchmarking in-context learning strategies of large language models for math reasoning tasks. IEEE Trans. Learn. Technol. 18, 1074\u20131082 (2025). https:\/\/doi.org\/10.1109\/TLT.2025.3630117","journal-title":"IEEE Trans. Learn. Technol."},{"key":"1132_CR56","doi-asserted-by":"publisher","unstructured":"Rossi, S., Michel, A.M., Mukkamala, R.R., Thatcher, J.B.: An early categorization of prompt injection attacks on large language models. arXiv preprint arXiv:2402.00898(2024). https:\/\/doi.org\/10.48550\/arXiv.2402.00898","DOI":"10.48550\/arXiv.2402.00898"},{"key":"1132_CR57","doi-asserted-by":"publisher","unstructured":"Santoni de Sio, F., Van den Hoven, J.: Meaningful human control over autonomous systems: a philosophical account. Front. Robot. AI, 5, 323836. (2018). https:\/\/doi.org\/10.3389\/frobt.2018.00015","DOI":"10.3389\/frobt.2018.00015"},{"key":"1132_CR58","doi-asserted-by":"publisher","unstructured":"Scheurer, J., Balesni, M., Hobbhahn, M.: Large language models can strategically deceive their users when put under pressure. In: ICLR 2024 Workshop on Large Language Model (LLM) Agents. (2024). https:\/\/doi.org\/10.48550\/arXiv.2311.07590","DOI":"10.48550\/arXiv.2311.07590"},{"key":"1132_CR59","doi-asserted-by":"publisher","unstructured":"Shelby, R., Rismani, S., Henne, K., Moon, A., Rostamzadeh, N., Nicholas, P., Yilla-Akbari, N., Gallegos, J., Smart, A., Garcia, E., Virk, G.: Sociotechnical harms of algorithmic systems: scoping a taxonomy for harm reduction. In: Proceedings of the 2023 AAAI\/ACM Conference on AI, Ethics, and Society, pp. 723\u2013741. (2023). https:\/\/doi.org\/10.1145\/3600211.3604673","DOI":"10.1145\/3600211.3604673"},{"issue":"8022","key":"1132_CR60","doi-asserted-by":"publisher","first-page":"755","DOI":"10.1038\/s41586-024-07566-y","volume":"631","author":"I Shumailov","year":"2024","unstructured":"Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R., Gal, Y.: AI models collapse when trained on recursively generated data. Nature 631(8022), 755\u2013759 (2024). https:\/\/doi.org\/10.1038\/s41586-024-07566-y","journal-title":"Nature"},{"key":"1132_CR61","doi-asserted-by":"publisher","unstructured":"Song, M., Sim, S.H., Bhardwaj, R., Chieu, H.L., Majumder, N., Poria, S.: Measuring and enhancing trustworthiness of LLMs in RAG through grounded attributions and learning to refuse. arXiv preprint arXiv:2409.11242(2024). https:\/\/doi.org\/10.48550\/arXiv.2409.11242","DOI":"10.48550\/arXiv.2409.11242"},{"issue":"2","key":"1132_CR62","doi-asserted-by":"publisher","first-page":"100099","DOI":"10.1016\/j.chbah.2024.100099","volume":"2","author":"N Spatola","year":"2024","unstructured":"Spatola, N.: The efficiency-accountability tradeoff in AI integration: effects on human performance and over-reliance. Comput. Hum. Behav. Artif. Hum. 2(2), 100099 (2024). https:\/\/doi.org\/10.1016\/j.chbah.2024.100099","journal-title":"Comput. Hum. Behav. Artif. Hum."},{"key":"1132_CR63","doi-asserted-by":"publisher","unstructured":"Sun, X., Zhang, D., Yang, D., Zou, Q., Li, H.: Multi-turn context jailbreak attack on large language models from first principles. arXiv preprint arXiv:2408.04686(2024). https:\/\/doi.org\/10.48550\/arXiv.2408.04686","DOI":"10.48550\/arXiv.2408.04686"},{"key":"1132_CR64","doi-asserted-by":"publisher","unstructured":"Sunil, B.D., Sinha, I., Maheshwari, P., Todmal, S., Mallik, S., Mishra, S.: Memory poisoning attack and defense on memory based LLM-agents. arXiv preprint arXiv:2601.05504(2026). https:\/\/doi.org\/10.48550\/arXiv.2601.05504","DOI":"10.48550\/arXiv.2601.05504"},{"issue":"1","key":"1132_CR65","doi-asserted-by":"publisher","first-page":"040013","DOI":"10.1063\/5.0222987","volume":"3194","author":"X Suo","year":"2024","unstructured":"Suo, X.: Signed-prompt: a new approach to prevent prompt injection attacks against LLM-integrated applications. AIP Conf. Proc. 3194(1), 040013 (2024). https:\/\/doi.org\/10.1063\/5.0222987","journal-title":"AIP Conf. Proc."},{"issue":"CSCW1","key":"1132_CR66","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3579605","volume":"7","author":"H Vasconcelos","year":"2023","unstructured":"Vasconcelos, H., J\u00f6rke, M., Grunde-McLaughlin, M., Gerstenberg, T., Bernstein, M.S., Krishna, R.: Explanations can reduce overreliance on ai systems during decision-making. Proc. ACM Hum.-Comput. Interact. 7(CSCW1), 1\u201338 (2023). https:\/\/doi.org\/10.1145\/3579605","journal-title":"Proc. ACM Hum.-Comput. Interact."},{"key":"1132_CR67","doi-asserted-by":"publisher","unstructured":"Wallat, J., Heuss, M., Rijke, M.D., Anand, A.: Correctness is not faithfulness in retrieval augmented generation attributions. In: Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR), pp. 22\u201332. (2025). https:\/\/doi.org\/10.1145\/3731120.3744592","DOI":"10.1145\/3731120.3744592"},{"key":"1132_CR68","doi-asserted-by":"publisher","unstructured":"Wang, B., He, W., Zeng, S., Xiang, Z., Xing, Y., Tang, J., He, P.: Unveiling privacy risks in LLM agent memory. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 25241\u201325260. (2025). https:\/\/doi.org\/10.18653\/v1\/2025.acl-long.1227","DOI":"10.18653\/v1\/2025.acl-long.1227"},{"key":"1132_CR69","unstructured":"Weick, K.E., Sutcliffe, K.M.: Managing the Unexpected: Resilient Performance in an Age of Uncertainty (2nd\u00a0ed.). Jossey-Bass. (2007)"},{"key":"1132_CR70","doi-asserted-by":"publisher","unstructured":"Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., et al.: Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 10. (2021). https:\/\/doi.org\/10.48550\/arXiv.2112.04359","DOI":"10.48550\/arXiv.2112.04359"},{"key":"1132_CR71","doi-asserted-by":"publisher","unstructured":"Xiong, Z., Lin, Y., Xie, W., He, P., Liu, Z., Tang, J., Lakkaraju, H., Xiang, Z.: How memory management impacts LLM agents: an empirical study of experience-following behavior. arXiv preprint arXiv:2505.16067(2025). https:\/\/doi.org\/10.48550\/arXiv.2505.16067","DOI":"10.48550\/arXiv.2505.16067"},{"key":"1132_CR72","doi-asserted-by":"publisher","unstructured":"Yu, E., Li, J., Liao, M., Wang, S., Zuchen, G., Mi, F., Hong, L.: CoSafe: evaluating large language model safety in multi-turn dialogue coreference. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 17494\u201317508. (2024) https:\/\/doi.org\/10.18653\/v1\/2024.emnlp-main.968","DOI":"10.18653\/v1\/2024.emnlp-main.968"},{"key":"1132_CR73","doi-asserted-by":"publisher","unstructured":"Yu, H., Kim, D., Kim, Y.-B.: Retrieval collapses when AI pollutes the web. arXiv preprint arXiv:2602.16136(2026). https:\/\/doi.org\/10.48550\/arXiv.2602.16136","DOI":"10.48550\/arXiv.2602.16136"},{"issue":"6","key":"1132_CR74","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3748302","volume":"43","author":"Z Zhang","year":"2025","unstructured":"Zhang, Z., Dai, Q., Bo, X., Ma, C., Li, R., Chen, X., Zhu, J., Dong, Z., Wen, J.-R.: A survey on the memory mechanism of large language model-based agents. ACM Trans. Inf. Syst. 43(6), 1 (2025). https:\/\/doi.org\/10.1145\/3748302","journal-title":"ACM Trans. Inf. Syst."},{"key":"1132_CR75","doi-asserted-by":"publisher","unstructured":"Zhou, Y., Liu, Y., Li, X., Jin, J., Qian, H., Liu, Z., Li, C., Dou, Z., Ho, T.-Y., Yu, P.S.: Trustworthiness in retrieval-augmented generation systems: a survey. arXiv preprint arXiv:2409.10102(2024). https:\/\/doi.org\/10.48550\/arXiv.2409.10102","DOI":"10.48550\/arXiv.2409.10102"},{"key":"1132_CR76","doi-asserted-by":"publisher","unstructured":"Zhu, Y., Jin, T., Pruksachatkun, Y., Zhang, A.K., Liu, S., Cui, S., Kapoor, S., Longpre, S., Meng, K., Weiss, R., Barez, F., Gupta, R., Dhamala, J., Merizian, J., Giulianelli, M., Coppock, H., Ududec, C., Kellermann, A., Sekhon, J.\u00a0S., Kang, D.: Establishing best practices in building rigorous agentic benchmarks. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track. (2025). https:\/\/doi.org\/10.48550\/arXiv.2507.02825","DOI":"10.48550\/arXiv.2507.02825"}],"container-title":["AI and Ethics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43681-026-01132-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s43681-026-01132-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43681-026-01132-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T08:04:19Z","timestamp":1778573059000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s43681-026-01132-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,12]]},"references-count":76,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,6]]}},"alternative-id":["1132"],"URL":"https:\/\/doi.org\/10.1007\/s43681-026-01132-0","relation":{},"ISSN":["2730-5953","2730-5961"],"issn-type":[{"value":"2730-5953","type":"print"},{"value":"2730-5961","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,12]]},"assertion":[{"value":"14 March 2026","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 April 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 May 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"295"}}