{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T03:01:57Z","timestamp":1777777317352,"version":"3.51.4"},"reference-count":110,"publisher":"Emerald","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,12,14]]},"abstract":"<jats:p>Every major technical invention resurfaces the dual-use dilemma\u2014the new technology has the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such as large language models (LLMs) and diffusion models, have shown remarkable capabilities (e.g., in-context learning, code-completion, and text-to-image generation and editing). However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.<\/jats:p>\n                  <jats:p>This monograph reports the findings of a workshop held at Google (co-organized by Stanford University and the University of Wisconsin-Madison) on the dual-use dilemma posed by GenAI. This work is not meant to be comprehensive, but is rather an attempt to synthesize some of the interesting findings from the workshop. We discuss short-term and long-term goals for the community on this topic. We hope this work provides both a launching point for a discussion on this important topic as well as interesting problems that the research community can work to address.<\/jats:p>","DOI":"10.1561\/3300000041","type":"journal-article","created":{"date-parts":[[2023,12,14]],"date-time":"2023-12-14T05:36:07Z","timestamp":1702532167000},"page":"1-52","source":"Crossref","is-referenced-by-count":40,"title":["Identifying and Mitigating the Security Risks of Generative AI"],"prefix":"10.1108","volume":"6","author":[{"given":"Clark","family":"Barrett","sequence":"first","affiliation":[{"name":"Stanford University ,","place":["USA"]}]},{"given":"Brad","family":"Boyd","sequence":"additional","affiliation":[{"name":"Stanford University ,","place":["USA"]}]},{"given":"Elie","family":"Bursztein","sequence":"additional","affiliation":[{"name":"Google ,","place":["USA"]}]},{"given":"Nicholas","family":"Carlini","sequence":"additional","affiliation":[{"name":"Google ,","place":["USA"]}]},{"given":"Brad","family":"Chen","sequence":"additional","affiliation":[{"name":"Google ,","place":["USA"]}]},{"given":"Jihye","family":"Choi","sequence":"additional","affiliation":[{"name":"University of Wisconsin , ,","place":["Madison, USA"]}]},{"given":"Amrita Roy","family":"Chowdhury","sequence":"additional","affiliation":[{"name":"University of California , ,","place":["San Diego, USA"]}]},{"given":"Mihai","family":"Christodorescu","sequence":"additional","affiliation":[{"name":"Google ,","place":["USA"]}]},{"given":"Anupam","family":"Datta","sequence":"additional","affiliation":[{"name":"Truera ,","place":["USA"]}]},{"given":"Soheil","family":"Feizi","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park ,","place":["USA"]}]},{"given":"Kathleen","family":"Fisher","sequence":"additional","affiliation":[{"name":"DARPA ,","place":["USA"]}]},{"given":"Tatsunori","family":"Hashimoto","sequence":"additional","affiliation":[{"name":"Stanford University ,","place":["USA"]}]},{"given":"Dan","family":"Hendrycks","sequence":"additional","affiliation":[{"name":"Center for AI Safety ,","place":["USA"]}]},{"given":"Somesh","family":"Jha","sequence":"additional","affiliation":[{"name":"University of Wisconsin , ,","place":["Madison, USA"]}]},{"given":"Daniel","family":"Kang","sequence":"additional","affiliation":[{"name":"University of Illinois, Urbana Champaign ,","place":["USA"]}]},{"given":"Florian","family":"Kerschbaum","sequence":"additional","affiliation":[{"name":"University of Waterloo ,","place":["Canada"]}]},{"given":"Eric","family":"Mitchell","sequence":"additional","affiliation":[{"name":"Stanford University ,","place":["USA"]}]},{"given":"John","family":"Mitchell","sequence":"additional","affiliation":[{"name":"Stanford University ,","place":["USA"]}]},{"given":"Zulfikar","family":"Ramzan","sequence":"additional","affiliation":[{"name":"Aura Labs ,","place":["USA"]}]},{"given":"Khawaja","family":"Shams","sequence":"additional","affiliation":[{"name":"Google ,","place":["USA"]}]},{"given":"Dawn","family":"Song","sequence":"additional","affiliation":[{"name":"University of California , ,","place":["Berkeley, USA"]}]},{"given":"Ankur","family":"Taly","sequence":"additional","affiliation":[{"name":"Google ,","place":["USA"]}]},{"given":"Diyi","family":"Yang","sequence":"additional","affiliation":[{"name":"Stanford University ,","place":["USA"]}]}],"member":"140","published-online":{"date-parts":[[2023,12,14]]},"reference":[{"key":"2026033014313556000_ref001","unstructured":"*** Your\n          \n          ai\n          model might be telling you this is not a cat,\n          url:\n          https:\/\/art-demo.mybluemix.net\/."},{"key":"2026033014313556000_ref002","unstructured":"***\n          , Authors Guild letter seeks compensation from AI companies for using authors\u2019 writings in AI,2023. url: https:\/\/chatgptiseatingtheworld.com\/2023\/07\/19\/authors-guild-letter-seeks-comp ensation-from-ai-companies-for-using-authors-writings-in-ai\/."},{"key":"2026033014313556000_ref003","unstructured":"***\n          , ChaosGPT: Empowering GPT with internet and memory to destroy humanity,2023. url: https:\/\/www.youtube.com\/watch?v=g7YJIpkk7KM."},{"key":"2026033014313556000_ref004","unstructured":"***\n          , Securing the future of GenAI: Mitigating security risks,2023. url:https:\/\/sites.google.com\/view\/genai-risk-workshop."},{"key":"2026033014313556000_ref005","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1207\/S15327051HCI1523_5","article-title":"The intellectual challenge of CSCW: The gap between social requirements and technical feasibility,","author":"Ackerman","year":"2000"},{"key":"2026033014313556000_ref006","doi-asserted-by":"publisher","first-page":"1341","DOI":"10.1007\/978-3-030-44041-1_114","article-title":"Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection,","author":"Adelani","year":"2020","journal-title":"Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020),"},{"key":"2026033014313556000_ref007","author":"Alon","year":"2023","journal-title":"Detecting language model attacks with perplexity,"},{"key":"2026033014313556000_ref008","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1007\/3-540-45496-9_14","article-title":"Natural language watermarking: Design, analysis, and a proof-of-concept implementation,","author":"Atallah","year":"2001","journal-title":"Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25\u201327, 2001 Proceedings 4,"},{"key":"2026033014313556000_ref009","unstructured":"Y.\n              Bai\n            , S.Kadavath, S.Kundu, A.Askell, J.Kernion, A.Jones, A.Chen, A.Goldie, A.Mirhoseini, C.McKinnon, C.Chen, C.Olsson, C.Olah, D.Hernandez, D.Drain, D.Ganguli, D.Li, E.Tran-Johnson, E.Perez, J.Kerr, J.Mueller, J.Ladish, J.Landau, K.Ndousse, K.Lukosuite, L.Lovitt, M.Sellitto, N.Elhage, N.Schiefer, N.Mercado, N.DasSarma, R.Lasenby, R.Larson, S.Ringer, S.Johnston, S.Kravec, S.El Showk, S.Fort, T.Lanham, T.Telleen-Lawton, T.Conerly, T.Henighan, T.Hume, S. R.Bowman, Z.Hatfield-Dodds, B.Mann, D.Amodei, N.Joseph, S.McCandlish, T.Brown, and J.Kaplan, Constitutional AI: Harmlessness from AI feedback,2022. url:https:\/\/arxiv.org\/abs\/2212.08073."},{"key":"2026033014313556000_ref010","unstructured":"A.\n              Bakhtin\n            , S.Gross, M.Ott, Y.Deng, M.Ranzato, and A.Szlam, Real or fake? learning to discriminate machine from human generated text,2019. url:https:\/\/arxiv.org\/abs\/1906.03351."},{"key":"2026033014313556000_ref011","unstructured":"A.\n              Belanger\n            \n          , OpenAI, Google will watermark AI-generated content to hinder deepfakes, misinfo,2023. url:https:\/\/arstechnica.com\/ai\/2023\/07\/openai-google-will-watermark-ai-generated-content-to-hinder-deepfakes-misinfo\/."},{"key":"2026033014313556000_ref012","unstructured":"M.\n              Bohannon\n            \n          , Lawyer used ChatGPT in court\u2014And cited fake cases. A judge is considering sanctions,2023. url:https:\/\/www.forbes.com\/sites\/mollybohannon\/2023\/06\/08\/lawyer-used-chatgpt-in-court-and-cited-fake-cases-a-judge-is-considering-sanctions\/?sh=1175e6e87c7f."},{"key":"2026033014313556000_ref013","unstructured":"A. M.\n              Bran\n            , S.Cox, A. D.White, and P.Schwaller, ChemCrow: Augmenting large-language models with chemistry tools,2023. url:https:\/\/arxiv.org\/abs\/2304.05376."},{"key":"2026033014313556000_ref014","unstructured":"B.\n              Brittain\n            \n          , Lawsuit says Openai violated us authors\u2019 copyrights to train AI chatbot,2023. url:https:\/\/www.reuters.com\/legal\/lawsuit-says-openai-violated-us-authors-copyrights-train-ai-chatbot-2023-06-29\/."},{"key":"2026033014313556000_ref015","unstructured":"M.\n              Buiten\n            \n          , Product liability for defective AI,2023. url:https:\/\/ssrn.com\/abstract=4515202."},{"key":"2026033014313556000_ref016","doi-asserted-by":"crossref","unstructured":"N.\n              Carlini\n            , M.Jagielski, C. A.Choquette-Choo, D.Paleka, W.Pearce, H.Anderson, A.Terzis, K.Tram\u00e8r, and F.Trainer, Poisoning web-scale training datasets is practical,2023. url:https:\/\/arxiv.org\/abs\/2302.10149.","DOI":"10.1109\/SP54263.2024.00179"},{"key":"2026033014313556000_ref017","doi-asserted-by":"crossref","unstructured":"N.\n              Carlini\n            , M.Nasr, C. A.Choquette-Choo, M.Jagielski, I.Gao, A.Awadalla, P. W.Koh, D.Ippolito, K.Lee, F.Tram\u00e8r, and L.Schmidt, Are aligned neural networks adversarially aligned?2023. url:https:\/\/arxiv.org\/abs\/2306.15447.","DOI":"10.52202\/075280-2687"},{"key":"2026033014313556000_ref018","unstructured":"N.\n              Carlini\n             and A.Terzis, \u201cPoisoning and backdooring contrastive learning,\u201d in International Conference on Learning Representations,2022. url:https:\/\/openreview.net\/forum?id=iC4UHbQ01Mp."},{"key":"2026033014313556000_ref019","article-title":"Extracting training data from large language models,","author":"Carlini","year":"2020","journal-title":"USENIX Security Symposium,"},{"key":"2026033014313556000_ref020","unstructured":"M.\n              Christ\n            , S.Gunn, and O.Zamir, Undetectable watermarks for language models, Cryptology ePrint Archive, Paper 2023\/763, 2023. url:https:\/\/eprint.iacr.org\/2023\/763."},{"key":"2026033014313556000_ref021","unstructured":"DARPA Public Affairs\n          , DARPA announces research teams selected to semantic forensics program,2021. url:https:\/\/www.darpa.mil\/news-events\/2021-03-02."},{"key":"2026033014313556000_ref022","unstructured":"R.\n              Durall\n            , M.Keuper, F.-J. Pfreundt, and J.Keuper, Unmasking deepfakes with simple features,2019. url:https:\/\/arxiv.org\/abs\/1911.00686."},{"key":"2026033014313556000_ref023","doi-asserted-by":"crossref","unstructured":"T.\n              Fagni\n            , F.Falchi, M.Gambini, A.Martella, and M.Tesconi, \u201cTweepfake: About detecting deepfake tweets,\u201d PLOS ONE, vol. 16, no. 5, pp. 1\u201316, 2021. url:10.1371\/journal.pone.0251415.","DOI":"10.1371\/journal.pone.0251415"},{"key":"2026033014313556000_ref024","doi-asserted-by":"crossref","unstructured":"P.\n              Fernandez\n            , G.Couairon, H.J\u00e9gou, M.Douze, and T.Furon, The stable signature: Rooting watermarks in latent diffusion models,2023. url:https:\/\/arxiv.org\/abs\/2303.15435.","DOI":"10.1109\/ICCV51070.2023.02053"},{"key":"2026033014313556000_ref025","first-page":"3247","article-title":"Leveraging frequency analysis for deep fake image recognition,","author":"Frank","year":"2020","journal-title":"International Conference on Machine Learning,"},{"key":"2026033014313556000_ref026","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1007\/s11023-020-09539-2","article-title":"Artificial intelligence, values, and alignment,","author":"Gabriel","year":"2020","journal-title":"Minds and Machines,"},{"key":"2026033014313556000_ref027","unstructured":"M.\n              Gahntz\n             and C.Pershan, How the eu can take on the challenge posed by general-purposeai systems,2022. url:https:\/\/assets.mofo-prod.net\/network\/documents\/AI-Act_Mozilla-GPAI-Brief_Kx1ktuk.pdf."},{"key":"2026033014313556000_ref028","unstructured":"D.\n              Ganguli\n            , L.Lovitt, J.Kernion, A.Askell, Y.Bai, S.Kadavath, B.Mann, E.Perez, N.Schiefer, K.Ndousse, A.Jones, S.Bowman, A.Chen, T.Conerly, N.DasSarma, D.Drain, N.Elhage, S.El-Showk, S.Fort, Z.Hatfield-Dodds, T.Henighan, D.Hernandez, T.Hume, J.Jacobson, S.Johnston, S.Kravec, C.Olsson, S.Ringer, E.Tran-Johnson, D.Amodei, T.Brown, N.Joseph, S.McCandlish, C.Olah, J.Kaplan, and J.Clark, Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned,2022. url:https:\/\/arxiv.org\/abs\/2209.07858."},{"key":"2026033014313556000_ref029","doi-asserted-by":"crossref","unstructured":"L.\n              Gao\n            , Z.Dai, P.Pasupat, A.Chen, A. T.Chaganty, Y.Fan, V.Zhao, N.Lao, H.Lee, D.-C. Juan, and K.Guu, \u201cRARR: Researching and revising what language models say, using language models,\u201d in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),A.Rogers, J.Boyd-Graber, and N.Okazaki, Eds., Toronto, Canada: Association for Computational Linguistics, pp. 16477\u201316508, Jul.2023. url:https:\/\/aclanthology.org\/2023.acl-long.910.","DOI":"10.18653\/v1\/2023.acl-long.910"},{"key":"2026033014313556000_ref030","doi-asserted-by":"publisher","first-page":"3356","DOI":"10.18653\/v1\/2020.findings-emnlp.301","article-title":"RealToxicityPrompts: Evaluating neural toxic degeneration in language models,","author":"Gehman","year":"2020","journal-title":"Findings of the Association for Computational Linguistics: EMNLP 2020,"},{"key":"2026033014313556000_ref031","doi-asserted-by":"crossref","unstructured":"S.\n              Gehrmann\n            , H.Strobelt, and A.Rush, \u201cGLTR: Statistical detection and visualization of generated text,\u201d in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations,M. R.Costa-juss\u00e0 andE.Alfonseca, Eds., Florence, Italy: Association for Computational Linguistics, pp. 111\u2013116, Jul.2019. url:https:\/\/aclanthology.org\/P19-3019.","DOI":"10.18653\/v1\/P19-3019"},{"key":"2026033014313556000_ref032","unstructured":"D.\n              Glukhov\n            , I.Shumailov, Y.Gal, N.Papernot, and V.Papyan, LLM censorship: A machine learning challenge or a computer security problem?2023. url:https:\/\/arxiv.org\/abs\/2307.10719."},{"key":"2026033014313556000_ref033","unstructured":"Google\n          , Bard,2023. url:https:\/\/bard.google.com\/."},{"key":"2026033014313556000_ref034","doi-asserted-by":"publisher","first-page":"666","DOI":"10.1109\/CVPRW50498.2020.00341","article-title":"Deepfake detection by analyzing convolutional traces,","author":"Guarnera","year":"2020","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops,"},{"key":"2026033014313556000_ref035","doi-asserted-by":"crossref","unstructured":"J.\n              He\n             and M.Vechev, Large language models for code: Security hardening and adversarial testing,2023. url:https:\/\/arxiv.org\/abs\/2302.05319.","DOI":"10.1145\/3576915.3623175"},{"key":"2026033014313556000_ref036","doi-asserted-by":"crossref","unstructured":"O.\n              Honovich\n            , R.Aharoni, J.Herzig, H.Taitelbaum, D.Kukliansy, V.Cohen, T.Scialom, I.Szpektor, A.Hassidim, and Y.Matias, \u201cTRUE: Re-evaluating factual consistency evaluation,\u201d in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,M.Carpuat, M.-C.de Marneffe, and I. V.Meza Ruiz, Eds., Seattle, United States: Association for Computational Linguistics, pp. 3905\u20133920, Jul. 2022, url:https:\/\/aclanthology.org\/2022.naacl-main.287.","DOI":"10.18653\/v1\/2022.naacl-main.287"},{"key":"2026033014313556000_ref037","doi-asserted-by":"publisher","first-page":"588","DOI":"10.18653\/v1\/2021.naacl-main.49","article-title":"The importance of modeling social factors of language: Theory and practice,","author":"Hovy","year":"2021","journal-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,"},{"key":"2026033014313556000_ref038","doi-asserted-by":"crossref","unstructured":"D.\n              Ippolito\n            , D.Duckworth, C.Callison-Burch, and D.Eck, \u201cAutomatic detection of generated text is easiest when humans are fooled,\u201d in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D.Jurafsky, J.Chai, N.Schluter, and J.Tetreault, Eds., Online: Association for Computational Linguistics, pp. 1808\u20131822, Jul.2020. url:https:\/\/aclanthology.org\/2020.aclmain.164.","DOI":"10.18653\/v1\/2020.acl-main.164"},{"key":"2026033014313556000_ref039","unstructured":"A.\n              Jain\n            , C.Adiole, S.Chaudhuri, T.Reps, and C.Jermaine, Tuning models of code with compiler-generated reinforcement learning feedback,2023. url:https:\/\/arxiv.org\/abs\/2305.18341."},{"key":"2026033014313556000_ref040","doi-asserted-by":"crossref","unstructured":"G.\n              Jawahar\n            , M.Abdul-Mageed, and L.Lakshmanan V.S., \u201cAutomatic detection of machine generated text: A critical survey,\u201d in Proceedings of the 28th International Conference on Computational Linguistics,  D.Scott, N.Bel, and C.Zong, Eds., Barcelona, Spain (Online): International Committee on Computational Linguistics, pp. 2296\u20132309, Dec.2020. url:https:\/\/aclanthology.org\/2020.coling-main.208.","DOI":"10.18653\/v1\/2020.coling-main.208"},{"key":"2026033014313556000_ref041","doi-asserted-by":"crossref","unstructured":"Z.\n              Jiang\n            , J.Zhang, and N. Z.Gong, Evading watermark based detection of AI-generated content,2023. url:https:\/\/arxiv.org\/abs\/2305.03807.","DOI":"10.1145\/3576915.3623189"},{"key":"2026033014313556000_ref042","doi-asserted-by":"crossref","unstructured":"D.\n              Kang\n            , X.Li, I.Stoica, C.Guestrin, M.Zaharia, and T.Hashimoto, \u201cExploiting programmatic behavior of LLMs: Dualuse through standard security attacks,\u201d in The Second Workshop on New Frontiers in Adversarial Machine Learning,2023. url:https:\/\/openreview.net\/forum?id=eXwzgiXYM8.","DOI":"10.1109\/SPW63631.2024.00018"},{"key":"2026033014313556000_ref043","author":"Katzenbeisser","year":"2016","journal-title":"Information Hiding."},{"key":"2026033014313556000_ref044","unstructured":"D.\n              Kelley\n            \n          , WormGPT \u2013 the Generative AI tool cybercriminals are using to launch business email compromise attacks,2023. url:https:\/\/slashnext.com\/blog\/wormgpt-the-generative-ai-tool-cybercriminals-are-using-to-launch-business-email-compromise-attacks\/\"."},{"key":"2026033014313556000_ref045","unstructured":"J.\n              Kirchenbauer\n            , J.Geiping, Y.Wen, J.Katz, I.Miers, and T.Goldstein, \u201c watermark for large language models,\u201d in Proceedings of the 40th International Conference on Machine Learning,A.Krause, E.Brunskill, K.Cho, B.Engelhardt, S.Sabato, and J.Scarlett, Eds., ser. Proceedings of Machine Learning Research, vol. 202, pp. 17061\u201317084, PMLR, 23\u201329Jul2023. url:https:\/\/proceedings.mlr.press\/v202\/kirchenbauer23a.html."},{"key":"2026033014313556000_ref046","unstructured":"J.\n              Kirchenbauer\n            , J.Geiping, Y.Wen, M.Shu, K.Saifullah, K.Kong, K.Fernando, A.Saha, M.Goldblum, and T.Goldstein, On the reliability of watermarks for large language models,2023. url:https:\/\/arxiv.org\/abs\/2306.04634."},{"key":"2026033014313556000_ref047","unstructured":"K.\n              Krishna\n            , Y.Song, M.Karpinska, J. F.Wieting, and M.Iyyer, \u201cParaphrasing evades detectors of AI-generated text, but retrieval is an effective defense,\u201d in Thirty-Seventh Conference on Neural Information Processing Systems,2023. url:https:\/\/openreview.net\/forum?id=WbFhFvjjKj."},{"key":"2026033014313556000_ref048","unstructured":"R.\n              Krishnan\n            \n          , FraudGPT: The villain avatar of ChatGPT,2023. url:https:\/\/netenrich.com\/blog\/fraudgpt-the-villain-avatar-of-chatgpt."},{"key":"2026033014313556000_ref049","unstructured":"R.\n              Kuditipudi\n            , J.Thickstun, T.Hashimoto, and P.Liang, Robust distortion-free watermarks for language models,2023. url:https:\/\/arxiv.org\/abs\/2307.15593."},{"key":"2026033014313556000_ref050","doi-asserted-by":"crossref","unstructured":"W.\n              Liang\n            , M.Yuksekgonul, Y.Mao, E.Wu, and J.Zou, \u201cGPT detectors are biased against non-native English writers,\u201d Patterns, vol. 4, no. 7, p. 100779, 2023. url:https:\/\/www.sciencedirect.com\/science\/article\/pii\/S2666389923001307.","DOI":"10.1016\/j.patter.2023.100779"},{"key":"2026033014313556000_ref051","unstructured":"Q. V.\n              Liao\n             and Z.Xiao, Rethinking model evaluation as narrowing the socio-technical gap,2023. url:https:\/\/arxiv.org\/abs\/2306.03100."},{"key":"2026033014313556000_ref052","unstructured":"Y.\n              Liu\n            , M.Ott, N.Goyal, J.Du, M.Joshi, D.Chen, O.Levy, M.Lewis, L.Zettlemoyer, and V.Stoyanov, RoBERTa: A robustly optimized BERT pretraining approach,2019. url:https:\/\/arxiv.org\/abs\/1907.11692."},{"key":"2026033014313556000_ref053","first-page":"8060","article-title":"Global texture enhancement for fake face detection in the wild,","author":"Liu","year":"2020","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition,"},{"key":"2026033014313556000_ref054","article-title":"PTW: Pivotal tuning watermarking for pre-trained image generators,","author":"Lukas","year":"2023","journal-title":"32nd USENIX Security Symposium,"},{"key":"2026033014313556000_ref055","unstructured":"A.\n              Madaan\n            , N.Tandon, P.Gupta, S.Hallinan, L.Gao, S.Wiegreffe, U.Alon, N.Dziri, S.Prabhumoye, Y.Yang, S.Gupta, B. P.Majumder, K.Hermann, S.Welleck, A.Yazdanbakhsh, and P.Clark, Self-Refine: Iterative refinement with self-feedback,2023. url:https:\/\/arxiv.org\/abs\/2303.17651."},{"key":"2026033014313556000_ref056","unstructured":"Makyen, Temporary policy: Generative AI (e.g., ChatGPT) is banned\n          , 2022. url:https:\/\/meta.stackoverflow.com\/questions\/421831\/temporary-policy-generative-ai-e-g-chatgpt-is-banned."},{"key":"2026033014313556000_ref057","doi-asserted-by":"publisher","first-page":"384","DOI":"10.1109\/MIPR.2018.00084","article-title":"Detection of gan-generated fake images over social networks,","author":"Marra","year":"2018","journal-title":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR),"},{"key":"2026033014313556000_ref058","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/WIFS47025.2019.9035099","article-title":"Incremental learning for the detection and classification of GAN-generated images,","author":"Marra","year":"2019","journal-title":"2019 IEEE International Workshop on Information Forensics and Security (WIFS),"},{"key":"2026033014313556000_ref059","unstructured":"S.\n              McCloskey\n             and M.Albright, Detecting GAN-generated imagery using color cues,2018. url:https:\/\/arxiv.org\/abs\/1812.08247."},{"key":"2026033014313556000_ref060","unstructured":"J.\n              Menick\n            , M.Trebacz, V.Mikulik, J.Aslanides, F.Song, M.Chadwick, M.Glaese, S.Young, L.Campbell-Gillingham, G.Irving, and N.McAleese, Teaching language models to support answers with verified quotes,2022. url:https:\/\/arxiv.org\/abs\/2203.11147."},{"key":"2026033014313556000_ref061","unstructured":"Meta\n          , Introducing Llama 2,2023. url:https:\/\/ai.meta.com\/llama\/."},{"key":"2026033014313556000_ref062","unstructured":"Midjourney\n          , Midjourney,2023. url:https:\/\/www.midjourney.com\/."},{"key":"2026033014313556000_ref063","unstructured":"Midjourney\n          , Stable diffusion,2023. url:https:\/\/stablediffusionweb.com\/."},{"key":"2026033014313556000_ref064","unstructured":"E.\n              Mitchell\n            , Y.Lee, A.Khazatsky, C. D.Manning, and C.Finn, \u201cDetectGPT: Zero-shot machine-generated text detection using probability curvature,\u201d in Proceedings of the 40th International Conference on Machine Learning, ser. ICML\u201923, Honolulu, Hawaii, USA: JMLR.org, 2023. url:https:\/\/arxiv.org\/abs\/2301.11305."},{"key":"2026033014313556000_ref065","doi-asserted-by":"publisher","first-page":"532","DOI":"10.2352\/ISSN.2470-1173.2019.5.MWSF-532","author":"Nataraj","year":"2019","journal-title":"Electronic Imaging,"},{"key":"2026033014313556000_ref066","unstructured":"E.\n              Nijkamp\n            , H.Hayashi, T.Xie, C.Xia, B.Pang, R.Meng, W.Kryscinski, L.Tu, M.Bhat, S.Yavuz, C.Xing, J.Vig, L.Murakhovska, C.-S. Wu, Y.Zhou, S. R.Joty, C.Xiong, and S.Savarese, Long sequence modeling with XGen: A 7B LLM trained on 8K input sequence length,2023. url:https:\/\/blog.salesforceairesearch.com\/xgen\/."},{"key":"2026033014313556000_ref067","doi-asserted-by":"publisher","first-page":"24 480","DOI":"10.1109\/CVPR52729.2023.02345","article-title":"Towards universal fake image detectors that generalize across generative models,","author":"Ojha","year":"2023","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition,"},{"key":"2026033014313556000_ref068","unstructured":"OpenAI\n          , GPT-4 is OpenAI\u2019s most advanced system, producing safer and more useful responses,2023. url:https:\/\/openai.com\/gpt-4."},{"key":"2026033014313556000_ref069","unstructured":"OpenAI\n          , ChatGPT: Optimizing language models for dialogue,2022. url:https:\/\/openai.com\/blog\/chatgpt\/."},{"key":"2026033014313556000_ref070","unstructured":"OpenAI\n          , DALL-E: Creating images from text,2023. url:https:\/\/openai.com\/research\/dall-e."},{"key":"2026033014313556000_ref071","unstructured":"OpenAI\n          , GPT-4 technical report,2023. url:https:\/\/cdn.openai.com\/papers\/gpt-4.pdf."},{"key":"2026033014313556000_ref072","author":"OpenAI","year":"2019","journal-title":"GPT-2: 1.5b release,"},{"key":"2026033014313556000_ref073","doi-asserted-by":"crossref","unstructured":"L.\n              Ouyang\n            , J.Wu, X.Jiang, D.Almeida, C.Wainwright, P.Mishkin, C.Zhang, S.Agarwal, K.Slama, A.Ray, J.Schulman, J.Hilton, F.Kelton, L.Miller, M.Simens, A.Askell, P.Welinder, P. F.Christiano, J.Leike, and R.Lowe, \u201cTraining language models to follow instructions with human feedback,\u201d in Advances in Neural Information Processing Systems,S.Koyejo, S.Mohamed, A.Agarwal, D.Belgrave, K.Cho, and A.Oh, Eds., Curran Associates, Inc., vol. 35, pp. 27730\u201327744, 2022. url:https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/b1efde53be364a73914f58805a001731-Paper-Conference.pdf.","DOI":"10.52202\/068431-2011"},{"key":"2026033014313556000_ref074","doi-asserted-by":"crossref","unstructured":"H.\n              Pearce\n            , B.Ahmad, B.Tan, B.Dolan-Gavitt, and R.Karri, \u201cAsleep at the keyboard? assessing the security of GitHub Copilot\u2019s code contributions,\u201d in 2022 IEEE Symposium on Security and Privacy (SP), pp. 754\u2013768, 2022. url:https:\/\/arxiv.org\/abs\/2108.09293.","DOI":"10.1109\/SP46214.2022.9833571"},{"key":"2026033014313556000_ref075","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.225","article-title":"Red teaming language models with language models","author":"Perez","year":"2022","journal-title":"Conference on Empirical Methods in Natural Language Processing"},{"key":"2026033014313556000_ref076","doi-asserted-by":"crossref","unstructured":"O.\n              Ram\n            , Y.Levine, I.Dalmedigos, D.Muhlgay, A.Shashua, K.Leyton-Brown, and Y.Shoham, In-context retrieval-augmented language models,2023. url:https:\/\/arxiv.org\/abs\/2302.00083.","DOI":"10.1162\/tacl_a_00605"},{"key":"2026033014313556000_ref077","doi-asserted-by":"crossref","unstructured":"H.\n              Rashkin\n            , V.Nikolaev, M.Lamm, L.Aroyo, M.Collins, D.Das, S.Petrov, G. S.Tomar, I.Turc, and D.Reitter, \u201cMeasuring attribution in natural language generation models,\u201d Computational Linguistics, pp. 1\u201364, Aug.2023. url:10.1162\/coli_a_00486.","DOI":"10.1162\/coli_a_00490"},{"key":"2026033014313556000_ref078","unstructured":"J.\n              Ricker\n            , S.Damm, T.Holz, and A.Fischer, Towards the detection of diffusion model deepfakes,2023. url:https:\/\/arxiv.org\/abs\/2210.14571."},{"key":"2026033014313556000_ref079","unstructured":"V. S.\n              Sadasivan\n            , A.Kumar, S.Balasubramanian, W.Wang, and S.Feizi, Can AI-generated text be reliably detected?2023. url:https:\/\/arxiv.org\/abs\/2303.11156."},{"key":"2026033014313556000_ref080","unstructured":"M.\n              Sellman\n            \n          , My AI: Snapchat chatbot coaches \u2018girl, 13\u2019 on losing virginity,2023. url:https:\/\/www.thetimes.co.uk\/article\/my-ai-snapchat-chatbot-coaches-girl-13-on-losing-virginity-dj7p6268b."},{"key":"2026033014313556000_ref081","doi-asserted-by":"crossref","unstructured":"Z.\n              Sha\n            , Z.Li, N.Yu, and Y.Zhang, DE-FAKE: Detection and attribution of fake images generated by text-to-image generation models,2023. url:https:\/\/arxiv.org\/abs\/2210.06998.","DOI":"10.1145\/3576915.3616588"},{"key":"2026033014313556000_ref082","unstructured":"I.\n              Shumailov\n            , Z.Shumaylov, Y.Zhao, Y.Gal, N.Papernot, and R.Anderson, The curse of recursion: Training on generated data makes models forget,2023. url:https:\/\/arxiv.org\/abs\/2305.17493."},{"key":"2026033014313556000_ref083","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/10447318.2023.2225931","article-title":"ChatGPT: More than a \u2018weapon of mass deception\u2019 ethical challenges and responses from the human-centered artificial intelligence (HCAI) perspective,","author":"G.-B. A. J. G. Sison","year":"2023","journal-title":"International Journal of Human\u2013Computer Interaction,"},{"key":"2026033014313556000_ref084","unstructured":"I.\n              Solaiman\n            , M.Brundage, J.Clark, A.Askell, A.Herbert-Voss, J.Wu, A.Radford, G.Krueger, J. W.Kim, S.Kreps, et al., Release strategies and the social impacts of language models,2019. url:https:\/\/arxiv.org\/abs\/1908.09203."},{"key":"2026033014313556000_ref085","unstructured":"I.\n              Solaiman\n             and C.Dennison, \u201cProcess for adapting language models to society (PALMS) with values-targeted datasets,\u201d in Advances in Neural Information Processing Systems, M.Ranzato, A.Beygelzimer,  Y.Dauphin, P.Liang, and J. W.Vaughan, Eds., Curran Associates, Inc., vol. 34, pp. 5861\u20135873, 2021. url:https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/2e855f9489df0712b4bd8ea9e2848c5a-Paper.pdf."},{"key":"2026033014313556000_ref086","unstructured":"N.\n              Stiennon\n            , L.Ouyang, J.Wu, D.Ziegler, R.Lowe, C.Voss, A.Radford, D.Amodei, and P. F.Christiano, \u201cLearning to summarize with human feedback,\u201d in Advances in Neural Information Processing Systems,H.Larochelle,  M.Ranzato, R.Hadsell, M.Balcan,and H.Lin, Eds., Curran Associates, Inc., vol. 33, pp. 3008\u20133021, 2020. url:https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/1f89885d556929e98d3ef9b86448f951-Paper.pdf."},{"key":"2026033014313556000_ref087","unstructured":"R.\n              Taori\n             and T.Hashimoto, \u201cData feedback loops: Model-driven amplification of dataset biases,\u201d in Proceedings of the 40th International Conference on Machine Learning,A.Krause, E.Brunskill,  K.Cho,  B.Engelhardt,  S.Sabato, and J.Scarlett, Eds., ser. Proceedings of Machine Learning Research, vol. 202, pp. 33883\u201333920, PMLR, 2023. url:https:\/\/proceedings.mlr.press\/v202\/taori23a.html."},{"key":"2026033014313556000_ref088","unstructured":"The White House\n          , Executive order on the safe, secure, and trustworthy development and use of artificial intelligence,2023. url:https:\/\/www.whitehouse.gov\/briefing-room\/presidential-actions\/2023\/10\/30\/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence\/."},{"key":"2026033014313556000_ref089","unstructured":"The White House\n          , FACT SHEET: Biden-Harris administration announces national cyber workforce and education strategy, unleashing America\u2019s cyber talent,2023.url:https:\/\/www.whitehouse.gov\/briefing-room\/statements-releases\/2023\/07\/31\/fact-sheet-biden-harris-administration-announces-national-cyber-workforce-and-education-strategy-unleashing-americas-cyber-talent\/\"&gt;https:\/\/www.whitehouse.gov\/briefing-room\/statements-releases\/2023\/07\/31\/fact-sheet-biden-harris-administration-announces-national-cyber-workforce-and-education-strategy-unleashing-americas-cyber-talent\/\"&gt;https:\/\/www.whitehouse.gov\/briefing-room\/statements-releases\/2023\/07\/31\/fact-sheet-biden-harris-administration-announces-national-cyber-workforce-and-education-strategy-unleashing-americas-cyber-talent\/."},{"key":"2026033014313556000_ref090","unstructured":"C.\n              Troncoso\n             and B.Preneel, Detecting child sexual abuse material shouldn\u2019t be done at any cost,2023. url:https:\/\/www.euronews.com\/2023\/07\/04\/detecting-child-sexual-abuse-material-shouldnt-be-done-at-any-cost."},{"key":"2026033014313556000_ref091","unstructured":"A.\n              Vaswani\n            , N.Shazeer, N.Parmar, J.Uszkoreit, L.Jones, A. N.Gomez, L.Kaiser, and I.Polosukhin, \u201cAttention is all you need,\u201d in Advances in Neural Information Processing Systems,I.Guyon,  U. V.Luxburg,  S.Bengio,  H.Wallach,  R.Fergus, S.Vishwanathan, and R.Garnett, Eds., Curran Associates, Inc., vol. 30, 2017. url:https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/-Paper.pdf."},{"key":"2026033014313556000_ref092","unstructured":"J.\n              Wang\n            , X.HU, W.Hou, H.Chen, R.Zheng, Y.Wang, L.Yang, W.Ye, H.Huang, X.Geng, B.Jiao, Y.Zhang, and X.Xie, \u201cOn the robustness of ChatGPT: An adversarial and out-of-distribution perspective,\u201d in ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models,2023. url:https:\/\/openreview.net\/forum?id=uw6HSkgoM29."},{"key":"2026033014313556000_ref093","doi-asserted-by":"publisher","first-page":"8695","DOI":"10.1109\/CVPR42600.2020.00872","article-title":"CNN-generated images are surprisingly easy to spot for now,","author":"Wang","year":"2020","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"2026033014313556000_ref094","unstructured":"J.\n              Wei\n            , Y.Tay, R.Bommasani, C.Raffel, B.Zoph, S.Borgeaud, D.Yogatama, M.Bosma, D.Zhou, D.Metzler, E. H.Chi, T.Hashimoto, O.Vinyals, P.Liang, J.Dean, and W.Fedus, \u201cEmergent abilities of large language models,\u201d Transactions on Machine Learning Research,2022. url:https:\/\/openreview.net\/forum?id=yzkSU5zdwD."},{"key":"2026033014313556000_ref095","article-title":"Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions,","author":"Weiss","year":"2019","journal-title":"Technology Science,"},{"key":"2026033014313556000_ref096","doi-asserted-by":"crossref","unstructured":"J.\n              Welbl\n            , A.Glaese, J.Uesato, S.Dathathri, J.Mellor, L. A.Hendricks, K.Anderson, P.Kohli, B.Coppin, and P.-S. Huang, \u201cChallenges in detoxifying language models,\u201d in Findings of the Association for Computational Linguistics: EMNLP 2021, M.-F.Moens,  X.Huang,  L.Specia, and  S. W.-T.Yih, Eds., pp. 2447\u20132469, Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021. url:https:\/\/aclanthology.org\/2021.findings-emnlp.210.","DOI":"10.18653\/v1\/2021.findings-emnlp.210"},{"key":"2026033014313556000_ref097","unstructured":"Wikipedia contributors\n          , Dual-use technology\u2014Wikipedia, the free encyclopedia,2023. url:https:\/\/en.wikipedia.org\/w\/index.php?title=Dual-use_technology&oldid=1167047934."},{"key":"2026033014313556000_ref098","unstructured":"Wikipedia contributors\n          , Large language model\u2014Wikipedia, the free encyclopedia,2023. url:https:\/\/en.wikipedia.org\/w\/index.php?title=Large_language_model&oldid=1184758575."},{"key":"2026033014313556000_ref099","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1117\/12.2039213","article-title":"Linguistic steganography on Twitter: Hierarchical language modeling with manual interaction,","author":"Wilson","year":"2014","journal-title":"Media Watermarking, Security, and Forensics 2014,"},{"key":"2026033014313556000_ref100","unstructured":"C.\n              Xiang\n            \n          , \u201cHe would still be here\u201d: Man dies by suicide after talking with AI chatbot, widow says,2023. url:https:\/\/www.vice.com\/en\/article\/pkadgm\/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says."},{"key":"2026033014313556000_ref101","unstructured":"J.\n              Xu\n            , D.Ju, M.Li, Y.-L. Boureau, J.Weston, and E.Dinan, Recipes for safety in open-domain chatbots,2021. url:https:\/\/arxiv.org\/abs\/2010.07079."},{"key":"2026033014313556000_ref102","unstructured":"S.\n              Yao\n            , J.Zhao, D.Yu, N.Du, I.Shafran, K. R.Narasimhan, and Y.Cao, \u201cReAct: Synergizing reasoning and acting in language models,\u201d in The Eleventh International Conference on Learning Representations,2023. url:https:\/\/openreview.net\/forum?id=WE_vluYUL-X."},{"key":"2026033014313556000_ref103","unstructured":"T.\n              Zhang\n            , V.Kishore, F.Wu, K. Q.Weinberger, and Y.Artzi, \u201cBERTScore: Evaluating text generation with BERT,\u201d in International Conference on Learning Representations,2020. url:https:\/\/openreview.net\/forum?id=SkeHuCVFDr."},{"key":"2026033014313556000_ref104","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/WIFS47025.2019.9035107","article-title":"Detecting and simulating artifacts in gan fake images,","author":"Zhang","year":"2019","journal-title":"2019 IEEE International Workshop on Information Forensics and Security (WIFS),"},{"key":"2026033014313556000_ref105","doi-asserted-by":"crossref","unstructured":"J.\n              Zhao\n            , T.Wang, M.Yatskar, V.Ordonez, and K.-W. Chang, \u201cMen also like shopping: Reducing gender bias amplification using corpus-level constraints,\u201d in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,M.Palmer,  R.Hwa, and S.Riedel,Eds.,Copenhagen, Denmark: Association for Computational Linguistics, pp. 2979\u20132989, Sep. 2017. url:https:\/\/aclanthology.org\/D17-1323.","DOI":"10.18653\/v1\/D17-1323"},{"key":"2026033014313556000_ref106","unstructured":"X.\n              Zhao\n            , P.Ananth, L.Li, and Y.-X.Wang, Provable robust watermarking for AI-generated text,2023. url:https:\/\/arxiv.org\/abs\/2306.17439."},{"key":"2026033014313556000_ref107","unstructured":"X.\n              Zhao\n            , Y.-X. Wang, and L.Li, \u201cProtecting language generation models via invisible watermarking,\u201d in Proceedings of the 40th International Conference on Machine Learning, ser. ICML\u201923, Honolulu, Hawaii, USA: JMLR.org, 2023. url:https:\/\/dl.acm.org\/doi\/10.5555\/3618408.3620182."},{"key":"2026033014313556000_ref108","doi-asserted-by":"crossref","unstructured":"K.\n              Zhu\n            , J.Wang, J.Zhou, Z.Wang, H.Chen, Y.Wang, L.Yang, W.Ye, N. Z.Gong, Y.Zhang, and X.Xie, PromptBench: Towards evaluating the robustness of large language models on adversarial prompts,2023. url:https:\/\/arxiv.org\/abs\/2306.04528.","DOI":"10.1145\/3689217.3690621"},{"key":"2026033014313556000_ref109","unstructured":"C.\n              Ziems\n            , W.Held, O.Shaikh, J.Chen, Z.Zhang, and D.Yang, Can large language models transform computational social science?2023. url:https:\/\/arxiv.org\/abs\/2305.03514."},{"key":"2026033014313556000_ref110","unstructured":"A.\n              Zou\n            , Z.Wang, J. Z.Kolter, and M.Fredrikson, Universal and transferable adversarial attacks on aligned language models,2023. url:https:\/\/arxiv.org\/abs\/2307.15043."}],"container-title":["Foundations and Trends\u00ae in Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftsec\/article-pdf\/6\/1\/1\/10970757\/3300000041en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftsec\/article-pdf\/6\/1\/1\/10970757\/3300000041en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T18:53:12Z","timestamp":1777488792000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftsec\/article\/6\/1\/1\/1324327\/Identifying-and-Mitigating-the-Security-Risks-of"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,14]]},"references-count":110,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,12,14]]}},"URL":"https:\/\/doi.org\/10.1561\/3300000041","relation":{},"ISSN":["2474-1558","2474-1566"],"issn-type":[{"value":"2474-1558","type":"print"},{"value":"2474-1566","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,14]]}}}