{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T11:59:55Z","timestamp":1758283195276,"version":"3.44.0"},"reference-count":66,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,5,21]],"date-time":"2025-05-21T00:00:00Z","timestamp":1747785600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,21]],"date-time":"2025-05-21T00:00:00Z","timestamp":1747785600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004004","name":"Universit\u00e0 degli Studi di Trento","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004004","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["AI Ethics"],"published-print":{"date-parts":[[2025,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed \u201cRogueGPT\u201d, responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model\u2019s knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the data quality used for training the foundational model and the implementation of ethical safeguards. We thus underline the responsibilities and dangers of user-driven modifications, and the broader effects that these may have on the design of safeguarding and ethical modules implemented by AI programmers. Disclaimer. This paper contains examples of harmful language. Reader discretion is recommended.<\/jats:p>","DOI":"10.1007\/s43681-025-00750-4","type":"journal-article","created":{"date-parts":[[2025,5,21]],"date-time":"2025-05-21T05:48:56Z","timestamp":1747806536000},"page":"4945-4966","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["RogueGPT: transforming ChatGPT-4 into a rogue AI with dis-ethical tuning"],"prefix":"10.1007","volume":"5","author":[{"given":"Alessio","family":"Buscemi","sequence":"first","affiliation":[]},{"given":"Daniele","family":"Proverbio","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,21]]},"reference":[{"unstructured":"OpenAI: Gpt-4 technical report. OpenAI Blog 1 (2023)","key":"750_CR1"},{"unstructured":"Reid, A.: Gemini: a revolutionary language model. Google AI (2024)","key":"750_CR2"},{"unstructured":"Touvron, H., et al.: Llama: Large language model. Meta AI (2023)","key":"750_CR3"},{"unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners (2020). arXiv preprint arXiv:2005.14165","key":"750_CR4"},{"issue":"8","key":"750_CR5","first-page":"9","volume":"1","author":"A Radford","year":"2019","unstructured":"Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)","journal-title":"OpenAI blog"},{"unstructured":"Devlin, J., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805","key":"750_CR6"},{"unstructured":"Buscemi, A., Proverbio, D.: ChatGPT vs Gemini vs LLaMA on Multilingual Sentiment Analysis (2024)","key":"750_CR7"},{"unstructured":"Buscemi, A.: A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages (2023)","key":"750_CR8"},{"unstructured":"Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative Adversarial Text to Image Synthesis (2016)","key":"750_CR9"},{"unstructured":"Yang, Z., et al.: XLNet: generalized autoregressive pretraining for language understanding. NeurIPS (2019)","key":"750_CR10"},{"unstructured":"Solaiman, I., et al.: Release strategies and the social impacts of language models (2019). arXiv preprint arXiv:1908.09203","key":"750_CR11"},{"doi-asserted-by":"crossref","unstructured":"Mitchell, M., et al.: Model cards for model reporting. FAT* (2019)","key":"750_CR12","DOI":"10.1145\/3287560.3287596"},{"issue":"8","key":"750_CR13","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735\u20131780 (1997)","journal-title":"Neural Comput."},{"unstructured":"Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104\u20133112 (2014)","key":"750_CR14"},{"unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998\u20136008 (2017)","key":"750_CR15"},{"issue":"1","key":"750_CR16","first-page":"2","volume":"12","author":"A Radford","year":"2018","unstructured":"Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training. OpenAI preprint 12(1), 2 (2018)","journal-title":"OpenAI preprint"},{"key":"750_CR17","first-page":"1","volume":"1","author":"Language models are few-shot learners","year":"2020","unstructured":"Language models are few-shot learners: OpenAI. OpenAI Blog 1, 1\u201315 (2020)","journal-title":"OpenAI Blog"},{"unstructured":"Wang, J., Hu, X., Hou, W., Chen, H., Zheng, R., Wang, Y., Yang, L., Huang, H., Ye, W., Geng, X., et al.: On the robustness of ChatGPT: an adversarial and out-of-distribution perspective (2023). arXiv preprint arXiv:2302.12095","key":"750_CR18"},{"unstructured":"Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: GPT-4 technical report (2023). arXiv preprint arXiv:2303.08774","key":"750_CR19"},{"unstructured":"Buscemi, A., Proverbio, D.: Large Language Models\u2019 Detection of Political Orientation in Newspapers (2024)","key":"750_CR20"},{"doi-asserted-by":"crossref","unstructured":"Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, M.: On the dangers of stochastic parrots: Can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021)","key":"750_CR21","DOI":"10.1145\/3442188.3445922"},{"unstructured":"Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.: On the opportunities and risks of foundation models (2021). arXiv preprint arXiv:2108.07258","key":"750_CR22"},{"doi-asserted-by":"crossref","unstructured":"Raji, I.D., et al.: Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. FAT* (2020)","key":"750_CR23","DOI":"10.1145\/3351095.3372873"},{"doi-asserted-by":"crossref","unstructured":"Zhou, J., Zhang, Y., Luo, Q., Parker, A.G., De\u00a0Choudhury, M.: Synthetic lies: understanding AI-generated misinformation and evaluating algorithmic and human solutions. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1\u201320 (2023)","key":"750_CR24","DOI":"10.1145\/3544548.3581318"},{"doi-asserted-by":"crossref","unstructured":"Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., Hashimoto, T.: Exploiting programmatic behavior of LLMs: dual-use through standard security attacks (2023). arXiv preprint arXiv:2302.05733","key":"750_CR25","DOI":"10.1109\/SPW63631.2024.00018"},{"doi-asserted-by":"crossref","unstructured":"Qu, Y., Shen, X., He, X., Backes, M., Zannettou, S., Zhang, Y.: Unsafe diffusion: on the generation of unsafe images and hateful memes from text-to-image models. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp. 3403\u20133417 (2023)","key":"750_CR26","DOI":"10.1145\/3576915.3616679"},{"doi-asserted-by":"crossref","unstructured":"Amini, A., et al.: Uncovering and mitigating algorithmic bias through learned latent structure. NeurIPS (2020)","key":"750_CR27","DOI":"10.1145\/3306618.3314243"},{"unstructured":"Mohseni, S., et al.: Multidisciplinary approaches to mitigating bias in AI: lessons from medicine, criminology, and HCI. FAccT (2021)","key":"750_CR28"},{"unstructured":"Zellers, R., et al.: Defending against neural fake news. NeurIPS (2019)","key":"750_CR29"},{"doi-asserted-by":"crossref","unstructured":"Gehman, S., et al.: Realtoxicityprompts: evaluating neural toxic degeneration in language models. FAT* (2020)","key":"750_CR30","DOI":"10.18653\/v1\/2020.findings-emnlp.301"},{"unstructured":"Weidinger, L., et al.: Ethical and social risks of foundation models (2021). arXiv preprint arXiv:2110.04301","key":"750_CR31"},{"unstructured":"Floridi, L., Cowls, J.: The ethical framework for artificial intelligence: a comprehensive overview. AI Ethics (2020)","key":"750_CR32"},{"key":"750_CR33","volume-title":"Superintelligence: Paths, Dangers, Strategies","author":"N Bostrom","year":"2014","unstructured":"Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014)"},{"unstructured":"Hendrycks, D., et al.: Aligning AI with shared human values. arXiv preprint arXiv:2008.02275 (2020)","key":"750_CR34"},{"doi-asserted-by":"crossref","unstructured":"Krause, B., et al.: GeDi: Generative discriminator guided sequence generation (2020). arXiv preprint arXiv:2009.06367","key":"750_CR35","DOI":"10.18653\/v1\/2021.findings-emnlp.424"},{"doi-asserted-by":"crossref","unstructured":"Dinan, E., et al.: Build it break it fix it for dialogue safety: robustness from adversarial human attack (2019). arXiv preprint arXiv:1908.06083","key":"750_CR36","DOI":"10.18653\/v1\/D19-1461"},{"unstructured":"Lee, S., et al.: Talk to me: design and evaluation of conversational agents for mental health support. CHI (2021)","key":"750_CR37"},{"unstructured":"McCullough, M., et al.: Ethical implications of large language models in Ai systems. AI Ethics (2021)","key":"750_CR38"},{"issue":"1","key":"750_CR39","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1186\/s13054-023-04473-y","volume":"27","author":"M Salvagno","year":"2023","unstructured":"Salvagno, M., Taccone, F.S., Gerli, A.G.: Artificial intelligence hallucinations. Crit. Care 27(1), 180 (2023)","journal-title":"Crit. Care"},{"doi-asserted-by":"crossref","unstructured":"Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Zhang, T., Wang, K., Liu, Y.: Jailbreaking ChatGPT via prompt engineering: An empirical study (2023). arXiv preprint arXiv:2305.13860","key":"750_CR40","DOI":"10.1145\/3663530.3665021"},{"doi-asserted-by":"crossref","unstructured":"Shen, X., Chen, Z., Backes, M., Shen, Y., Zhang, Y.: \u201cDo anything now\u201d: characterizing and evaluating in-the-wild jailbreak prompts on large language models (2023). arXiv preprint arXiv:2308.03825","key":"750_CR41","DOI":"10.1145\/3658644.3670388"},{"issue":"12","key":"750_CR42","doi-asserted-by":"publisher","first-page":"1486","DOI":"10.1038\/s42256-023-00765-8","volume":"5","author":"Y Xie","year":"2023","unstructured":"Xie, Y., Yi, J., Shao, J., Curl, J., Lyu, L., Chen, Q., Xie, X., Wu, F.: Defending ChatGPT against jailbreak attack via self-reminders. Nat. Mach. Intell. 5(12), 1486\u20131496 (2023)","journal-title":"Nat. Mach. Intell."},{"unstructured":"Zhou, W., Wang, X., Xiong, L., Xia, H., Gu, Y., Chai, M., Zhu, F., Huang, C., Dou, S., Xi, Z., et al.: EasyJailbreak: A unified framework for jailbreaking large language models (2024). arXiv preprint arXiv:2403.12171","key":"750_CR43"},{"unstructured":"OpenAI: Moderation\u2014OpenAI API. https:\/\/platform.openai.com\/docs\/guides\/moderation\/overview. Accessed 10 June 06 2024","key":"750_CR44"},{"issue":"3","key":"750_CR45","doi-asserted-by":"publisher","first-page":"699","DOI":"10.1007\/s43681-023-00258-9","volume":"3","author":"E Prem","year":"2023","unstructured":"Prem, E.: From ethical AI frameworks to tools: a review of approaches. AI Ethics 3(3), 699\u2013716 (2023)","journal-title":"AI Ethics"},{"key":"750_CR46","first-page":"1","volume":"28","author":"R Capurro","year":"2020","unstructured":"Capurro, R.: The age of artificial intelligences: a personal reflection. Int. Rev. Inf. Ethics 28, 1\u201321 (2020)","journal-title":"Int. Rev. Inf. Ethics"},{"doi-asserted-by":"crossref","unstructured":"Von\u00a0Foerster, H., Foerster, H.: Ethics and Second-Order Cybernetics. Understanding Understanding: Essays on cybernetics and cognition, pp. 287\u2013304 (2003)","key":"750_CR47","DOI":"10.1007\/0-387-21722-3_14"},{"key":"750_CR48","doi-asserted-by":"publisher","DOI":"10.1093\/acprof:oso\/9780190498511.001.0001","volume-title":"Technology and the Virtues: A Philosophical Guide to a Future Worth Wanting","author":"S Vallor","year":"2016","unstructured":"Vallor, S.: Technology and the Virtues: A Philosophical Guide to a Future Worth Wanting. Oxford University Press, Oxford (2016)"},{"unstructured":"Proverbio, D., Rausch, K.: Swarm ethics and systems thinking. House of Ethics (2023)","key":"750_CR49"},{"unstructured":"Kant, I.: Groundwork of the Metaphysics of Morals, Revised edn. Cambridge University Press, Cambridge (1785)","key":"750_CR50"},{"doi-asserted-by":"crossref","unstructured":"Bentham, J.: An Introduction to the Principles of Morals and Legislation, Reprint 1996 edn. Oxford University Press, Oxford (1789)","key":"750_CR51","DOI":"10.1093\/oseo\/instance.00077240"},{"key":"750_CR52","volume-title":"Utilitarianism, Reprint","author":"JS Mill","year":"1861","unstructured":"Mill, J.S.: Utilitarianism, Reprint, 1998th edn. Oxford University Press, London (1861)","edition":"1998"},{"key":"750_CR53","first-page":"102700","volume":"74","author":"BC Stahl","year":"2024","unstructured":"Stahl, B.C., Eke, D.: The ethics of ChatGPT-exploring the ethical issues of an emerging technology. Int. J. Inf. Manag. 74, 102700 (2024)","journal-title":"Int. J. Inf. Manag."},{"key":"750_CR54","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-68677-4","volume-title":"Artificial General Intelligence","author":"B Goertzel","year":"2007","unstructured":"Goertzel, B., et al.: Artificial General Intelligence. Springer, Berlin (2007)"},{"key":"750_CR55","first-page":"184","volume":"1","author":"E Yudkowsky","year":"2008","unstructured":"Yudkowsky, E.: Artificial intelligence as a positive and negative factor in global risk. Glob. Catastr. Risks 1, 184 (2008)","journal-title":"Glob. Catastr. Risks"},{"key":"750_CR56","volume-title":"Superintelligence: Paths, Dangers","author":"N Bostrom","year":"2014","unstructured":"Bostrom, N.: Superintelligence: Paths, Dangers. Strategies. Oxford University Press, Oxford (2014)"},{"doi-asserted-by":"crossref","unstructured":"Islam, M.J., Nguyen, G., Pan, R., Rajan, H.: A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 510\u2013520 (2019)","key":"750_CR57","DOI":"10.1145\/3338906.3338955"},{"unstructured":"Yao, J.-Y., Ning, K.-P., Liu, Z.-H., Ning, M.-N., Yuan, L.: LLM lies: Hallucinations are not bugs, but features as adversarial examples (2023). arXiv preprint arXiv:2310.01469","key":"750_CR58"},{"doi-asserted-by":"crossref","unstructured":"Fagbohun, O., Harrison, R.M., Dereventsov, A.: An empirical categorization of prompting techniques for large language models: A practitioner\u2019s guide (2024). arXiv preprint arXiv:2402.14837","key":"750_CR59","DOI":"10.51219\/JAIMLD\/Oluwole-Fagbohun\/15"},{"unstructured":"DAIR.AI: Prompt Engineering Guide. https:\/\/www.promptingguide.ai\/","key":"750_CR60"},{"issue":"01","key":"750_CR61","doi-asserted-by":"publisher","first-page":"016","DOI":"10.1055\/s-0039-1677908","volume":"28","author":"F Wang","year":"2019","unstructured":"Wang, F., Preininger, A.: Ai in health: state of the art, challenges, and future directions. Yearb. Med. Inform. 28(01), 016\u2013026 (2019)","journal-title":"Yearb. Med. Inform."},{"key":"750_CR62","doi-asserted-by":"publisher","first-page":"106021","DOI":"10.1016\/j.jfludis.2023.106021","volume":"78","author":"AHR Jokar","year":"2023","unstructured":"Jokar, A.H.R., Roche, S., Karimi, H.: Stuttering on instagram: What is the focus of stuttering-related instagram posts and how do users engage with them? J. Fluency Disord. 78, 106021 (2023)","journal-title":"J. Fluency Disord."},{"doi-asserted-by":"crossref","unstructured":"Rubin, V.L.: Deception detection and rumor debunking for social media. In: The SAGE Handbook of Social Media Research Methods, p. 342. SAGE, London (2017)","key":"750_CR63","DOI":"10.4135\/9781473983847.n21"},{"unstructured":"OpenAI: GPTs Data Privacy FAQs. https:\/\/help.openai.com\/en\/articles\/8554402-gpts-data-privacy-faqs. Accessed 10 June 2024","key":"750_CR64"},{"unstructured":"Rafieyan, D., Chowdhury, H.: OpenAI destroyed a trove of books used to train AI models. The employees who collected the data are gone. (2024). https:\/\/www.businessinsider.com\/openai-destroyed-ai-training-datasets-lawsuit-authors-books-copyright-2024-5","key":"750_CR65"},{"unstructured":"Parliament, E.: EU AI Act: first regulation on artificial intelligence (2023). https:\/\/www.europarl.europa.eu\/topics\/en\/article\/20230601STO93804\/eu-ai-act-first-regulation-on-artificial-intelligence","key":"750_CR66"}],"container-title":["AI and Ethics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43681-025-00750-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s43681-025-00750-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43681-025-00750-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:34:29Z","timestamp":1758270869000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s43681-025-00750-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,21]]},"references-count":66,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,10]]}},"alternative-id":["750"],"URL":"https:\/\/doi.org\/10.1007\/s43681-025-00750-4","relation":{},"ISSN":["2730-5953","2730-5961"],"issn-type":[{"type":"print","value":"2730-5953"},{"type":"electronic","value":"2730-5961"}],"subject":[],"published":{"date-parts":[[2025,5,21]]},"assertion":[{"value":"31 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}