{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T09:36:21Z","timestamp":1761989781040,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Nowadays, developers increasingly rely on solutions powered by Large Language Models (LLM) to assist them with their coding tasks. This makes it crucial to align these tools with human values to prevent malicious misuse.  \nIn this paper,  \nwe propose a comprehensive framework  \nfor assessing the potential harmfulness  \nof LLMs within the software engineering domain.  \nWe begin by developing a taxonomy of potentially harmful software engineering scenarios  \nand subsequently, create a dataset of prompts based on this taxonomy.  \nTo systematically assess the responses,  \nwe design and validate  \nan automatic evaluator  \nthat classifies the outputs  \nof a variety of LLMs  \nboth open-source and closed-source models,  \nas well as general-purpose and code-specific LLMs.  \nFurthermore, we investigate the impact of  \nmodels' size, architecture family, and alignment strategies  \non their tendency to generate harmful content.  \n% Results  \nThe results show significant disparities in the alignment of various LLMs for harmlessness.  \nWe find that  \nsome models  \nand model families, such as Openhermes,  \nare more harmful than others  \nand that code-specific models  \ndo not perform better  \nthan their general-purpose counterparts.  \nNotably, some fine-tuned models  \nperform significantly worse than their base-models  \ndue to their design choices.  \nOn the other side,  \nwe find that  \nlarger models tend to be more helpful  \nand are less likely to respond with harmful information.  \nThese results highlight the importance of targeted alignment strategies tailored to the unique challenges of software engineering tasks  \nand provide a foundation for future work in this critical area.<\/jats:p>","DOI":"10.1145\/3729380","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"2477-2499","source":"Crossref","is-referenced-by-count":1,"title":["Code Red! On the Harmfulness of Applying Off-the-Shelf Large Language Models to Programming Tasks"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7338-2044","authenticated-orcid":false,"given":"Ali","family":"Al-Kaswan","sequence":"first","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-9430-6233","authenticated-orcid":false,"given":"Sebastian","family":"Deatc","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6686-6008","authenticated-orcid":false,"given":"Beg\u00fcm","family":"Ko\u00e7","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4850-3312","authenticated-orcid":false,"given":"Arie","family":"van Deursen","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5093-5523","authenticated-orcid":false,"given":"Maliheh","family":"Izadi","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Toufique Ahmed Premkumar Devanbu Christoph Treude and Michael Pradel. 2025. Can LLMs Replace Manual Annotation of Software Engineering Artifacts?","DOI":"10.1109\/MSR66628.2025.00086"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330701"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/SANER56733.2023.00033"},{"key":"e_1_2_1_4_1","volume-title":"2023 IEEE\/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE). 9\u201310","author":"Al-Kaswan Ali","year":"2023","unstructured":"Ali Al-Kaswan and Maliheh Izadi. 2023. The (ab) use of open source code to train large language models. In 2023 IEEE\/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE). 9\u201310."},{"key":"e_1_2_1_5_1","volume-title":"Erik Jenner, Stephen Casper, and Oliver Sourbut.","author":"Anwar Usman","year":"2024","unstructured":"Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, and Oliver Sourbut. 2024. Foundational Challenges in Assuring Alignment and Safety of Large Language Models. CoRR."},{"key":"e_1_2_1_6_1","unstructured":"Amanda Askell Yuntao Bai Anna Chen Dawn Drain Deep Ganguli Tom Henighan Andy Jones Nicholas Joseph Ben Mann and Nova DasSarma. 2021. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","unstructured":"Mohammad Atari Mona J Xue Peter S Park Dami\u00e1n E Blasi and Joseph Henrich. 2023. Which Humans? https:\/\/doi.org\/10.31234\/osf.io\/5b26t 10.31234\/osf.io\/5b26t","DOI":"10.31234\/osf.io"},{"key":"e_1_2_1_8_1","unstructured":"Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli and Tom Henighan. 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862."},{"key":"e_1_2_1_9_1","volume-title":"ICLR 2025 Workshop on Foundation Models in the Wild. https:\/\/openreview.net\/forum?id=bwx3VNjLzX","author":"Betley Jan","year":"2025","unstructured":"Jan Betley, Daniel Chee Hian Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Mart\u00edn Soto, Nathan Labenz, and Owain Evans. 2025. Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs. In ICLR 2025 Workshop on Foundation Models in the Wild. https:\/\/openreview.net\/forum?id=bwx3VNjLzX"},{"key":"e_1_2_1_10_1","unstructured":"Manish Bhatt Sahana Chennabasappa Cyrus Nikolaidis Shengye Wan Ivan Evtimov Dominik Gabi Daniel Song Faizan Ahmad Cornelius Aschermann and Lorenzo Fontana. 2023. Purple llama cyberseceval: A secure coding benchmark for language models. arXiv preprint arXiv:2312.04724."},{"key":"e_1_2_1_11_1","unstructured":"Miles Brundage Shahar Avin Jasmine Wang Haydn Belfield Gretchen Krueger Gillian Hadfield Heidy Khlaaf Jingying Yang Helen Toner and Ruth Fong. 2020. Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213."},{"key":"e_1_2_1_12_1","volume-title":"Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman.","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374."},{"key":"e_1_2_1_13_1","volume-title":"A coefficient of agreement for nominal scales. Educational and psychological measurement, 20, 1","author":"Cohen Jacob","year":"1960","unstructured":"Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20, 1 (1960), 37\u201346."},{"key":"e_1_2_1_14_1","first-page":"1","article-title":"Auto-sklearn 2.0: Hands-free automl via meta-learning","volume":"23","author":"Feurer Matthias","year":"2022","unstructured":"Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. 2022. Auto-sklearn 2.0: Hands-free automl via meta-learning. Journal of Machine Learning Research, 23, 261 (2022), 1\u201361.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_15_1","unstructured":"Deep Ganguli Liane Lovitt Jackson Kernion Amanda Askell Yuntao Bai Saurav Kadavath Ben Mann Ethan Perez Nicholas Schiefer and Kamal Ndousse. 2022. Red Teaming Language Models to Reduce Harms: Methods Scaling Behaviors and Lessons Learned. CoRR."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3300381"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.3390\/a15110418"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1080\/19312450709336664"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3695988"},{"key":"e_1_2_1_20_1","unstructured":"Hakan Inan Kartikeya Upasani Jianfeng Chi Rashi Rungta Krithika Iyer Yuning Mao Michael Tontchev Qing Hu Brian Fuller and Davide Testuggine. 2023. Llama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510172"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639138"},{"key":"e_1_2_1_23_1","volume-title":"Advances in Neural Information Processing Systems","author":"Ji Jiaming","year":"2023","unstructured":"Jiaming Ji, Mickel Liu, Josef Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, and Yaodong Yang. 2023. BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.). 36, Curran Associates, Inc., 24678\u201324704. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/file\/4dbb61cb68671edc4ca3712d70083b9f-Paper-Datasets_and_Benchmarks.pdf"},{"key":"e_1_2_1_24_1","volume-title":"Enrico Panai, Julija Kalpokiene, and Donald Jay Bertulfo.","author":"Johnson Rebecca L","year":"2022","unstructured":"Rebecca L Johnson, Giada Pistilli, Natalia Men\u00e9dez-Gonz\u00e1lez, Leslye Denisse Dias Duran, Enrico Panai, Julija Kalpokiene, and Donald Jay Bertulfo. 2022. The Ghost in the Machine has an American accent: value conflict in GPT-3. arXiv preprint arXiv:2203.07785."},{"key":"e_1_2_1_25_1","unstructured":"Oliver Klingefjord Ryan Lowe and Joe Edelman. 2024. What are human values and how do we align AI to them? arXiv preprint arXiv:2404.10636."},{"key":"e_1_2_1_26_1","volume-title":"Estimating the reliability, systematic error and random error of interval data. Educational and psychological measurement, 30, 1","author":"Krippendorff Klaus","year":"1970","unstructured":"Klaus Krippendorff. 1970. Estimating the reliability, systematic error and random error of interval data. Educational and psychological measurement, 30, 1 (1970), 61\u201370."},{"key":"e_1_2_1_27_1","first-page":"2278","article-title":"Malware and malware detection techniques: A survey","volume":"2","author":"Landage Jyoti","year":"2013","unstructured":"Jyoti Landage and M. P. Wankhade. 2013. Malware and malware detection techniques: A survey. International Journal of Engineering Research and Technology (IJERT), 2, 12 (2013), 2278\u20130181.","journal-title":"International Journal of Engineering Research and Technology (IJERT)"},{"key":"e_1_2_1_28_1","unstructured":"Jan Leike. 2022. Distinguishing three alignment taxes. https:\/\/aligned.substack.com\/p\/three-alignment-taxes"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","unstructured":"Yujia Li David Choi Junyoung Chung Nate Kushman Julian Schrittwieser R\u00e9mi Leblond Tom Eccles James Keeling Felix Gimeno Agustin Dal Lago Thomas Hubert Peter Choy Cyprien de Masson d\u2019Autume Igor Babuschkin Xinyun Chen Po-Sen Huang Johannes Welbl Sven Gowal Alexey Cherepanov James Molloy Daniel J. Mankowitz Esme Sutherland Robson Pushmeet Kohli Nando de Freitas Koray Kavukcuoglu and Oriol Vinyals. 2022. Competition-level code generation with AlphaCode. Science 378 6624 (2022) 1092\u20131097. https:\/\/doi.org\/10.1126\/science.abq1158 arxiv:https:\/\/www.science.org\/doi\/pdf\/10.1126\/science.abq1158. 10.1126\/science.abq1158","DOI":"10.1126\/science.abq1158"},{"key":"e_1_2_1_30_1","volume-title":"Advances in Neural Information Processing Systems","author":"Liu Yan","year":"2023","unstructured":"Yan Liu, Xiaokang Chen, Yan Gao, Zhe Su, Fengji Zhang, Daoguang Zan, Jian-Guang Lou, Pin-Yu Chen, and Tsung-Yi Ho. 2023. Uncovering and Quantifying Social Biases in Code Generation. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.). 36, Curran Associates, Inc., 2368\u20132380. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/file\/071a637d41ea290ac4360818a8323f33-Paper-Conference.pdf"},{"key":"e_1_2_1_31_1","volume-title":"CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).","author":"Lu Shuai","year":"2021","unstructured":"Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2015.2465386"},{"key":"e_1_2_1_33_1","unstructured":"Maximilian Mozes Xuanli He Bennett Kleinberg and Lewis D Griffin. 2023. Use of LLMs for Illicit Purposes: Threats Prevention Measures and Vulnerabilities. CoRR."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2019.100199"},{"key":"e_1_2_1_35_1","unstructured":"Tianhao Shen Renren Jin Yufei Huang Chuang Liu Weilong Dong Zishan Guo Xinwei Wu Yan Liu and Deyi Xiong. 2023. Large language model alignment: A survey. arXiv preprint arXiv:2309.15025."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2022.11.049"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5815\/ijeme.2018.02.03"},{"key":"e_1_2_1_38_1","volume-title":"ALERT: A Comprehensive Benchmark for Assessing Large Language Models","author":"Tedeschi Simone","year":"2024","unstructured":"Simone Tedeschi, Felix Friedrich, Patrick Schramowski, Kristian Kersting, Roberto Navigli, Huu Nguyen, and Bo Li. 2024. ALERT: A Comprehensive Benchmark for Assessing Large Language Models\u2019 Safety through Red Teaming. CoRR."},{"key":"e_1_2_1_39_1","unstructured":"Teknium. 2023. OpenHermes 2.5: An Open Dataset of Synthetic Data for Generalist LLM Assistants. https:\/\/huggingface.co\/datasets\/teknium\/OpenHermes-2.5"},{"key":"e_1_2_1_40_1","unstructured":"Ryan Teknium Jeffrey Quesnelle and Chen Guang. 2024. Hermes 3 Technical Report. arXiv preprint arXiv:2408.11857."},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Daphne Theodorakopoulos Frederic Stahl and Marius Lindauer. 2024. Hyperparameter importance analysis for multi-objective automl. arXiv preprint arXiv:2405.07640.","DOI":"10.3233\/FAIA240602"},{"key":"e_1_2_1_42_1","volume-title":"Luke Bates, Daniel Korat, Moshe Wasserblat, and Oren Pereg.","author":"Tunstall Lewis","year":"2022","unstructured":"Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, and Oren Pereg. 2022. Efficient few-shot learning without prompts. arXiv preprint arXiv:2209.11055."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_44_1","volume-title":"2023 IEEE International Conference on Medical Artificial Intelligence (MedAI). 284\u2013289","author":"Wang Jianxun","year":"2023","unstructured":"Jianxun Wang and Yixiang Chen. 2023. A Review on Code Generation with LLMs: Application and Evaluation. In 2023 IEEE International Conference on Medical Artificial Intelligence (MedAI). 284\u2013289."},{"key":"e_1_2_1_45_1","volume-title":"Do-Not-Answer: Evaluating Safeguards in LLMs. In Findings of the Association for Computational Linguistics: EACL 2024","author":"Wang Yuxia","year":"2024","unstructured":"Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, and Timothy Baldwin. 2024. Do-Not-Answer: Evaluating Safeguards in LLMs. In Findings of the Association for Computational Linguistics: EACL 2024, Yvette Graham and Matthew Purver (Eds.). Association for Computational Linguistics, St. Julian\u2019s, Malta. 896\u2013911. https:\/\/aclanthology.org\/2024.findings-eacl.61"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3520312.3534862"},{"key":"e_1_2_1_47_1","volume-title":"A Survey of Automatic Source Code Summarization. Symmetry, 14, 3","author":"Zhang Chunyan","year":"2022","unstructured":"Chunyan Zhang, Junchao Wang, Qinglei Zhou, Ting Xu, Ke Tang, Hairen Gui, and Fudong Liu. 2022. A Survey of Automatic Source Code Summarization. Symmetry, 14, 3 (2022), issn:2073-8994 https:\/\/www.mdpi.com\/2073-8994\/14\/3\/471"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729380","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:16:59Z","timestamp":1750346219000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729380"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":47,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3729380"],"URL":"https:\/\/doi.org\/10.1145\/3729380","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}