{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T17:05:47Z","timestamp":1770915947055,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T00:00:00Z","timestamp":1759795200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Interdisciplinary Research Center for Intelligent Secure Systems (IRC-ISS) at King Fahd University of Petroleum &amp; Minerals (KFUPM)"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Commit messages are vital for traceability, maintenance, and onboarding in modern software projects, yet their quality is frequently inconsistent. Recent large language models (LLMs) can transform code diffs into natural language summaries, offering a path to more consistent and informative commit messages. This paper makes two contributions: (i) it provides a systematic survey of automated commit message generation with LLMs, critically comparing prompt-only, fine-tuned, and retrieval-augmented approaches; and (ii) it specifies a transparent, agent-based evaluation blueprint centered on CommitBench. Unlike prior reviews, we include a detailed dataset audit, preprocessing impacts, evaluation metrics, and error taxonomy. The protocol defines dataset usage and splits, prompting and context settings, scoring and selection rules, and reporting guidelines (results by project, language, and commit type), along with an error taxonomy to guide qualitative analysis. Importantly, this work emphasizes methodology and design rather than presenting new empirical benchmarking results. The blueprint is intended to support reproducibility and comparability in future studies.<\/jats:p>","DOI":"10.3390\/computers14100427","type":"journal-article","created":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T12:46:56Z","timestamp":1759841216000},"page":"427","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["LLMs for Commit Messages: A Survey and an Agent-Based Evaluation Protocol on CommitBench"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-5706-895X","authenticated-orcid":false,"given":"Mohamed Mehdi","family":"Trigui","sequence":"first","affiliation":[{"name":"Information & Computer Science Department (ICS), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6760-3506","authenticated-orcid":false,"given":"Wasfi G.","family":"Al-Khatib","sequence":"additional","affiliation":[{"name":"Information & Computer Science Department (ICS), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia"},{"name":"Interdisciplinary Research Center for Intelligent Secure Systems (IRC-ISS), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1109\/TSE.2024.3364675","article-title":"Automatic commit message generation: A critical review and directions for future work","volume":"50","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, J., and Ahmed, I. (2023, January 15\u201316). Commit message matters: Investigating impact and evolution of commit message quality. Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia.","DOI":"10.1109\/ICSE48619.2023.00076"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Tian, Y., Zhang, Y., Stol, K.-J., Jiang, L., and Liu, H. (2022, January 21\u201329). What makes a good commit message?. Proceedings of the 44th International Conference on Software Engineering (ICSE \u201922), Pittsburgh, PA, USA.","DOI":"10.1145\/3510003.3510205"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3208","DOI":"10.1109\/TSE.2024.3478317","article-title":"Automated commit message generation with large language models: An empirical study and beyond","volume":"50","author":"Xue","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3695988","article-title":"Large language models for software engineering: A systematic literature review","volume":"33","author":"Hou","year":"2024","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"ref_6","unstructured":"Lopes, C.V., Klotzman, V.I., Ma, I., and Ahmed, I. (2024). Commit messages in the age of large language models. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, L., Zhao, J., Wang, C., and Liang, P. (2024, January 12\u201315). Using large language models for commit message generation: A preliminary study. Proceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Rovaniemi, Finland.","DOI":"10.1109\/SANER60148.2024.00020"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Jiang, S., Armaly, A., and McMillan, C. (November, January 30). Automatically generating commit messages from diffs using neural machine translation. Proceedings of the 2017 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, USA.","DOI":"10.1109\/ASE.2017.8115626"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, Z., Xia, X., Hassan, A.E., Lo, D., Xing, Z., and Wang, X. (2018, January 3\u20137). Neural-machine-translation-based commit message generation: How far are we?. Proceedings of the 33rd ACM\/IEEE International Conference on Automated Software Engineering, Montpellier, France.","DOI":"10.1145\/3238147.3238190"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.neucom.2021.05.039","article-title":"Coregen: Contextualized code representation learning for commit message generation","volume":"459","author":"Nie","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"He, Y., Wang, L., Wang, K., Zhang, Y., Zhang, H., and Li, Z. (2023, January 17\u201321). Come: Commit message generation with modification embedding. Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA.","DOI":"10.1145\/3597926.3598096"},{"key":"ref_12","unstructured":"Zhang, Q., Fang, C., Xie, Y., Zhang, Y., Yang, Y., Sun, W., Yu, S., and Chen, Z. (2023). A survey on large language models for software engineering. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1145\/3643760","article-title":"Only diff is not enough: Generating commit messages leveraging reasoning and action of large language model","volume":"1","author":"Li","year":"2024","journal-title":"Proc. ACM Softw. Eng."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Beining, Y., Alassane, S., Fraysse, G., and Cherrared, S. (2024, January 28\u201331). Generating commit messages for configuration files in 5G network deployment using LLMs. Proceedings of the 2024 20th International Conference on Network and Service Management (CNSM), Prague, Czech Republic.","DOI":"10.23919\/CNSM62983.2024.10814636"},{"key":"ref_15","unstructured":"Pandya, K. (2024). Automated Software Compliance Using Smart Contracts and Large Language Models in Continuous Integration and Continuous Deployment with DevSecOps. [Master\u2019s Thesis, Arizona State University]."},{"key":"ref_16","unstructured":"Kruger, J. (2024). Embracing DevOps Release Management: Strategies and Tools to Accelerate Continuous Delivery and Ensure Quality Software Deployment, Packt Publishing Ltd."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Schall, M., Czinczoll, T., and De Melo, G. (2024, January 12\u201315). Commitbench: A benchmark for commit message generation. Proceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Rovaniemi, Finland.","DOI":"10.1109\/SANER60148.2024.00080"},{"key":"ref_18","unstructured":"Huang, Z., Huang, Y., Chen, X., Zhou, X., Yang, C., and Zheng, Z. (November, January 27). An empirical study on learning-based techniques for explicit and implicit commit messages generation. Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA."},{"key":"ref_19","first-page":"1","article-title":"The current challenges of software engineering in the era of large language models","volume":"34","author":"Gao","year":"2025","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Palakodeti, V.K., and Heydarnoori, A. (2025). Automated generation of commit messages in software repositories. arXiv.","DOI":"10.18293\/DMSVIVA2024-145"},{"key":"ref_21","unstructured":"Bektas, A. (2024). Large Language Models in Software Engineering: A Critical Review of Evaluation Strategies. [Master\u2019s Thesis, Freie Universit\u00e4t Berlin]."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, Y., Chen, J., Bi, T., Grundy, J., Wang, Y., Yu, J., Chen, T., Tang, Y., and Zheng, Z. (2024). An empirical study on low-code programming using traditional vs large language model support. arXiv.","DOI":"10.2139\/ssrn.5277058"},{"key":"ref_23","unstructured":"Don, R.G.G. (2024). Comparative Research on Code Vulnerability Detection: Open-Source vs. Proprietary Large Language Models and Lstm Neural Network. [Master\u2019s Thesis, Unitec Institute of Technology]."},{"key":"ref_24","unstructured":"Sultana, S., Afreen, S., and Eisty, N.U. (2024). Code vulnerability detection: A comparative analysis of emerging large language models. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1016\/j.future.2024.07.029","article-title":"ChatOps for microservice systems: A low-code approach using service composition and large language models","volume":"161","author":"Wang","year":"2024","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_26","unstructured":"Bogomolov, E., Eliseeva, A., Galimzyanov, T., Glukhov, E., Shapkin, A., Tigina, M., Golubev, Y., Kovrigin, A., van Deursen, A., and Izadi, M. (2024). Long code arena: A set of benchmarks for long-context code models. arXiv."},{"key":"ref_27","unstructured":"Zhao, Y., Luo, Z., Tian, Y., Lin, H., Yan, W., Li, A., and Ma, J. (2024). Codejudge-eval: Can large language models be good judges in code understanding?. arXiv."},{"key":"ref_28","unstructured":"Cao, J., Chan, Y.-K., Ling, Z., Wang, W., Li, S., Liu, M., Wang, C., Yu, B., He, P., and Wang, S. (2025). How should I build a benchmark?. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"7","DOI":"10.32362\/2500-316X-2025-13-2-7-17","article-title":"Dataset collection for automatic generation of commit messages","volume":"13","author":"Kosyanenko","year":"2025","journal-title":"Russ. Technol. J."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3188","DOI":"10.1109\/TSE.2024.3475375","article-title":"Exploring the effectiveness of LLMs in automated logging statement generation: An empirical study","volume":"50","author":"Li","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"81","DOI":"10.63282\/3050-9416.IJAIBDCMS-V5I4P109","article-title":"Intelligent automation: Leveraging LLMs in DevOps toolchains","volume":"5","author":"Allam","year":"2024","journal-title":"Int. J. AI Bigdata Comput. Manag. Stud."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ragothaman, H., and Udayakumar, S.K. (2024, January 22\u201323). Optimizing service deployments with NLP-based infrastructure code generation\u2014An automation framework. Proceedings of the 2024 IEEE 2nd International Conference on Electrical Engineering, Computer and Information Technology (ICEECIT), Jember, Indonesia.","DOI":"10.1109\/ICEECIT63698.2024.10859822"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.55524\/ijircst.2025.13.4.1","article-title":"A review of generative AI and DevOps pipelines: CI\/CD, agentic automation, MLOps integration, and large language models","volume":"13","author":"Joshi","year":"2025","journal-title":"Int. J. Innov. Res. Comput. Sci. Technol."},{"key":"ref_34","unstructured":"Coban, S., Mattukat, A., and Slupczynski, A. (2024). Full-Scale Software Engineering. [Master\u2019s Thesis, RWTH Aachen University]."},{"key":"ref_35","unstructured":"Krishna, A., and Meda, V. (2025). AI Integration in Software Development and Operations, Springer."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gandhi, A., De, S., Chechik, M.P., Pandit, V., Kiehn, M., Chee, M.C., and Bedasso, Y. (2025, January 27\u201328). Automated codebase reconciliation using large language models. Proceedings of the 2025 IEEE\/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge), Ottawa, ON, Canada.","DOI":"10.1109\/Forge66646.2025.00011"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Cihan, U., Haratian, V., \u0130\u00e7\u00f6z, A., G\u00fcl, M.K., Devran, O., Bayendur, E.F., U\u00e7ar, B.M., and T\u00fcz\u00fcn, E. (2024). Automated code review in practice. arXiv.","DOI":"10.1109\/ICSE-SEIP66354.2025.00043"},{"key":"ref_38","unstructured":"Parveen, R. (2025). Investigating T-BERT for Automated Issue\u2013Commit Link Recovery. [Master\u2019s Thesis, University of Tampere]."},{"key":"ref_39","unstructured":"Jaju, I. (2023). Maximizing DevOps Scalability in Complex Software Systems. [Master\u2019s Thesis, Uppsala University]."},{"key":"ref_40","first-page":"7421","article-title":"Machine learning algorithms in DevOps: Optimizing software development and deployment workflows with precision","volume":"2582","author":"Kolawole","year":"2025","journal-title":"Int. J. Res. Publ. Rev."},{"key":"ref_41","unstructured":"Zhang, X., Muralee, S., Cherupattamoolayil, S., and Machiry, A. (August, January 30). On the effectiveness of large language models for GitHub workflows. Proceedings of the 19th International Conference on Availability, Reliability and Security, Vienna, Austria."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/s41239-023-00386-6","article-title":"Automatic feedback and assessment of team-coding assignments in a DevOps context","volume":"20","author":"Rojo","year":"2023","journal-title":"Int. J. Educ. Technol. High. Educ."},{"key":"ref_43","unstructured":"Cellamare, F.P. (2025). AI-Driven Unit Test Generation. [Ph.D. Thesis, Politecnico di Torino]."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/10\/427\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:50:03Z","timestamp":1760035803000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/10\/427"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,7]]},"references-count":43,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["computers14100427"],"URL":"https:\/\/doi.org\/10.3390\/computers14100427","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,7]]}}}