{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T06:17:49Z","timestamp":1770877069642,"version":"3.50.1"},"reference-count":28,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T00:00:00Z","timestamp":1769904000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004055","name":"King Fahd University of Petroleum and Minerals","doi-asserted-by":"publisher","award":["INSS2522"],"award-info":[{"award-number":["INSS2522"]}],"id":[{"id":"10.13039\/501100004055","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Commit messages are essential for understanding software evolution and maintaining traceability of projects; however, their quality varies across repositories. Recent Large Language Models provide a promising path to automate this task by generating concise context-sensitive commit messages directly from code diffs. This paper provides a comparative study of three paradigms of large language models: zero-shot prompting, retrieval-augmented generation, and fine-tuning, using the large-scale CommitBench dataset that spans six programming languages. We assess the performance of the models with automatic metrics, namely BLEU, ROUGE-L, METEOR, and Adequacy, and a human assessment of 100 commits. In the latter, experienced developers rated each generated commit message for Adequacy and Fluency on a five-point Likert scale. The results show that fine-tuning and domain adaptation yield models that perform consistently better than general-purpose baselines across all evaluation metrics, thus generating commit messages with higher semantic adequacy and clearer phrasing than zero-shot approaches. The correlation analysis suggests that the Adequacy and BLEU scores are closer to human judgment, while ROUGE-L and METEOR tend to underestimate the quality in cases where the models generate stylistically diverse or paraphrased outputs. Finally, the study outlines a conceptual integration pathway for incorporating such models into software development workflows, emphasizing a human-in-the-loop approach for quality assurance.<\/jats:p>","DOI":"10.3390\/computers15020087","type":"journal-article","created":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T10:48:01Z","timestamp":1770115681000},"page":"87","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["AI-Driven Code Documentation: Comparative Evaluation of LLMs for Commit Message Generation"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-5706-895X","authenticated-orcid":false,"given":"Mohamed Mehdi","family":"Trigui","sequence":"first","affiliation":[{"name":"Information & Computer Science Department (ICS), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6760-3506","authenticated-orcid":false,"given":"Wasfi G.","family":"Al-Khatib","sequence":"additional","affiliation":[{"name":"Information & Computer Science Department (ICS), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia"},{"name":"Interdisciplinary Research Center for Intelligent Secure Systems (IRC-ISS), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-7624-1491","authenticated-orcid":false,"given":"Mohammad","family":"Amro","sequence":"additional","affiliation":[{"name":"Information & Computer Science Department (ICS), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia"},{"name":"Interdisciplinary Research Center for Intelligent Secure Systems (IRC-ISS), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9206-4583","authenticated-orcid":false,"given":"Fatma","family":"Mallouli","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Deanship of Preparatory Year and Supporting Studies, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1109\/TSE.2024.3364675","article-title":"Automatic commit message generation: A critical review and directions for future work","volume":"50","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, J., and Ahmed, I. (2023). Commit message matters: Investigating impact and evolution of commit message quality. Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 15\u201316 May 2023, IEEE.","DOI":"10.1109\/ICSE48619.2023.00076"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Tian, Y., Zhang, Y., Stol, K.-J., Jiang, L., and Liu, H. (2022). What makes a good commit message?. Proceedings of the 44th International Conference on Software Engineering (ICSE \u201922), Pittsburgh, PA, USA, 21\u201329 May 2022, ACM.","DOI":"10.1145\/3510003.3510205"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3208","DOI":"10.1109\/TSE.2024.3478317","article-title":"Automated commit message generation with large language models: An empirical study and beyond","volume":"50","author":"Xue","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_5","unstructured":"Zhang, Q., Fang, C., Xie, Y., Zhang, Y., Yang, Y., Sun, W., Yu, S., and Chen, Z. (2023). A survey on large language models for software engineering. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., and Wang, H. (2024). Large language models for software engineering: A systematic literature review. ACM Trans. Softw. Eng. Methodol., 33.","DOI":"10.1145\/3695988"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Schall, M., Czinczoll, T., and De Melo, G. (2024). Commitbench: A benchmark for commit message generation. Proceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Rovaniemi, Finland, 12\u201315 March 2024, IEEE.","DOI":"10.1109\/SANER60148.2024.00080"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"7","DOI":"10.32362\/2500-316X-2025-13-2-7-17","article-title":"Dataset collection for automatic generation of commit messages","volume":"13","author":"Kosyanenko","year":"2025","journal-title":"Russ. Technol. J."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Jiang, S., Armaly, A., and McMillan, C. (2017). Automatically generating commit messages from diffs using neural machine translation. Proceedings of the 2017 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, USA, 30 October\u20133 November 2017, IEEE.","DOI":"10.1109\/ASE.2017.8115626"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, Z., Xia, X., Hassan, A.E., Lo, D., Xing, Z., and Wang, X. (2018, January 3\u20137). Neural-machine-translation-based commit message generation: How far are we?. Proceedings of the 33rd ACM\/IEEE International Conference on Automated Software Engineering, Montpellier, France.","DOI":"10.1145\/3238147.3238190"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.neucom.2021.05.039","article-title":"Coregen: Contextualized code representation learning for commit message generation","volume":"459","author":"Nie","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"He, Y., Wang, L., Wang, K., Zhang, Y., Zhang, H., and Li, Z. (2023, January 17\u201321). Come: Commit message generation with modification embedding. Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA.","DOI":"10.1145\/3597926.3598096"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"102027","DOI":"10.1016\/j.aei.2023.102027","article-title":"Adaptive variational autoencoding generative adversarial networks for rolling bearing fault diagnosis","volume":"56","author":"Wang","year":"2023","journal-title":"Adv. Eng. Inform."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Qu, X., Liu, Z., Wu, C.Q., Hou, A., Yin, X., and Chen, Z. (2024). MFGAN: Multimodal fusion for industrial anomaly detection using attention-based autoencoder and generative adversarial network. Sensors, 24.","DOI":"10.3390\/s24020637"},{"key":"ref_15","unstructured":"Lopes, C.V., Klotzman, V.I., Ma, I., and Ahmed, I. (2024). Commit messages in the age of large language models. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1145\/3643760","article-title":"Only diff is not enough: Generating commit messages leveraging reasoning and action of large language model","volume":"1","author":"Li","year":"2024","journal-title":"Proc. ACM Softw. Eng."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Gao, C., Hu, X., Gao, S., Xia, X., and Jin, Z. (2025). The current challenges of software engineering in the era of large language models. ACM Trans. Softw. Eng. Methodol., 34.","DOI":"10.1145\/3712005"},{"key":"ref_18","unstructured":"Huang, Z., Huang, Y., Chen, X., Zhou, X., Yang, C., and Zheng, Z. (November, January 27). An empirical study on learning-based techniques for explicit and implicit commit messages generation. Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, L., Zhao, J., Wang, C., and Liang, P. (2024). Using large language models for commit message generation: A preliminary study. Proceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Rovaniemi, Finland, 12\u201315 March 2024, IEEE.","DOI":"10.1109\/SANER60148.2024.00020"},{"key":"ref_20","unstructured":"BalancedCommitBench Authors (2026). BalancedCommitBench: A Language-Balanced Commit Messages Subset Extracted from CommitBench, Zenodo."},{"key":"ref_21","unstructured":"Bogomolov, E., Eliseeva, A., Galimzyanov, T., Glukhov, E., Shapkin, A., Tigina, M., Golubev, Y., Kovrigin, A., van Deursen, A., and Izadi, M. (2024). Long code arena: A set of benchmarks for long-context code models. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.55524\/ijircst.2025.13.4.1","article-title":"A review of generative AI and DevOps pipelines: CI\/CD, agentic automation, MLOps integration, and large language models","volume":"13","author":"Joshi","year":"2025","journal-title":"Int. J. Innov. Res. Comput. Sci. Technol."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"81","DOI":"10.63282\/3050-9416.IJAIBDCMS-V5I4P109","article-title":"Intelligent automation: Leveraging LLMs in DevOps toolchains","volume":"5","author":"Allam","year":"2024","journal-title":"Int. J. AI Bigdata Comput. Manag. Stud."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002, January 7\u201312). BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, USA.","DOI":"10.3115\/1073083.1073135"},{"key":"ref_25","unstructured":"Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Proceedings of Text Summarization Branches Out: ACL Workshop, Association for Computational Linguistics."},{"key":"ref_26","unstructured":"Banerjee, S., and Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization, Association for Computational Linguistics."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xu, S., Yao, Y., Xu, F., Gu, T., Tong, H., and Lu, J. (2019, January 10\u201316). Commit message generation for source code changes. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China.","DOI":"10.24963\/ijcai.2019\/552"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Trigui, M.M., and Al-Khatib, W.G. (2025). LLMs for Commit Messages: A Survey and an Agent-Based Evaluation Protocol on CommitBench. Computers, 14.","DOI":"10.3390\/computers14100427"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/15\/2\/87\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T05:25:08Z","timestamp":1770873908000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/15\/2\/87"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,1]]},"references-count":28,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["computers15020087"],"URL":"https:\/\/doi.org\/10.3390\/computers15020087","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,1]]}}}