{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T23:15:59Z","timestamp":1776122159813,"version":"3.50.1"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,6,4]],"date-time":"2024-06-04T00:00:00Z","timestamp":1717459200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62072112"],"award-info":[{"award-number":["62072112"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Scientific and Technological innovation action plan of Shanghai Science and Technology Committee","award":["22511102202"],"award-info":[{"award-number":["22511102202"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we discover that training on good-practice commits significantly contributes to the commit message generation. Motivated by this finding, we propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. To achieve this, we propose a model that learns the commit knowledge by training on good-practice commits. This knowledge model enables supplementing more information for training samples that do not conform to good practice. However, since the supplementary information may contain noise or prediction errors, we propose a dynamic denoising training method. This method composes a distribution-aware confidence function and a dynamic distribution list, which enhances the effectiveness of the training process. Experimental results on the whole MCMD dataset demonstrate that our method overall achieves state-of-the-art performance compared with previous methods.<\/jats:p>","DOI":"10.1145\/3643675","type":"journal-article","created":{"date-parts":[[2024,1,29]],"date-time":"2024-01-29T12:57:22Z","timestamp":1706533042000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation"],"prefix":"10.1145","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1800-1904","authenticated-orcid":false,"given":"Wei","family":"Tao","sequence":"first","affiliation":[{"name":"Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-9883-5621","authenticated-orcid":false,"given":"Yucheng","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Macau, Macau, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7761-7269","authenticated-orcid":false,"given":"Yanlin","family":"Wang","sequence":"additional","affiliation":[{"name":"Sun Yat-Sen University, Zhuhai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3063-9425","authenticated-orcid":false,"given":"Hongyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Chongqing University, Chongqing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3018-3824","authenticated-orcid":false,"given":"Haofen","family":"Wang","sequence":"additional","affiliation":[{"name":"Tongji University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3339-8751","authenticated-orcid":false,"given":"Wenqiang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2024,6,4]]},"reference":[{"key":"e_1_3_2_2_2","series-title":"Proceedings of Machine Learning Research","first-page":"312","volume-title":"Proceedings of the 36th International Conference on Machine Learning, ICML 2019","volume":"97","author":"Arazo Eric","year":"2019","unstructured":"Eric Arazo, Diego Ortego, Paul Albert, Noel E. O\u2019Connor, and Kevin McGuinness. 2019. Unsupervised label noise modeling and loss correction. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). Proceedings of Machine Learning Research, Vol. 97, PMLR, 312\u2013321. Retrieved from http:\/\/proceedings.mlr.press\/v97\/arazo19a.html"},{"key":"e_1_3_2_3_2","first-page":"65","volume-title":"Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization@ACL 2005","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization@ACL 2005, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare R. Voss (Eds.). Association for Computational Linguistics, 65\u201372. Retrieved from https:\/\/aclanthology.org\/W05-0909\/"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2901739.2903496"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/p19-1470"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/p19-1175"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/1858996.1859005"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3092703.3098230"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1299"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/SCAM.2014.14"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1977.tb01600.x"},{"key":"e_1_3_2_12_2","first-page":"4171","volume-title":"Proceedings of the NAACL-HLT (1)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT (1). Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510069"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00078"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3608134"},{"key":"e_1_3_2_16_2","unstructured":"Bo Han Quanming Yao Tongliang Liu Gang Niu Ivor W. Tsang James T. Kwok and Masashi Sugiyama. 2020. A survey of label-noise representation learning: Past present and future. arXiv:2011.04406. Retrieved from https:\/\/arxiv.org\/abs\/2011.04406"},{"key":"e_1_3_2_17_2","first-page":"8536","volume-title":"Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS\u201918)","author":"Han Bo","year":"2018","unstructured":"Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor W. Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS\u201918), Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). NeurIPS Foundation, 8536\u20138546. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/a19744e268754fb0148b017647355b7b-Abstract.html"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597926.3598096"},{"key":"e_1_3_2_19_2","first-page":"10477","volume-title":"Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS\u201918)","author":"Hendrycks Dan","year":"2018","unstructured":"Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. 2018. Using trusted data to train deep networks on labels corrupted by severe noise. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS\u201918), Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). NeurIPS Foundation, 10477\u201310486. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/ad554d8c3b06d6b97ee76a2448bd7913-Abstract.html"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPC.2009.5090025"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3502853"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-020-0496-0"},{"key":"e_1_3_2_23_2","series-title":"Proceedings of Machine Learning Research","first-page":"2309","volume-title":"Proceedings of the 35th International Conference on Machine Learning (ICML\u201918)","volume":"80","author":"Jiang Lu","year":"2018","unstructured":"Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Proceedings of the 35th International Conference on Machine Learning (ICML\u201918), Jennifer G. Dy and Andreas Krause (Eds.). Proceedings of Machine Learning Research, Vol. 80, PMLR, 2309\u20132318. Retrieved from http:\/\/proceedings.mlr.press\/v80\/jiang18c.html"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00162"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2017.8115626"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/33.3.239"},{"key":"e_1_3_2_27_2","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020","author":"Lewis Patrick S. H.","year":"2020","unstructured":"Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, Tim Rockt\u00e4schel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). NeurIPS Foundation, Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/6b493230205f780e1bc26945df7481e5-Abstract.html"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00076"},{"key":"e_1_3_2_29_2","unstructured":"Lin Chin-Yew. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. ACL."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2019.00056"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.3038681"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2456899"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238190"},{"key":"e_1_3_2_34_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations, ICLR 2019","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=Bkg6RiCqY7"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/w18-6513"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2045"},{"key":"e_1_3_2_37_2","volume-title":"Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021","author":"Lu Shuai","year":"2021","unstructured":"Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). NeurIPS Foundation, Retrieved from https:\/\/datasets-benchmarks-proceedings.neurips.cc\/paper\/2021\/hash\/c16a5320fa475530d9583c34fd356ef5-Abstract-round1.html"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.27"},{"key":"e_1_3_2_39_2","unstructured":"Wei Ma Shangqing Liu Wenhan Wang Qiang Hu Ye Liu Cen Zhang Liming Nie and Yang Liu. 2023. The scope of ChatGPT in software engineering: A thorough investigation. arXiv:2305.12138. Retrieved from https:\/\/arxiv.org\/abs\/2305.12138"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2858821"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPC.2013.6613830"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.05.039"},{"key":"e_1_3_2_43_2","volume-title":"Introducing ChatGPT","year":"2022","unstructured":"OpenAI. 2022. Introducing ChatGPT. Technical Report. OpenAI. [Online]. Retrieved from https:\/\/openai.com\/blog\/chatgpt"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884847"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.240"},{"key":"e_1_3_2_47_2","unstructured":"Alec Radford Narasimhan Karthik Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2020.106332"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33013027"},{"issue":"5","key":"e_1_3_2_50_2","first-page":"1","article-title":"The truth of the F-measure","volume":"1","year":"2007","unstructured":"Yutaka Sasaki. 2007. The truth of the F-measure. Teach Tutor Mater 1, 5 (2007), 1\u20135.","journal-title":"Teach Tutor Mater"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1099"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.372"},{"key":"e_1_3_2_53_2","article-title":"Best practices for prompt engineering with OpenAI API","author":"Shieh Jessica","year":"2023","unstructured":"Jessica Shieh. 2023. Best practices for prompt engineering with OpenAI API. OpenAI, February. Retrieved Nov 10, 2023 from https:\/\/help.openai.com\/en\/articles\/6654000-best-practices-for-prompt-engineering-with-openai-api","journal-title":"OpenAI, February."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME52107.2021.00018"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-022-10219-1"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510205"},{"key":"e_1_3_2_57_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-8643"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2015.229"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-3007"},{"key":"e_1_3_2_61_2","unstructured":"Chenglin Wang Yucheng Zhou Guodong Long Xiaodong Wang and Xiaowei Xu. 2022. Unsupervised knowledge graph construction and event-centric knowledge infusion for scientific NLI. arXiv:2210.15248. Retrieved from https:\/\/arxiv.org\/abs\/2210.15248"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3464689"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01374"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/552"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-020-05559-3"},{"key":"e_1_3_2_67_2","unstructured":"Zibin Zheng Kaiwen Ning Jiachi Chen Yanlin Wang Wenqing Chen Lianghong Guo and Weicheng Wang. 2023. Towards an understanding of large language models in software engineering tasks. arXiv:2308.11396. Retrieved from https:\/\/arxiv.org\/abs\/2308.11396"},{"key":"e_1_3_2_68_2","unstructured":"Zibin Zheng Kaiwen Ning Yanlin Wang Jingwen Zhang Dewu Zheng Mingxi Ye and Jiachi Chen. 2023. A survey of large language models for code: Evolution benchmarking and future trends. arXiv:2311.10372. Retrieved from https:\/\/arxiv.org\/abs\/2311.10372"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.403"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-ACL.332"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643675","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643675","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:05:33Z","timestamp":1750291533000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643675"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,4]]},"references-count":69,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3643675"],"URL":"https:\/\/doi.org\/10.1145\/3643675","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,4]]},"assertion":[{"value":"2023-07-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-15","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}