{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T03:02:32Z","timestamp":1775617352612,"version":"3.50.1"},"reference-count":73,"publisher":"Association for Computing Machinery (ACM)","license":[{"start":{"date-parts":[[2024,7,1]],"date-time":"2024-07-01T00:00:00Z","timestamp":1719792000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"abstract":"<jats:p>\n            GitHub commits, which record the code changes with natural language messages for description, play a critical role in software developers\u2019 comprehension of software evolution. Due to their importance in software development, several learning-based works are conducted for GitHub commits, such as commit message generation and security patch identification. However, most existing works focus on customizing specialized neural networks for different tasks. Inspired by the superiority of code pre-trained models, which has confirmed their effectiveness across different downstream tasks, to promote the development of open-source software community, we first collect a large-scale commit benchmark including over\n            <jats:bold>7.99<\/jats:bold>\n            million commits across\n            <jats:bold>7<\/jats:bold>\n            programming languages. Based on this benchmark, we present CommitBART, a pre-trained encoder-decoder Transformer model for GitHub commits. The model is pre-trained by three categories (i.e., denoising objectives, cross-modal generation, and contrastive learning) for six pre-training tasks to learn commit fragment representations. Our model is evaluated on one understanding task and three generation tasks for commits. The comprehensive experiments on these tasks demonstrate that CommitBART significantly outperforms previous pre-trained works for code. Further analysis also reveals that each pre-training task enhances the model performance.\n          <\/jats:p>","DOI":"10.1145\/3674731","type":"journal-article","created":{"date-parts":[[2024,7,1]],"date-time":"2024-07-01T16:54:10Z","timestamp":1719852850000},"update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Automated Commit Intelligence by Pre-training"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5598-4006","authenticated-orcid":false,"given":"Shangqing","family":"Liu","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2263-7383","authenticated-orcid":false,"given":"Yanzhou","family":"Li","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1288-6502","authenticated-orcid":false,"given":"Xiaofei","family":"Xie","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0044-466X","authenticated-orcid":false,"given":"Wei","family":"Ma","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6388-2571","authenticated-orcid":false,"given":"Guozhu","family":"Meng","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering, Chinese Academy of Sciences, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7300-9215","authenticated-orcid":false,"given":"Yang","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,7]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021","author":"Ahmad Wasi Uddin","year":"2021","unstructured":"Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-T\u00fcr, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (Eds.). Association for Computational Linguistics, 2655\u20132668. https:\/\/doi.org\/10.18653\/V1\/2021.NAACL-MAIN.211"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 143\u2013153","author":"Allamanis Miltiadis","year":"2019","unstructured":"Miltiadis Allamanis. 2019. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 143\u2013153."},{"key":"e_1_2_1_3_1","unstructured":"Authors. 2023. Automated Commit Intelligence by Pre-training. https:\/\/github.com\/Lyz1213\/CommitBart."},{"key":"e_1_2_1_4_1","volume-title":"et\u00a0al","author":"Buratti Luca","year":"2020","unstructured":"Luca Buratti, Saurabh Pujar, Mihaela Bornea, Scott McCarley, Yunhui Zheng, Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang, et\u00a0al. 2020. Exploring software naturalness through neural language models. arXiv preprint arXiv:2006.12641 (2020)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","first-page":"1385","DOI":"10.1109\/TSE.2020.3020502","article-title":"Codit: Code editing with tree-based neural models","volume":"48","author":"Chakraborty Saikat","year":"2020","unstructured":"Saikat Chakraborty, Yangruibo Ding, Miltiadis Allamanis, and Baishakhi Ray. 2020. Codit: Code editing with tree-based neural models. IEEE Transactions on Software Engineering 48, 4 (2020), 1385\u20131399.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_2_1_6_1","volume-title":"On Multi-Modal Learning of Editing Source Code. In 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 443\u2013455","author":"Chakraborty Saikat","year":"2021","unstructured":"Saikat Chakraborty and Baishakhi Ray. 2021. On Multi-Modal Learning of Editing Source Code. In 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 443\u2013455."},{"key":"e_1_2_1_7_1","unstructured":"ChatGPT. 2022. Do the OpenAI API models have knowledge of current events? https:\/\/help.openai.com\/en\/articles\/6639781-do-the-openai-api-models-have-knowledge-of-current-events."},{"key":"e_1_2_1_8_1","volume-title":"Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et\u00a0al.","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et\u00a0al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)."},{"key":"e_1_2_1_9_1","volume-title":"Identifying Vulnerability Patches by Comprehending Code Commits with Comprehensive Change Contexts. arXiv preprint arXiv:2310.02530","author":"Chen Tianyu","year":"2023","unstructured":"Tianyu Chen, Lin Li, Taotao Qian, Zeyu Wang, Guangtai Liang, Ding Li, Qianxiang Wang, and Tao Xie. 2023. Identifying Vulnerability Patches by Comprehending Code Commits with Comprehensive Change Contexts. arXiv preprint arXiv:2310.02530 (2023)."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171\u20134186. https:\/\/doi.org\/10.18653\/v1\/n19-1423"},{"key":"e_1_2_1_11_1","volume-title":"Revisiting Learning-based Commit Message Generation. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 794\u2013805","author":"Dong Jinhao","year":"2023","unstructured":"Jinhao Dong, Yiling Lou, Dan Hao, and Lin Tan. 2023. Revisiting Learning-based Commit Message Generation. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 794\u2013805."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 970\u2013981","author":"Dong Jinhao","year":"2022","unstructured":"Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, and Dan Hao. 2022. FIRA: fine-grained graph-based code change representation for automated commit message generation. In Proceedings of the 44th International Conference on Software Engineering. 970\u2013981."},{"key":"e_1_2_1_13_1","volume-title":"CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL","volume":"1547","author":"Feng Zhangyin","year":"2020","unstructured":"Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536\u20131547. https:\/\/doi.org\/10.18653\/V1\/2020.FINDINGS-EMNLP.139"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic","author":"Gao Tianyu","year":"2021","unstructured":"Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 6894\u20136910. https:\/\/doi.org\/10.18653\/V1\/2021.EMNLP-MAIN.552"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022","author":"Guo Daya","year":"2022","unstructured":"Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 7212\u20137225. https:\/\/doi.org\/10.18653\/V1\/2022.ACL-LONG.499"},{"key":"e_1_2_1_16_1","volume-title":"GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations, ICLR 2021","author":"Guo Daya","year":"2021","unstructured":"Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=jLoC4ez43PZ"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313","author":"Guo Qi","year":"2024","unstructured":"Qi Guo, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, and Xin Peng. 2024. Exploring the potential of chatgpt in automated code refinement: An empirical study. In Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 792\u2013803","author":"He Yichen","year":"2023","unstructured":"Yichen He, Liran Wang, Kaiyi Wang, Yupeng Zhang, Hang Zhang, and Zhoujun Li. 2023. COME: Commit Message Generation with Modification Embedding. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 792\u2013803."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 37th IEEE\/ACM International Conference on Automated Software Engineering. 1\u201311","author":"Hern\u00e1ndez L\u00f3pez Jos\u00e9 Antonio","year":"2022","unstructured":"Jos\u00e9 Antonio Hern\u00e1ndez L\u00f3pez, Martin Weyssow, Jes\u00fas S\u00e1nchez Cuadrado, and Houari Sahraoui. 2022. AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models. In Proceedings of the 37th IEEE\/ACM International Conference on Automated Software Engineering. 1\u201311."},{"key":"e_1_2_1_20_1","volume-title":"Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436","author":"Husain Hamel","year":"2019","unstructured":"Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic","author":"Jain Paras","year":"2021","unstructured":"Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph Gonzalez, and Ion Stoica. 2021. Contrastive Code Representation Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 5954\u20135971. https:\/\/doi.org\/10.18653\/V1\/2021.EMNLP-MAIN.482"},{"key":"e_1_2_1_22_1","volume-title":"2017 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 135\u2013146","author":"Jiang Siyuan","year":"2017","unstructured":"Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In 2017 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 135\u2013146."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI 2021","author":"Jiang Xue","year":"2021","unstructured":"Xue Jiang, Zhuoran Zheng, Chen Lyu, Liang Li, and Lei Lyu. 2021. TreeBERT: A tree-based pre-trained model for programming language. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI 2021, Virtual Event, 27-30 July 2021 (Proceedings of Machine Learning Research, Vol. 161), Cassio P. de Campos, Marloes H. Maathuis, and Erik Quaeghebeur (Eds.). AUAI Press, 54\u201363. https:\/\/proceedings.mlr.press\/v161\/jiang21a.html"},{"key":"e_1_2_1_24_1","volume-title":"Commitbert: Commit message generation using pre-trained programming language model. arXiv preprint arXiv:2105.14242","author":"Jung Tae-Hwan","year":"2021","unstructured":"Tae-Hwan Jung. 2021. Commitbert: Commit message generation using pre-trained programming language model. arXiv preprint arXiv:2105.14242 (2021)."},{"key":"e_1_2_1_25_1","volume-title":"2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1332\u20131336","author":"Karmakar Anjan","year":"2021","unstructured":"Anjan Karmakar and Romain Robbes. 2021. What do pre-trained code models know about code?. In 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1332\u20131336."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020","author":"Lewis Mike","year":"2020","unstructured":"Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7871\u20137880. https:\/\/doi.org\/10.18653\/V1\/2020.ACL-MAIN.703"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3597207","article-title":"Codeeditor: Learning to edit source code with pre-trained models","volume":"32","author":"Li Jia","year":"2023","unstructured":"Jia Li, Ge Li, Zhuo Li, Zhi Jin, Xing Hu, Kechi Zhang, and Zhiyi Fu. 2023. Codeeditor: Learning to edit source code with pre-trained models. ACM Transactions on Software Engineering and Methodology 32, 6 (2023), 1\u201322.","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_2_1_29_1","volume-title":"Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et\u00a0al.","author":"Li Raymond","year":"2023","unstructured":"Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et\u00a0al. 2023. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023)."},{"key":"e_1_2_1_30_1","volume-title":"Agustin Dal Lago, et\u00a0al","author":"Li Yujia","year":"2022","unstructured":"Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, R\u00e9mi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et\u00a0al. 2022. Competition-level code generation with alphacode. Science 378, 6624 (2022), 1092\u20131097."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023","author":"Li Yanzhou","year":"2023","unstructured":"Yanzhou Li, Shangqing Liu, Kangjie Chen, Xiaofei Xie, Tianwei Zhang, and Yang Liu. 2023. Multi-target Backdoor Attacks for Code Pre-trained Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 7236\u20137254. https:\/\/doi.org\/10.18653\/V1\/2023.ACL-LONG.399"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1035\u20131047","author":"Li Zhiyu","year":"2022","unstructured":"Zhiyu Li, Shuai Lu, Daya Guo, Nan Duan, Shailesh Jannu, Grant Jenks, Deep Majumder, Jared Green, Alexey Svyatkovskiy, Shengyu Fu, et\u00a0al. 2022. Automating code review activities by large-scale pre-training. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1035\u20131047."},{"key":"e_1_2_1_33_1","volume-title":"CCT5: A Code-Change-Oriented Pre-Trained Model. arXiv preprint arXiv:2305.10785","author":"Lin Bo","year":"2023","unstructured":"Bo Lin, Shangwen Wang, Zhongxin Liu, Yepang Liu, Xin Xia, and Xiaoguang Mao. 2023. CCT5: A Code-Change-Oriented Pre-Trained Model. arXiv preprint arXiv:2305.10785 (2023)."},{"key":"e_1_2_1_34_1","volume-title":"Automated Code Editing with Search-Generate-Modify. arXiv preprint arXiv:2306.06490","author":"Liu Changshu","year":"2023","unstructured":"Changshu Liu, Pelin Cetin, Yogesh Patodia, Saikat Chakraborty, Yangruibo Ding, and Baishakhi Ray. 2023. Automated Code Editing with Search-Generate-Modify. arXiv preprint arXiv:2306.06490 (2023)."},{"key":"e_1_2_1_35_1","volume-title":"2019 IEEE\/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 299\u2013309","author":"Liu Qin","year":"2019","unstructured":"Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, and Yu Qian. 2019. Generating commit messages from diffs using pointer-generator network. In 2019 IEEE\/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 299\u2013309."},{"key":"e_1_2_1_36_1","volume-title":"9th International Conference on Learning Representations, ICLR 2021","author":"Liu Shangqing","year":"2021","unstructured":"Shangqing Liu, Yu Chen, Xiaofei Xie, Jing Kai Siow, and Yang Liu. 2021. Retrieval-Augmented Generation for Code Summarization via Hybrid GNN. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=zv-typ1gPxA"},{"key":"e_1_2_1_37_1","volume-title":"Nie Lun Yiu, and Yang Liu","author":"Liu Shangqing","year":"2020","unstructured":"Shangqing Liu, Cuiyun Gao, Sen Chen, Nie Lun Yiu, and Yang Liu. 2020. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Transactions on Software Engineering (2020)."},{"key":"e_1_2_1_38_1","volume-title":"45th IEEE\/ACM International Conference on Software Engineering, ICSE 2023","author":"Liu Shangqing","year":"2023","unstructured":"Shangqing Liu, Bozhi Wu, Xiaofei Xie, Guozhu Meng, and Yang Liu. 2023. ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning. In 45th IEEE\/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2476\u20132487. https:\/\/doi.org\/10.1109\/ICSE48619.2023.00207"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 33rd ACM\/IEEE International Conference on Automated Software Engineering. 373\u2013384","author":"Liu Zhongxin","year":"2018","unstructured":"Zhongxin Liu, Xin Xia, Ahmed E Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-machine-translation-based commit message generation: how far are we?. In Proceedings of the 33rd ACM\/IEEE International Conference on Automated Software Engineering. 373\u2013384."},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021","author":"Lu Shuai","year":"2021","unstructured":"Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). https:\/\/datasets-benchmarks-proceedings.neurips.cc\/paper\/2021\/hash\/c16a5320fa475530d9583c34fd356ef5-Abstract-round1.html"},{"key":"e_1_2_1_41_1","volume-title":"Is Self-Attention Powerful to Learn Code Syntax and Semantics? arXiv preprint arXiv:2212.10017","author":"Ma Wei","year":"2022","unstructured":"Wei Ma, Mengjie Zhao, Xiaofei Xie, Qiang Hu, Shangqing Liu, Jie Zhang, Wenhan Wang, and Yang Liu. 2022. Is Self-Attention Powerful to Learn Code Syntax and Semantics? arXiv preprint arXiv:2212.10017 (2022)."},{"key":"e_1_2_1_42_1","volume-title":"Ratnadira Widyasari, Chengran Yang, Zhipeng Zhao, Bowen Xu, Jiayuan Zhou, Xin Xia, Ahmed E Hassan, et\u00a0al.","author":"Nguyen Truong Giang","year":"2023","unstructured":"Truong Giang Nguyen, Thanh Le-Cong, Hong Jin Kang, Ratnadira Widyasari, Chengran Yang, Zhipeng Zhao, Bowen Xu, Jiayuan Zhou, Xin Xia, Ahmed E Hassan, et\u00a0al. 2023. Multi-Granularity Detector for Vulnerability Fixes. IEEE Transactions on Software Engineering (2023)."},{"key":"e_1_2_1_43_1","volume-title":"CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Nijkamp Erik","year":"2023","unstructured":"Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https:\/\/openreview.net\/pdf?id=iaYcJKpY2B_"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 2006\u20132018","author":"Niu Changan","year":"2022","unstructured":"Changan Niu, Chuanyi Li, Vincent Ng, Jidong Ge, Liguo Huang, and Bin Luo. 2022. Spt-code: Sequence-to-sequence pre-training for learning source code representations. In Proceedings of the 44th International Conference on Software Engineering. 2006\u20132018."},{"key":"e_1_2_1_45_1","unstructured":"OpenAI. 2022. ChatGPT: Optimizing Language Models for Dialogue. https:\/\/chat.openai.com."},{"key":"e_1_2_1_46_1","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, et\u00a0al. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140 (2020), 1\u201367.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_47_1","volume-title":"Benchmarking Language Models for Code Syntax Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2022","author":"Shen Da","year":"2022","unstructured":"Da Shen, Xinyun Chen, Chenguang Wang, Koushik Sen, and Dawn Song. 2022. Benchmarking Language Models for Code Syntax Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 3071\u20133093. https:\/\/doi.org\/10.18653\/V1\/2022.FINDINGS-EMNLP.224"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates","author":"Shi Ensheng","year":"2022","unstructured":"Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2022. RACE: Retrieval-augmented Commit Message Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 5520\u20135530. https:\/\/doi.org\/10.18653\/V1\/2022.EMNLP-MAIN.372"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 107\u2013119","author":"Shi Lin","year":"2022","unstructured":"Lin Shi, Fangwen Mu, Xiao Chen, Song Wang, Junjie Wang, Ye Yang, Ge Li, Xin Xia, and Qing Wang. 2022. Are we building on the rock? on the importance of data preprocessing for code summarization. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 107\u2013119."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 1609\u20131620","author":"Sun Zhensu","year":"2022","unstructured":"Zhensu Sun, Li Li, Yan Liu, Xiaoning Du, and Li Li. 2022. On the importance of building high-quality training datasets for neural code search. In Proceedings of the 44th International Conference on Software Engineering. 1609\u20131620."},{"key":"e_1_2_1_51_1","volume-title":"28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering","author":"Svyatkovskiy Alexey","year":"2020","unstructured":"Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. IntelliCode compose: code generation using transformer. In ESEC\/FSE \u201920: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 1433\u20131443. https:\/\/doi.org\/10.1145\/3368089.3417058"},{"key":"e_1_2_1_52_1","volume-title":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 126\u2013136","author":"Tao Wei","year":"2021","unstructured":"Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, and Wenqiang Zhang. 2021. On the evaluation of commit message generation models: an experimental study. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 126\u2013136."},{"key":"e_1_2_1_53_1","volume-title":"MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution. arXiv preprint arXiv:2403.17927","author":"Tao Wei","year":"2024","unstructured":"Wei Tao, Yucheng Zhou, Wenqiang Zhang, and Yu Cheng. 2024. MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution. arXiv preprint arXiv:2403.17927 (2024)."},{"key":"e_1_2_1_54_1","volume-title":"et\u00a0al","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et\u00a0al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv preprint arXiv:2307.09288 (2023)."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2022","author":"Troshin Sergey","year":"2022","unstructured":"Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022, Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.). Association for Computational Linguistics, 371\u2013383. https:\/\/doi.org\/10.18653\/V1\/2022.BLACKBOXNLP-1.31"},{"key":"e_1_2_1_56_1","volume-title":"44th IEEE\/ACM 44th International Conference on Software Engineering, ICSE 2022","author":"Tufano Rosalia","year":"2022","unstructured":"Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, and Gabriele Bavota. 2022. Using Pre-Trained Models to Boost Code Review Automation. In 44th IEEE\/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2291\u20132302. https:\/\/doi.org\/10.1145\/3510003.3510621"},{"key":"e_1_2_1_57_1","volume-title":"2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 163\u2013174","author":"Tufano Rosalia","year":"2021","unstructured":"Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk, and Gabriele Bavota. 2021. Towards automating code review activities. In 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 163\u2013174."},{"key":"e_1_2_1_58_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998\u20136008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_2_1_59_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 2377\u20132388","author":"Wan Yao","year":"2022","unstructured":"Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hai Jin. 2022. What do they capture? a structural analysis of pre-trained language models for source code. In Proceedings of the 44th International Conference on Software Engineering. 2377\u20132388."},{"key":"e_1_2_1_60_1","volume-title":"2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2409\u20132426","author":"Wang Shu","year":"2023","unstructured":"Shu Wang, Xinda Wang, Kun Sun, Sushil Jajodia, Haining Wang, and Qi Li. 2023. GraphSPD: Graph-based security patch detection with enriched code semantics. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2409\u20132426."},{"key":"e_1_2_1_61_1","volume-title":"MILCOM 2021-2021 IEEE Military Communications Conference (MILCOM). IEEE, 595\u2013600","author":"Wang Xinda","year":"2021","unstructured":"Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun, Sushil Jajodia, Sanae Benchaaboun, and Frank Geck. 2021. Patchrnn: A deep learning-based system for security patch identification. In MILCOM 2021-2021 IEEE Military Communications Conference (MILCOM). IEEE, 595\u2013600."},{"key":"e_1_2_1_62_1","volume-title":"Syncobert: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv preprint arXiv:2108.04556","author":"Wang Xin","year":"2021","unstructured":"Xin Wang, Yasheng Wang, Fei Mi, Pingyi Zhou, Yao Wan, Xiao Liu, Li Li, Hao Wu, Jin Liu, and Xin Jiang. 2021. Syncobert: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv preprint arXiv:2108.04556 (2021)."},{"key":"e_1_2_1_63_1","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic","author":"Wang Yue","year":"2021","unstructured":"Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696\u20138708. https:\/\/doi.org\/10.18653\/V1\/2021.EMNLP-MAIN.685"},{"key":"e_1_2_1_64_1","volume-title":"Magicoder: Source code is all you need. arXiv preprint arXiv:2312.02120","author":"Wei Yuxiang","year":"2023","unstructured":"Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. 2023. Magicoder: Source code is all you need. arXiv preprint arXiv:2312.02120 (2023)."},{"key":"e_1_2_1_65_1","volume-title":"Enhancing security patch identification by capturing structures in commits","author":"Wu Bozhi","year":"2022","unstructured":"Bozhi Wu, Shangqing Liu, Ruitao Feng, Xiaofei Xie, Jingkai Siow, and Shang-Wei Lin. 2022. Enhancing security patch identification by capturing structures in commits. IEEE Transactions on Dependable and Secure Computing (2022)."},{"key":"e_1_2_1_66_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 3733\u20133742","author":"Wu Zhirong","year":"2018","unstructured":"Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3733\u20133742."},{"key":"e_1_2_1_67_1","unstructured":"Shengbin Xu Yuan Yao Feng Xu Tianxiao Gu Hanghang Tong and Jian Lu. 2019. Commit message generation for source code changes. In IJCAI."},{"key":"e_1_2_1_68_1","volume-title":"2014 IEEE international conference on software maintenance and evolution. IEEE, 406\u2013410","author":"Yamauchi Kenji","year":"2014","unstructured":"Kenji Yamauchi, Jiachen Yang, Keisuke Hotta, Yoshiki Higo, and Shinji Kusumoto. 2014. Clustering commits for understanding the intents of implementation. In 2014 IEEE international conference on software maintenance and evolution. IEEE, 406\u2013410."},{"key":"e_1_2_1_69_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 1482\u20131493","author":"Yang Zhou","year":"2022","unstructured":"Zhou Yang, Jieke Shi, Junda He, and David Lo. 2022. Natural attack for pre-trained models of code. In Proceedings of the 44th International Conference on Software Engineering. 1482\u20131493."},{"key":"e_1_2_1_70_1","volume-title":"Yves Le Traon, and Yang Liu","author":"Zhang Jie","year":"2023","unstructured":"Jie Zhang, Wei Ma, Qiang Hu, Xiaofei Xie, Yves Le Traon, and Yang Liu. 2023. RNNS: Representation Nearest Neighbor Search Black-Box Attack on Code Models. arXiv preprint arXiv:2305.05896 (2023)."},{"key":"e_1_2_1_71_1","volume-title":"37th IEEE\/ACM International Conference on Automated Software Engineering. 1\u201312","author":"Zhang Jiyang","year":"2022","unstructured":"Jiyang Zhang, Sheena Panthaplackel, Pengyu Nie, Junyi Jessy Li, and Milos Gligoric. 2022. CoditT5: Pretraining for Source Code and Natural Language Editing. In 37th IEEE\/ACM International Conference on Automated Software Engineering. 1\u201312."},{"key":"e_1_2_1_72_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3468854","article-title":"SPI: Automated Identification of Security Patches via Commits","volume":"31","author":"Zhou Yaqin","year":"2021","unstructured":"Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. 2021. SPI: Automated Identification of Security Patches via Commits. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 1 (2021), 1\u201327.","journal-title":"ACM Transactions on Software Engineering and Methodology (TOSEM)"},{"key":"e_1_2_1_73_1","volume-title":"2023 IEEE\/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA). IEEE, 345\u2013351","author":"Zuo Fei","year":"2023","unstructured":"Fei Zuo, Xin Zhang, Yuqi Song, Junghwan Rhee, and Jicheng Fu. 2023. Commit Message Can Help: Security Patch Detection in Open Source Software via Transformer. In 2023 IEEE\/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA). IEEE, 345\u2013351."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674731","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674731","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:57:50Z","timestamp":1750294670000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674731"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":73,"alternative-id":["10.1145\/3674731"],"URL":"https:\/\/doi.org\/10.1145\/3674731","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2023-11-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-05","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"3674731"}}