{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T02:12:59Z","timestamp":1775873579688,"version":"3.50.1"},"reference-count":80,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","license":[{"start":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:00:00Z","timestamp":1720742400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["CNS-2112471"],"award-info":[{"award-number":["CNS-2112471"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["N6600120C4020"],"award-info":[{"award-number":["N6600120C4020"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000183","name":"Army Research Office","doi-asserted-by":"publisher","award":["W911NF2110081"],"award-info":[{"award-number":["W911NF2110081"]}],"id":[{"id":"10.13039\/100000183","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2024,7,12]]},"abstract":"<jats:p>\n                    Code summaries are pivotal in software engineering, serving to improve code readability, maintainability, and collaboration. While recent advancements in Large Language Models (LLMs) have opened new avenues for automatic code summarization, existing metrics for evaluating summary quality, such as BLEU and BERTScore, have notable limitations. Specifically, these existing metrics either fail to capture the nuances of semantic meaning in summaries or are further limited in understanding domain-specific terminologies and expressions prevalent in code summaries. In this paper, we present\n                    <jats:sc>Sim<\/jats:sc>\n                    LLM, a novel LLM-based approach designed to more precisely evaluate the semantic similarity of code summaries. Built upon an autoregressive LLM using a specialized pretraining task on permutated inputs and a pooling-based pairwise similarity measure,\n                    <jats:sc>Sim<\/jats:sc>\n                    LLM overcomes the shortcomings of existing metrics. Our empirical evaluations demonstrate that\n                    <jats:sc>Sim<\/jats:sc>\n                    LLM not only outperforms existing metrics but also shows a significantly high correlation with human ratings.\n                  <\/jats:p>","DOI":"10.1145\/3660769","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T10:22:09Z","timestamp":1720779729000},"page":"1376-1399","source":"Crossref","is-referenced-by-count":9,"title":["SimLLM: Calculating Semantic Similarity in Code Summaries using a Large Language Model-Based Approach"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6525-2821","authenticated-orcid":false,"given":"Xin","family":"Jin","sequence":"first","affiliation":[{"name":"The Ohio State University, Columbus, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6527-5994","authenticated-orcid":false,"given":"Zhiqiang","family":"Lin","sequence":"additional","affiliation":[{"name":"The Ohio State University, Columbus, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,7,12]]},"reference":[{"key":"e_1_3_1_2_1","unstructured":"2024. CodeXGLUE. https:\/\/microsoft.github.io\/CodeXGLUE\/. Accessed: 2024-02-20."},{"key":"e_1_3_1_3_1","unstructured":"2024. EvalPlus Leaderboard. https:\/\/evalplus.github.io\/leaderboard.html. Accessed: 2024-02-20."},{"key":"e_1_3_1_4_1","doi-asserted-by":"publisher","unstructured":"Wasi Uddin Ahmad Saikat Chakraborty Baishakhi Ray and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333 (2021). https:\/\/doi.org\/10.48550\/arXiv.2103.06333 10.48550\/arXiv.2103.06333","DOI":"10.48550\/arXiv.2103.06333"},{"key":"e_1_3_1_5_1","unstructured":"Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and\/or summarization. 65\u201372."},{"key":"e_1_3_1_6_1","doi-asserted-by":"publisher","unstructured":"Victoria Bobicev and Marina Sokolova. 2017. Inter-annotator agreement in sentiment analysis: machine learning perspective. In International Conference Recent Advances in Natural Language Processing. 97\u2013102. https:\/\/doi.org\/10.26615\/978-954-452-049-6_015 10.26615\/978-954-452-049-6_015","DOI":"10.26615\/978-954-452-049-6_015"},{"key":"e_1_3_1_7_1","doi-asserted-by":"publisher","unstructured":"Daniel Cer Yinfei Yang Sheng-yi Kong Nan Hua Nicole Limtiaco Rhomni St John Noah Constant Mario Guajardo-Cespedes Steve Yuan Chris Tar et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018). https:\/\/doi.org\/10.48550\/arXiv.1803.11175 10.48550\/arXiv.1803.11175","DOI":"10.48550\/arXiv.1803.11175"},{"key":"e_1_3_1_8_1","doi-asserted-by":"publisher","unstructured":"Saikat Chakraborty Toufique Ahmed Yangruibo Ding Premkumar T Devanbu and Baishakhi Ray. 2022. Natgen: generative pre-training by \u201cnaturalizing\u201d source code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 18\u201330. https:\/\/doi.org\/10.1145\/3540250.3549162 10.1145\/3540250.3549162","DOI":"10.1145\/3540250.3549162"},{"key":"e_1_3_1_9_1","unstructured":"Stanley F Chen Douglas Beeferman and Roni Rosenfeld. 1998. Evaluation metrics for language models. (1998)."},{"key":"e_1_3_1_10_1","doi-asserted-by":"publisher","unstructured":"Yu Cheng Duo Wang Pan Zhou and Tao Zhang. 2017. A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017). https:\/\/doi.org\/10.48550\/arXiv.1710.09282 10.48550\/arXiv.1710.09282","DOI":"10.48550\/arXiv.1710.09282"},{"key":"e_1_3_1_11_1","doi-asserted-by":"publisher","unstructured":"Elizabeth Clark Asli Celikyilmaz and Noah A Smith. 2019. Sentence mover\u2019s similarity: Automatic evaluation for multisentence texts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2748\u20132760. https:\/\/doi.org\/10.18653\/v1\/P19-1264 10.18653\/v1\/P19-1264","DOI":"10.18653\/v1\/P19-1264"},{"key":"e_1_3_1_12_1","doi-asserted-by":"publisher","unstructured":"Alexis Conneau and Douwe Kiela. 2018. Senteval: An evaluation toolkit for universal sentence representations. arXiv preprint arXiv:1803.05449 (2018). https:\/\/doi.org\/10.48550\/arXiv.1803.05449 10.48550\/arXiv.1803.05449","DOI":"10.48550\/arXiv.1803.05449"},{"key":"e_1_3_1_13_1","doi-asserted-by":"publisher","unstructured":"Alexis Conneau Douwe Kiela Holger Schwenk Loic Barrault and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017). https:\/\/doi.org\/10.48550\/arXiv.1705.02364 10.48550\/arXiv.1705.02364","DOI":"10.48550\/arXiv.1705.02364"},{"key":"e_1_3_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.softx.2024.101677"},{"key":"e_1_3_1_15_1","doi-asserted-by":"publisher","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). https:\/\/doi.org\/10.48550\/arXiv.1810.04805 10.48550\/arXiv.1810.04805","DOI":"10.48550\/arXiv.1810.04805"},{"key":"e_1_3_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.113679"},{"key":"e_1_3_1_17_1","doi-asserted-by":"publisher","unstructured":"Zhangyin Feng Daya Guo Duyu Tang Nan Duan Xiaocheng Feng Ming Gong Linjun Shou Bing Qin Ting Liu Daxin Jiang et al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020). https:\/\/doi.org\/10.48550\/arXiv.2002.08155 10.48550\/arXiv.2002.08155","DOI":"10.48550\/arXiv.2002.08155"},{"key":"e_1_3_1_18_1","article-title":"Policy shaping: Integrating human feedback with reinforcement learning","volume":"26","author":"Griffith Shane","year":"2013","unstructured":"Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L Isbell, and Andrea L Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. Advances in neural information processing systems 26 (2013).","journal-title":"Advances in neural information processing systems"},{"key":"e_1_3_1_19_1","doi-asserted-by":"publisher","unstructured":"Daya Guo Shuo Ren Shuai Lu Zhangyin Feng Duyu Tang Shujie Liu Long Zhou Nan Duan Alexey Svyatkovskiy Shengyu Fu et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020). https:\/\/doi.org\/10.48550\/arXiv.2009.08366 10.48550\/arXiv.2009.08366","DOI":"10.48550\/arXiv.2009.08366"},{"key":"e_1_3_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/WCRE.2010.13"},{"key":"e_1_3_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP46215.2023.10179298"},{"key":"e_1_3_1_22_1","doi-asserted-by":"publisher","unstructured":"Sakib Haque Zachary Eberhart Aakash Bansal and Collin McMillan. 2022. Semantic similarity metrics for evaluating source code summarization. In Proceedings of the 30th IEEE\/ACM International Conference on Program Comprehension. 36\u201347. https:\/\/doi.org\/10.1145\/3524610.3527909 10.1145\/3524610.3527909","DOI":"10.1145\/3524610.3527909"},{"key":"e_1_3_1_23_1","doi-asserted-by":"publisher","unstructured":"Hamel Husain Ho-Hsiang Wu Tiferet Gazit Miltiadis Allamanis and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019). https:\/\/doi.org\/10.48550\/arXiv.1909.09436 10.48550\/arXiv.1909.09436","DOI":"10.48550\/arXiv.1909.09436"},{"key":"e_1_3_1_24_1","doi-asserted-by":"publisher","unstructured":"Xin Jin Jonathan Larson Weiwei Yang and Zhiqiang Lin. 2023. Binary code summarization: Benchmarking chatgpt\/gpt-4 and other large language models. arXiv preprint arXiv:2312.09601 (2023). https:\/\/doi.org\/10.48550\/arXiv.2312.09601 10.48550\/arXiv.2312.09601","DOI":"10.48550\/arXiv.2312.09601"},{"key":"e_1_3_1_25_1","doi-asserted-by":"publisher","unstructured":"Xin Jin and Zhiqiang Lin. 2024. SimLLM artifact. https:\/\/doi.org\/10.5281\/zenodo.11095396 10.5281\/zenodo.11095396","DOI":"10.5281\/zenodo.11095396"},{"key":"e_1_3_1_26_1","doi-asserted-by":"publisher","unstructured":"Xin Jin Kexin Pei Jun Yeon Won and Zhiqiang Lin. 2022. Symlm: Predicting function names in stripped binaries via context-sensitive execution-aware code embeddings. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1631\u20131645. https:\/\/doi.org\/10.1145\/3548606.3560612 10.1145\/3548606.3560612","DOI":"10.1145\/3548606.3560612"},{"key":"e_1_3_1_27_1","doi-asserted-by":"publisher","unstructured":"Xin Jin and Yuchen Wang. 2023. Understand legal documents with contextualized large language models. arXiv preprint arXiv:2303.12135 (2023). https:\/\/doi.org\/10.48550\/arXiv.2303.12135 10.48550\/arXiv.2303.12135","DOI":"10.48550\/arXiv.2303.12135"},{"key":"e_1_3_1_28_1","doi-asserted-by":"publisher","unstructured":"Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). https:\/\/doi.org\/10.48550\/arXiv.1412.6980 10.48550\/arXiv.1412.6980","DOI":"10.48550\/arXiv.1412.6980"},{"key":"e_1_3_1_29_1","doi-asserted-by":"publisher","unstructured":"Denis Kocetkov Raymond Li Loubna Ben Allal Jia Li Chenghao Mou Carlos Mu\u00f1oz Ferrandis Yacine Jernite Margaret Mitchell Sean Hughes Thomas Wolf Dzmitry Bahdanau Leandro von Werra and Harm de Vries. 2022. The Stack: 3 TB of permissively licensed source code. Preprint (2022). https:\/\/doi.org\/10.48550\/arXiv.2211.15533 10.48550\/arXiv.2211.15533","DOI":"10.48550\/arXiv.2211.15533"},{"key":"e_1_3_1_30_1","unstructured":"Klaus Krippendorff. 2011. Computing Krippendorff\u2019s alpha-reliability. (2011)."},{"key":"e_1_3_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2018.05.003"},{"key":"e_1_3_1_32_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1902.01954"},{"key":"e_1_3_1_33_1","doi-asserted-by":"publisher","unstructured":"Alexander LeClair and Collin McMillan. 2019. Recommendations for datasets for source code summarization. arXiv preprint arXiv:1904.02660 (2019). https:\/\/doi.org\/10.48550\/arXiv.1904.02660 10.48550\/arXiv.1904.02660","DOI":"10.48550\/arXiv.1904.02660"},{"key":"e_1_3_1_34_1","article-title":"Neural word embedding as implicit matrix factorization","volume":"27","author":"Levy Omer","year":"2014","unstructured":"Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. Advances in neural information processing systems 27 (2014).","journal-title":"Advances in neural information processing systems"},{"key":"e_1_3_1_35_1","unstructured":"Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74\u201381."},{"key":"e_1_3_1_36_1","doi-asserted-by":"publisher","unstructured":"Shangqing Liu Yanzhou Li Xiaofei Xie and Yang Liu. 2022. Commitbart: A large pre-trained model for github commits. arXiv preprint arXiv:2208.08100 (2022). https:\/\/doi.org\/10.48550\/arXiv.2208.08100 10.48550\/arXiv.2208.08100","DOI":"10.48550\/arXiv.2208.08100"},{"key":"e_1_3_1_37_1","doi-asserted-by":"publisher","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). https:\/\/doi.org\/10.48550\/arXiv.1907.11692 10.48550\/arXiv.1907.11692","DOI":"10.48550\/arXiv.1907.11692"},{"key":"e_1_3_1_38_1","doi-asserted-by":"publisher","unstructured":"Alexandra Luccioni and Joseph Viviano. 2021. What\u2019s in the box? an analysis of undesirable content in the Common Crawl corpus. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 182\u2013189. https:\/\/doi.org\/10.18653\/v1\/2021.acl-short.24 10.18653\/v1\/2021.acl-short.24","DOI":"10.18653\/v1\/2021.acl-short.24"},{"key":"e_1_3_1_39_1","doi-asserted-by":"publisher","unstructured":"Sabrina J Mielke Zaid Alyafeai Elizabeth Salesky Colin Raffel Manan Dey Matthias Gall\u00e9 Arun Raja Chenglei Si Wilson Y Lee Beno\u00eet Sagot et al. 2021. Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP. arXiv preprint arXiv:2112.10508 (2021). https:\/\/doi.org\/10.48550\/arXiv.2112.10508 10.48550\/arXiv.2112.10508","DOI":"10.48550\/arXiv.2112.10508"},{"key":"e_1_3_1_40_1","doi-asserted-by":"publisher","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013). https:\/\/doi.org\/10.48550\/arXiv.1301.3781 10.48550\/arXiv.1301.3781","DOI":"10.48550\/arXiv.1301.3781"},{"key":"e_1_3_1_41_1","unstructured":"OpenAI. 2022. Introducing ChatGPT. https:\/\/openai.com\/blog\/chatgpt. Accessed: 2024-02-20."},{"key":"e_1_3_1_42_1","doi-asserted-by":"publisher","unstructured":"Myle Ott Sergey Edunov Alexei Baevski Angela Fan Sam Gross Nathan Ng David Grangier and Michael Auli. 2019. fairseq: A fast extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019). https:\/\/doi.org\/10.48550\/arXiv.1904.01038 10.48550\/arXiv.1904.01038","DOI":"10.48550\/arXiv.1904.01038"},{"key":"e_1_3_1_43_1","unstructured":"Myle Ott and et. al. 2023. Fast BPE. https:\/\/github.com\/glample\/fastBPE. 09-12-2023."},{"key":"e_1_3_1_44_1","doi-asserted-by":"publisher","unstructured":"Yu Pan Zhichao Xu Levi Taiji Li Yunhe Yang and Mu Zhang. 2023. Automated generation of security-centric descriptions for smart contract bytecode. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1244\u20131256. https:\/\/doi.org\/10.1145\/3597926.3598132 10.1145\/3597926.3598132","DOI":"10.1145\/3597926.3598132"},{"key":"e_1_3_1_45_1","doi-asserted-by":"crossref","unstructured":"Kishore Papineni Salim Roukos Todd Ward and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311\u2013318.","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_1_46_1","article-title":"Pytorch: An imperative style, high-performance deep learning library","volume":"32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).","journal-title":"Advances in neural information processing systems"},{"key":"e_1_3_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_1_48_1","doi-asserted-by":"publisher","unstructured":"Jeffrey Pennington Richard Socher and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532\u20131543. https:\/\/doi.org\/10.3115\/v1\/D14-1162 10.3115\/v1\/D14-1162","DOI":"10.3115\/v1\/D14-1162"},{"issue":"8","key":"e_1_3_1_49_1","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.","journal-title":"OpenAI blog"},{"key":"e_1_3_1_50_1","unstructured":"Peter A Rankel John Conroy Hoa Trang Dang and Ani Nenkova. 2013. A decade of automatic content evaluation of news summaries: Reassessing the state of the art. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 131\u2013136."},{"key":"e_1_3_1_51_1","doi-asserted-by":"publisher","unstructured":"Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019). https:\/\/doi.org\/10.48550\/arXiv.1908.10084 10.48550\/arXiv.1908.10084","DOI":"10.48550\/arXiv.1908.10084"},{"key":"e_1_3_1_52_1","doi-asserted-by":"publisher","DOI":"10.1108\/00220410410560582"},{"key":"e_1_3_1_53_1","doi-asserted-by":"publisher","unstructured":"Devjeet Roy Sarah Fakhoury and Venera Arnaoudova. 2021. Reassessing automatic evaluation metrics for code summarization tasks. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1105\u20131116. https:\/\/doi.org\/10.1145\/3468264.3468588 10.1145\/3468264.3468588","DOI":"10.1145\/3468264.3468588"},{"key":"e_1_3_1_54_1","doi-asserted-by":"publisher","unstructured":"Baptiste Roziere Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Ellen Tan Yossi Adi Jingyu Liu Tal Remez J\u00e9r\u00e9my Rapin et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023). https:\/\/doi.org\/10.48550\/arXiv.2308.12950 10.48550\/arXiv.2308.12950","DOI":"10.48550\/arXiv.2308.12950"},{"key":"e_1_3_1_55_1","first-page":"859","volume-title":"LREC","author":"Sabou Marta","year":"2014","unstructured":"Marta Sabou, Kalina Bontcheva, Leon Derczynski, and Arno Scharl. 2014. Corpus annotation through crowdsourcing: Towards best practice guidelines. In LREC. Citeseer, 859\u2013866."},{"key":"e_1_3_1_56_1","doi-asserted-by":"publisher","unstructured":"Rico Sennrich Barry Haddow and Alexandra Birch. 2015. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015). https:\/\/doi.org\/10.48550\/arXiv.1508.07909 10.48550\/arXiv.1508.07909","DOI":"10.48550\/arXiv.1508.07909"},{"key":"e_1_3_1_57_1","doi-asserted-by":"publisher","unstructured":"Tushar Sharma Maria Kechagia Stefanos Georgiou Rohit Tiwari Indira Vats Hadi Moazen and Federica Sarro. 2021. A survey on machine learning techniques for source code analysis. arXiv preprint arXiv:2110.09610 (2021). https:\/\/doi.org\/10.48550\/arXiv.2110.09610 10.48550\/arXiv.2110.09610","DOI":"10.48550\/arXiv.2110.09610"},{"key":"e_1_3_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/QRS54544.2021.00108"},{"key":"e_1_3_1_59_1","doi-asserted-by":"publisher","unstructured":"Lin Shi Fangwen Mu Xiao Chen Song Wang Junjie Wang Ye Yang Ge Li Xin Xia and Qing Wang. 2022. Are we building on the rock? on the importance of data preprocessing for code summarization. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 107\u2013119. https:\/\/doi.org\/10.1145\/3540250.3549145 10.1145\/3540250.3549145","DOI":"10.1145\/3540250.3549145"},{"key":"e_1_3_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2003.1223656"},{"key":"e_1_3_1_61_1","unstructured":"Jikyoeng Son Joonghyuk Hahn HyeonTae Seo and Yo-Sub Han. 2022. Boosting code summarization by embedding code structures. In Proceedings of the 29th International Conference on Computational Linguistics. 5966\u20135977."},{"key":"e_1_3_1_62_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2004.09297"},{"key":"e_1_3_1_63_1","doi-asserted-by":"publisher","unstructured":"Jing Su Chufeng Jiang Xin Jin Yuxin Qiao Tingsong Xiao Hongda Ma Rong Wei Zhi Jing Jiajun Xu and Junhong Lin. 2024. Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review. arXiv preprint arXiv:2402.10350 (2024). https:\/\/doi.org\/10.48550\/arXiv.2402.10350 10.48550\/arXiv.2402.10350","DOI":"10.48550\/arXiv.2402.10350"},{"key":"e_1_3_1_64_1","doi-asserted-by":"publisher","unstructured":"Weisong Sun Chunrong Fang Yudu You Yun Miao Yi Liu Yuekang Li Gelei Deng Shenghan Huang Yuchen Chen Quanjun Zhang et al. 2023. Automatic Code Summarization via ChatGPT: How Far Are We? arXiv preprint arXiv:2305.12865 (2023). https:\/\/doi.org\/10.48550\/arXiv.2305.12865 10.48550\/arXiv.2305.12865","DOI":"10.48550\/arXiv.2305.12865"},{"key":"e_1_3_1_65_1","doi-asserted-by":"publisher","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023). https:\/\/doi.org\/10.48550\/arXiv.2307.09288 10.48550\/arXiv.2307.09288","DOI":"10.48550\/arXiv.2307.09288"},{"key":"e_1_3_1_66_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-019-0686-2"},{"key":"e_1_3_1_67_1","doi-asserted-by":"publisher","unstructured":"Chaozheng Wang Yuanhang Yang Cuiyun Gao Yun Peng Hongyu Zhang and Michael R Lyu. 2023. Prompt Tuning in Code Intelligence: An Experimental Evaluation. IEEE Transactions on Software Engineering (2023). https:\/\/doi.org\/10.1109\/TSE.2023.3313881 10.1109\/TSE.2023.3313881","DOI":"10.1109\/TSE.2023.3313881"},{"key":"e_1_3_1_68_1","doi-asserted-by":"publisher","unstructured":"Deze Wang Zhouyang Jia Shanshan Li Yue Yu Yun Xiong Wei Dong and Xiangke Liao. 2022. Bridging pre-trained models and downstream tasks for source code understanding. In Proceedings of the 44th International Conference on Software Engineering. 287\u2013298. https:\/\/doi.org\/10.48550\/arXiv.2112.02268 10.48550\/arXiv.2112.02268","DOI":"10.48550\/arXiv.2112.02268"},{"key":"e_1_3_1_69_1","doi-asserted-by":"publisher","unstructured":"Kexin Wang Nils Reimers and Iryna Gurevych. 2021a. Tsdae: Using transformer-based sequential denoising auto-encoder for unsupervised sentence embedding learning. arXiv preprint arXiv:2104.06979 (2021). https:\/\/doi.org\/10.48550\/arXiv.2104.06979 10.48550\/arXiv.2104.06979","DOI":"10.48550\/arXiv.2104.06979"},{"key":"e_1_3_1_70_1","doi-asserted-by":"publisher","unstructured":"Kexin Wang Nandan Thakur Nils Reimers and Iryna Gurevych. 2021b. Gpl: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval. arXiv preprint arXiv:2112.07577 (2021). https:\/\/doi.org\/10.48550\/arXiv.2112.07577 10.48550\/arXiv.2112.07577","DOI":"10.48550\/arXiv.2112.07577"},{"key":"e_1_3_1_71_1","doi-asserted-by":"publisher","unstructured":"Yue Wang Weishi Wang Shafiq Joty and Steven CH Hoi. 2021c. Codet5: Identifier-aware unified pre-trained encoderdecoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021). https:\/\/doi.org\/10.48550\/arXiv.2109.00859 10.48550\/arXiv.2109.00859","DOI":"10.48550\/arXiv.2109.00859"},{"key":"e_1_3_1_72_1","doi-asserted-by":"publisher","unstructured":"Yuxiang Wei Zhe Wang Jiawei Liu Yifeng Ding and Lingming Zhang. 2023. Magicoder: Source code is all you need. arXiv preprint arXiv:2312.02120 (2023). https:\/\/doi.org\/10.48550\/arXiv.2312.02120 10.48550\/arXiv.2312.02120","DOI":"10.48550\/arXiv.2312.02120"},{"key":"e_1_3_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/SANER.2015.7081848"},{"key":"e_1_3_1_74_1","doi-asserted-by":"publisher","unstructured":"Wei Wu Houfeng Wang Tianyu Liu and Shuming Ma. 2018. Phrase-level self-attention networks for universal sentence encoding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3729\u20133738. https:\/\/doi.org\/10.18653\/v1\/D18-1408 10.18653\/v1\/D18-1408","DOI":"10.18653\/v1\/D18-1408"},{"key":"e_1_3_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3505243"},{"key":"e_1_3_1_76_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1906.08237"},{"key":"e_1_3_1_77_1","doi-asserted-by":"publisher","unstructured":"Yang You Jing Li Sashank Reddi Jonathan Hseu Sanjiv Kumar Srinadh Bhojanapalli Xiaodan Song James Demmel Kurt Keutzer and Cho-Jui Hsieh. 2019. Large batch optimization for deep learning: Training bert in 76 minutes. arXiv preprint arXiv:1904.00962 (2019). https:\/\/doi.org\/10.48550\/arXiv.1904.00962 10.48550\/arXiv.1904.00962","DOI":"10.48550\/arXiv.1904.00962"},{"key":"e_1_3_1_78_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco_a_01199"},{"key":"e_1_3_1_79_1","doi-asserted-by":"publisher","unstructured":"Tianyi Zhang Varsha Kishore Felix Wu Kilian Q Weinberger and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019). https:\/\/doi.org\/10.48550\/arXiv.1904.09675 10.48550\/arXiv.1904.09675","DOI":"10.48550\/arXiv.1904.09675"},{"key":"e_1_3_1_80_1","doi-asserted-by":"publisher","unstructured":"Shuyan Zhou Uri Alon Sumit Agarwal and Graham Neubig. 2023. Codebertscore: Evaluating code generation with pretrained models of code. arXiv preprint arXiv:2302.05527 (2023). https:\/\/doi.org\/10.48550\/arXiv.2302.05527 10.48550\/arXiv.2302.05527","DOI":"10.48550\/arXiv.2302.05527"},{"key":"e_1_3_1_81_1","volume-title":"CRC standard probability and statistics tables and formulae","author":"Zwillinger Daniel","year":"1999","unstructured":"Daniel Zwillinger and Stephen Kokoska. 1999. CRC standard probability and statistics tables and formulae. Crc Press."}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660769","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3660769","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3660769","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T08:05:17Z","timestamp":1770192317000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660769"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,12]]},"references-count":80,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2024,7,12]]}},"alternative-id":["10.1145\/3660769"],"URL":"https:\/\/doi.org\/10.1145\/3660769","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,12]]}}}