{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T12:09:37Z","timestamp":1771330177575,"version":"3.50.1"},"reference-count":110,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,4,9]],"date-time":"2022-04-09T00:00:00Z","timestamp":1649462400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62072399 and U19B2042"],"award-info":[{"award-number":["62072399 and U19B2042"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"MoE Engineering Research Center of Digital Library"},{"name":"Chinese Knowledge Center for Engineering Sciences and Technology"},{"name":"National Engineering Research Center for Big Data Technology and System"},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62102157"],"award-info":[{"award-number":["62102157"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2022,7,31]]},"abstract":"<jats:p>Source code representation learning is the basis of applying artificial intelligence to many software engineering tasks such as code clone detection, algorithm classification, and code summarization. Recently, many works have tried to improve the performance of source code representation from various perspectives, e.g., introducing the structural information of programs into latent representation. However, when dealing with rapidly expanded unlabeled cross-language source code datasets from the Internet, there are still two issues. Firstly, deep learning models for many code-specific tasks still suffer from the lack of high-quality labels. Secondly, the structural differences among programming languages make it more difficult to process multiple languages in a single neural architecture.<\/jats:p>\n          <jats:p>\n            To address these issues, in this article, we propose a novel\n            <jats:underline>Cross<\/jats:underline>\n            -language\n            <jats:underline>Code<\/jats:underline>\n            representation with a large-scale pre-training (\n            <jats:sc>XCode<\/jats:sc>\n            ) method. Concretely, we propose to use several abstract syntax trees and ELMo-enhanced variational autoencoders to obtain multiple pre-trained source code language models trained on about 1.5 million code snippets. To fully utilize the knowledge across programming languages, we further propose a Shared Encoder-Decoder (SED) architecture which uses the multi-teacher single-student method to transfer knowledge from the aforementioned pre-trained models to the distilled SED. The pre-trained models and SED will cooperate to better represent the source code. For evaluation, we examine our approach on three typical downstream cross-language tasks, i.e., source code translation, code clone detection, and code-to-code search, on a real-world dataset composed of programming exercises with multiple solutions. Experimental results demonstrate the effectiveness of our proposed approach on cross-language code representations. Meanwhile, our approach performs significantly better than several code representation baselines on different downstream tasks in terms of multiple automatic evaluation metrics.\n          <\/jats:p>","DOI":"10.1145\/3506696","type":"journal-article","created":{"date-parts":[[2022,4,9]],"date-time":"2022-04-09T13:52:24Z","timestamp":1649512344000},"page":"1-44","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["<scp>XCode<\/scp>\n            : Towards Cross-Language Code Representation with Large-Scale Pre-Training"],"prefix":"10.1145","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4726-7867","authenticated-orcid":false,"given":"Zehao","family":"Lin","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guodun","family":"Li","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingfeng","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yue","family":"Deng","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiangji","family":"Zeng","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yin","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yao","family":"Wan","sequence":"additional","affiliation":[{"name":"School of Computer Sci. and Tech., Huazhong University of Science and Technology, Wuhan, Hubei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,4,9]]},"reference":[{"key":"e_1_3_3_2_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, November 2-4, 2016","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, November 2-4, 2016, Kimberly Keeton and Timothy Roscoe (Eds.), USENIX Association, 265\u2013283. Retrieved from https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/abadi."},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3212695"},{"key":"e_1_3_3_4_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations","author":"Alon Uri","year":"2019","unstructured":"Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=H1gKYo09tX."},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296979.3192412"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290353"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2021.111133"},{"key":"e_1_3_3_8_2","first-page":"1672","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics","author":"Bahuleyan Hareesh","year":"2018","unstructured":"Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, and Pascal Poupart. 2018. Variational attention for sequence-to-sequence models. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, 1672\u20131682. Retrieved from https:\/\/aclanthology.org\/C18-1142."},{"key":"e_1_3_3_9_2","first-page":"65","volume-title":"Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization@ACL 2005, Ann Arbor, Michigan, June 29, 2005","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization@ACL 2005, Ann Arbor, Michigan, June 29, 2005, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare R. Voss (Eds.), Association for Computational Linguistics, 65\u201372. https:\/\/www.aclweb.org\/anthology\/W05-0909\/."},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2019.00021"},{"key":"e_1_3_3_11_2","unstructured":"Avishkar Bhoopchand Tim Rockt\u00e4schel Earl T. Barr and Sebastian Riedel. 2016. Learning python code suggestion with a sparse pointer network. arXiv:1611.08307. Retrieved from http:\/\/arxiv.org\/abs\/1611.08307."},{"key":"e_1_3_3_12_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.) Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html."},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00109"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3340458"},{"key":"e_1_3_3_15_2","first-page":"10390","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada","author":"Casale Francesco Paolo","year":"2018","unstructured":"Francesco Paolo Casale, Adrian V. Dalca, Luca Saglietti, Jennifer Listgarten, and Nicol\u00f3 Fusi. 2018. Gaussian process prior variational autoencoders. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). 10390\u201310401. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/1c336b8080f82bcc2cd2499b4c57261d-Abstract.html."},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394112"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1599"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1254"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3240471"},{"key":"e_1_3_3_20_2","first-page":"2552","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada","author":"Chen Xinyun","year":"2018","unstructured":"Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.), 2552\u20132562. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/d759175de8ea5b1d9a2660e45554894f-Abstract.html."},{"key":"e_1_3_3_21_2","first-page":"7570","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, The 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020","author":"Chi Zewen","year":"2020","unstructured":"Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, and Heyan Huang. 2020. Cross-lingual natural language generation via pre-training. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, The 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020. AAAI Press, 7570\u20137577. Retrieved from https:\/\/aaai.org\/ojs\/index.php\/AAAI\/article\/view\/6256."},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00916"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSM.2013.85"},{"key":"e_1_3_3_24_2","first-page":"7057","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada","author":"Conneau Alexis","year":"2019","unstructured":"Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.), 7057\u20137067. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/c04c19c2c2474dbf5f7ac4372c5b9af1-Abstract.html."},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3213846.3213848"},{"key":"e_1_3_3_26_2","series-title":"Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California","first-page":"1475","volume":"97","author":"Cvitkovic Milan","year":"2019","unstructured":"Milan Cvitkovic, Badal Singh, and Animashree Anandkumar. 2019. Open vocabulary learning on source code with a graph-structured cache. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), PMLR, 1475\u20131485. Retrieved from http:\/\/proceedings.mlr.press\/v97\/cvitkovic19b.html."},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3428293"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1982.1653939"},{"key":"e_1_3_3_29_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2-7, 2019, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.), Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_3_30_2","first-page":"422","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020","author":"Eric Mihail","year":"2020","unstructured":"Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Kumar Goyal, Peter Ku, and Dilek Hakkani-T\u00fcr. 2020. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020, Nicoletta Calzolari, Fr\u00e9d\u00e9ric B\u00e9chet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, H\u00e9l\u00e8ne Mazo, Asunci\u00f3n Moreno, Jan Odijk, and Stelios Piperidis (Eds.), European Language Resources Association, 422\u2013428."},{"key":"e_1_3_3_31_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings","author":"Fan Yang","year":"2018","unstructured":"Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, and Tie-Yan Liu. 2018. Learning to teach. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=HJewuJWCZ."},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1101"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-58347-1_10"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2019.00025"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3401026"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2017.8115618"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180167"},{"key":"e_1_3_3_39_2","unstructured":"Daya Guo Shuo Ren Shuai Lu Zhangyin Feng Duyu Tang Shujie Liu Long Zhou Nan Duan Alexey Svyatkovskiy Shengyu Fu Michele Tufano Shao Kun Deng Colin B. Clement Dawn Drain Neel Sundaresan Jian Yin Daxin Jiang and Ming Zhou. 2021. GraphCodeBERT: Pre-training code representations with data flow. In 9th International Conference on Learning Representations ICLR 2021 Virtual Event Austria May 3\u20137 2021 . OpenReview.net."},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2012.6227135"},{"key":"e_1_3_3_42_2","unstructured":"Geoffrey E. Hinton Oriol Vinyals and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. arxiv:1503.02531. Retrieved from http:\/\/arxiv.org\/abs\/1503.02531."},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.282"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2017.8268911"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3196321.3196334"},{"key":"e_1_3_3_47_2","series-title":"Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017","first-page":"1587","volume":"70","author":"Hu Zhiting","year":"2017","unstructured":"Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P. Xing. 2017. Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.), PMLR, 1587\u20131596. Retrieved from http:\/\/proceedings.mlr.press\/v70\/hu17e.html."},{"key":"e_1_3_3_48_2","first-page":"52","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada","author":"Huang Huaibo","year":"2018","unstructured":"Huaibo Huang, Zhihang Li, Ran He, Zhenan Sun, and Tieniu Tan. 2018. IntroVAE: Introspective variational autoencoders for photographic image synthesis. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.), 52\u201363. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/093f65e080a295f8076b1c5722a46aa2-Abstract.html."},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2019.2920771"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3341525.3387362"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_3_52_2","first-page":"946","volume-title":"Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018","author":"Kim Kisub","year":"2018","unstructured":"Kisub Kim, Dongsun Kim, Tegawend\u00e9 F. Bissyand\u00e9, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: A code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.), ACM, 946\u2013957."},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1139"},{"key":"e_1_3_3_55_2","volume-title":"Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_3_56_2","first-page":"255","volume-title":"Convolutional Networks for Images, Speech, and Time Series","author":"LeCun Yann","year":"1998","unstructured":"Yann LeCun and Yoshua Bengio. 1998. Convolutional Networks for Images, Speech, and Time Series. MIT Press, Cambridge, MA, 255\u2013258."},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00067"},{"key":"e_1_3_3_58_2","first-page":"74","volume-title":"Proceedings of the Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74\u201381."},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1463"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2020.110547"},{"key":"e_1_3_3_61_2","first-page":"2873","volume-title":"Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, 9-15 July 2016","author":"Liu Pengfei","year":"2016","unstructured":"Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, 9-15 July 2016, Subbarao Kambhampati (Ed.), IJCAI\/AAAI Press, 2873\u20132879."},{"key":"e_1_3_3_62_2","unstructured":"Shangqing Liu Cuiyun Gao Sen Chen Lun Yiu Nie and Yang Liu. 2020. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Transactions on Software Engineering 1 (2020) 1\u20131."},{"key":"e_1_3_3_63_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs\/1907.11692."},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.587"},{"key":"e_1_3_3_65_2","unstructured":"Shuai Lu Daya Guo Shuo Ren Junjie Huang Alexey Svyatkovskiy Ambrosio Blanco Colin B. Clement Dawn Drain Daxin Jiang Duyu Tang Ge Li Lidong Zhou Linjun Shou Long Zhou Michele Tufano Ming Gong Ming Zhou Nan Duan Neel Sundaresan Shao Kun Deng Shengyu Fu and Shujie Liu. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) ."},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2019.2934906"},{"key":"e_1_3_3_67_2","volume-title":"Proceedings of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2-4, 2013, Workshop Track Proceedings","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2-4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.), http:\/\/arxiv.org\/abs\/1301.3781."},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10139"},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00099"},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/1082983.1083143"},{"key":"e_1_3_3_71_2","first-page":"422","volume-title":"Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019","author":"Nghi Bui D. Q.","year":"2019","unstructured":"Bui D. Q. Nghi, Yijun Yu, and Lingxiao Jiang. 2019. Bilateral dependency neural networks for cross-language algorithm classification. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, Xinyu Wang, David Lo, and Emad Shihab (Eds.). IEEE, 422\u2013433."},{"key":"e_1_3_3_72_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA311\u2013318. Retrieved from http:\/\/www.aclweb.org\/anthology\/P02-1040.pdf."},{"key":"e_1_3_3_73_2","first-page":"8024","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.), 8024\u20138035. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/bdbca288fee7f92f2bfa9f7012727740-Abstract.html."},{"key":"e_1_3_3_74_2","first-page":"8608","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020","author":"Peng Shuke","year":"2020","unstructured":"Shuke Peng, Feng Ji, Zehao Lin, Shaobo Cui, Haiqing Chen, and Yin Zhang. 2020. MTSS: Learn from multiple domain teachers and become a multi-domain dialogue expert. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020. AAAI Press, 8608\u20138615. Retrieved from https:\/\/aaai.org\/ojs\/index.php\/AAAI\/article\/view\/6384."},{"key":"e_1_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2020.106330"},{"key":"e_1_3_3_76_2","doi-asserted-by":"publisher","DOI":"10.3390\/app10082973"},{"key":"e_1_3_3_77_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual","author":"Rozi\u00e8re Baptiste","year":"2020","unstructured":"Baptiste Rozi\u00e8re, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/ed23fbf18c2cd35f8c7f8de44f85c08d-Abstract.html."},{"key":"e_1_3_3_78_2","doi-asserted-by":"publisher","DOI":"10.1038\/323533a0"},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2019.8851751"},{"key":"e_1_3_3_81_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"e_1_3_3_82_2","series-title":"Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California","first-page":"6105","volume":"97","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), PMLR, 6105\u20136114. Retrieved from http:\/\/proceedings.mlr.press\/v97\/tan19a.html."},{"key":"e_1_3_3_83_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, May 6-9, 2019","author":"Tan Xu","year":"2019","unstructured":"Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2019. Multilingual neural machine translation with knowledge distillation. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, May 6-9, 2019. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=S1gUsoR9YX."},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2977362"},{"key":"e_1_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00791-008-0120-2"},{"key":"e_1_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2017.11.008"},{"key":"e_1_3_3_87_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual","author":"Tran Chau","year":"2020","unstructured":"Chau Tran, Yuqing Tang, Xian Li, and Jiatao Gu. 2020. Cross-lingual retrieval for iterative self-supervised training. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/1763ea5a7e72dd7ee64073c2dda7a7a8-Abstract.html."},{"key":"e_1_3_3_88_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340544"},{"key":"e_1_3_3_89_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 5998\u20136008. Retrieved from http:\/\/papers.nips.cc\/paper\/7181-attention-is-all-you-need."},{"key":"e_1_3_3_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_3_3_91_2","unstructured":"S. VenkataKeerthy R. Aggarwal S. Jain Maunendra Sankar Desarkar Ramakrishna Upadrasta and Y. N. Srikant. 2019. IR2Vec: A flow analysis based scalable infrastructure for program encodings. CoRR abs\/1909.06228."},{"key":"e_1_3_3_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00012"},{"key":"e_1_3_3_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238206"},{"key":"e_1_3_3_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3025580"},{"issue":"4","key":"e_1_3_3_95_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3409331","article-title":"Modular tree network for source code representation learning","volume":"29","author":"Wang Wenhan","year":"2020","unstructured":"Wenhan Wang, Ge Li, Sijie Shen, Xin Xia, and Zhi Jin. 2020. Modular tree network for source code representation learning. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 1\u201323.","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_3_3_96_2","unstructured":"Wenhan Wang Kechi Zhang Ge Li and Zhi Jin. 2020. Learning to represent programs with heterogeneous graphs. CoRR."},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00679"},{"key":"e_1_3_3_98_2","article-title":"SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation","author":"Wang Xin","year":"2021","unstructured":"Xin Wang, Yasheng Wang, Fei Mi, Pingyi Zhou, Yao Wan, Xiao Liu, Li Li, Hao Wu, Jin Liu, and Xin Jiang. 2021. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv:2108.04556. Retrieved from https:\/\/arxiv.org\/abs\/2108.04556.","journal-title":"arXiv:2108.04556"},{"key":"e_1_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2015.38"},{"key":"e_1_3_3_100_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2018.08.002"},{"key":"e_1_3_3_101_2","first-page":"1225","article-title":"Vulnerability detection for source code using contextual LSTM","author":"Xu A.","year":"2018","unstructured":"A. Xu, T. Dai, Huajun Chen, Zhe Ming, and W. Li. 2018. Vulnerability detection for source code using contextual LSTM. In Proceedings of the 2018 5th International Conference on Systems and Informatics. 1225\u20131230.","journal-title":"Proceedings of the 2018 5th International Conference on Systems and Informatics"},{"key":"e_1_3_3_102_2","doi-asserted-by":"publisher","DOI":"10.1145\/2970276.2970357"},{"key":"e_1_3_3_103_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1480"},{"key":"e_1_3_3_104_2","unstructured":"Yanming Yang Xin Xia David Lo and John C. Grundy. 2021. A survey on deep learning for software engineering. ACM Comput. Surv. (2021)."},{"key":"e_1_3_3_105_2","first-page":"5753","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems. 5753\u20135763."},{"key":"e_1_3_3_106_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1070"},{"key":"e_1_3_3_107_2","doi-asserted-by":"publisher","DOI":"10.1109\/CyberC.2016.42"},{"key":"e_1_3_3_108_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.391"},{"key":"e_1_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00049"},{"key":"e_1_3_3_110_2","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236068"},{"key":"e_1_3_3_111_2","first-page":"10197","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Zhou Yaqin","year":"2019","unstructured":"Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 10197\u201310207."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3506696","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3506696","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:50Z","timestamp":1750191110000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3506696"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,9]]},"references-count":110,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,7,31]]}},"alternative-id":["10.1145\/3506696"],"URL":"https:\/\/doi.org\/10.1145\/3506696","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,9]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-04-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}