{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,30]],"date-time":"2026-06-30T15:59:21Z","timestamp":1782835161222,"version":"3.54.5"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,1,16]],"date-time":"2025-01-16T00:00:00Z","timestamp":1736985600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62176053, 62376130"],"award-info":[{"award-number":["62176053, 62376130"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100007129","name":"Natural Science Foundation of Shandong Province","doi-asserted-by":"crossref","award":["ZR2022MF333"],"award-info":[{"award-number":["ZR2022MF333"]}],"id":[{"id":"10.13039\/501100007129","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Program of New Twenty Policies for Universities of Jinan","award":["202333008"],"award-info":[{"award-number":["202333008"]}]},{"name":"Big Data Computing Center of Southeast University"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>\n            Efficient code search techniques are crucial in accelerating software development by aiding developers in locating specific code snippets and understanding code functionalities. This study investigates code search methodologies, focusing on the emerging significance of semantic consistency in data augmentation techniques. While existing approaches predominantly enhance raw data, often requiring additional preprocessing and incurring higher training costs, this research introduces a pioneering method operating at the code and query representation levels. By bypassing the need for extensive data processing, this novel approach fosters an interactive alignment between code and query, augmenting the semantic coherence crucial for effective code search. An extensive empirical evaluation of a diverse dataset across multiple programming languages substantiates the efficacy of this approach in significantly enhancing code search model performance compared to traditional methodologies. The implementation is publicly available on GitHub,\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n            offering an accessible resource for further exploration and application.\n          <\/jats:p>","DOI":"10.1145\/3686151","type":"journal-article","created":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T16:11:15Z","timestamp":1722528675000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["SECON: Maintaining Semantic Consistency in Data Augmentation for Code Search"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9294-2841","authenticated-orcid":false,"given":"Xu","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, Southeast University, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5547-8680","authenticated-orcid":false,"given":"Zexu","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, Southeast University, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-4012-2720","authenticated-orcid":false,"given":"Xiaoyu","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, Southeast University, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0339-8573","authenticated-orcid":false,"given":"Jianlei","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1840-3540","authenticated-orcid":false,"given":"Wenpeng","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7702-9387","authenticated-orcid":false,"given":"De-Yu","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, Southeast University, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,1,16]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"2655","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Ahmad Wasi","year":"2021","unstructured":"Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2655\u20132668."},{"key":"e_1_3_2_3_2","volume-title":"Proceedings of the 10th International Conference on Learning Representations","author":"Bardes Adrien","year":"2022","unstructured":"Adrien Bardes, Jean Ponce, and Yann Lecun. 2022. VICReg: Variance-invariance-covariance regularization for self-supervised learning. In Proceedings of the 10th International Conference on Learning Representations."},{"key":"e_1_3_2_4_2","first-page":"511","volume-title":"Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Bui Nghi DQ","year":"2021","unstructured":"Nghi DQ Bui, Yijun Yu, and Lingxiao Jiang. 2021. Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 511\u2013521."},{"key":"e_1_3_2_5_2","first-page":"964","volume-title":"Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering","author":"Cambronero Jose","year":"2019","unstructured":"Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 964\u2013974."},{"key":"e_1_3_2_6_2","volume-title":"Convolutional Neural Network for Sentence Classification","author":"Chen Yahui","year":"2015","unstructured":"Yahui Chen. 2015. Convolutional Neural Network for Sentence Classification. Master\u2019s thesis. University of Waterloo."},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1145\/3524610.3527889","volume-title":"Proceedings of the 30th IEEE\/ACM International Conference on Program Comprehension","author":"Cheng Yi","year":"2022","unstructured":"Yi Cheng and Li Kuang. 2022. CSRS: Code search with relevance matching and semantic matching. In Proceedings of the 30th IEEE\/ACM International Conference on Program Comprehension, 533\u2013542."},{"key":"e_1_3_2_8_2","first-page":"1724","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho, Bart van Merri\u00ebnboer, \u00c7a\u011flar Gul\u00e7ehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder\u2013decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 1724\u20131734."},{"key":"e_1_3_2_9_2","first-page":"5490","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Choi YunSeok","year":"2022","unstructured":"YunSeok Choi, Hyojun Kim, and Jee-Hyong Lee. 2022. TABS: Efficient textual adversarial attack for pre-trained NL code model using semantic beam search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 5490\u20135498."},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1109\/SANER53432.2022.00055","volume-title":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","author":"Deng Zhongyang","year":"2022","unstructured":"Zhongyang Deng, Ling Xu, Chao Liu, Meng Yan, Zhou Xu, and Yan Lei. 2022. Fine-grained co-attentive representation learning for semantic code search. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 396\u2013407."},{"key":"e_1_3_2_11_2","first-page":"4171","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, Long and Short Papers, 4171\u20134186."},{"key":"e_1_3_2_12_2","unstructured":"Hande Dong Jiayi Lin Yichong Leng Jiawei Chen and Yutao Xie. 2023. Retriever and ranker framework with probabilistic hard negative sampling for code search. arXiv:2305.04508. Retrieved from https:\/\/arxiv.org\/abs\/2305.04508"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"1536","DOI":"10.18653\/v1\/2020.findings-emnlp.139","volume-title":"Proceedings of the Conference on Findings of the Association for Computational Linguistics (EMNLP \u201920)","author":"Feng Zhangyin","year":"2020","unstructured":"Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Proceedings of the Conference on Findings of the Association for Computational Linguistics (EMNLP \u201920), 1536\u20131547."},{"key":"e_1_3_2_14_2","first-page":"528","volume-title":"Proceedings of the International Conference on Software Engineering & Knowledge Engineering (SEKE)","author":"Gu Lian","year":"2021","unstructured":"Lian Gu, Zihui Wang, Jiaxin Liu, Yating Zhang, Dong Yang, and Wei Dong. 2021. MACA: A residual network with multi-attention and core attributes for code search. In Proceedings of the International Conference on Software Engineering & Knowledge Engineering (SEKE), 528\u2013531."},{"key":"e_1_3_2_15_2","first-page":"933","volume-title":"Proceedings of the 40th International Conference on Software Engineering","author":"Gu Xiaodong","year":"2018","unstructured":"Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering, 933\u2013944."},{"key":"e_1_3_2_16_2","first-page":"7212","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics","volume":"1","author":"Guo Daya","year":"2022","unstructured":"Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified cross-modal pre-training for code representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Vol. 1, Long Papers, 7212\u20137225."},{"key":"e_1_3_2_17_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Guo Daya","year":"2020","unstructured":"Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, LIU Shujie, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2020. GraphCodeBERT: Pre-training code representations with data flow. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_18_2","first-page":"994","volume-title":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","author":"Hu Fan","year":"2023","unstructured":"Fan Hu, Yanlin Wang, Lun Du, Xirong Li, Hongyu Zhang, Shi Han, and Dongmei Zhang. 2023. Revisiting code search in a two-stage paradigm. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 994\u20131002."},{"key":"e_1_3_2_19_2","first-page":"13722","volume-title":"Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING \u201924)","author":"Hu Xiaoyu","year":"2024","unstructured":"Xiaoyu Hu, Xu Zhang, Zexu Lin, and Deyu Zhou. 2024. Reduce redundancy then rerank: Enhancing code summarization with a novel pipeline framework. In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING \u201924), 13722\u201313733."},{"key":"e_1_3_2_20_2","unstructured":"Kevin H Huang Peter Orbanz and Morgane Austern. 2022. Quantifying the effects of data augmentation. arXiv:2202.09134. Retrieved from https:\/\/arxiv.org\/abs\/2202.09134"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1145\/3609437.3609439","volume-title":"Proceedings of the 14th Asia-Pacific Symposium on Internetware","author":"Huang Xiangbing","year":"2023","unstructured":"Xiangbing Huang, Yingwei Ma, Haifang Zhou, Zhijie Jiang, Yuanliang Zhang, Teng Wang, and Shanshan Li. 2023. Towards better multilingual code search through cross-lingual contrastive learning. In Proceedings of the 14th Asia-Pacific Symposium on Internetware, 22\u201332."},{"key":"e_1_3_2_22_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Humeau Samuel","year":"2020","unstructured":"Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2020. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_23_2","unstructured":"Hamel Husain Ho-Hsiang Wu Tiferet Gazit Miltiadis Allamanis and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv:1909.09436. Retrieved from https:\/\/arxiv.org\/abs\/1909.09436"},{"key":"e_1_3_2_24_2","unstructured":"Abdullah Al Ishtiaq Masum Hasan Md Mahim Anjum Haque Kazi Sajeed Mehrab Tanveer Muttaqueen Tahmid Hasan Anindya Iqbal and Rifat Shahriyar. 2021. Bert2code: Can pretrained language models be leveraged for code search? arXiv:2104.08017. Retrieved from https:\/\/arxiv.org\/abs\/2104.08017"},{"key":"e_1_3_2_25_2","first-page":"1681","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing","volume":"1","author":"Iyyer Mohit","year":"2015","unstructured":"Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daum\u00e9 III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 1, Long Papers, 1681\u20131691."},{"key":"e_1_3_2_26_2","first-page":"547","article-title":"\u00c9tude comparative de la distribution florale dans une portion des Alpes et des Jura","volume":"37","author":"Jaccard Paul","year":"1901","unstructured":"Paul Jaccard. 1901. \u00c9tude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles 37 (1901), 547\u2013579.","journal-title":"Bulletin de la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles"},{"key":"e_1_3_2_27_2","first-page":"5954","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Jain Paras","year":"2021","unstructured":"Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph Gonzalez, and Ion Stoica. 2021. Contrastive code representation learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 5954\u20135971."},{"key":"e_1_3_2_28_2","first-page":"92","volume-title":"Proceedings of the IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","author":"Jiang Renhe","year":"2018","unstructured":"Renhe Jiang, Zhengzhao Chen, Zejun Zhang, Yu Pei, Minxue Pan, and Tian Zhang. 2018. Semantics-based code search using input\/output examples. In Proceedings of the IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 92\u2013102."},{"key":"e_1_3_2_29_2","first-page":"4924","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Li Haochen","year":"2022","unstructured":"Haochen Li, Chunyan Miao, Cyril Leung, Yanxian Huang, Yuan Huang, Hongyu Zhang, and Yanlin Wang. 2022c. Exploring representation-level augmentation for code search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 4924\u20134936."},{"key":"e_1_3_2_30_2","first-page":"115","volume-title":"Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME)","author":"Li Wei","year":"2020","unstructured":"Wei Li, Haozhe Qin, Shuhan Yan, Beijun Shen, and Yuting Chen. 2020. Learning code-query interaction for enhancing code searches. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 115\u2013126."},{"key":"e_1_3_2_31_2","first-page":"2898","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Li Xiaonan","year":"2022","unstructured":"Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, and Nan Duan. 2022. CodeRetriever: A large scale contrastive pre-training method for code search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2898\u20132910."},{"key":"e_1_3_2_32_2","first-page":"118","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics (EMNLP \u201922)","author":"Li Xiaonan","year":"2022","unstructured":"Xiaonan Li, Daya Guo, Yeyun Gong, Yun Lin, Yelong Shen, Xipeng Qiu, Daxin Jiang, Weizhu Chen, and Nan Duan. 2022. Soft-labeled contrastive pre-training for function-level code representation. In Proceedings of the Findings of the Association for Computational Linguistics (EMNLP \u201922), 118\u2013129."},{"issue":"5","key":"e_1_3_2_33_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3447571","article-title":"Deep graph matching and searching for semantic code retrieval","volume":"15","author":"Ling Xiang","year":"2021","unstructured":"Xiang Ling, Lingfei Wu, Saizhuo Wang, Gaoning Pan, Tengfei Ma, Fangli Xu, Alex X Liu, Chunming Wu, and Shouling Ji. 2021. Deep graph matching and searching for semantic code retrieval. ACM Transactions on Knowledge Discovery from Data 15, 5 (2021), 1\u201321.","journal-title":"ACM Transactions on Knowledge Discovery from Data"},{"key":"e_1_3_2_34_2","first-page":"1642","volume-title":"Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering","author":"Liu Chao","year":"2022","unstructured":"Chao Liu, Xuanlin Bao, Xin Xia, Meng Yan, David Lo, and Ting Zhang. 2022. CodeMatcher: A tool for large-scale code search based on query semantics matching. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1642\u20131646."},{"key":"e_1_3_2_35_2","volume-title":"Proceedings of the 45th International Conference on Software Engineering","author":"Liu Shangqing","year":"2023","unstructured":"Shangqing Liu, Bozhi Wu, Xiaofei Xie, Guozhu Meng, and Yang Liu. 2023a. ContraBERT: Enhancing code pre-trained models via contrastive learning. In Proceedings of the 45th International Conference on Software Engineering."},{"issue":"4","key":"e_1_3_2_36_2","doi-asserted-by":"crossref","first-page":"2839","DOI":"10.1109\/TSE.2022.3233901","article-title":"Graphsearchnet: Enhancing gnns via capturing global dependencies for semantic code search","volume":"49","author":"Liu Shangqing","year":"2023","unstructured":"Shangqing Liu, Xiaofei Xie, Jingkai Siow, Lei Ma, Guozhu Meng, and Yang Liu. 2023b. Graphsearchnet: Enhancing gnns via capturing global dependencies for semantic code search. IEEE Transactions on Software Engineering 49, 4 (2023), 2839\u2013285.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_37_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_38_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Loshchilov Ilya","year":"2017","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Fixing weight decay regularization in adam. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_39_2","volume-title":"An Introduction to Information Retrieval","author":"Manning Christopher D.","year":"2009","unstructured":"Christopher D. Manning. 2009. An Introduction to Information Retrieval. Cambridge University Press."},{"key":"e_1_3_2_40_2","first-page":"2006","volume-title":"Proceedings of the 44th International Conference on Software Engineering","author":"Niu Changan","year":"2022","unstructured":"Changan Niu, Chuanyi Li, Vincent Ng, Jidong Ge, Liguo Huang, and Bin Luo. 2022. SPT-code: Sequence-to-sequence pre-training for learning source code representations. In Proceedings of the 44th International Conference on Software Engineering, 2006\u20132018."},{"key":"e_1_3_2_41_2","first-page":"1","volume-title":"Proceedings of the Swedish Artificial Intelligence Society Workshop (SAIS)","author":"Pe\u00f1a Francisco J.","year":"2022","unstructured":"Francisco J. Pe\u00f1a, Angel Luis Gonzalez, Sepideh Pashami, Ahmad Al-Shishtawy, and Amir H. Payberah. 2022. Siambert: Siamese Bert-based code search. In Proceedings of the Swedish Artificial Intelligence Society Workshop (SAIS). IEEE, 1\u20137."},{"key":"e_1_3_2_42_2","first-page":"243","volume-title":"Proceedings of the IEEE 31st International Conference on Software Engineering","author":"Reiss Steven P.","year":"2009","unstructured":"Steven P. Reiss. 2009. Semantics-based code search. In Proceedings of the IEEE 31st International Conference on Software Engineering. IEEE, 243\u2013253."},{"issue":"3","key":"e_1_3_2_43_2","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1002\/asi.4630270302","article-title":"Relevance weighting of search terms","volume":"27","author":"Robertson Stephen E.","year":"1976","unstructured":"Stephen E. Robertson and K. Sparck Jones. 1976. Relevance weighting of search terms. Journal of the American Society for Information Science 27, 3 (1976), 129\u2013146.","journal-title":"Journal of the American Society for Information Science"},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/3211346.3211353","volume-title":"Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages","author":"Sachdev Saksham","year":"2018","unstructured":"Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on source code: A neural code search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 31\u201341."},{"key":"e_1_3_2_45_2","first-page":"29542","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Sachidananda Vin","year":"2023","unstructured":"Vin Sachidananda, Ziyi Yang, and Chenguang Zhu. 2023. Global selection of contrastive batches via optimization on sample permutations. In Proceedings of the International Conference on Machine Learning. PMLR, 29542\u201329562."},{"issue":"4","key":"e_1_3_2_46_2","doi-asserted-by":"crossref","first-page":"1804","DOI":"10.1109\/TSE.2022.3192755","article-title":"On the effectiveness of transfer learning for code search","volume":"49","author":"Salza Pasquale","year":"2023","unstructured":"Pasquale Salza, Christoph Schwizer, Jian Gu, and Harald C. Gall. 2023. On the effectiveness of transfer learning for code search. IEEE Transactions on Software Engineering 49, 4 (2023), 1804\u20131822.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_47_2","first-page":"1715","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics","volume":"1","author":"Sennrich Rico","year":"2016","unstructured":"Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Vol. 1, Long Papers, 1715\u20131725."},{"key":"e_1_3_2_48_2","first-page":"2198","volume-title":"Proceedings of the IEEE\/ACM 45th International Conference on Software Engineering (ICSE)","author":"Shi Ensheng","year":"2023","unstructured":"Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2023a. Cocosoda: Effective contrastive learning for code search. In Proceedings of the IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2198\u20132210."},{"key":"e_1_3_2_49_2","first-page":"94","volume-title":"Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME)","author":"Shi Zejian","year":"2022","unstructured":"Zejian Shi, Yun Xiong, Xiaolong Zhang, Yao Zhang, Shanshan Li, and Yangyong Zhu. 2022. Cross-modal contrastive learning for code search. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 94\u2013105."},{"key":"e_1_3_2_50_2","first-page":"280","volume-title":"Proceedings of the IEEE\/ACM 31st International Conference on Program Comprehension (ICPC)","author":"Shi Zejian","year":"2023","unstructured":"Zejian Shi, Yun Xiong, Yao Zhang, Zhijie Jiang, Jinjing Zhao, Lei Wang, and Shanshan Li. 2023. Improving code search with multi-modal momentum contrastive learning. In Proceedings of the IEEE\/ACM 31st International Conference on Program Comprehension (ICPC). IEEE, 280\u2013291."},{"key":"e_1_3_2_51_2","volume-title":"Proceedings of the 31st IEEE\/ACM International Conference on Program Comprehension","author":"Shi Zejian","year":"2023","unstructured":"Zejian Shi, Yun Xiong, Yao Zhang, Zhijie Jiang, Jinjing Zhao, Lei Wang, and Shanshan Li. 2023. Improving code search with multi-modal momentum contrastive learning. In Proceedings of the 31st IEEE\/ACM International Conference on Program Comprehension."},{"key":"e_1_3_2_52_2","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1145\/3387904.3389269","volume-title":"Proceedings of the 28th International Conference on Program Comprehension","author":"Shuai Jianhang","year":"2020","unstructured":"Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, and Yan Lei. 2020. Improving code search with co-attentive representation learning. In Proceedings of the 28th International Conference on Program Comprehension, 196\u2013207."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1007\/978-3-030-32381-3_16","volume-title":"Proceedings of the 18th China National Conference on Chinese Computational Linguistics (CCL \u201919)","author":"Sun Chi","year":"2019","unstructured":"Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification?. In Proceedings of the 18th China National Conference on Chinese Computational Linguistics (CCL \u201919). Springer, 194\u2013206."},{"key":"e_1_3_2_54_2","first-page":"388","volume-title":"Proceedings of the 44th International Conference on Software Engineering (ICSE \u201922)","author":"Sun Weisong","year":"2022","unstructured":"Weisong Sun, Chunrong Fang, Yuchen Chen, Guanhong Tao, Tingxu Han, and Quanjun Zhang. 2022. Code search based on context-aware code translation. In Proceedings of the 44th International Conference on Software Engineering (ICSE \u201922). ACM, New York, NY, 388\u2013400."},{"key":"e_1_3_2_55_2","first-page":"1","volume-title":"Proceedings of the International Joint Conference on Neural Networks (IJCNN)","author":"Tan Cong","year":"2023","unstructured":"Cong Tan and Shun Yang. 2023. Fine-grained similarity matching with a similarity filtration pyramid for code search. In Proceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, 1\u20138."},{"key":"e_1_3_2_56_2","first-page":"6000","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000\u20136010."},{"key":"e_1_3_2_57_2","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.neunet.2021.09.025","article-title":"Enriching query semantics for code search with reinforcement learning","volume":"145","author":"Wang Chaozheng","year":"2022","unstructured":"Chaozheng Wang, Zhenhao Nong, Cuiyun Gao, Zongjie Li, Jichuan Zeng, Zhenchang Xing, and Yang Liu. 2022. Enriching query semantics for code search with reinforcement learning. Neural Networks 145 (2022), 22\u201332.","journal-title":"Neural Networks"},{"key":"e_1_3_2_58_2","first-page":"5","volume-title":"Proceedings of the 45th International Conference on Software Engineering","author":"Wang Deze","year":"2023","unstructured":"Deze Wang, Boxing Chen, Shanshan Li, Wei Luo, Shaoliang Peng, Wei Dong, and Xiangke Liao. 2023. One adapter for all programming languages? Adapter tuning for code search and summarization. In Proceedings of the 45th International Conference on Software Engineering, 5\u201316."},{"key":"e_1_3_2_59_2","unstructured":"Xin Wang Yasheng Wang Fei Mi Pingyi Zhou Yao Wan Xiao Liu Li Li Hao Wu Jin Liu and Xin Jiang. 2021b. Syncobert: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv:2108.04556. Retrieved from https:\/\/arxiv.org\/abs\/2108.04556"},{"key":"e_1_3_2_60_2","doi-asserted-by":"crossref","first-page":"8696","DOI":"10.18653\/v1\/2021.emnlp-main.685","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Wang Yue","year":"2021","unstructured":"Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 8696\u20138708."},{"key":"e_1_3_2_61_2","first-page":"13005","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision And Pattern Recognition","author":"Wei Jiwei","year":"2020","unstructured":"Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, and Heng Tao Shen. 2020. Universal weighting metric learning for cross-modal matching. In Proceedings of the IEEE\/CVF Conference on Computer Vision And Pattern Recognition, 13005\u201313014."},{"key":"e_1_3_2_62_2","first-page":"342","volume-title":"Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","author":"Xu Ling","year":"2021","unstructured":"Ling Xu, Huanhuan Yang, Chao Liu, Jianhang Shuai, Meng Yan, Yan Lei, and Zhou Xu. 2021. Two-stage attention-based model for code search with textual and structural features. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 342\u2013353."},{"key":"e_1_3_2_63_2","first-page":"344","volume-title":"Proceedings of the IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)","author":"Yan Shuhan","year":"2020","unstructured":"Shuhan Yan, Hang Yu, Yuting Chen, Beijun Shen, and Lingxiao Jiang. 2020. Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries. In Proceedings of the IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 344\u2013354."},{"key":"e_1_3_2_64_2","first-page":"5065","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing","volume":"1","author":"Yan Yuanmeng","year":"2021","unstructured":"Yuanmeng Yan, Rumei Li, Sirui Wang, Fuzheng Zhang, Wei Wu, and Weiran Xu. 2021. ConSERT: A contrastive framework for self-supervised sentence representation transfer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol. 1, Long Papers, 5065\u20135075."},{"key":"e_1_3_2_65_2","first-page":"348","volume-title":"Proceedings of the 29th Asia-Pacific Software Engineering Conference (APSEC)","author":"Yang Shun","year":"2022","unstructured":"Shun Yang and Bo Cai. 2022. Multi-perspective alignment mechanism for code search. In Proceedings of the 29th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 348\u2013356."},{"key":"e_1_3_2_66_2","first-page":"12310","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Zbontar Jure","year":"2021","unstructured":"Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and St\u00e9phane Deny. 2021. Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of the International Conference on Machine Learning. PMLR, 12310\u201312320."},{"issue":"2","key":"e_1_3_2_67_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3546066","article-title":"degraphcs: Embedding variable-based flow graph for neural code search","volume":"32","author":"Zeng Chen","year":"2023","unstructured":"Chen Zeng, Yue Yu, Shanshan Li, Xin Xia, Zhiming Wang, Mingyang Geng, Linxiao Bai, Wei Dong, and Xiangke Liao. 2023. degraphcs: Embedding variable-based flow graph for neural code search. ACM Transactions on Software Engineering and Methodology 32, 2 (2023), 1\u201327.","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_3_2_68_2","first-page":"1136","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics","author":"Zhang Xu","year":"2022","unstructured":"Xu Zhang, Zejie Liu, Yanzheng Xiang, and Deyu Zhou. 2022. Complicate then simplify: A novel way to explore pre-trained models for text classification. In Proceedings of the 29th International Conference on Computational Linguistics, 1136\u20131145."},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","first-page":"807","DOI":"10.3233\/IDA-230082","article-title":"I2R: Intra and inter-modal representation learning for code search","author":"Zhang Xu","year":"2024","unstructured":"Xu Zhang, Yanzheng Xiang, Zejie Liu, Xiaoyu Hu, and Deyu Zhou. 2024. I2R: Intra and inter-modal representation learning for code search. Intelligent Data Analysis (2024), 807\u2013823.","journal-title":"Intelligent Data Analysis"},{"key":"e_1_3_2_70_2","volume-title":"Proceedings of the IEEE\/ACM 46th International Conference on Software Engineering (ICSE)","author":"Zhang Yubo","year":"2024","unstructured":"Yubo Zhang, Yanfang Liu, Xinxin Fan, and Yunfeng Lu. 2024. Contrastive prompt learning-based code search based on interaction matrix. In Proceedings of the IEEE\/ACM 46th International Conference on Software Engineering (ICSE). IEEE."},{"key":"e_1_3_2_71_2","first-page":"60","volume-title":"Proceedings of the International Conference on Software Engineering & Knowledge Engineering (SEKE)","author":"Zhao Wei","year":"2022","unstructured":"Wei Zhao and Yan Liu. 2022. Utilizing edge attention in graph-based code search. In Proceedings of the International Conference on Software Engineering & Knowledge Engineering (SEKE), 60\u201366."}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3686151","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3686151","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:50Z","timestamp":1750295870000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3686151"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,16]]},"references-count":70,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3686151"],"URL":"https:\/\/doi.org\/10.1145\/3686151","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,16]]},"assertion":[{"value":"2024-01-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-26","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}