{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T21:30:14Z","timestamp":1774128614681,"version":"3.50.1"},"reference-count":31,"publisher":"SAGE Publications","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDA"],"published-print":{"date-parts":[[2024,5,28]]},"abstract":"<jats:p>Code search, which locates code snippets in large code repositories based on natural language queries entered by developers, has become increasingly popular in the software development process. It has the potential to improve the efficiency of software developers. Recent studies have demonstrated the effectiveness of using deep learning techniques to represent queries and codes accurately for code search. In specific, pre-trained models of programming languages have recently achieved significant progress in code searching. However, we argue that aligning programming and natural languages are crucial as there are two different modalities. Existing pre-train models based approaches for code search do not effectively consider implicit alignments of representations across modalities (inter-modal representation). Moreover, the existing methods do not take into account the consistency constraint of intra-modal representations, making the model ineffective. As a result, we propose a novel code search method that optimizes both intra-modal and inter-modal representation learning. The alignment of the representation between the two modalities is achieved by introducing contrastive learning. Furthermore, the consistency of intra-modal feature representation is constrained by KL-divergence. Our experimental results confirm the model\u2019s effectiveness on seven different test datasets. This paper proposes a code search method that significantly improves existing methods. Our source code is publicly available on GitHub.1<\/jats:p>","DOI":"10.3233\/ida-230082","type":"journal-article","created":{"date-parts":[[2023,8,13]],"date-time":"2023-08-13T19:06:46Z","timestamp":1691953606000},"page":"807-823","source":"Crossref","is-referenced-by-count":6,"title":["I2R: Intra and inter-modal representation learning for code search"],"prefix":"10.1177","volume":"28","author":[{"given":"Xu","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Yanzheng","family":"Xiang","sequence":"additional","affiliation":[]},{"given":"Zejie","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Xiaoyu","family":"Hu","sequence":"additional","affiliation":[]},{"given":"Deyu","family":"Zhou","sequence":"additional","affiliation":[]}],"member":"179","reference":[{"key":"10.3233\/IDA-230082_ref1","doi-asserted-by":"crossref","unstructured":"C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie and C. Fu, Portfolio: finding relevant functions and their usage, in: Proceedings of the 33rd International Conference on Software Engineering, 2011, pp. 111\u2013120.","DOI":"10.1145\/1985793.1985809"},{"key":"10.3233\/IDA-230082_ref2","doi-asserted-by":"crossref","unstructured":"F. Lv, H. Zhang, J.-g. Lou, S. Wang, D. Zhang and J. Zhao, Codehow: Effective code search based on api understanding and extended boolean model (e), in: 2015 30th IEEE\/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2015, pp. 260\u2013270.","DOI":"10.1109\/ASE.2015.42"},{"key":"10.3233\/IDA-230082_ref3","doi-asserted-by":"crossref","unstructured":"M. Lu, X. Sun, S. Wang, D. Lo and Y. Duan, Query expansion via wordnet for effective code search, in: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), IEEE, 2015, pp. 545\u2013549.","DOI":"10.1109\/SANER.2015.7081874"},{"key":"10.3233\/IDA-230082_ref4","doi-asserted-by":"crossref","unstructured":"S. Yan, H. Yu, Y. Chen, B. Shen and L. Jiang, Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries, in: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2020, pp. 344\u2013354.","DOI":"10.1109\/SANER48275.2020.9054840"},{"key":"10.3233\/IDA-230082_ref5","doi-asserted-by":"crossref","unstructured":"J. Shuai, L. Xu, C. Liu, M. Yan, X. Xia and Y. Lei, Improving code search with co-attentive representation learning, in: Proceedings of the 28th International Conference on Program Comprehension, 2020, pp. 196\u2013207.","DOI":"10.1145\/3387904.3389269"},{"key":"10.3233\/IDA-230082_ref6","doi-asserted-by":"crossref","unstructured":"L. Du, X. Shi, Y. Wang, E. Shi, S. Han and D. Zhang, Is a single model enough? mucos: A multi-model ensemble learning approach for semantic code search, in: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 2994\u20132998.","DOI":"10.1145\/3459637.3482127"},{"key":"10.3233\/IDA-230082_ref7","doi-asserted-by":"crossref","unstructured":"L. Xu, H. Yang, C. Liu, J. Shuai, M. Yan, Y. Lei and Z. Xu, Two-stage attention-based model for code search with textual and structural features, in: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2021, pp. 342\u2013353.","DOI":"10.1109\/SANER50967.2021.00039"},{"key":"10.3233\/IDA-230082_ref8","doi-asserted-by":"crossref","unstructured":"G. Mathew and K.T. Stolee, Cross-language code search using static and dynamic analyses, in: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 205\u2013217.","DOI":"10.1145\/3468264.3468538"},{"key":"10.3233\/IDA-230082_ref9","doi-asserted-by":"crossref","unstructured":"W. Sun, C. Fang, Y. Chen, G. Tao, T. Han and Q. Zhang, Code search based on context-aware code translation, in: Proceedings of the 44th International Conference on Software Engineering, ICSE \u201922, Association for Computing Machinery, 2022, pp. 388\u2013400. ISBN 9781450392211.","DOI":"10.1145\/3510003.3510140"},{"key":"10.3233\/IDA-230082_ref10","doi-asserted-by":"crossref","unstructured":"Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., CodeBERT: A Pre-Trained Model for Programming and Natural Languages, in: Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 1536\u20131547.","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"10.3233\/IDA-230082_ref11","unstructured":"D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, L. Shujie, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fu et al., GraphCodeBERT: Pre-training Code Representations with Data Flow, in: International Conference on Learning Representations, 2020."},{"key":"10.3233\/IDA-230082_ref12","doi-asserted-by":"crossref","unstructured":"D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou and J. Yin, UniXcoder: Unified Cross-Modal Pre-training for Code Representation, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 7212\u20137225.","DOI":"10.18653\/v1\/2022.acl-long.499"},{"key":"10.3233\/IDA-230082_ref13","doi-asserted-by":"crossref","unstructured":"Y. Chai, H. Zhang, B. Shen and X. Gu, Cross-domain deep code search with meta learning, in: Proceedings of the 44th International Conference on Software Engineering, Association for Computing Machinery, 2022, pp. 487\u2013498. ISBN 9781450392211.","DOI":"10.1145\/3510003.3510125"},{"key":"10.3233\/IDA-230082_ref14","unstructured":"L. Wu, S. Xie, Y. Xia, Y. Fan, J.-H. Lai, T. Qin and T. Liu, Sequence generation with mixed representations, in: International Conference on Machine Learning, PMLR, 2020, pp. 10388\u201310398."},{"key":"10.3233\/IDA-230082_ref15","unstructured":"K. Zolna, D. Arpit, D. Suhubdy and Y. Bengio, Fraternal dropout, in: International Conference on Learning Representations, 2018."},{"key":"10.3233\/IDA-230082_ref16","first-page":"10890","article-title":"R-drop: Regularized dropout for neural networks","volume":"34","author":"Wu","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/IDA-230082_ref17","doi-asserted-by":"crossref","unstructured":"Y.-C. Chen, L. Li, L. Yu, A. El\u00a0Kholy, F. Ahmed, Z. Gan, Y. Cheng and J. Liu, Uniter: Universal image-text representation learning, in: European Conference on Computer Vision, Springer, 2020, pp. 104\u2013120.","DOI":"10.1007\/978-3-030-58577-8_7"},{"key":"10.3233\/IDA-230082_ref18","doi-asserted-by":"crossref","unstructured":"W. Li, C. Gao, G. Niu, X. Xiao, H. Liu, J. Liu, H. Wu and H. Wang, UNIMO: Towards unified-modal understanding and generation via cross-modal contrastive learning, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 2592\u20132607.","DOI":"10.18653\/v1\/2021.acl-long.202"},{"key":"10.3233\/IDA-230082_ref19","unstructured":"A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., Learning transferable visual models from natural language supervision, in: International Conference on Machine Learning, PMLR, 2021, pp. 8748\u20138763."},{"key":"10.3233\/IDA-230082_ref21","doi-asserted-by":"crossref","unstructured":"T. Gao, X. Yao and D. Chen, SimCSE: Simple contrastive learning of sentence embeddings, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 6894\u20136910.","DOI":"10.18653\/v1\/2021.emnlp-main.552"},{"key":"10.3233\/IDA-230082_ref23","doi-asserted-by":"crossref","unstructured":"Y.-S. Chuang, R. Dangovski, H. Luo, Y. Zhang, S. Chang, M. Soljacic, S.-W. Li, W.-t. Yih, Y. Kim and J. Glass, DiffCSE: Difference-based contrastive learning for sentence embeddings, in: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022.","DOI":"10.18653\/v1\/2022.naacl-main.311"},{"key":"10.3233\/IDA-230082_ref24","doi-asserted-by":"crossref","unstructured":"X. Gu, H. Zhang and S. Kim, Deep code search, in: 2018 IEEE\/ACM 40th International Conference on Software Engineering (ICSE), IEEE, 2018, pp. 933\u2013944.","DOI":"10.1145\/3180155.3180167"},{"key":"10.3233\/IDA-230082_ref25","doi-asserted-by":"crossref","unstructured":"K. Desai and J. Johnson, Virtex: Learning visual representations from textual annotations, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11162\u201311173.","DOI":"10.1109\/CVPR46437.2021.01101"},{"key":"10.3233\/IDA-230082_ref27","doi-asserted-by":"crossref","unstructured":"Y. Wang, W. Wang, S. Joty and S.C. Hoi, CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 8696\u20138708.","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"10.3233\/IDA-230082_ref32","doi-asserted-by":"crossref","unstructured":"Z. Huang, Z. Zeng, Y. Huang, B. Liu, D. Fu and J. Fu, Seeing out of the box: End-to-end pre-training for vision-language representation learning, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12976\u201312985.","DOI":"10.1109\/CVPR46437.2021.01278"},{"key":"10.3233\/IDA-230082_ref33","unstructured":"S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang et al., CodeXGLUE: A machine learning benchmark dataset for code understanding and generation, in: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021."},{"key":"10.3233\/IDA-230082_ref35","unstructured":"J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171\u20134186."},{"issue":"140","key":"10.3233\/IDA-230082_ref36","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"10.3233\/IDA-230082_ref37","doi-asserted-by":"crossref","unstructured":"C. Sun, A. Myers, C. Vondrick, K. Murphy and C. Schmid, Videobert: A joint model for video and language representation learning, in: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 2019, pp. 7464\u20137473.","DOI":"10.1109\/ICCV.2019.00756"},{"issue":"1","key":"10.3233\/IDA-230082_ref38","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"The Journal of Machine Learning Research"},{"key":"10.3233\/IDA-230082_ref39","doi-asserted-by":"crossref","unstructured":"S. Park, J. Park, S.-J. Shin and I.-C. Moon, Adversarial dropout for supervised and semi-supervised learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 2018.","DOI":"10.1609\/aaai.v32i1.11634"}],"container-title":["Intelligent Data Analysis"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDA-230082","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,10]],"date-time":"2025-03-10T17:11:50Z","timestamp":1741626710000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDA-230082"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,28]]},"references-count":31,"journal-issue":{"issue":"3"},"URL":"https:\/\/doi.org\/10.3233\/ida-230082","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"value":"1088-467X","type":"print"},{"value":"1571-4128","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,28]]}}}