{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T15:51:53Z","timestamp":1774540313171,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2023,7,21]],"date-time":"2023-07-21T00:00:00Z","timestamp":1689897600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272132"],"award-info":[{"award-number":["62272132"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>Software cross-modal retrieval is a popular yet challenging direction, such as bug localization and code search. Previous studies generally map natural language texts and codes into a homogeneous semantic space for similarity measurement. However, it is not easy to accurately capture their similar semantics in a homogeneous semantic space due to the semantic gap. Therefore, we propose to map the multi-modal data into heterogeneous semantic spaces to capture their unique semantics. Specifically, we propose a novel software cross-modal retrieval framework named Deep Hypothesis Testing (DeepHT). In DeepHT, to capture the unique semantics of the code\u2019s control flow structure, all control flow paths (CFPs) in the control flow graph are mapped to a CFP sample set in the sample space. Meanwhile, the text is mapped to a CFP correlation distribution in the distribution space to model its correlation with different CFPs. The matching score is calculated according to how well the sample set obeys the distribution using hypothesis testing. The experimental results on two text-to-code retrieval tasks (i.e., bug localization and code search) and two code-to-text retrieval tasks (i.e., vulnerability knowledge retrieval and historical patch retrieval) show that DeepHT outperforms the baseline methods.<\/jats:p>","DOI":"10.1145\/3591868","type":"journal-article","created":{"date-parts":[[2023,4,10]],"date-time":"2023-04-10T13:15:53Z","timestamp":1681132553000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic Spaces"],"prefix":"10.1145","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5607-1065","authenticated-orcid":false,"given":"Hongwei","family":"Wei","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8584-0716","authenticated-orcid":false,"given":"Xiaohong","family":"Su","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4774-2434","authenticated-orcid":false,"given":"Cuiyun","family":"Gao","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology (Shenzhen), China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3668-3600","authenticated-orcid":false,"given":"Weining","family":"Zheng","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6818-5118","authenticated-orcid":false,"given":"Wenxin","family":"Tao","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,7,21]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2019.00012"},{"issue":"5","key":"e_1_3_2_3_2","first-page":"1","article-title":"Control flow analysis","volume":"24","year":"1970","unstructured":"Allen and E. Frances. 1970. Control flow analysis. ACM 24, 5 (1970), 1\u201319.","journal-title":"ACM"},{"key":"e_1_3_2_4_2","first-page":"2787","volume-title":"Proceedings of the 27th Annual Conference on Neural Information Processing Systems","author":"Bordes Antoine","year":"2013","unstructured":"Antoine Bordes, Nicolas Usunier, Alberto Garc\u00eda-Dur\u00e1n, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 2787\u20132795. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2013\/hash\/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3340458"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-018-9672-z"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3240471"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2021.106542"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_3_2_10_2","volume-title":"Statistical Inference","author":"Casella G.","year":"2002","unstructured":"G. Casella and R. L. Berger.2002. Statistical Inference. Duxbury, Pacific Grove, CA."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2018.07.004"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME52107.2021.00049"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME52107.2021.00049"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180167"},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations","author":"Guo Daya","year":"2021","unstructured":"Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training code representations with data flow. In Proceedings of the 9th International Conference on Learning Representations. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=jLoC4ez43PZ."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/265"},{"key":"e_1_3_2_17_2","first-page":"1606","volume-title":"Proceedings of the 25th International Joint Conference on Artificial Intelligence","author":"Huo Xuan","year":"2016","unstructured":"Xuan Huo, Ming Li, and Zhi-Hua Zhou. 2016. Learning unified features from natural and programming languages for locating buggy source code. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. IJCAI\/AAAI Press, 1606\u20131612. Retrieved from http:\/\/www.ijcai.org\/Abstract\/16\/230."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5844"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2019.2920771"},{"key":"e_1_3_2_20_2","article-title":"CodeSearchNet challenge: Evaluating the state of semantic code search","volume":"1909","author":"Husain Hamel","year":"2019","unstructured":"Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet challenge: Evaluating the state of semantic code search. CoRR abs\/1909.09436 (2019).","journal-title":"CoRR"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/d18-1192"},{"key":"e_1_3_2_22_2","article-title":"A convolutional neural network for modelling sentences","author":"Kalchbrenner Nal","year":"2014","unstructured":"Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014).","journal-title":"arXiv preprint arXiv:1404.2188"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/d14-1181"},{"key":"e_1_3_2_24_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations. Retrieved from http:\/\/arxiv.org\/abs\/1412.6980."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2015.73"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN48605.2020.9207101"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2922686"},{"key":"e_1_3_2_28_2","first-page":"1","article-title":"The flowing nature matters: Feature learning from the control flow graph of source code for bug localization.","author":"Ma Yi-Fan","year":"2022","unstructured":"Yi-Fan Ma and Ming Li. 2022. The flowing nature matters: Feature learning from the control flow graph of source code for bug localization. Mach. Learn. (2022), 1\u201318. Retrieved from https:\/\/search.ebscohost.com\/login.aspx?direct=true&db=edssjs&AN=edssjs.12EB9D72&lang=zh-cn&site=eds-live.","journal-title":"Mach. Learn."},{"key":"e_1_3_2_29_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten Laurens van der","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 28 (2008), 2579\u20132605.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1176325373"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3211346.3211353"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3211346.3211353"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3387904.3389269"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2581377"},{"key":"e_1_3_2_35_2","first-page":"1057","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems","author":"Sutton Richard S.","year":"1999","unstructured":"Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the Conference on Advances in Neural Information Processing Systems. The MIT Press, 1057\u20131063. Retrieved from http:\/\/papers.nips.cc\/paper\/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/QRS.2016.30"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00012"},{"key":"e_1_3_2_38_2","unstructured":"X. Wang Y. Wang F. Mi P. Zhou Y. Wan X. Liu L. Li H. Wu J. Liu and X. Jiang. 2021. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv preprint arXiv:2108.04556 (2021)."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2018.03.003"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3210459.3210469"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER50967.2021.00039"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER50967.2021.00039"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER48275.2020.9054840"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2635868.2635874"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884862"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2012.6227210"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416530"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/493"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-021-00755-7"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3591868","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3591868","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:45Z","timestamp":1750178265000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3591868"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,21]]},"references-count":48,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3591868"],"URL":"https:\/\/doi.org\/10.1145\/3591868","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,21]]},"assertion":[{"value":"2022-08-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-15","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}