{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T08:30:16Z","timestamp":1759825816583,"version":"3.37.3"},"reference-count":44,"publisher":"World Scientific Pub Co Pte Ltd","issue":"02","funder":[{"name":"National Key R&D Program of","award":["2019YFB1404802"],"award-info":[{"award-number":["2019YFB1404802"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62176231","62106218"],"award-info":[{"award-number":["62176231","62106218"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Zhejiang University Education Foundation","award":["K18-511120-004","K17-511120-017","K17-518051-02"],"award-info":[{"award-number":["K18-511120-004","K17-511120-017","K17-518051-02"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Soft. Eng. Knowl. Eng."],"published-print":{"date-parts":[[2022,2]]},"abstract":"<jats:p> Stack Overflow is one of the most popular Question-Answering sites for programmers. However, it faces the problem of question duplication, where newly created questions are identical to previous questions. Existing works on duplicate question detection in Stack Overflow extract a set of textual features on the question pairs and use supervised learning approaches to classify duplicate question pairs. However, they do not consider the source code information in the questions. While in some cases, the intention of a question is mainly represented by the source code. In this paper, we aim to learn the semantics of a question by combining both text features and source code features. We use word embedding and convolutional neural networks to extract textual features from questions to overcome the lexical gap issue. We use tree-based convolutional neural networks to extract structural and semantic features from source code. In addition, we perform multi-task learning by combining the duplication question detection task with a question tag prediction side task. We conduct extensive experiments on the Stack Overflow dataset and show that our approach can detect duplicate questions with higher recall and MRR compared with baseline approaches on Python and Java programming languages. <\/jats:p>","DOI":"10.1142\/s0218194022500073","type":"journal-article","created":{"date-parts":[[2022,3,24]],"date-time":"2022-03-24T01:59:52Z","timestamp":1648087192000},"page":"227-255","source":"Crossref","is-referenced-by-count":5,"title":["Detecting Duplicate Questions in Stack Overflow via Source Code Modeling"],"prefix":"10.1142","volume":"32","author":[{"given":"Wei","family":"Gao","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University, Zhejiang, Hangzhou 310027, P.\u00a0R.\u00a0China"}]},{"given":"Jian","family":"Wu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University, Zhejiang, Hangzhou 310027, P.\u00a0R.\u00a0China"}]},{"given":"Guandong","family":"Xu","sequence":"additional","affiliation":[{"name":"Advanced Analytics Institute, University of Technology Sydney, Sydney, Australia"}]}],"member":"219","published-online":{"date-parts":[[2022,4,25]]},"reference":[{"issue":"5","key":"S0218194022500073BIB001","doi-asserted-by":"crossref","first-page":"981","DOI":"10.1007\/s11390-015-1576-4","volume":"30","author":"Zhang Y.","year":"2015","journal-title":"J. Comput. Sci. Technol."},{"key":"S0218194022500073BIB002","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1145\/2901739.2901770","volume-title":"Int. Conf. Mining Software Repositories","author":"Ahasanuzzaman M.","year":"2016"},{"key":"S0218194022500073BIB003","first-page":"1221","volume-title":"Int. Conf. World Wide Web","author":"Zhang W. E.","year":"2017"},{"volume-title":"Introduction to Modern Information Retrieval","year":"1984","author":"Salton G.","key":"S0218194022500073BIB004"},{"key":"S0218194022500073BIB005","first-page":"993","volume":"3","author":"Blei D. M.","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"S0218194022500073BIB006","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"issue":"3","key":"S0218194022500073BIB007","first-page":"37:1","volume":"18","author":"Zhang W. E.","year":"2018","journal-title":"ACM Trans. Internet Technol."},{"key":"S0218194022500073BIB008","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1007\/978-3-319-69179-4_44","volume-title":"Int. Conf. Advanced Data Mining and Applications","volume":"10604","author":"Zhang W. E.","year":"2017"},{"key":"S0218194022500073BIB009","first-page":"572","volume-title":"Int. Conf. Software Analysis, Evolution and Reengineering","author":"da Silva R. F. G.","year":"2018"},{"key":"S0218194022500073BIB010","first-page":"230","volume-title":"Int. Conf. Mining Software Repositories","author":"Abric D.","year":"2019"},{"issue":"20","key":"S0218194022500073BIB011","doi-asserted-by":"crossref","first-page":"1635","DOI":"10.17485\/IJST\/v14i20.312","volume":"14","author":"Anishaa V. K. R.","year":"2021","journal-title":"Indian J. Sci. Technol."},{"key":"S0218194022500073BIB012","doi-asserted-by":"crossref","first-page":"1153","DOI":"10.1007\/978-3-030-01012-6","volume-title":"Int. Conf. Research & Development in Information Retrieval","author":"Zhang W. E.","year":"2018"},{"key":"S0218194022500073BIB013","doi-asserted-by":"crossref","first-page":"25964","DOI":"10.1109\/ACCESS.2020.2968391","volume":"8","author":"Wang L.","year":"2020","journal-title":"IEEE Access"},{"key":"S0218194022500073BIB014","first-page":"95","volume-title":"Int. Conf. Research and Development in Information Retrieval","author":"Liang D.","year":"2019"},{"key":"S0218194022500073BIB015","first-page":"559","volume-title":"Int. Conf. Research and Development in Information Retrieval","author":"Wang Z.","year":"2020"},{"key":"S0218194022500073BIB016","doi-asserted-by":"crossref","first-page":"56029","DOI":"10.1109\/ACCESS.2020.2982268","volume":"8","author":"Xu Z.","year":"2020","journal-title":"IEEE Access"},{"key":"S0218194022500073BIB017","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1016\/j.ins.2020.07.048","volume":"543","author":"Zhou Q.","year":"2021","journal-title":"Inf. Sci."},{"key":"S0218194022500073BIB018","first-page":"97","volume-title":"Int. Conf. Mining Software Repositories","author":"Pei J.","year":"2021"},{"key":"S0218194022500073BIB019","first-page":"13","volume-title":"Int. Conf. Software Engineering, New Ideas and Emerging Results","author":"Baltes S.","year":"2020"},{"key":"S0218194022500073BIB020","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.50"},{"key":"S0218194022500073BIB021","first-page":"1273","volume-title":"Int. World Wide Web Conf.","author":"Tao K.","year":"2013"},{"key":"S0218194022500073BIB022","doi-asserted-by":"publisher","DOI":"10.1145\/2970276.2970357"},{"key":"S0218194022500073BIB023","first-page":"1106","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky A.","year":"2012"},{"key":"S0218194022500073BIB024","first-page":"334","volume-title":"Int. Conf. Mining Software Repositories","author":"White M.","year":"2015"},{"key":"S0218194022500073BIB025","first-page":"87","volume-title":"Int. Conf. Automated Software Engineering","author":"White M.","year":"2016"},{"key":"S0218194022500073BIB027","first-page":"1604","volume-title":"Int. Conf. Machine Learning","volume":"37","author":"Zhu X.","year":"2015"},{"key":"S0218194022500073BIB028","first-page":"1556","volume-title":"Proc. Annual Meeting of the Association for Computational Linguistics","author":"Tai K. S.","year":"2015"},{"key":"S0218194022500073BIB029","first-page":"1287","volume-title":"Proc. AAAI Conf. Artificial Intelligence","author":"Mou L.","year":"2016"},{"key":"S0218194022500073BIB030","first-page":"2315","volume-title":"Proc. Conf. Empirical Methods in Natural Language Processing","author":"Mou L.","year":"2015"},{"key":"S0218194022500073BIB031","first-page":"318","volume-title":"Int. Conf. Software Quality, Reliability and Security","author":"Li J.","year":"2017"},{"key":"S0218194022500073BIB032","first-page":"2091","volume-title":"Int. Conf. Machine Learning","volume":"48","author":"Allamanis M.","year":"2016"},{"key":"S0218194022500073BIB033","first-page":"45","volume-title":"Int. Conf. Tools with Artificial Intelligence","author":"Phan A. V.","year":"2017"},{"issue":"4","key":"S0218194022500073BIB034","first-page":"81:1","volume":"51","author":"Allamanis M.","year":"2018","journal-title":"ACM Comput. Surv."},{"key":"S0218194022500073BIB035","first-page":"90","volume-title":"Int. Conf. Software Analysis, Evolution, and Reengineering","author":"Ye D.","year":"2016"},{"volume-title":"Int. Conf. Learning Representations","year":"2013","author":"Mikolov T.","key":"S0218194022500073BIB036"},{"key":"S0218194022500073BIB037","first-page":"3111","volume-title":"Advances in Neural Information Processing Systems","author":"Mikolov T.","year":"2013"},{"key":"S0218194022500073BIB038","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1145\/3196398.3196448","volume-title":"Int. Conf. Mining Software Repositories","author":"Efstathiou V.","year":"2018"},{"key":"S0218194022500073BIB039","first-page":"315","volume-title":"Int. Conf. Artificial Intelligence and Statistics","volume":"15","author":"Glorot X.","year":"2011"},{"key":"S0218194022500073BIB040","first-page":"818","volume-title":"European Conf. Computer Vision","volume":"8689","author":"Zeiler M. D.","year":"2014"},{"key":"S0218194022500073BIB041","first-page":"368","volume-title":"Int. Conf. Software Maintenance","author":"Baxter I. D.","year":"1998"},{"key":"S0218194022500073BIB042","volume-title":"The Art of Computer Programming, Volume I: Fundamental Algorithms","author":"Knuth D. E.","year":"1997","edition":"3"},{"volume-title":"Int. Conf. Learning Representations","year":"2015","author":"Kingma D. P.","key":"S0218194022500073BIB043"},{"issue":"1","key":"S0218194022500073BIB044","first-page":"1929","volume":"15","author":"Srivastava N.","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"S0218194022500073BIB045","first-page":"1188","volume-title":"Int. Conf. Machine Learning","volume":"32","author":"Le Q. V.","year":"2014"}],"container-title":["International Journal of Software Engineering and Knowledge Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218194022500073","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,26]],"date-time":"2022-04-26T02:42:24Z","timestamp":1650940944000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218194022500073"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2]]},"references-count":44,"journal-issue":{"issue":"02","published-print":{"date-parts":[[2022,2]]}},"alternative-id":["10.1142\/S0218194022500073"],"URL":"https:\/\/doi.org\/10.1142\/s0218194022500073","relation":{},"ISSN":["0218-1940","1793-6403"],"issn-type":[{"type":"print","value":"0218-1940"},{"type":"electronic","value":"1793-6403"}],"subject":[],"published":{"date-parts":[[2022,2]]}}}