{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T10:48:57Z","timestamp":1761648537019,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2018,4,17]],"date-time":"2018-04-17T00:00:00Z","timestamp":1523923200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Australian Research Council (ARC) Future Fellowship","award":["FT140101247"],"award-info":[{"award-number":["FT140101247"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Internet Technol."],"published-print":{"date-parts":[[2018,8,31]]},"abstract":"<jats:p>Community-based Question Answering (CQA) websites are attracting increasing numbers of users and contributors in recent years. However, duplicate questions frequently occur in CQA websites and are currently manually identified by the moderators. Automatic duplicate detection, on one hand, alleviates this laborious effort for moderators before taking close actions, and, on the other hand, helps question issuers quickly find answers. A number of studies have looked into related problems, but very limited works target Duplicate Detection in Programming CQA (PCQA), a branch of CQA that is dedicated to programmers. Existing works framed the task as a supervised learning problem on the question pairs and relied on only textual features. Moreover, the issue of selecting candidate duplicates from large volumes of historical questions is often un-addressed. To tackle these issues, we model duplicate detection as a two-stage \u201cranking-classification\u201d problem over question pairs. In the first stage, we rank the historical questions according to their similarities to the newly issued question and select the top ranked ones as candidates to reduce the search space. In the second stage, we develop novel features that capture both textual similarity and latent semantics on question pairs, leveraging techniques in deep learning and information retrieval literature. Experiments on real-world questions about multiple programming languages demonstrate that our method works very well; in some cases, up to 25% improvement compared to the state-of-the-art benchmarks.<\/jats:p>","DOI":"10.1145\/3169795","type":"journal-article","created":{"date-parts":[[2018,4,18]],"date-time":"2018-04-18T17:21:50Z","timestamp":1524072110000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Duplicate Detection in Programming Question Answering Communities"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0406-5974","authenticated-orcid":false,"given":"Wei Emma","family":"Zhang","sequence":"first","affiliation":[{"name":"Macquarie University, Australia"}]},{"given":"Quan Z.","family":"Sheng","sequence":"additional","affiliation":[{"name":"Macquarie University, Australia"}]},{"given":"Jey Han","family":"Lau","sequence":"additional","affiliation":[{"name":"The University of Melbourne and IBM Research Australia, Australia"}]},{"given":"Ermyas","family":"Abebe","sequence":"additional","affiliation":[{"name":"IBM Research Australia, Australia"}]},{"given":"Wenjie","family":"Ruan","sequence":"additional","affiliation":[{"name":"University of Oxford, Oxford, UK"}]}],"member":"320","published-online":{"date-parts":[[2018,4,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2901739.2901770"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/00031305.1992.10475879","article-title":"An introduction to kernel and nearest-neighbor nonparametric regression","volume":"46","author":"Altman Naomi S.","year":"1992","unstructured":"Naomi S. Altman . 1992 . An introduction to kernel and nearest-neighbor nonparametric regression . The American Statistician 46 , 3 (1992), 175 -- 185 . Naomi S. Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 3 (1992), 175--185.","journal-title":"The American Statistician"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582416"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the EMNLP","author":"Berant Jonathan","year":"2013","unstructured":"Jonathan Berant , Andrew Chou , Roy Frostig , and Percy Liang . Semantic parsing on freebase from question-answer pairs . In Proceedings of the EMNLP 2013 . ACL, Seattle, Washington, USA, 1533--1544. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question-answer pairs. In Proceedings of the EMNLP 2013. ACL, Seattle, Washington, USA, 1533--1544."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1133"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_2_1_7_1","unstructured":"Leo Breiman J. H. Friedman R. A. Olshen and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth. Leo Breiman J. H. Friedman R. A. Olshen and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772712"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2180868.2180869"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the COMPSTAT","author":"Chan Tony F.","year":"1982","unstructured":"Tony F. Chan , Gene Howard Golub, and Randall J. LeVeque. Updating formulae and a pairwise algorithm for computing sample variances . In Proceedings of the COMPSTAT 1982 . Springer , Physica, Heidelberg , 30--41. Tony F. Chan, Gene Howard Golub, and Randall J. LeVeque. Updating formulae and a pairwise algorithm for computing sample variances. In Proceedings of the COMPSTAT 1982. Springer, Physica, Heidelberg, 30--41."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835490"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118694"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2566486.2568036"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1248547.1248566"},{"volume-title":"WordNet: An electronic lexical database","author":"Fellbaum C.","key":"e_1_2_1_15_1","unstructured":"C. Fellbaum . 1998. WordNet: An electronic lexical database . MIT Press . C. Fellbaum. 1998. WordNet: An electronic lexical database. MIT Press."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the EuroCOLT","author":"Freund Yoav","year":"1995","unstructured":"Yoav Freund and Robert E. Schapire . A decision-theoretic generalization of on-line learning and an application to boosting . In Proceedings of the EuroCOLT 1995 . Springer , Barcelona, Spain , 23--37. Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the EuroCOLT 1995. Springer, Barcelona, Spain, 23--37."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.239"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1181"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/5254.708428"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.709601"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the PRNI","author":"Jelinek Fred","year":"1980","unstructured":"Fred Jelinek and Robert L. Mercer . Interpolated estimation of Markov source parameters from sparse data . In Proceedings of the PRNI 1980 . North Holland, Amsterdam, Netherlands, 381--397. Fred Jelinek and Robert L. Mercer. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the PRNI 1980. North Holland, Amsterdam, Netherlands, 381--397."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the EMNLP","author":"Ji Yangfeng","year":"2013","unstructured":"Yangfeng Ji and Jacob Eisenstein . Discriminative improvements to distributional sentence similarity . In Proceedings of the EMNLP 2013 . ACL, Seattle, Washington, USA, 891--896. Yangfeng Ji and Jacob Eisenstein. Discriminative improvements to distributional sentence similarity. In Proceedings of the EMNLP 2013. ACL, Seattle, Washington, USA, 891--896."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-1609"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the ICML 2014","author":"Quoc","year":"2014","unstructured":"Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents . In Proceedings of the ICML 2014 . JMLR.org \u00a9 2014 , Beijing, China, 1188--1196. Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proceedings of the ICML 2014. JMLR.org \u00a92014, Beijing, China, 1188--1196."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911499"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the NAACL","author":"Madnani Nitin","year":"2012","unstructured":"Nitin Madnani , Joel R. Tetreault , and Martin Chodorow . Re-examining machine translation metrics for paraphrase identification . In Proceedings of the NAACL 2012 . ACL, Montr\u00e9al, Canada, 182--190. Nitin Madnani, Joel R. Tetreault, and Martin Chodorow. Re-examining machine translation metrics for paraphrase identification. In Proceedings of the NAACL 2012. ACL, Montr\u00e9al, Canada, 182--190."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the AAAI","author":"Mihalcea Rada","year":"2006","unstructured":"Rada Mihalcea , Courtney Corley , and Carlo Strapparava . Corpus-based and knowledge-based measures of text semantic similarity . In Proceedings of the AAAI 2006 . AAAI Press , Boston, Massachusetts, USA , 775--780. Rada Mihalcea, Courtney Corley, and Carlo Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the AAAI 2006. AAAI Press, Boston, Massachusetts, USA, 775--780."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the NIPS","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Gregory S. Corrado , and Jeffrey Dean . Distributed representations of words and phrases and their compositionality . In Proceedings of the NIPS 2013 . Neural Information Processing Systems, Lake Tahoe, Nevada, USA, 3111--3119. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the NIPS 2013. Neural Information Processing Systems, Lake Tahoe, Nevada, USA, 3111--3119."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120103321337421"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1162\/0891201042544884"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1610075.1610079"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the TREC","author":"Robertson Stephen E.","year":"1994","unstructured":"Stephen E. Robertson , Steve Walker , Susan Jones , Micheline M. Hancock-Beaulieu , and Mike Gatford . Okapi at TREC-3 . In Proceedings of the TREC 1994 . National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, USA, 109--126. Stephen E. Robertson, Steve Walker, Susan Jones, Micheline M. Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the TREC 1994. National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, USA, 109--126."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2187836.2187939"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the NIPS","author":"Socher Richard","year":"2011","unstructured":"Richard Socher , Eric H. Huang , Jeffrey Pennington , Andrew Y. Ng , and Christopher D. Manning . Dynamic pooling and unfolding recursive autoencoders for paraphrase detection . In Proceedings of the NIPS 2011 . Neural Information Processing Systems, Lake Tahoe, Nevada, United States, 801--809. Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Proceedings of the NIPS 2011. Neural Information Processing Systems, Lake Tahoe, Nevada, United States, 801--809."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1985793.1985907"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/54.1-2.167"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1571975"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the AAAI","author":"Yang Lichun","year":"2011","unstructured":"Lichun Yang , Shenghua Bao , Qingliang Lin , Xian Wu , Dingyi Han , Zhong Su , and Yong Yu . Analyzing and predicting not-answered questions in community-based question answering services . In Proceedings of the AAAI 2011 . AAAI Press , San Francisco , California, USA, 1273--1278. Lichun Yang, Shenghua Bao, Qingliang Lin, Xian Wu, Dingyi Han, Zhong Su, and Yong Yu. Analyzing and predicting not-answered questions in community-based question answering services. In Proceedings of the AAAI 2011. AAAI Press, San Francisco, California, USA, 1273--1278."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2806416.2806542"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/984321.984322"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015332"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052701"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-015-1576-4"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the IJCAI","author":"Zhou Guangyou","year":"2013","unstructured":"Guangyou Zhou , Yang Liu , Fang Liu , Daojian Zeng , and Jun Zhao . Improving question retrieval in community question answering using world knowledge . In Proceedings of the IJCAI 2013 . IJCAI\/AAAI, Beijing, China, 2239--2245. Guangyou Zhou, Yang Liu, Fang Liu, Daojian Zeng, and Jun Zhao. Improving question retrieval in community question answering using world knowledge. In Proceedings of the IJCAI 2013. IJCAI\/AAAI, Beijing, China, 2239--2245."}],"container-title":["ACM Transactions on Internet Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3169795","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3169795","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:30:33Z","timestamp":1750217433000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3169795"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,4,17]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,8,31]]}},"alternative-id":["10.1145\/3169795"],"URL":"https:\/\/doi.org\/10.1145\/3169795","relation":{},"ISSN":["1533-5399","1557-6051"],"issn-type":[{"type":"print","value":"1533-5399"},{"type":"electronic","value":"1557-6051"}],"subject":[],"published":{"date-parts":[[2018,4,17]]},"assertion":[{"value":"2017-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-04-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}