{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T01:56:02Z","timestamp":1771552562045,"version":"3.50.1"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","funder":[{"name":"National Natural Science Foundation of China","award":["62302536, 62032025"],"award-info":[{"award-number":["62302536, 62032025"]}]},{"DOI":"10.13039\/501100021171","name":"Guangdong Basic and Applied Basic Research Foundation","doi-asserted-by":"crossref","award":["2023A1515012292"],"award-info":[{"award-number":["2023A1515012292"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Code snippets are widely used in technical forums to demonstrate solutions to programming problems. They can be leveraged by developers to accelerate problem-solving. However, code snippets often lack concrete types of the APIs used in them, which impedes their understanding and resue. To enhance the description of a code snippet, a number of approaches are proposed to infer the types of APIs. Although existing approaches can achieve good performance, their performance is limited by ignoring other information outside the input code snippet (e.g., the descriptions of similar code snippets) that could potentially improve the performance. \n \n \n \n\n \n \n \nIn this paper, we propose a novel type inference approach, named CKTyper, by leveraging crowdsourcing knowledge in technical posts. The key idea is to generate a relevant context for a target code snippet from the posts containing similar code snippets and then employ the context to promote the type inference with large language models (e.g., ChatGPT). More specifically, we build a crowdsourcing knowledge base (CKB) by extracting code snippets from a large set of posts and index the CKB using Lucene. An API type dictionary is also built from a set of API libraries. Given a code snippet to be inferred, we first retrieve a list of similar code snippets from the indexed CKB. Then, we generate a crowdsourcing knowledge context (CKC) by extracting and summarizing useful content (e.g., API-related sentences) in the posts that contain the similar code snippets. The CKC is subsequently used to improve the type inference of ChatGPT on the input code snippet. The hallucination of ChatGPT is eliminated by employing the API type dictionary. Evaluation results on two open-source datasets demonstrate the effectiveness and efficiency of CKTyper. CKTyper achieves the optimal precision\/recall of 97.80% and 95.54% on both datasets, respectively, significantly outperforming three state-of-the-art baselines and ChatGPT.<\/jats:p>","DOI":"10.1145\/3715724","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:16:02Z","timestamp":1750346162000},"page":"176-196","source":"Crossref","is-referenced-by-count":1,"title":["CKTyper: Enhancing Type Inference for Java Code Snippets by Leveraging Crowdsourcing Knowledge in Stack Overflow"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-4881-4148","authenticated-orcid":false,"given":"Anji","family":"Li","sequence":"first","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8662-5690","authenticated-orcid":false,"given":"Neng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Central China Normal University, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5335-0261","authenticated-orcid":false,"given":"Ying","family":"Zou","sequence":"additional","affiliation":[{"name":"Queen's University, Kingston, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6536-5978","authenticated-orcid":false,"given":"Zhixiang","family":"Chen","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1559-9314","authenticated-orcid":false,"given":"Jian","family":"Wang","sequence":"additional","affiliation":[{"name":"Wuhan University, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7878-4330","authenticated-orcid":false,"given":"Zibin","family":"Zheng","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE). 26\u201330","author":"Bacchelli Alberto","year":"2012","unstructured":"Alberto Bacchelli, Luca Ponzanelli, and Michele Lanza. 2012. Harnessing stack overflow for the ide. In 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE). 26\u201330. https:\/\/doi.org\/10.1109\/RSSE.2012.6233404 10.1109\/RSSE.2012.6233404"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-012-9231-y"},{"key":"e_1_2_1_3_1","unstructured":"Zhixiang Chen Anji Li Neng Zhang Jianguo Chen Yuan Huang and Zibin Zheng. 2024. iJTyper: An Iterative Type Inference Framework for Java by Integrating Constraint- and Statistically-based Methods. arxiv:2402.09995. arxiv:2402.09995"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324916000334"},{"key":"e_1_2_1_5_1","volume-title":"2012 34th international conference on software engineering (icse). 47\u201357","author":"Dagenais Barth\u00e9l\u00e9my","year":"2012","unstructured":"Barth\u00e9l\u00e9my Dagenais and Martin P Robillard. 2012. Recovering traceability links between an API and its learning resources. In 2012 34th international conference on software engineering (icse). 47\u201357. https:\/\/doi.org\/10.1109\/ICSE.2012.6227207 10.1109\/ICSE.2012.6227207"},{"key":"e_1_2_1_6_1","volume-title":"International conference on Tools and Algorithms for the Construction and Analysis of Systems. 337\u2013340","author":"Moura Leonardo De","year":"2008","unstructured":"Leonardo De Moura and Nikolaj Bj\u00f8rner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. 337\u2013340. https:\/\/doi.org\/10.1007\/978-3-540-78800-3_24 10.1007\/978-3-540-78800-3_24"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 1982\u20131993","author":"Dong Yiwen","year":"2022","unstructured":"Yiwen Dong, Tianxiao Gu, Yongqiang Tian, and Chengnian Sun. 2022. SnR: constraint-based type inference for incomplete Java code snippets. In Proceedings of the 44th International Conference on Software Engineering. 1982\u20131993. https:\/\/doi.org\/10.1145\/3510003.3510061 10.1145\/3510003.3510061"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 37th IEEE\/ACM International Conference on Automated Software Engineering. 1\u201312","author":"Eghbali Aryaz","year":"2022","unstructured":"Aryaz Eghbali and Michael Pradel. 2022. CrystalBLEU: precisely and efficiently measuring the similarity of code. In Proceedings of the 37th IEEE\/ACM International Conference on Automated Software Engineering. 1\u201312. https:\/\/doi.org\/10.1145\/3551349.3556903 10.1145\/3551349.3556903"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-96142-2_2"},{"key":"e_1_2_1_10_1","volume-title":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 217\u2013227","author":"Horton Eric","year":"2018","unstructured":"Eric Horton and Chris Parnin. 2018. Gistable: Evaluating the executability of python code snippets on github. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 217\u2013227. https:\/\/doi.org\/10.1109\/ICSME.2018.00031 10.1109\/ICSME.2018.00031"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 37th IEEE\/ACM International Conference on Automated Software Engineering. 1\u201313","author":"Huang Qing","year":"2022","unstructured":"Qing Huang, Zhiqiang Yuan, Zhenchang Xing, Xiwei Xu, Liming Zhu, and Qinghua Lu. 2022. Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In Proceedings of the 37th IEEE\/ACM International Conference on Automated Software Engineering. 1\u201313. https:\/\/doi.org\/10.1145\/3551349.3556912 10.1145\/3551349.3556912"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the ACM on Programming Languages, 4, OOPSLA","author":"Madsen Magnus","year":"2020","unstructured":"Magnus Madsen and Ond\u0159ej Lhot\u00e1k. 2020. Fixpoints for the masses: programming with first-class datalog constraints. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1\u201328. https:\/\/doi.org\/10.1145\/3428193 10.1145\/3428193"},{"key":"e_1_2_1_13_1","volume-title":"2019 IEEE\/ACM 41st International Conference on Software Engineering (ICSE). 304\u2013315","author":"Malik Rabee Sohail","year":"2019","unstructured":"Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: inferring JavaScript function types from natural language information. In 2019 IEEE\/ACM 41st International Conference on Software Engineering (ICSE). 304\u2013315. https:\/\/doi.org\/10.1109\/ICSE.2019.00045 10.1109\/ICSE.2019.00045"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 40th annual meeting of the Association for Computational Linguistics. ACL, 311\u2013318","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. ACL, 311\u2013318. https:\/\/doi.org\/10.3115\/1073083.1073135 10.3115\/1073083.1073135"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 2019\u20132030","author":"Peng Yun","year":"2022","unstructured":"Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: a hybrid type inference approach for Python. In Proceedings of the 44th International Conference on Software Engineering. 2019\u20132030. https:\/\/doi.org\/10.1145\/3510003.3510038 10.1145\/3510003.3510038"},{"key":"e_1_2_1_16_1","volume-title":"2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 988\u2013999","author":"Peng Yun","year":"2023","unstructured":"Yun Peng, Chaozheng Wang, Wenxuan Wang, Cuiyun Gao, and Michael R Lyu. 2023. Generative type inference for python. In 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 988\u2013999. https:\/\/doi.org\/10.1109\/ASE56229.2023.00031 10.1109\/ASE56229.2023.00031"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 40th International Conference on Software Engineering. 632\u2013642","author":"Phan Hung","year":"2018","unstructured":"Hung Phan, Hoan Anh Nguyen, Ngoc M Tran, Linh H Truong, Anh Tuan Nguyen, and Tien N Nguyen. 2018. Statistical learning of api fully qualified names in code snippets of online forums. In Proceedings of the 40th International Conference on Software Engineering. 632\u2013642. https:\/\/doi.org\/10.1145\/3180155.3180230 10.1145\/3180155.3180230"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the thirteenth ACM international conference on Information and knowledge management. 42\u201349","author":"Robertson Stephen","year":"2004","unstructured":"Stephen Robertson, Hugo Zaragoza, and Michael Taylor. 2004. Simple BM25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on Information and knowledge management. 42\u201349. https:\/\/doi.org\/10.1145\/1031171.1031181 10.1145\/1031171.1031181"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-015-9379-3"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 2015 10th joint meeting on foundations of software engineering. 191\u2013201","author":"Sadowski Caitlin","year":"2015","unstructured":"Caitlin Sadowski, Kathryn T Stolee, and Sebastian Elbaum. 2015. How developers search for code: a case study. In Proceedings of the 2015 10th joint meeting on foundations of software engineering. 191\u2013201. https:\/\/doi.org\/10.1145\/2786805.2786855 10.1145\/2786805.2786855"},{"key":"e_1_2_1_21_1","volume-title":"2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 243\u2013254","author":"Khaled Saifullah CM","year":"2019","unstructured":"CM Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal K Roy. 2019. Learning from examples to find fully qualified names of api elements in code snippets. In 2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 243\u2013254. https:\/\/doi.org\/10.1145\/2901739.2901767 10.1145\/2901739.2901767"},{"key":"e_1_2_1_22_1","volume-title":"2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 1388\u20131390","author":"Shokri Ali","year":"2021","unstructured":"Ali Shokri. 2021. A program synthesis approach for adding architectural tactics to an existing code base. In 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 1388\u20131390. https:\/\/doi.org\/10.1145\/3428193 10.1145\/3428193"},{"key":"e_1_2_1_23_1","volume-title":"Depres: A tool for resolving fully qualified names and their dependencies. arXiv preprint arXiv:2108.01165, abs\/2108.01165","author":"Shokri Ali","year":"2021","unstructured":"Ali Shokri and Mehdi Mirakhorli. 2021. Depres: A tool for resolving fully qualified names and their dependencies. arXiv preprint arXiv:2108.01165, abs\/2108.01165 (2021), arXiv:2108.01165. arxiv:2108.01165"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 36th international conference on software engineering. 643\u2013652","author":"Subramanian Siddharth","year":"2014","unstructured":"Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API documentation. In Proceedings of the 36th international conference on software engineering. 643\u2013652. https:\/\/doi.org\/10.1145\/2568225.2568313 10.1145\/2568225.2568313"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 25th international symposium on software testing and analysis. 118\u2013129","author":"Terragni Valerio","year":"2016","unstructured":"Valerio Terragni, Yepang Liu, and Shing-Chi Cheung. 2016. CSNIPPEX: automated synthesis of compilable code snippets from Q&A sites. In Proceedings of the 25th international symposium on software testing and analysis. 118\u2013129. https:\/\/doi.org\/10.1145\/2931037.2931058 10.1145\/2931037.2931058"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.scico.2023.102941"},{"key":"e_1_2_1_27_1","volume-title":"8th International Conference on Learning Representations, ICLR","author":"Wei Jiayi","year":"2020","unstructured":"Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. LambdaNet: Probabilistic Type Inference using Graph Neural Networks. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=Hkx6hANtwH"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-4380-9_16"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 13th International Conference on Mining Software Repositories (MSR \u201916)","author":"Yang Di","year":"2016","unstructured":"Di Yang, Aftab Hussain, and Cristina Videira Lopes. 2016. From query to usable code: an analysis of stack overflow code snippets. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR \u201916). Association for Computing Machinery, 391\u2013402. isbn:9781450341868 https:\/\/doi.org\/10.1145\/2901739.2901767 10.1145\/2901739.2901767"},{"key":"e_1_2_1_30_1","volume-title":"2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER). 1, 90\u2013101","author":"Ye Deheng","year":"2016","unstructured":"Deheng Ye, Zhenchang Xing, Chee Yong Foo, Zi Qun Ang, Jing Li, and Nachiket Kapre. 2016. Software-specific named entity recognition in software engineering social content. In 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER). 1, 90\u2013101. https:\/\/doi.org\/10.1109\/SANER.2016.10 10.1109\/SANER.2016.10"},{"key":"e_1_2_1_31_1","volume-title":"International conference on machine learning (Proceedings of Machine Learning Research","volume":"11339","author":"Zhang Jingqing","year":"2020","unstructured":"Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International conference on machine learning (Proceedings of Machine Learning Research, Vol. 119). 11328\u201311339. http:\/\/proceedings.mlr.press\/v119\/zhang20ae.html"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715724","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:19:54Z","timestamp":1750346394000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715724"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":31,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3715724"],"URL":"https:\/\/doi.org\/10.1145\/3715724","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}