{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T14:48:30Z","timestamp":1776782910100,"version":"3.51.2"},"reference-count":79,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","license":[{"start":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:00:00Z","timestamp":1720742400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["CCF-2146233"],"award-info":[{"award-number":["CCF-2146233"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000006","name":"Office of Naval Research","doi-asserted-by":"crossref","award":["N000142212111"],"award-info":[{"award-number":["N000142212111"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2024,7,12]]},"abstract":"<jats:p>\n                    Increasing studies have shown bugs in multi-language software as a critical loophole in modern software quality assurance, especially those induced by language interactions (i.e.,\n                    <jats:italic toggle=\"yes\">multilingual bugs<\/jats:italic>\n                    ). Yet existing tool support for bug detection\/localization remains largely limited to single-language software, despite the long-standing prevalence of multi-language systems in various real-world software domains. Extant static\/dynamic analysis and deep learning (DL) based approaches all face major challenges in addressing multilingual bugs. In this paper, we present xLoc, a DL-based technique\/tool for detecting and localizing multilingual bugs. Motivated by results of our bug-characteristics study on top locations of multilingual bugs, xLoc first learns the\n                    <jats:italic toggle=\"yes\">general knowledge<\/jats:italic>\n                    relevant to differentiating various multilingual control-flow structures. This is achieved by pre-training a Transformer model with\n                    <jats:italic toggle=\"yes\">customized position encoding<\/jats:italic>\n                    against\n                    <jats:italic toggle=\"yes\">novel objectives.<\/jats:italic>\n                    Then, xLoc learns\n                    <jats:italic toggle=\"yes\">task-specific knowledge<\/jats:italic>\n                    for the task of multilingual bug detection\/localization, through\n                    <jats:italic toggle=\"yes\">another new position encoding scheme<\/jats:italic>\n                    (based on cross-language API vicinity) that allows for the model to attend particularly to control-flow constructs that bear most multilingual bugs during fine-tuning. We have implemented xLoc for Python-C software and curated a dataset of 3,770 buggy and 15,884 non-buggy Python-C samples, which enabled our extensive evaluation of xLoc against two state-of-the-art baselines: fine-tuned CodeT5 and zero-shot ChatGPT. Our results show that xLoc achieved 94.98% F1 and 87.24% @Top-1 accuracy, which are significantly (up to 162.88% and 511.75%) higher than the baselines. Ablation studies further confirmed significant contributions of each of the novel design elements in xLoc. With respective bug-location characteristics and labeled bug datasets for fine-tuning, our design may be applied to other language combinations beyond Python-C.\n                  <\/jats:p>","DOI":"10.1145\/3660804","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T10:22:09Z","timestamp":1720779729000},"page":"2190-2213","source":"Crossref","is-referenced-by-count":12,"title":["Learning to Detect and Localize Multilingual Bugs"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9298-9757","authenticated-orcid":false,"given":"Haoran","family":"Yang","sequence":"first","affiliation":[{"name":"Washington State University, Pullman, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8598-5181","authenticated-orcid":false,"given":"Yu","family":"Nong","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6272-4069","authenticated-orcid":false,"given":"Tao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Macau University of Science and Technology, Macau, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9082-3208","authenticated-orcid":false,"given":"Xiapu","family":"Luo","sequence":"additional","affiliation":[{"name":"Hong Kong Polytechnic University, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5224-9970","authenticated-orcid":false,"given":"Haipeng","family":"Cai","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,7,12]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"72","article-title":"Behind the scenes: developers\u2019 perception of multi-language practices","author":"Abidi Mouna","year":"2019","unstructured":"Mouna Abidi, Manel Grichi, and Foutse Khomh. 2019. Behind the scenes: developers\u2019 perception of multi-language practices. In Annual International Conference on Computer Science and Software Engineering. 72\u201381.","journal-title":"In Annual International Conference on Computer Science and Software Engineering"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3432690"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2401.10716"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00038"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2018.2855650"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340571"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2635868.2635893"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSAC.2013.55"},{"key":"e_1_3_1_10_2","article-title":"Cython: The Cython compiler for writing C extensions for the Python language","author":"Bradshaw Robert","year":"2023","unstructured":"Robert Bradshaw, Stefan Behnel, Dag Sverre Seljebotn, et al. 2023. Cython: The Cython compiler for writing C extensions for the Python language. https:\/\/github.com\/cython\/cython GitHub repository.","journal-title":"GitHub repository"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-42637-7_7"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2005.55"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-81-322-3972-7_19"},{"key":"e_1_3_1_14_2","first-page":"1","article-title":"Programming language trends in open source development: An evaluation using data from all production phase sourceforge projects","author":"Delorey Daniel P","year":"2007","unstructured":"Daniel P Delorey, Charles D Knutson, and Christophe Giraud-Carrier. 2007. Programming language trends in open source development: An evaluation using data from all production phase sourceforge projects. In Second International Workshop on Public Data about Software Development (WoPDaSD\u201907). 1\u20135.","journal-title":"Second International Workshop on Public Data about Software Development (WoPDaSD\u201907)"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2021.24224"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEST.2001.966704"},{"key":"e_1_3_1_17_2","article-title":"cpython: The Python programming language","author":"Python Software Foundation and contributors","year":"2023","unstructured":"Python Software Foundation and contributors. 2023. cpython: The Python programming language. https:\/\/github.com\/python\/cpython\/ GitHub repository.","journal-title":"GitHub repository"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3386014"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524842.3528452"},{"key":"e_1_3_1_20_2","article-title":"kitty: A cross-platform, fast, feature-rich, GP U based terminal emulator","author":"Goyal Kovid","year":"2023","unstructured":"Kovid Goyal . 2023. kitty: A cross-platform, fast, feature-rich, GP U based terminal emulator. https:\/\/github.com\/kovidgoyal\/kitty GitHub repository.","journal-title":"GitHub repository"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2020.3024873"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME46990.2020.00058"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3183440.3183485"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-020-2649-2"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00151"},{"key":"e_1_3_1_26_2","volume-title":"Software engineering best practices: lessons from successful projects in the top companies","author":"Jones Capers","year":"2010","unstructured":"Capers Jones . 2010. Software engineering best practices: lessons from successful projects in the top companies. McGraw-Hill Education."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER.2016.112"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00066"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2207.01780"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416558"},{"key":"e_1_3_1_31_2","first-page":"2513","article-title":"PolyCruise: A cross-language dynamic information flow analysis","author":"Li Wen","year":"2022","unstructured":"Wen Li, Ming Jiang, Xiapu Luo, and Haipeng Cai. 2022. PolyCruise: A cross-language dynamic information flow analysis. In 31st USENIX Security Symposium (USENIX Security 22). 2513\u20132530.","journal-title":"31st USENIX Security Symposium (USENIX Security 22)"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549173"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3558925"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3631967"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-Companion52605.2021.00119"},{"key":"e_1_3_1_36_2","first-page":"1379","article-title":"PolyFuzz: Holistic greybox fuzzing of multi-language systems","author":"Li Wen","year":"2023","unstructured":"Wen Li, Jinyang Ruan, Guangbei Yi, Long Cheng, Xiapu Luo, and Haipeng Cai. 2023. PolyFuzz: Holistic greybox fuzzing of multi-language systems. In 32nd USENIX Security Symposium (USENIX Security 23). 1379\u20131396.","journal-title":"32nd USENIX Security Symposium (USENIX Security 23)"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3339068"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00067"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549137"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3360588"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3084827"},{"key":"e_1_3_1_42_2","unstructured":"Linus Eriksson . 2022. Tree-Sitter. https:\/\/github.com\/tree-sitter\/tree-sitter."},{"key":"e_1_3_1_43_2","first-page":"307","article-title":"FANS: Fuzzing Android native system services via automated interface analysis","author":"Liu Baozheng","year":"2020","unstructured":"Baozheng Liu, Chao Zhang, Guang Gong, Yishun Zeng, Haifeng Ruan, and Jianwei Zhuge. 2020. FANS: Fuzzing Android native system services via automated interface analysis. In USENIX Security Symposium. 307\u2013323.","journal-title":"In USENIX Security Symposium"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477535"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468580"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/2745802.2745805"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40411-017-0035-z"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.5555\/553011"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510147"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/2509136.2509515"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510096"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639116"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549128"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00211"},{"key":"e_1_3_1_55_2","unstructured":"OpenAI. 2019. ChatGPT: A Large-Scale Generative Model. https:\/\/openai.com\/research\/chatgpt. Accessed: 9\/12\/2023."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2023.3241639"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP46215.2023.10179420"},{"key":"e_1_3_1_58_2","first-page":"9343","article-title":"Integrating tree path in transformer for code representation","volume":"34","author":"Peng Han","year":"2021","unstructured":"Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, and Zhi Jin. 2021. Integrating tree path in transformer for code representation. Advances in Neural Information Processing Systems 34 (2021), 9343\u20139354.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2105.12655"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3126905"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1803.02155"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6430"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/2601248.2601269"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2015.0249"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-03260-3_34"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1706.03762"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-019-0686-2"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616256"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485275"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243734.3243835"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2018.2866347"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2014.44"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3560880"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00157"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2024.3358258"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1002\/spe.3199"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2020.106486"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aiopen.2021.01.001"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660804","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3660804","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3660804","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T08:00:47Z","timestamp":1770192047000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660804"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,12]]},"references-count":79,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2024,7,12]]}},"alternative-id":["10.1145\/3660804"],"URL":"https:\/\/doi.org\/10.1145\/3660804","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,12]]}}}