{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T03:51:38Z","timestamp":1777175498371,"version":"3.51.4"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,3,14]],"date-time":"2024-03-14T00:00:00Z","timestamp":1710374400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"U.S. National Science Foundation","doi-asserted-by":"crossref","award":["CCF-2146233"],"award-info":[{"award-number":["CCF-2146233"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000006","name":"U.S. Office of Naval Research","doi-asserted-by":"crossref","award":["N000142212111"],"award-info":[{"award-number":["N000142212111"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>\n            For many years now, modern software is known to be developed in multiple languages (hence termed as\n            <jats:italic>multilingual<\/jats:italic>\n            or\n            <jats:italic>multi-language<\/jats:italic>\n            software). Yet, to date, we still only have very limited knowledge about how multilingual software systems are constructed. For instance, it is not yet really clear how different languages are used, selected together, and why they have been so in multilingual software development. Given the fact that using multiple languages in a single software project has become a norm, understanding language use and selection (i.e.,\n            <jats:italic>language profile<\/jats:italic>\n            ) as a basic element of the\n            <jats:italic>multilingual construction<\/jats:italic>\n            in contemporary software engineering is an essential first step.\n          <\/jats:p>\n          <jats:p>\n            In this article, we set out to fill this gap with a large-scale characterization study on language use and selection in open-source multilingual software. We start with presenting\n            <jats:italic>an updated overview<\/jats:italic>\n            of language use in 7,113 GitHub projects spanning the 5 past years by characterizing overall statistics of language profiles, followed by\n            <jats:italic>a deeper look<\/jats:italic>\n            into the functionality relevance\/justification of language selection in these projects through association rule mining. We proceed with an evolutionary characterization of 1,000 GitHub projects for each of the 10 past years to provide\n            <jats:italic>a longitudinal view<\/jats:italic>\n            of how language use and selection have changed over the years, as well as how the association between functionality and language selection has been evolving.\n          <\/jats:p>\n          <jats:p>Among many other findings, our study revealed a growing trend of using three to five languages in one multilingual software project and the noticeable stableness of top language selections. We found a non-trivial association between language selection and certain functionality domains, which was less stable than that with individual languages over time. In a historical context, we also have observed major shifts in these characteristics of multilingual systems both in contrast to earlier peer studies and along the evolutionary timeline. Our findings offer essential knowledge on the multilingual construction in modern software development. Based on our results, we also provide insights and actionable suggestions for both researchers and developers of multilingual systems.<\/jats:p>","DOI":"10.1145\/3631967","type":"journal-article","created":{"date-parts":[[2023,11,6]],"date-time":"2023-11-06T11:54:55Z","timestamp":1699271695000},"page":"1-46","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["How Are Multilingual Systems Constructed: Characterizing Language Use and Selection in Open-Source Multilingual Software"],"prefix":"10.1145","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0194-2115","authenticated-orcid":false,"given":"Wen","family":"Li","sequence":"first","affiliation":[{"name":"Washington State University, Pullman, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8916-2736","authenticated-orcid":false,"given":"Austin","family":"Marino","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9298-9757","authenticated-orcid":false,"given":"Haoran","family":"Yang","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0230-5524","authenticated-orcid":false,"given":"Na","family":"Meng","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksburg, VA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2990-1614","authenticated-orcid":false,"given":"Li","family":"Li","sequence":"additional","affiliation":[{"name":"Monash University, Clayton VIC, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5224-9970","authenticated-orcid":false,"given":"Haipeng","family":"Cai","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,3,14]]},"reference":[{"key":"e_1_3_3_2_2","first-page":"72","volume-title":"29th Annual International Conference on Computer Science and Software Engineering","author":"Abidi Mouna","year":"2019","unstructured":"Mouna Abidi, Manel Grichi, and Foutse Khomh. 2019. Behind the scenes: Developers\u2019 perception of multi-language practices. In 29th Annual International Conference on Computer Science and Software Engineering. 72\u201381."},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3432690"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340571"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSAC.2013.55"},{"issue":"12","key":"e_1_3_3_6_2","first-page":"4733","article-title":"D \\(^2\\) ABS: A framework for dynamic dependence analysis of distributed programs","volume":"48","author":"Cai Haipeng","year":"2022","unstructured":"Haipeng Cai and Xiaoqin Fu. 2022. D \\(^2\\) ABS: A framework for dynamic dependence analysis of distributed programs. IEEE Transactions on Software Engineering (TSE) 48, 12 (2022), 4733\u20134761.","journal-title":"IEEE Transactions on Software Engineering (TSE)"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2017.35"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.2975176"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2970276.2970352"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1177\/0049124113500475"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2005.55"},{"key":"e_1_3_3_12_2","unstructured":"CORBA. 1991. Common Object Request Broker Architecture (CORBA). https:\/\/www.omg.org\/spec\/CORBA\/"},{"key":"e_1_3_3_13_2","volume-title":"Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory","author":"Corbin Juliet","year":"2014","unstructured":"Juliet Corbin and Anselm Strauss. 2014. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. Sage Publications."},{"key":"e_1_3_3_14_2","first-page":"1","volume-title":"2nd International Workshop on Public Data about Software Development (WoPDaSD\u201907)","author":"Delorey Daniel P.","year":"2007","unstructured":"Daniel P. Delorey, Charles D. Knutson, and Christophe Giraud-Carrier. 2007. Programming language trends in open source development: An evaluation using data from all production phase sourceforge projects. In 2nd International Workshop on Public Data about Software Development (WoPDaSD\u201907). 1\u20135."},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3379345"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510454.3516859"},{"key":"e_1_3_3_17_2","unstructured":"GitHub. 2023. GitHut 2.0 - GitHub Language Stats. https:\/\/madnight.github.io\/githut\/"},{"key":"e_1_3_3_18_2","unstructured":"GitHub Inc.2020. GitHub: A US-based global company provides hosting for software development version control using Git. https:\/\/github.com\/"},{"key":"e_1_3_3_19_2","unstructured":"GitHub Inc.2020. GitHub Developer: Provides APIs to retrive or query repositories in GitHub. https:\/\/developer.github.com\/v3"},{"key":"e_1_3_3_20_2","unstructured":"Google Brain Team. 2021. The TensorFlow project. https:\/\/github.com\/tensorflow\/tensorflow"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2020.3024873"},{"key":"e_1_3_3_22_2","unstructured":"gRPC. 2020. gRPC Tutorial. https:\/\/grpc.io\/docs\/"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.22159\/ajpcr.2019.v12i18.33339"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.5555\/1594755"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-015-9393-5"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/1985441.1985447"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416558"},{"key":"e_1_3_3_28_2","unstructured":"Justin Lestal. 2023. How many programming and coding languages are there?https:\/\/devskiller.com\/how-many-programming-languages\/"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549173"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3558925"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-Companion52605.2021.00119"},{"key":"e_1_3_3_32_2","first-page":"2513","volume-title":"31st USENIX Security Symposium (USENIX Security 22)","author":"Li Wen","year":"2022","unstructured":"Wen Li, Jiang Ming, Xiapu Luo, and Haipeng Cai. 2022. PolyCruise: A cross-language dynamic information flow analysis. In 31st USENIX Security Symposium (USENIX Security 22). 2513\u20132530."},{"key":"e_1_3_3_33_2","first-page":"1379","volume-title":"32nd USENIX Security Symposium (USENIX Security 23)","author":"Li Wen","year":"2023","unstructured":"Wen Li, Jinyang Ruan, Guangbei Yi, Long Cheng, Xiapu Luo, and Haipeng Cai. 2023. PolyFuzz: Holistic greybox fuzzing of multi-language systems. In 32nd USENIX Security Symposium (USENIX Security 23). 1379\u20131396."},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2745802.2745805"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40411-017-0035-z"},{"key":"e_1_3_3_36_2","unstructured":"Slashdot Media. 2020. SourceForge: The Complete Open-Source and Business Software Platform. https:\/\/sourceforge.net\/"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/2509136.2509515"},{"key":"e_1_3_3_38_2","volume-title":"Qualitative Data Analysis: A Methods Sourcebook","author":"Miles Matthew B.","year":"2018","unstructured":"Matthew B. Miles, A. Michael Huberman, and Johnny Salda\u00f1a. 2018. Qualitative Data Analysis: A Methods Sourcebook. Sage Publications."},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/QRS.2016.22"},{"key":"e_1_3_3_40_2","unstructured":"Adam Paszke Sam Gross Soumith Chintala and Gregory Chanan. 2021. The PyTorch project. https:\/\/github.com\/pytorch\/pytorch"},{"key":"e_1_3_3_41_2","unstructured":"Havoc Pennington. 2020. D-Bus Tutorial. https:\/\/dbus.freedesktop.org\/doc\/dbus-tutorial.html"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44801-2_8"},{"key":"e_1_3_3_43_2","unstructured":"Sebastian Raschka. 2020. Mlxtend: (Machine learning extensions) a Python library of useful tools for the day-to-day data science tasks.http:\/\/rasbt.github.io\/mlxtend"},{"key":"e_1_3_3_44_2","first-page":"155","volume-title":"22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering","author":"Ray Baishakhi","year":"2014","unstructured":"Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 155\u2013165."},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/2601248.2601269"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2015.0249"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-03260-3_34"},{"key":"e_1_3_3_48_2","unstructured":"Jason Warner. 2018. Thank you for 100 million repositories. https:\/\/github.blog\/2018-11-08-100M-repos\/"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3560880"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00157"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2019.2961897"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631967","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3631967","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:51:02Z","timestamp":1750287062000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631967"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,14]]},"references-count":50,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3631967"],"URL":"https:\/\/doi.org\/10.1145\/3631967","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,14]]},"assertion":[{"value":"2021-07-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-19","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}