{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T04:43:36Z","timestamp":1754109816359,"version":"3.37.3"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,8,9]],"date-time":"2023-08-09T00:00:00Z","timestamp":1691539200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,9]],"date-time":"2023-08-09T00:00:00Z","timestamp":1691539200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"JSPS","award":["20H04184","21K11831"],"award-info":[{"award-number":["20H04184","21K11831"]}]},{"name":"JSPS","award":["21K11833"],"award-info":[{"award-number":["21K11833"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Variable names represent a significant source of information regarding the source code, and a successful naming of variables is key to producing readable code. Programmers often use a compound variable name by concatenating two or more words to make it more informative and enhance the code readability. While each compound variable name is descriptive, a collection of them sometimes produces \u201cconfusing\u201d variable pairs if their names are highly similar, e.g., \u201cshippingHeight,\u201d vs. \u201cshippingWeight.\u201d A confusing variable pair would adversely affect the code readability because it can cause a misreading or mix-up of variables during the programming or code review activities. Toward automated support for enhancing code readability, this paper conducts a large-scale investigation of compound variable names in Java and Python programs. The investigation collects 116,921,127 pairs of compound-named variables from 1,876 open-source Java projects and 106,943,523 pairs of such variables from 2,427 open-source Python projects. Then, this study analyzes those variable pairs from two perspectives of name similarity: string similarity and semantic similarity. Through an evaluation study with 30 human participants, the data analyses show that both string and semantic similarity can help detect confusing variable pairs in Java and Python programs. In order to distill confusing variable pairs automatically, support tools for detecting confusing variable pairs are also developed in this study.<\/jats:p>","DOI":"10.1007\/s10664-023-10339-2","type":"journal-article","created":{"date-parts":[[2023,8,9]],"date-time":"2023-08-09T09:03:16Z","timestamp":1691571796000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["An automated detection of confusing variable pairs with highly similar compound names in Java and Python programs"],"prefix":"10.1007","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7074-5225","authenticated-orcid":false,"given":"Hirohisa","family":"Aman","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sousuke","family":"Amasaki","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tomoyuki","family":"Yokogawa","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Minoru","family":"Kawahara","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,8,9]]},"reference":[{"key":"10339_CR1","doi-asserted-by":"publisher","unstructured":"Aivaloglou E, Hermans F (2016) How kids code and how we know: An exploratory study on the scratch repository. In: Proceedings of the 2016 ACM Conference on International Computing Education Research, ACM, New York, NY, USA, ICER \u201916, pp 53\u201361. https:\/\/doi.org\/10.1145\/2960310.2960325, http:\/\/doi.acm.org\/10.1145\/2960310.2960325","DOI":"10.1145\/2960310.2960325"},{"key":"10339_CR2","unstructured":"Aman H, Amasaki S, Yokogawa T, Kawahara M (2019) Empirical study of fault introduction focusing on the similarity among local variable names. In: Proc. 7th Int. Workshop Quantitative Approaches to Softw. Quality, pp 3\u201311"},{"key":"10339_CR3","doi-asserted-by":"publisher","unstructured":"Aman H, Amasaki S, Yokogawa T, Kawahara M (2021a) An investigation of compound variable names toward automated detection of confusing variable pairs. In: Proc. 36th IEEE\/ACM International Conference on Automated Software Engineering Workshops, pp 133\u2013137. https:\/\/doi.org\/10.1109\/ASEW52652.2021.00036","DOI":"10.1109\/ASEW52652.2021.00036"},{"key":"10339_CR4","first-page":"489","volume-title":"Paiva ACR, Cavalli AR, entura Martins P","author":"H Aman","year":"2021","unstructured":"Aman H, Amasaki S, Yokogawa T, Kawahara M (2021) A large-scale investigation of local variable names in java programs: Is longer name better for broader scope variable? In: P\u00e9rez-Castillo R (ed) Paiva ACR, Cavalli AR, entura Martins P. Quality of Information and Communications Technology, Springer International Publishing, Cham, pp 489\u2013500"},{"issue":"1","key":"10339_CR5","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1007\/s10664-014-9350-8","volume":"21","author":"V Arnaoudova","year":"2016","unstructured":"Arnaoudova V, Di Penta M, Antoniol G (2016) Linguistic antipatterns: what they are and how developers perceive them. Empir Softw Eng 21(1):104\u2013158. https:\/\/doi.org\/10.1007\/s10664-014-9350-8","journal-title":"Empir Softw Eng"},{"key":"10339_CR6","doi-asserted-by":"publisher","unstructured":"Beniamini G, Gingichashvili S, Orbach AK, Feitelson DG (2017) Meaningful identifier names: The case of single-letter variables. In: 2017 IEEE\/ACM 25th International Conference on Program Comprehension (ICPC), pp 45\u201354. https:\/\/doi.org\/10.1109\/ICPC.2017.18","DOI":"10.1109\/ICPC.2017.18"},{"key":"10339_CR7","doi-asserted-by":"publisher","unstructured":"Binkley D, Lawrie D, Maex S, Morrell C (2009) Identifier length and limited programmer memory. Sci Comput Program 74(7):430\u2013445. https:\/\/doi.org\/10.1016\/j.scico.2009.02.006www.sciencedirect.com\/science\/article\/pii\/S0167642309000343","DOI":"10.1016\/j.scico.2009.02.006"},{"issue":"2","key":"10339_CR8","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1007\/s10664-012-9201-4","volume":"18","author":"D Binkley","year":"2013","unstructured":"Binkley D, Davis M, Lawrie D, Maletic JI, Morrell C, Sharif B (2013) The impact of identifier style on effort and comprehension. Empir Softw Eng 18(2):219\u2013276. https:\/\/doi.org\/10.1007\/s10664-012-9201-4","journal-title":"Empir Softw Eng"},{"key":"10339_CR9","doi-asserted-by":"publisher","unstructured":"Caprile, Tonella (2000) Restructuring program identifier names. In: Proceedings of 2000 International Conference on Software Maintenance, pp 97\u2013107. https:\/\/doi.org\/10.1109\/ICSM.2000.883022","DOI":"10.1109\/ICSM.2000.883022"},{"issue":"4","key":"10339_CR10","doi-asserted-by":"publisher","first-page":"1040","DOI":"10.1007\/s10664-013-9248-x","volume":"19","author":"M Ceccato","year":"2014","unstructured":"Ceccato M, Di Penta M, Falcarin P, Ricca F, Torchiano M, Tonella P (2014) A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques. Empir Softw Eng 19(4):1040\u20131074. https:\/\/doi.org\/10.1007\/s10664-013-9248-x","journal-title":"Empir Softw Eng"},{"issue":"3","key":"10339_CR11","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1007\/s11219-006-9219-1","volume":"14","author":"F Deissenboeck","year":"2006","unstructured":"Deissenboeck F, Pizka M (2006) Concise and consistent naming. Softw Q J 14(3):261\u2013282. https:\/\/doi.org\/10.1007\/s11219-006-9219-1","journal-title":"Softw Q J"},{"key":"10339_CR12","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. 2019 Conf. the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol\u00a01, pp 4171\u20134186"},{"key":"10339_CR13","unstructured":"Free Software Foundation (2018) GNU coding standards. https:\/\/www.gnu.org\/prep\/standards\/standards.html"},{"key":"10339_CR14","unstructured":"Gosling J, Joy B Jr, GLS, Bracha G, Buckley A (2014) The Java Language Specification. Addison-Wesley, Boston, MA"},{"key":"10339_CR15","doi-asserted-by":"crossref","unstructured":"Gusfield D (1997) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge","DOI":"10.1017\/CBO9780511574931"},{"issue":"24","key":"10339_CR16","doi-asserted-by":"publisher","first-page":"653","DOI":"10.21105\/joss.00653","volume":"3","author":"M Hucka","year":"2018","unstructured":"Hucka M (2018) Spiral: splitters for identifiers in source code files. J Open Source Software 3(24):653. https:\/\/doi.org\/10.21105\/joss.00653","journal-title":"J Open Source Software"},{"key":"10339_CR17","unstructured":"Kernighan BW, Pike R (1999) The Practice of Programming. Addison-Wesley, Boston, MA"},{"key":"10339_CR18","unstructured":"kernel\u00a0development community T (2016) Linux kernel coding style. https:\/\/www.kernel.org\/doc\/html\/v4.10\/process\/coding-style.html"},{"key":"10339_CR19","unstructured":"Knuth DE (2003) Selected Papers on Computer Languages. No. 139 in CSLI Lecture Notes, Center for the Study of Lang. & Inf., Stanford, California"},{"key":"10339_CR20","doi-asserted-by":"publisher","unstructured":"Lacomis J, Yin P, Schwartz EJ, Allamanis M, Goues CL, Neubig G, Vasilescu B (2019) DIRE: A neural approach to decompiled identifier naming. In: Proceedings of the 34th IEEE\/ACM International Conference on Automated Software Engineering, pp 628\u2013639. https:\/\/doi.org\/10.1109\/ASE.2019.00064","DOI":"10.1109\/ASE.2019.00064"},{"issue":"4","key":"10339_CR21","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11334-007-0031-2","volume":"3","author":"D Lawrie","year":"2007","unstructured":"Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innov Syst Softw Eng 3(4):303\u2013318. https:\/\/doi.org\/10.1007\/s11334-007-0031-2","journal-title":"Innov Syst Softw Eng"},{"key":"10339_CR22","unstructured":"Le QV, Mikolov T (2014) Distributed representations of sentences and documents. CoRR abs\/1405.4053"},{"key":"10339_CR23","unstructured":"Liblit B, Begel A, Sweetser E (2006) Cognitive perspectives on the role of naming in computer programs. In: Proc. 18th Annual Psychology of Programming Workshop, pp 53\u201367"},{"issue":"3","key":"10339_CR24","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1145\/332084.332092","volume":"4","author":"D Low","year":"1998","unstructured":"Low D (1998) Protecting java code via code obfuscation. Crossroads 4(3):21\u201323. https:\/\/doi.org\/10.1145\/332084.332092","journal-title":"Crossroads"},{"key":"10339_CR25","unstructured":"Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR abs\/1301.3781"},{"key":"10339_CR26","doi-asserted-by":"publisher","unstructured":"Palma SD, Di\u00a0Nucci D, Tamburri D (2021) RepoMiner: a language-agnostic python framework to mine software repositories for defect prediction. https:\/\/doi.org\/10.48550\/arXiv.2111.11807https:\/\/arxiv.org\/abs\/2111.11807","DOI":"10.48550\/arXiv.2111.11807"},{"key":"10339_CR27","unstructured":"Pigoski TM (1996) Practical Software Maintenance: Best Practices for Managing Your Software Investment, 1st edn. Wiley Publishing, N.J"},{"key":"10339_CR28","doi-asserted-by":"publisher","unstructured":"Reimers N, Gurevych I (2019) Sentence-BERT: Sentence embeddings using siamese BERT-networks. In: Proc. 2019 Conf. Empirical Methods in Natural Language Processing and 9th Int\u2019l Joint Conf. Natural Language Processing, pp 3982\u20133992, https:\/\/doi.org\/10.18653\/v1\/D19-1410","DOI":"10.18653\/v1\/D19-1410"},{"issue":"3","key":"10339_CR29","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1109\/TSE.2019.2901468","volume":"47","author":"S Scalabrino","year":"2021","unstructured":"Scalabrino S, Bavota G, Vendome C, Linares-V\u00e1squez M, Poshyvanyk D, Oliveto R (2021) Automatically assessing code understandability. IEEE Trans Softw Eng 47(3):595\u2013613. https:\/\/doi.org\/10.1109\/TSE.2019.2901468","journal-title":"IEEE Trans Softw Eng"},{"key":"10339_CR30","doi-asserted-by":"publisher","unstructured":"Scanniello G, Risi M, Tramontana P, Romano S (2017) Fixing faults in c and java source code: Abbreviated vs. full-word identifier names. ACM Trans Softw Eng Methodol 26(2):6:1\u20136:43. https:\/\/doi.org\/10.1145\/3104029","DOI":"10.1145\/3104029"},{"key":"10339_CR31","doi-asserted-by":"publisher","unstructured":"Schankin A, Berger A, Holt DV, Hofmeister JC, Riedel T, Beigl M (2018) Descriptive compound identifier names improve source code comprehension. In: Proc. 26th Int. Conf. Program Comprehension, pp 31\u201340. https:\/\/doi.org\/10.1145\/3196321.3196332","DOI":"10.1145\/3196321.3196332"},{"key":"10339_CR32","doi-asserted-by":"publisher","unstructured":"Swidan A, Serebrenik A, Hermans F (2017) How do scratch programmers name variables and procedures? In: 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp 51\u201360. https:\/\/doi.org\/10.1109\/SCAM.2017.12","DOI":"10.1109\/SCAM.2017.12"},{"key":"10339_CR33","doi-asserted-by":"publisher","unstructured":"Tashima K, Aman H, Amasaki S, Yokogawa T, Kawahara M (2018) Fault-prone java method analysis focusing on pair of local variables with confusing names. In: Proc. 44th Euromicro Conf. Softw. Eng. & Advanced App., pp 154\u2013158. https:\/\/doi.org\/10.1109\/SEAA.2018.00033","DOI":"10.1109\/SEAA.2018.00033"},{"key":"10339_CR34","doi-asserted-by":"publisher","unstructured":"Tran H, Tran N, Nguyen S, Nguyen H, Nguyen TN (2019) Recovering variable names for minified code with usage contexts. In: Proceedings of the 41st International Conference on Software Engineering, IEEE Press, ICSE \u201919, pp 1165\u20131175. https:\/\/doi.org\/10.1109\/ICSE.2019.00119","DOI":"10.1109\/ICSE.2019.00119"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-023-10339-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-023-10339-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-023-10339-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,4]],"date-time":"2023-10-04T12:16:42Z","timestamp":1696421802000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-023-10339-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,9]]},"references-count":34,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["10339"],"URL":"https:\/\/doi.org\/10.1007\/s10664-023-10339-2","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"type":"print","value":"1382-3256"},{"type":"electronic","value":"1573-7616"}],"subject":[],"published":{"date-parts":[[2023,8,9]]},"assertion":[{"value":"10 May 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 August 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Funding and\/or Conflicts of interests\/Competing interests"}}],"article-number":"108"}}