{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T11:38:06Z","timestamp":1759837086291,"version":"3.44.0"},"reference-count":66,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T00:00:00Z","timestamp":1754611200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T00:00:00Z","timestamp":1754611200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100020618","name":"Universit\u00e4t Bayreuth","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100020618","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The code base of software projects evolves essentially through inserting and removing information to and from the source code. We can measure this evolution via the elements of infor-mation\u2014tokens, words, nodes\u2014of the respective representation of the code. In this work, we approach the measurement of the information content of the source code of open-source projects from an information-theoretic standpoint. Our focus is on the entropy of two funda-mental representations of code: tokens and abstract syntax tree nodes, from which we derive definitions of textual and structural entropy. We proceed with an empirical assessment where we evaluate the evolution patterns of the entropy of 95 actively maintained open source pro-jects. We calculate the statistical relationships between our derived entropy metrics and classic methods of measuring code complexity and learn that entropy may capture different dimen-sions of complexity than classic metrics. Finally, we conduct entropy-based anomaly detection of unusual changes to demonstrate that our approach may effectively recognise unusual source code change events with over 60% precision, and lay the groundwork for improvements to information-theoretic measurement of source code evolution, thus paving the way for a new approach to statically gauging program complexity throughout its development.<\/jats:p>","DOI":"10.1007\/s10664-025-10644-y","type":"journal-article","created":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T05:32:25Z","timestamp":1754631145000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Information-theoretic detection of unusual source code changes"],"prefix":"10.1007","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3754-029X","authenticated-orcid":false,"given":"Adriano","family":"Torres","sequence":"first","affiliation":[]},{"given":"Sebastian","family":"Baltes","sequence":"additional","affiliation":[]},{"given":"Christoph","family":"Treude","sequence":"additional","affiliation":[]},{"given":"Markus","family":"Wagner","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,8,8]]},"reference":[{"key":"10644_CR1","volume-title":"Compilers, Principles, Techniques, and Tools","author":"AV Aho","year":"1986","unstructured":"Aho AV, Sethi R, Ullman JD (1986) Compilers, Principles, Techniques, and Tools. Addison-Wesley, White Plains"},{"key":"10644_CR2","doi-asserted-by":"publisher","unstructured":"Akundi A, Smith E, Tseng TL (2018) Information entropy applied to software based control flow graphs. International Journal of System Assurance Engineering and Management 9:1080\u20131091. https:\/\/doi.org\/10.1007\/s13198-018-0740-y","DOI":"10.1007\/s13198-018-0740-y"},{"key":"10644_CR3","doi-asserted-by":"publisher","unstructured":"Allen E, Khoshgoftaar T, Chen Y (2001) Measuring coupling and cohesion of software modules: an information-theory approach. In: Proceedings Seventh International Software Metrics Symposium, pp 124\u2013134,https:\/\/doi.org\/10.1109\/METRIC.2001.915521","DOI":"10.1109\/METRIC.2001.915521"},{"key":"10644_CR4","unstructured":"Anonymous (2024) GSM-symbolic: Understanding the limitations of mathematical reasoning in large language models. In: Submitted to The Thirteenth International Conference on Learning Representations, URL https:\/\/openreview.net\/forum?id=AjXkRZIvjB, under review"},{"issue":"4","key":"10644_CR5","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1007\/s10664-021-10072-8","volume":"27","author":"S Baltes","year":"2022","unstructured":"Baltes S, Ralph P (2022) Sampling in software engineering research: A critical review and guidelines. Empir Softw Eng 27(4):94","journal-title":"Empir Softw Eng"},{"key":"10644_CR6","doi-asserted-by":"publisher","unstructured":"Berlinger E (1980) An information theory based complexity measure. In: Proceedings of the May 19-22, 1980, National Computer Conference, Association for Computing Machinery, New York, NY, USA, AFIPS \u201980, p 773\u2013779,https:\/\/doi.org\/10.1145\/1500518.1500651","DOI":"10.1145\/1500518.1500651"},{"key":"10644_CR7","doi-asserted-by":"publisher","unstructured":"Blackburn J, Scudder G, Van Wassenhove L (1997) Improving speed and productivity of software development: A global survey of software developers. Software Engineering, IEEE Transactions on 22:875\u2013885. https:\/\/doi.org\/10.1109\/32.553636","DOI":"10.1109\/32.553636"},{"key":"10644_CR8","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1007\/BF02249046","volume":"1","author":"B Boehm","year":"1995","unstructured":"Boehm B, Clark B, Horowitz E, Westland C, Madachy R, Selby R (1995) Cost models for future software life cycle processes: Cocomo 2.0. Annals of software engineering 1:57\u201394","journal-title":"Annals of software engineering"},{"key":"10644_CR9","unstructured":"Boehm BW, Abts C, Brown AW, Chulani S, Clark BK, Horowitz E, Madachy R, Reifer DJ, Steece B (2009) Software cost estimation with COCOMO II. Prentice Hall Press"},{"key":"10644_CR10","unstructured":"Brewer MB, Crano WD (2000) Research design and issues of validity. Handbook of research methods in social and personality psychology pp 3\u201316"},{"key":"10644_CR11","unstructured":"Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. CoRR abs\/2005.14165, URL https:\/\/arxiv.org\/abs\/2005.14165"},{"issue":"1","key":"10644_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10664-012-9214-z","volume":"19","author":"G Canfora","year":"2014","unstructured":"Canfora G, Cerulo L, Cimitile M, Di Penta M (2014) How changes affect software entropy: an empirical study. Empir Softw Eng 19(1):1\u201338","journal-title":"Empir Softw Eng"},{"key":"10644_CR13","doi-asserted-by":"publisher","unstructured":"Chaturvedi KK, Kapur PK, Anand S, Singh VB (2014) Predicting the complexity of code changes using entropy based measures. Int J Syst Assur Eng Manag 5(2):155\u2013164. https:\/\/doi.org\/10.1007\/s13198-014-0226-5","DOI":"10.1007\/s13198-014-0226-5"},{"key":"10644_CR14","doi-asserted-by":"publisher","unstructured":"Chen E (1978) Program complexity and programmer productivity. IEEE Transactions on Software Engineering SE-4(3):187\u2013194, https:\/\/doi.org\/10.1109\/TSE.1978.231497","DOI":"10.1109\/TSE.1978.231497"},{"key":"10644_CR15","doi-asserted-by":"crossref","unstructured":"Christakis M, Bird C (2016) What developers want and need from program analysis: An empirical study. In: 2016 31st IEEE\/ACM International Conference on Automated Software Engineering (ASE), pp 332\u2013343","DOI":"10.1145\/2970276.2970347"},{"key":"10644_CR16","doi-asserted-by":"publisher","unstructured":"Coleman D, Ash D, Lowther B, Oman P (1994) Using metrics to evaluate software system maintainability. Computer 27(8):44\u20134. https:\/\/doi.org\/10.1109\/2.303623","DOI":"10.1109\/2.303623"},{"key":"10644_CR17","unstructured":"Cook C (1993) Information theory metric for assembly language. Software Engineering Strategies pp 52\u201360"},{"key":"10644_CR18","doi-asserted-by":"publisher","unstructured":"Csiszar I (1975) $$I$$-Divergence Geometry of Probability Distributions and Minimization Problems. Ann Probab 3(1):146\u2013158. https:\/\/doi.org\/10.1214\/aop\/1176996454","DOI":"10.1214\/aop\/1176996454"},{"key":"10644_CR19","doi-asserted-by":"crossref","unstructured":"Dabic O, Aghajani E, Bavota G (2021) Sampling projects in github for MSR studies. In: 18th IEEE\/ACM International Conference on Mining Software Repositories, MSR 2021, IEEE, pp 560\u2013564","DOI":"10.1109\/MSR52588.2021.00074"},{"key":"10644_CR20","doi-asserted-by":"publisher","unstructured":"Davis J, LeBlanc R (1988) A study of the applicability of complexity measures. IEEE Trans Software Eng 14(9):1366\u20131372. https:\/\/doi.org\/10.1109\/32.6179","DOI":"10.1109\/32.6179"},{"key":"10644_CR21","unstructured":"De\u00a0Win B, Piessens F, Joosen W, Verhanneman T (2002) On the importance of the separation-of-concerns principle in secure software engineering. In: Workshop on the Application of Engineering Principles to System Security Design, Citeseer, pp 1\u201310"},{"key":"10644_CR22","doi-asserted-by":"publisher","unstructured":"Dourish P, Bellotti V (1992) Awareness and coordination in shared workspaces. In: Proceedings of the 1992 ACM Conference on Computer-Supported Cooperative Work, Association for Computing Machinery, New York, NY, USA, CSCW \u201992, p 107\u2013114https:\/\/doi.org\/10.1145\/143457.143468","DOI":"10.1145\/143457.143468"},{"key":"10644_CR23","volume-title":"Refactoring","author":"M Fowler","year":"2018","unstructured":"Fowler M (2018) Refactoring. Addison-Wesley Professional, Boston"},{"key":"10644_CR24","unstructured":"Graham P (2012) Startup= growth. Paul Graham"},{"key":"10644_CR25","volume-title":"Elements of Software Science (Operating and Programming Systems Series)","author":"MH Halstead","year":"1977","unstructured":"Halstead MH (1977) Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., USA"},{"key":"10644_CR26","doi-asserted-by":"crossref","unstructured":"Hassan AE (2009) Predicting faults using the complexity of code changes. In: 2009 IEEE 31st international conference on software engineering, IEEE, pp 78\u201388","DOI":"10.1109\/ICSE.2009.5070510"},{"key":"10644_CR27","doi-asserted-by":"crossref","unstructured":"Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning. Springer Series in Statistics, Springer New York Inc., New York, NY, USA","DOI":"10.1007\/978-0-387-21606-5"},{"key":"10644_CR28","doi-asserted-by":"publisher","unstructured":"Hellerman L (1972) A measure of computational work. IEEE Transactions on Computers C-21(5):439\u2013446, https:\/\/doi.org\/10.1109\/T-C.1972.223539","DOI":"10.1109\/T-C.1972.223539"},{"issue":"5","key":"10644_CR29","doi-asserted-by":"publisher","first-page":"122","DOI":"10.1145\/2902362","volume":"59","author":"A Hindle","year":"2016","unstructured":"Hindle A, Barr ET, Gabel M, Su Z, Devanbu P (2016) On the naturalness of software. Commun ACM 59(5):122\u2013131","journal-title":"Commun ACM"},{"key":"10644_CR30","doi-asserted-by":"publisher","unstructured":"Hucka M (2018) Spiral: splitters for identifiers in source code files. Journal of Open Source Software 3(24):653,https:\/\/doi.org\/10.21105\/joss.00653","DOI":"10.21105\/joss.00653"},{"issue":"1","key":"10644_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10648-010-9150-7","volume":"23","author":"S Kalyuga","year":"2011","unstructured":"Kalyuga S (2011) Cognitive load theory: How many types of load does it really need? Educ Psychol Rev 23(1):1\u201319","journal-title":"Educ Psychol Rev"},{"key":"10644_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.lindif.2023.102274","volume":"103","author":"E Kasneci","year":"2023","unstructured":"Kasneci E, Se\u00dfler K, K\u00fcchemann S, Bannert M, Dementieva D, Fischer F, Gasser U, Groh G, G\u00fcnnemann S, H\u00fcllermeier E et al (2023) Chatgpt for good? on opportunities and challenges of large language models for education. Learn Individ Differ 103:102274","journal-title":"Learn Individ Differ"},{"key":"10644_CR33","doi-asserted-by":"publisher","unstructured":"Kaur A, Kaur K, Chopra D (2017) An empirical study of software entropy based bug prediction using machine learning. Int J Syst Assur Eng Manag 8(2s):599\u2013616. https:\/\/doi.org\/10.1007\/s13198-016-0479-2","DOI":"10.1007\/s13198-016-0479-2"},{"key":"10644_CR34","doi-asserted-by":"crossref","unstructured":"Keenan D, Greer D, Cutting D (2022) An investigation of entropy and refactoring in software evolution. In: International Conference on Product-Focused Software Process Improvement, Springer, pp 282\u2013297","DOI":"10.1007\/978-3-031-21388-5_20"},{"key":"10644_CR35","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1007\/BF00213632","volume":"3","author":"TM Khoshgoftaar","year":"1994","unstructured":"Khoshgoftaar TM, Allen EB (1994) Applications of information theory to software engineering measurement. Software Qual J 3:79\u2013103","journal-title":"Software Qual J"},{"issue":"03","key":"10644_CR36","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1142\/S0218539398000224","volume":"5","author":"TM Khoshgoftaar","year":"1998","unstructured":"Khoshgoftaar TM, Allen EB (1998) An information theoretic approach to predicting software faults. Int J Reliab Qual Saf Eng 5(03):227\u2013248","journal-title":"Int J Reliab Qual Saf Eng"},{"key":"10644_CR37","doi-asserted-by":"publisher","unstructured":"Kitchenham B, Pearl Brereton O, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering \u2013 a systematic literature review. Information and Software Technology 51(1):7\u201315, https:\/\/doi.org\/10.1016\/j.infsof.2008.09.009, URL https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0950584908001390, special Section - Most Cited Articles in 2002 and Regular Research Papers","DOI":"10.1016\/j.infsof.2008.09.009"},{"issue":"6","key":"10644_CR38","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.69.066138","volume":"69","author":"A Kraskov","year":"2004","unstructured":"Kraskov A, St\u00f6gbauer H, Grassberger P (2004) Estimating mutual information. Physical Review E-Statistical, Nonlinear, and Soft Matter Physics 69(6):066138","journal-title":"Physical Review E-Statistical, Nonlinear, and Soft Matter Physics"},{"key":"10644_CR39","doi-asserted-by":"crossref","unstructured":"Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics pp 159\u2013174","DOI":"10.2307\/2529310"},{"key":"10644_CR40","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1016\/0164-1212(79)90022-0","volume":"1","author":"M Lehman","year":"1979","unstructured":"Lehman M (1979) On understanding laws, evolution, and conservation in the large-program life cycle. J Syst Softw 1:213\u2013221","journal-title":"J Syst Softw"},{"key":"10644_CR41","doi-asserted-by":"publisher","unstructured":"Lehman M (1980) Programs, life cycles, and laws of software evolution. Proc IEEE 68(9):1060\u20131076. https:\/\/doi.org\/10.1109\/PROC.1980.11805","DOI":"10.1109\/PROC.1980.11805"},{"key":"10644_CR42","doi-asserted-by":"publisher","unstructured":"Lehman M (1984) Program evolution. Information Processing & Management 20(1):19\u20133https:\/\/doi.org\/10.1016\/0306-4573(84)90037-2, URL https:\/\/www.sciencedirect.com\/science\/article\/pii\/0306457384900372, special Issue Empirical Foundations of Information and Software Science","DOI":"10.1016\/0306-4573(84)90037-2"},{"key":"10644_CR43","unstructured":"Lewkowycz A, Andreassen AJ, Dohan D, Dyer E, Michalewski H, Ramasesh VV, Slone A, Anil C, Schlag I, Gutman-Solo T, Wu Y, Neyshabur B, Gur-Ari G, Misra V (2022) Solving quantitative reasoning problems with language models. In: Oh AH, Agarwal A, Belgrave D, Cho K (eds) Advances in Neural Information Processing Systems, URL https:\/\/openreview.net\/forum?id=IFXTZERXdM7"},{"key":"10644_CR44","doi-asserted-by":"publisher","unstructured":"McCabe T (1976) A complexity measure. IEEE Transactions on Software Engineering SE-2(4):308\u2013320, https:\/\/doi.org\/10.1109\/TSE.1976.233837","DOI":"10.1109\/TSE.1976.233837"},{"key":"10644_CR45","volume-title":"Code Complete","author":"S McConnell","year":"2004","unstructured":"McConnell S (2004) Code Complete, 2nd edn. Microsoft Press, Redmond, WA, USA","edition":"2"},{"key":"10644_CR46","unstructured":"McConnell S (2008) Managing technical debt. Available at http:\/\/www.construx.com\/uploadedfiles\/resources\/whitepapers\/Managing%20Technical%20Debt.pdf"},{"issue":"1","key":"10644_CR47","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1016\/0164-1212(81)90045-5","volume":"2","author":"SN Mohanty","year":"1981","unstructured":"Mohanty SN (1981) Entropy metrics for software design evaluation. J Syst Softw 2(1):39\u201346","journal-title":"J Syst Softw"},{"key":"10644_CR48","unstructured":"Nicholas\u00a0Smith FT Danny van\u00a0Bruggen (2023) Javaparser"},{"key":"10644_CR49","unstructured":"OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, Almeida D, Altenschmidt J, Altman S, Anadkat S, Avila R, Babuschkin I, Balaji S, Balcom V, Baltescu P, Bao H, Bavarian M, Belgum J, Bello I, Berdine J, Bernadett-Shapiro G, Berner C, Bogdonoff L, Boiko O, Boyd M, Brakman AL, Brockman G, Brooks T, Brundage M, Button K, Cai T, Campbell R, Cann A, Carey B, Carlson C, Carmichael R, Chan B, Chang C, Chantzis F, Chen D, Chen S, Chen R, Chen J, Chen M, Chess B, Cho C, Chu C, Chung HW, Cummings D, Currier J, Dai Y, Decareaux C, Degry T, Deutsch N, Deville D, Dhar A, Dohan D, Dowling S, Dunning S, Ecoffet A, Eleti A, Eloundou T, Farhi D, Fedus L, Felix N, Fishman SP, Forte J, Fulford I, Gao L, Georges E, Gibson C, Goel V, Gogineni T, Goh G, Gontijo-Lopes R, Gordon J, Grafstein M, Gray S, Greene R, Gross J, Gu SS, Guo Y, Hallacy C, Han J, Harris J, He Y, Heaton M, Heidecke J, Hesse C, Hickey A, Hickey W, Hoeschele P, Houghton B, Hsu K, Hu S, Hu X, Huizinga J, Jain S, Jain S, Jang J, Jiang A, Jiang R, Jin H, Jin D, Jomoto S, Jonn B, Jun H, Kaftan T, \u0141ukasz Kaiser, Kamali A, Kanitscheider I, Keskar NS, Khan T, Kilpatrick L, Kim JW, Kim C, Kim Y, Kirchner JH, Kiros J, Knight M, Kokotajlo D, \u0141ukasz Kondraciuk, Kondrich A, Konstantinidis A, Kosic K, Krueger G, Kuo V, Lampe M, Lan I, Lee T, Leike J, Leung J, Levy D, Li CM, Lim R, Lin M, Lin S, Litwin M, Lopez T, Lowe R, Lue P, Makanju A, Malfacini K, Manning S, Markov T, Markovski Y, Martin B, Mayer K, Mayne A, McGrew B, McKinney SM, McLeavey C, McMillan P, McNeil J, Medina D, Mehta A, Menick J, Metz L, Mishchenko A, Mishkin P, Monaco V, Morikawa E, Mossing D, Mu T, Murati M, Murk O, M\u00e9ly D, Nair A, Nakano R, Nayak R, Neelakantan A, Ngo R, Noh H, Ouyang L, O\u2019Keefe C, Pachocki J, Paino A, Palermo J, Pantuliano A, Parascandolo G, Parish J, Parparita E, Passos A, Pavlov M, Peng A, Perelman A, de\u00a0Avila Belbute\u00a0Peres F, Petrov M, de\u00a0Oliveira\u00a0Pinto HP, Michael, Pokorny, Pokrass M, Pong VH, Powell T, Power A, Power B, Proehl E, Puri R, Radford A, Rae J, Ramesh A, Raymond C, Real F, Rimbach K, Ross C, Rotsted B, Roussez H, Ryder N, Saltarelli M, Sanders T, Santurkar S, Sastry G, Schmidt H, Schnurr D, Schulman J, Selsam D, Sheppard K, Sherbakov T, Shieh J, Shoker S, Shyam P, Sidor S, Sigler E, Simens M, Sitkin J, Slama K, Sohl I, Sokolowsky B, Song Y, Staudacher N, Such FP, Summers N, Sutskever I, Tang J, Tezak N, Thompson MB, Tillet P, Tootoonchian A, Tseng E, Tuggle P, Turley N, Tworek J, Uribe JFC, Vallone A, Vijayvergiya A, Voss C, Wainwright C, Wang JJ, Wang A, Wang B, Ward J, Wei J, Weinmann C, Welihinda A, Welinder P, Weng J, Weng L, Wiethoff M, Willner D, Winter C, Wolrich S, Wong H, Workman L, Wu S, Wu J, Wu M, Xiao K, Xu T, Yoo S, Yu K, Yuan Q, Zaremba W, Zellers R, Zhang C, Zhang M, Zhao S, Zheng T, Zhuang J, Zhuk W, Zoph B (2024) Gpt-4 technical report. URL https:\/\/arxiv.org\/abs\/2303.08774, 2303.08774"},{"issue":"10","key":"10644_CR50","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1145\/383845.383856","volume":"44","author":"H Ossher","year":"2001","unstructured":"Ossher H, Tarr P (2001) Using multidimensional separation of concerns to (re) shape evolving software. Commun ACM 44(10):43\u201350","journal-title":"Commun ACM"},{"key":"10644_CR51","unstructured":"Parr T (2025) Antlr grammars v4. Available at https:\/\/github.com\/antlr\/grammars-v4"},{"key":"10644_CR52","unstructured":"Parr T (2013) The definitive ANTLR 4 reference (2nd. ed.). Pragmatic Bookshelf"},{"key":"10644_CR53","doi-asserted-by":"publisher","unstructured":"Paulson D, Wand Y (1992) An automated approach to information systems decomposition. IEEE Trans Software Eng 18(3):174\u2013189. https:\/\/doi.org\/10.1109\/32.126767","DOI":"10.1109\/32.126767"},{"key":"10644_CR54","doi-asserted-by":"publisher","unstructured":"Ralph P, Tempero E (2018) Construct validity in software engineering research and software metrics. In: Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018, Association for Computing Machinery, New York, NY, USA, EASE \u201918, p 13\u201323,https:\/\/doi.org\/10.1145\/3210459.3210461","DOI":"10.1145\/3210459.3210461"},{"issue":"3","key":"10644_CR55","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","volume":"27","author":"CE Shannon","year":"1948","unstructured":"Shannon CE (1948) A mathematical theory of communication. The Bell system technical journal 27(3):379\u2013423","journal-title":"The Bell system technical journal"},{"key":"10644_CR56","doi-asserted-by":"publisher","unstructured":"Spadini D, Aniche M, Bacchelli A (2018) Pydriller: Python framework for mining software repositories. In: The 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC\/FSE), https:\/\/doi.org\/10.1145\/3236024.3264598","DOI":"10.1145\/3236024.3264598"},{"key":"10644_CR57","unstructured":"StackOverflow (2024) 2024 survey. Available at https:\/\/survey.stackoverflow.co\/2024\/technology"},{"key":"10644_CR58","doi-asserted-by":"crossref","unstructured":"Tarr P, Ossher H, Harrison W, Sutton\u00a0Jr SM (1999) N degrees of separation: Multi-dimensional separation of concerns. In: Proceedings of the 21st international conference on Software engineering, pp 107\u2013119","DOI":"10.1145\/302405.302457"},{"issue":"10","key":"10644_CR59","doi-asserted-by":"publisher","first-page":"949","DOI":"10.1016\/0895-4356(88)90031-5","volume":"41","author":"W Thompson","year":"1988","unstructured":"Thompson W, Walter SD (1988) A reappraisal of the kappa coefficient. J Clin Epidemiol 41(10):949\u2013958","journal-title":"J Clin Epidemiol"},{"key":"10644_CR60","volume-title":"Engineering A Compiler","author":"L Torczon","year":"2007","unstructured":"Torczon L, Cooper K (2007) Engineering A Compiler, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA","edition":"2"},{"key":"10644_CR61","unstructured":"Torres A (2024) Supplementary material. Available at https:\/\/zenodo.org\/records\/11180885"},{"key":"10644_CR62","doi-asserted-by":"publisher","unstructured":"Torres A, Baltes S, Treude C, Wagner M (2023) Applying information theory to software evolution. In: 2023 IEEE\/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE), pp 48\u201355, https:\/\/doi.org\/10.1109\/NLBSE59153.2023.00017","DOI":"10.1109\/NLBSE59153.2023.00017"},{"key":"10644_CR63","doi-asserted-by":"publisher","unstructured":"Treude C, Figueira\u00a0Filho F, Kulesza U (2015) Summarizing and measuring development activity. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC\/FSE 2015, p 625\u2013636, https:\/\/doi.org\/10.1145\/2786805.2786827","DOI":"10.1145\/2786805.2786827"},{"key":"10644_CR64","doi-asserted-by":"publisher","unstructured":"Treude C, Leite L, Aniche M (2018) Unusual events in github repositories. J Syst Softw 142:237\u2013247. https:\/\/doi.org\/10.1016\/j.jss.2018.04.063, https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0164121218300876","DOI":"10.1016\/j.jss.2018.04.063"},{"key":"10644_CR65","doi-asserted-by":"crossref","unstructured":"Tversky A, Kahneman D (1974) Judgment under uncertainty: Heuristics and biases: Biases in judgments reveal some heuristics of thinking under uncertainty. science 185(4157):1124\u20131131","DOI":"10.1126\/science.185.4157.1124"},{"key":"10644_CR66","doi-asserted-by":"crossref","unstructured":"Zapf A, Castell S, Morawietz L, Karch A (2016) Measuring inter-rater reliability for nominal data \u2013 which coefficients and confidence intervals are appropriate? BMC Medical Research Methodology 16, URL https:\/\/api.semanticscholar.org\/CorpusID:16038581","DOI":"10.1186\/s12874-016-0200-9"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-025-10644-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-025-10644-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-025-10644-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,13]],"date-time":"2025-09-13T08:54:29Z","timestamp":1757753669000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-025-10644-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,8]]},"references-count":66,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,9]]}},"alternative-id":["10644"],"URL":"https:\/\/doi.org\/10.1007\/s10664-025-10644-y","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"type":"print","value":"1382-3256"},{"type":"electronic","value":"1573-7616"}],"subject":[],"published":{"date-parts":[[2025,8,8]]},"assertion":[{"value":"24 March 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 August 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors confirm that this work complies with ethical standards.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with Ethical Standards"}},{"value":"The authors declare no competing interests. However, Sebastian Baltes and Christoph Treude are members of the EMSE Editorial Board, which is disclosed here for transparency.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest\/Competing Interests"}},{"value":"No research involving humans or animals was conducted in this study. All code bases analyzed in this work were publicly available as open-source projects at the time of analysis.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}},{"value":"No research involving humans or animals was conducted in this study. Therefore, no informed consent was required.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed Consent"}}],"article-number":"153"}}