{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T23:09:02Z","timestamp":1769728142809,"version":"3.49.0"},"reference-count":73,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T00:00:00Z","timestamp":1702425600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T00:00:00Z","timestamp":1702425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2024,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Context<\/jats:title><jats:p>EMF metamodels lie at the heart of model-based approaches for a variety of tasks, notably for defining the abstract syntax of modeling languages. The language design of EMF metamodels itself is part of a design process, where the needs of its specific range of users should be satisfied. Studying how people actually use the language in the wild would enable empirical feedback for improving the design of the EMF metamodeling language.<\/jats:p><\/jats:sec><jats:sec><jats:title>Objective<\/jats:title><jats:p>Our goal is to study the language usage of EMF metamodels in public engineered projects on GitHub. We aim to reveal information about the usage of specific language constructs, whether they match the language design. Based on our findings, we plan to suggest improvements in the EMF metamodelling language.<\/jats:p><\/jats:sec><jats:sec><jats:title>Method<\/jats:title><jats:p>We adopt a sample study research strategy and collect data from the EMF metamodels on GitHub. After a series of preprocessing steps including filtering out non-engineered projects and deduplication, we employ an analytics workflow on top of a graph database to formulate generalizing statements about the artifacts under study. Based on the results, we also give actionable suggestions for the EMF metamodeling language design.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We have conducted various analyses on metaclass, attribute, feature\/relationship usage as well as specific parts of the language: annotations and generics. Our findings reveal that the most used metaclasses are not the main building blocks of the language, but rather auxiliary ones. Some of the metaclasses, metaclass features and relations are almost never used. There are a few attributes which are almost exclusively used with a single value or illegal values. Some of the language features such as special forms of generics are very rarely used. Based on our findings, we provide suggestions to improve the EMF language, e.g.\u00a0removing a language element, restricting its values or refining the metaclass hierarchy.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>In this paper, we present an extensive empirical study into the language usage of EMF metamodels on GitHub. We believe this study fills a gap in the literature of model analytics and will hopefully help future improvement of the EMF metamodeling language.<\/jats:p><\/jats:sec>","DOI":"10.1007\/s10664-023-10368-x","type":"journal-article","created":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T12:02:14Z","timestamp":1702468934000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Language usage analysis for EMF metamodels on GitHub"],"prefix":"10.1007","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1460-2825","authenticated-orcid":false,"given":"\u00d6nder","family":"Babur","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eleni","family":"Constantinou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexander","family":"Serebrenik","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,12,13]]},"reference":[{"key":"10368_CR1","doi-asserted-by":"crossref","unstructured":"Allamanis M (2019) The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, p 143\u2013153","DOI":"10.1145\/3359591.3359735"},{"key":"10368_CR2","doi-asserted-by":"crossref","unstructured":"Andova S, van\u00a0den Brand MGJ, Engelen LJP, Verhoeff T (2012) MDE basics with a DSL focus. In International School on Formal Methods for the Design of Computer, Communication and Software Systems, p 21\u201357. Springer","DOI":"10.1007\/978-3-642-30982-3_2"},{"key":"10368_CR3","unstructured":"Babur \u00d6 (2019) Model analytics and management. PhD thesis, Technische Universiteit Eindhoven. Proefschrift"},{"key":"10368_CR4","doi-asserted-by":"crossref","unstructured":"Babur \u00d6, Cleophas L (2017) Using n-grams for the automated clustering of structural models. In International Conference on Current Trends in Theory and Practice of Informatics, p 510\u2013524. Springer","DOI":"10.1007\/978-3-319-51963-0_40"},{"key":"10368_CR5","doi-asserted-by":"crossref","unstructured":"Babur \u00d6, Cleophas L, van\u00a0den Brand M (2016) Hierarchical clustering of metamodels for comparative analysis and visualization. In European Conference on Modelling Foundations and Applications, p 3\u201318. Springer","DOI":"10.1007\/978-3-319-42061-5_1"},{"key":"10368_CR6","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/j.cola.2018.12.002","volume":"51","author":"\u00d6 Babur","year":"2019","unstructured":"Babur \u00d6, Cleophas L, van den Brand M (2019) Metamodel clone detection with SAMOS. Journal of Computer Languages 51:57\u201374","journal-title":"Journal of Computer Languages"},{"key":"10368_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/j.scico.2022.102877","volume":"223","author":"\u00d6 Babur","year":"2022","unstructured":"Babur \u00d6, Cleophas L, van den Brand M (2022) SAMOS - a framework for model analytics and management. Sci Comput Program 223:102877","journal-title":"Sci Comput Program"},{"key":"10368_CR8","doi-asserted-by":"crossref","unstructured":"Babur \u00d6, Cleophas L, van\u00a0den Brand M, Tekinerdogan B, Aksit M (2017) Models, more models, and then a lot more. In Federation of International Conferences on Software Technologies: Applications and Foundations, p 129\u2013135. Springer","DOI":"10.1007\/978-3-319-74730-9_10"},{"key":"10368_CR9","unstructured":"Baltes S, Ralph P (2020) Sampling in software engineering research: A critical review and guidelines. arXiv preprint. arXiv:2002.07764"},{"key":"10368_CR10","doi-asserted-by":"crossref","unstructured":"Basciani F,\u00a0Rocco JD,\u00a0Ruscio DD, Iovino L, Pierantonio A (2016) Automated clustering of metamodel repositories. In Advanced Information Systems Engineering: 28th International Conference, CAiSE 2016, Ljubljana, Slovenia, June 13-17, 2016. Proceedings vol 28. Springer, pp 342\u2013358","DOI":"10.1007\/978-3-319-39696-5_21"},{"key":"10368_CR11","doi-asserted-by":"crossref","unstructured":"Biber D, Douglas B, Conrad S, Reppen R (1998) Corpus linguistics: Investigating language structure and use. Cambridge University Press","DOI":"10.1017\/CBO9780511804489"},{"key":"10368_CR12","doi-asserted-by":"crossref","unstructured":"Brambilla M, Cabot J, Wimmer M (2017) Model-driven software engineering in practice, second edition. Synthesis Lectures on Software Engineering 3(1):1\u2013207","DOI":"10.2200\/S00751ED2V01Y201701SWE004"},{"key":"10368_CR13","doi-asserted-by":"crossref","unstructured":"Broy M, Kirstan S, Krcmar H, Sch\u00e4tz B (2012) What is the benefit of a model-based design of embedded software systems in the car industry? In Emerging Technologies for the Evolution and Maintenance of Software Models, p 343\u2013369. IGI Global","DOI":"10.4018\/978-1-61350-438-3.ch013"},{"key":"10368_CR14","first-page":"42","volume":"41","author":"JJ Cadavid","year":"2015","unstructured":"Cadavid JJ, Combemale B, Baudry B (2015) An analysis of metamodeling practices for MOF and OCL. Comput Lang Syst Struct 41:42\u201365","journal-title":"Comput Lang Syst Struct"},{"key":"10368_CR15","doi-asserted-by":"crossref","unstructured":"Clark T, Van\u00a0den Brand M, Combemale B, Rumpe B (2015) Conceptual model of the globalization for domain-specific languages. In Globalizing Domain-Specific Languages, p 7\u201320. Springer","DOI":"10.1007\/978-3-319-26172-0_2"},{"key":"10368_CR16","doi-asserted-by":"crossref","unstructured":"Combemale B, France R, J\u00e9z\u00e9quel J-M, Rumpe B, Steel J, Vojtisek D (2016) Engineering modeling languages: Turning domain knowledge into tools. CRC Press","DOI":"10.1201\/b21841"},{"issue":"10","key":"10368_CR17","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1109\/TSE.2007.1019","volume":"33","author":"G Concas","year":"2007","unstructured":"Concas G, Marchesi M, Pinna S, Serra N (2007) Power-laws in a large object-oriented software system. IEEE Trans Softw Eng 33(10):687\u2013708","journal-title":"IEEE Trans Softw Eng"},{"key":"10368_CR18","doi-asserted-by":"publisher","first-page":"7173","DOI":"10.1109\/ACCESS.2017.2682323","volume":"5","author":"V Cosentino","year":"2017","unstructured":"Cosentino V, Izquierdo JLC, Cabot J (2017) A systematic mapping study of software development with GitHub. IEEE Access 5:7173\u20137192","journal-title":"IEEE Access"},{"key":"10368_CR19","doi-asserted-by":"crossref","unstructured":"Cosentino V, Izquierdo JLC, Cabot J (2016) Findings from GitHub: methods, datasets and limitations. In 2016 IEEE\/ACM 13th Working Conference on Mining Software Repositories (MSR), p 137\u2013141. IEEE","DOI":"10.1145\/2901739.2901776"},{"key":"10368_CR20","doi-asserted-by":"crossref","unstructured":"de\u00a0F.\u00a0Farias MA, Novais R, J\u00fanior MC, da\u00a0Silva\u00a0Carvalho LP, Mendon\u00e7a M, Sp\u00ednola RO (2016) A systematic mapping study on mining software repositories. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, p 1472\u20131479","DOI":"10.1145\/2851613.2851786"},{"key":"10368_CR21","doi-asserted-by":"crossref","unstructured":"de\u00a0Mello RM, Stolee KT, Travassos GH (2015) Investigating samples representativeness for an online experiment in java code search. In 2015 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), p 1\u201310","DOI":"10.1109\/ESEM.2015.7321205"},{"key":"10368_CR22","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1007\/s10270-019-00748-7","volume":"19","author":"J Di Rocco","year":"2020","unstructured":"Di Rocco J, Di Ruscio D, H\u00e4rtel J, Iovino L, L\u00e4mmel R, Pierantonio A (2020) Understanding mde projects: megamodels to the rescue for architecture recovery. Softw Syst Model 19:401\u2013423","journal-title":"Softw Syst Model"},{"key":"10368_CR23","doi-asserted-by":"crossref","unstructured":"Di\u00a0Rocco J, Di\u00a0Ruscio D, Iovino L, Pierantonio A (2014) Mining metrics for understanding metamodel characteristics. In Proceedings of the 6th International Workshop on Modeling in Software Engineering, p 55\u201360","DOI":"10.1145\/2593770.2593774"},{"key":"10368_CR24","doi-asserted-by":"crossref","unstructured":"Erdweg S, Van Der\u00a0Storm T, V\u00f6lter M, Boersma M, Bosman R, Cook WR, Gerritsen A, Hulshout A, Kelly S, Loh A et\u00a0al (2013) The state of the art in language workbenches. In International Conference on Software Language Engineering, p 197\u2013217. Springer","DOI":"10.1007\/978-3-319-02654-1_11"},{"key":"10368_CR25","doi-asserted-by":"crossref","unstructured":"Favre J-M, Gasevic D, L\u00e4mmel R, Pek E (2010) Empirical language analysis in software linguistics. In International Conference on Software Language Engineering, p 316\u2013326. Springer","DOI":"10.1007\/978-3-642-19440-5_21"},{"key":"10368_CR26","unstructured":"Gabriel P,\u00a0Goul\u00c3?\u00c2\u00a3o M, Amaral V (2010) Do software languages engineers evaluate their languages? In\u00a0Franch JPCX,\u00a0Gimenes I (eds) XIII Congreso Iberoamericano en, p 149\u2013162. CIbSE2010, 04"},{"issue":"3","key":"10368_CR27","doi-asserted-by":"publisher","first-page":"1538","DOI":"10.1007\/s10664-018-9648-z","volume":"24","author":"M Gharehyazie","year":"2019","unstructured":"Gharehyazie M, Ray B, Keshani M, Zavosht MS, Heydarnoori A, Filkov V (2019) Cross-project code clones in GitHub. Empir Softw Eng 24(3):1538\u20131573","journal-title":"Empir Softw Eng"},{"key":"10368_CR28","doi-asserted-by":"crossref","unstructured":"Gousios G, Spinellis D (2012) GHTorrent: GitHub\u2019s data from a firehose. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), p 12\u201321. IEEE","DOI":"10.1109\/MSR.2012.6224294"},{"key":"10368_CR29","doi-asserted-by":"crossref","unstructured":"Grechanik M, McMillan C, DeFerrari L, Comi M, Crespi S, Poshyvanyk D, Fu C, Xie Q, Ghezzi C (2010) An empirical investigation into a large-scale java open source code repository. In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, p 1\u201310","DOI":"10.1145\/1852786.1852801"},{"key":"10368_CR30","doi-asserted-by":"crossref","unstructured":"H\u00e4rtel J, Heinz M, L\u00e4mmel R (2018) Emf patterns of usage on github. In European Conference on Modelling Foundations and Applications, p 216\u2013234. Springer","DOI":"10.1007\/978-3-319-92997-2_14"},{"key":"10368_CR31","doi-asserted-by":"crossref","unstructured":"Hebig R, Quang TH, Chaudron MRV, Robles G, Fernandez MA (2016) The quest for open source projects that use UML: mining GitHub. In Proceedings of the ACM\/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, p 173\u2013183","DOI":"10.1145\/2976767.2976778"},{"issue":"2","key":"10368_CR32","doi-asserted-by":"publisher","first-page":"8","DOI":"10.5381\/jot.2020.19.2.a8","volume":"19","author":"M Heinz","year":"2020","unstructured":"Heinz M, H\u00e4rtel J, L\u00e4mmel R (2020) Reproducible construction of interconnected technology models for emf code generation. J Object Technol 19(2):8\u20131","journal-title":"J Object Technol"},{"key":"10368_CR33","first-page":"193","volume-title":"Enterprise, Business-Process and Information Systems Modeling","author":"TS Heinze","year":"2020","unstructured":"Heinze TS, Stefanko V, Amme W (2020) Mining BPMN processes on GitHub for tool validation and development. In: Nurcan S, Reinhartz-Berger I, Soffer P, Zdravkovic J (eds) Enterprise, Business-Process and Information Systems Modeling. Springer International Publishing, Cham, pp 193\u2013208"},{"key":"10368_CR34","doi-asserted-by":"crossref","unstructured":"Herrmannsdoerfer M, Ratiu D, Koegel M (2010) Metamodel usage analysis for identifying metamodel improvements. In International Conference on Software Language Engineering, p 62\u201381. Springer","DOI":"10.1007\/978-3-642-19440-5_5"},{"key":"10368_CR35","doi-asserted-by":"crossref","unstructured":"Ho-Quang T, Hebig R, Robles G, Chaudron MRV, Fernandez MA (2017) Practices and perceptions of UML use in open source projects. In 2017 IEEE\/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), p 203\u2013212. IEEE","DOI":"10.1109\/ICSE-SEIP.2017.28"},{"key":"10368_CR36","doi-asserted-by":"crossref","unstructured":"Hutchinson J, Whittle J, Rouncefield M, Kristoffersen S (2011) Empirical assessment of mde in industry. In Proceedings of the 33rd international conference on software engineering, p 471\u2013480","DOI":"10.1145\/1985793.1985858"},{"key":"10368_CR37","unstructured":"Information technology - Meta Object Facility (MOF) (2005) Standard, International Organization for Standardization"},{"key":"10368_CR38","doi-asserted-by":"crossref","unstructured":"Izquierdo JLC, Cosentino V, Cabot J (2017) An empirical study on the maturity of the eclipse modeling ecosystem. In 2017 ACM\/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS), p 292\u2013302. IEEE","DOI":"10.1109\/MODELS.2017.19"},{"key":"10368_CR39","doi-asserted-by":"publisher","first-page":"19923","DOI":"10.1109\/ACCESS.2018.2821111","volume":"6","author":"S J\u00e1come","year":"2018","unstructured":"J\u00e1come S, De Lara J (2018) Controlling meta-model extensibility in model-driven engineering. IEEE Access 6:19923\u201319939","journal-title":"IEEE Access"},{"key":"10368_CR40","doi-asserted-by":"crossref","unstructured":"Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining GitHub. In Proceedings of the 11th working conference on mining software repositories, p 92\u2013101","DOI":"10.1145\/2597073.2597074"},{"issue":"5","key":"10368_CR41","doi-asserted-by":"publisher","first-page":"2035","DOI":"10.1007\/s10664-015-9393-5","volume":"21","author":"E Kalliamvakou","year":"2016","unstructured":"Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2016) An in-depth study of the promises and perils of mining GitHub. Empir Softw Eng 21(5):2035\u20132071","journal-title":"Empir Softw Eng"},{"key":"10368_CR42","unstructured":"K\u00f6gel S, Tichy M (2018) A dataset of EMF models from eclipse projects"},{"key":"10368_CR43","unstructured":"Kolovos DS, Matragkas ND, Korkontzelos I, Ananiadou S, Paige RF (2015) Assessing the use of eclipse MDE technologies in open-source software projects. In OSS4MDE@ MoDELS, p 20\u201329"},{"key":"10368_CR44","doi-asserted-by":"crossref","unstructured":"Kolovos DS, Rose LM, Matragkas N, Paige RF, Guerra E, Cuadrado JS,\u00a0Lara JD, R\u00e1th I, Varr\u00f3 D, Tisi M et\u00a0al (2013) A research roadmap towards achieving scalability in model driven engineering. In Proceedings of the Workshop on Scalability in Model Driven Engineering, p 1\u201310","DOI":"10.1145\/2487766.2487768"},{"key":"10368_CR45","doi-asserted-by":"publisher","first-page":"310","DOI":"10.1007\/s10664-012-9204-1","volume":"18","author":"R L\u00e4mmel","year":"2013","unstructured":"L\u00e4mmel R, Pek E (2013) Understanding privacy policies: A study in empirical analysis of language usage. Empir Softw Eng 18:310\u2013374","journal-title":"Empir Softw Eng"},{"key":"10368_CR46","doi-asserted-by":"crossref","unstructured":"Lopes CV, Maj P, Martins P, Saini V,\u00a0Yang D, Zitny J, Sajnani H, Vitek J (2017) D\u00e9j\u00e0vu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages 1(OOPSLA):1\u201328","DOI":"10.1145\/3133908"},{"key":"10368_CR47","doi-asserted-by":"crossref","unstructured":"L\u00f3pez JAH,\u00a0Izquierdo JLC, Cuadrado JS (2021) Modelset: a dataset for machine learning in model-driven engineering. Softw Syst Model, p 1\u201320","DOI":"10.1007\/s10270-021-00929-3"},{"key":"10368_CR48","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to information retrieval 1","author":"CD Manning","year":"2008","unstructured":"Manning CD, Raghavan P, Sch\u00fctze H et al (2008) Introduction to information retrieval 1. Cambridge University Press"},{"issue":"4","key":"10368_CR49","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1007\/s10664-006-9033-1","volume":"12","author":"H Melton","year":"2007","unstructured":"Melton H, Tempero E (2007) An empirical study of cycles among classes in java. Empir Softw Eng 12(4):389\u2013415","journal-title":"Empir Softw Eng"},{"key":"10368_CR50","unstructured":"Mengerink J, Noten J, Schiffelers R, van\u00a0den Brand M, Serebrenik A (2017) A case of industrial vs. open-source ocl: not so different after all. In ACM\/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS 2017), p 472\u2013474. CEUR-WS. org"},{"issue":"3","key":"10368_CR51","doi-asserted-by":"publisher","first-page":"1574","DOI":"10.1007\/s10664-018-9641-6","volume":"24","author":"JGM Mengerink","year":"2019","unstructured":"Mengerink JGM, Noten J, Serebrenik A (2019) Empowering ocl research: a large-scale corpus of open-source data from github. Empir Softw Eng 24(3):1574\u20131609","journal-title":"Empir Softw Eng"},{"key":"10368_CR52","doi-asserted-by":"crossref","unstructured":"Mengerink JGM, Serebrenik A, Schiffelers RRH, van\u00a0den Brand MGJ (2017) Automated analyses of model-driven artifacts: obtaining insights into industrial application of mde. In Proceedings of the 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement, p 116\u2013121","DOI":"10.1145\/3143434.3143442"},{"key":"10368_CR53","doi-asserted-by":"crossref","unstructured":"Mohagheghi P, Dehlen V (2008) Where is the proof?-a review of experiences from applying mde in industry. In Model Driven Architecture\u2013Foundations and Applications: 4th European Conference, ECMDA-FA 2008, Berlin, Germany, June 9-13, 2008. Proceedings vol 4, pp 432\u2013443. Springer","DOI":"10.1007\/978-3-540-69100-6_31"},{"key":"10368_CR54","doi-asserted-by":"publisher","DOI":"10.1016\/j.cola.2020.100972","volume":"59","author":"MA Mohamed","year":"2020","unstructured":"Mohamed MA, Challenger M, Kardas G (2020) Applications of model-driven engineering in cyber-physical systems: a systematic mapping study. Journal of Computer Languages 59:100972","journal-title":"Journal of Computer Languages"},{"issue":"6","key":"10368_CR55","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1145\/153571.255960","volume":"36","author":"MJ Muller","year":"1993","unstructured":"Muller MJ, Kuhn S (1993) Participatory design. Commun ACM 36(6):24\u201328","journal-title":"Commun ACM"},{"issue":"6","key":"10368_CR56","doi-asserted-by":"publisher","first-page":"3219","DOI":"10.1007\/s10664-017-9512-6","volume":"22","author":"N Munaiah","year":"2017","unstructured":"Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating GitHub for engineered software projects. Empir Softw Eng 22(6):3219\u20133253","journal-title":"Empir Softw Eng"},{"key":"10368_CR57","doi-asserted-by":"crossref","unstructured":"Nagappan M, Zimmermann T, Bird C (2013) Diversity in software engineering research. In: Meyer B, Baresi L, Mezini M (eds) Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC\/FSE\u201913, Saint Petersburg, Russian Federation, August 18\u201326, 2013. ACM, pp 466\u2013476","DOI":"10.1145\/2491411.2491415"},{"key":"10368_CR58","doi-asserted-by":"crossref","unstructured":"Noten J, Mengerink JGM, Serebrenik A (2017) A data set of OCL expressions on GitHub. In 2017 IEEE\/ACM 14th International Conference on Mining Software Repositories (MSR), p 531\u2013534. IEEE","DOI":"10.1109\/MSR.2017.52"},{"key":"10368_CR59","doi-asserted-by":"crossref","unstructured":"Pag\u00e1n JE, Cuadrado JS, Molina JG (2011) Morsa: A scalable approach for persisting and accessing large models. In International Conference on Model Driven Engineering Languages and Systems, p 77\u201392. Springer","DOI":"10.1007\/978-3-642-24485-8_7"},{"issue":"10","key":"10368_CR60","doi-asserted-by":"publisher","first-page":"665","DOI":"10.1016\/S0950-5849(00)00109-9","volume":"42","author":"RF Paige","year":"2000","unstructured":"Paige RF, Ostroff JS, Brooke PJ (2000) Principles for modeling language design. Inf Softw Technol 42(10):665\u2013675","journal-title":"Inf Softw Technol"},{"key":"10368_CR61","doi-asserted-by":"crossref","unstructured":"Pickerill P, Jungen HJ, Ochodek M, Staron M (2020) PHANTOM: Curating GitHub for engineered software projects using time-series clustering. Empir Software Eng","DOI":"10.1007\/s10664-020-09825-8"},{"key":"10368_CR62","doi-asserted-by":"crossref","unstructured":"Pietri A, Spinellis D, Zacchiroli S (2019) The software heritage graph dataset: public software development under one roof. In 2019 IEEE\/ACM 16th International Conference on Mining Software Repositories (MSR), p 138\u2013142. IEEE","DOI":"10.1109\/MSR.2019.00030"},{"key":"10368_CR63","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1016\/j.jss.2016.10.017","volume":"123","author":"D Qiu","year":"2017","unstructured":"Qiu D, Li B, Barr ET, Su Z (2017) Understanding the syntactic rule usage in java. J Syst Softw 123:160\u2013172","journal-title":"J Syst Softw"},{"key":"10368_CR64","doi-asserted-by":"crossref","unstructured":"Ray B, Posnett D, Filkov V, Devanbu P (2014) A large scale study of programming languages and code quality in GitHub. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, p 155\u2013165","DOI":"10.1145\/2635868.2635922"},{"issue":"1","key":"10368_CR65","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1016\/j.inffus.2004.04.008","volume":"6","author":"D Ruta","year":"2005","unstructured":"Ruta D, Gabrys B (2005) Classifier selection for majority voting. Information fusion 6(1):63\u201381","journal-title":"Information fusion"},{"key":"10368_CR66","doi-asserted-by":"crossref","unstructured":"Spinellis D, Kotti Z, Mockus A (2020) A dataset for GitHub repository deduplication. arXiv preprint. arXiv:2002.02314","DOI":"10.1145\/3379597.3387496"},{"key":"10368_CR67","unstructured":"Steinberg D, Budinsky F, Paternostro M,\u00a0Merks E (2008) EMF: Eclipse Modeling Framework Second Edition. Pearson Education"},{"issue":"3","key":"10368_CR68","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3241743","volume":"27","author":"K-J Stol","year":"2018","unstructured":"Stol K-J, Fitzgerald B (2018) The abc of software engineering research. ACM Trans Softw Eng Methodol (TOSEM) 27(3):1\u201351","journal-title":"ACM Trans Softw Eng Methodol (TOSEM)"},{"issue":"2","key":"10368_CR69","doi-asserted-by":"publisher","first-page":"889","DOI":"10.1007\/s10270-013-0352-6","volume":"14","author":"R Tairas","year":"2015","unstructured":"Tairas R, Cabot J (2015) Corpus-based analysis of domain-specific languages. Softw Syst Model 14(2):889\u2013904","journal-title":"Softw Syst Model"},{"key":"10368_CR70","doi-asserted-by":"crossref","unstructured":"Tekinerdogan B, Babur \u00d6, Cleophas L, van\u00a0den Brand M, Ak\u015fit M (2019) Introduction to model management and analytics. In Model Management and Analytics for Large Scale Systems, p 3\u201311. Academic Press","DOI":"10.1016\/B978-0-12-816649-9.00009-0"},{"key":"10368_CR71","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-43839-8","volume-title":"Design science methodology for information systems and software engineering","author":"RJ Wieringa","year":"2014","unstructured":"Wieringa RJ (2014) Design science methodology for information systems and software engineering. Springer"},{"key":"10368_CR72","unstructured":"Williams JR, Zolotas A, Matragkas ND, Rose LM, Kolovos DS, Paige RF, Polack FAC (2013) What do metamodels really look like? Eessmod@ Models 1078:55\u201360"},{"key":"10368_CR73","doi-asserted-by":"crossref","unstructured":"Wohlin C, Runeson P, H\u00f6st M, Ohlsson MC, Regnell B, Wessl\u00e9n A (2012) Experimentation in software engineering. Springer Science & Business Media","DOI":"10.1007\/978-3-642-29044-2"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-023-10368-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-023-10368-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-023-10368-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,5]],"date-time":"2024-11-05T21:47:30Z","timestamp":1730843250000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-023-10368-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,13]]},"references-count":73,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1]]}},"alternative-id":["10368"],"URL":"https:\/\/doi.org\/10.1007\/s10664-023-10368-x","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,13]]},"assertion":[{"value":"13 July 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflicts of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}}],"article-number":"23"}}