{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T19:20:10Z","timestamp":1778613610643,"version":"3.51.4"},"reference-count":76,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2022,9,20]],"date-time":"2022-09-20T00:00:00Z","timestamp":1663632000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,9,20]],"date-time":"2022-09-20T00:00:00Z","timestamp":1663632000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["825480"],"award-info":[{"award-number":["825480"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["82504"],"award-info":[{"award-number":["82504"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["PZ00P2 186090"],"award-info":[{"award-number":["PZ00P2 186090"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100021856","name":"Ministero dell\u2019Universit\u00e0 e della Ricerca","doi-asserted-by":"publisher","award":["2020W3A5FY"],"award-info":[{"award-number":["2020W3A5FY"]}],"id":[{"id":"10.13039\/501100021856","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Linguistic anti-patterns are recurring poor practices concerning inconsistencies in the naming, documentation, and implementation of an entity. They impede the readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in Infrastructure-as-Code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic\/body of IaC code units and their short text names. To this end, we propose<jats:sc>FindICI<\/jats:sc>a novel automated approach that employs word embedding and classification algorithms. We build and use the abstract syntax tree of IaC code units to create code embeddings used by machine learning techniques to detect inconsistent IaC code units. We evaluated our approach with two experiments on Ansible tasks systematically extracted from open source repositories for various word embedding models and classification algorithms. Classical machine learning models and novel deep learning models with different word embedding methods showed comparable and satisfactory results in detecting inconsistent Ansible tasks related to the top-10 used Ansible modules.<\/jats:p>","DOI":"10.1007\/s10664-022-10215-5","type":"journal-article","created":{"date-parts":[[2022,9,20]],"date-time":"2022-09-20T09:04:03Z","timestamp":1663664643000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code"],"prefix":"10.1007","volume":"27","author":[{"given":"Nemania","family":"Borovits","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Indika","family":"Kumara","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3861-1902","authenticated-orcid":false,"given":"Dario","family":"Di Nucci","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Parvathy","family":"Krishnan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefano Dalla","family":"Palma","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fabio","family":"Palomba","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Damian A.","family":"Tamburri","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Willem-Jan van den","family":"Heuvel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,9,20]]},"reference":[{"key":"10215_CR1","doi-asserted-by":"publisher","unstructured":"Aghajani E, Nagy C, Bavota G, Lanza M (2018) A large-scale empirical study on linguistic antipatterns affecting apis. In: 2018 IEEE International conference on software maintenance and evolution (ICSME), pp 25\u201335. https:\/\/doi.org\/10.1109\/ICSME.2018.00012","DOI":"10.1109\/ICSME.2018.00012"},{"key":"10215_CR2","doi-asserted-by":"crossref","unstructured":"Alon U, Zilberstein M, Levy O, Yahav E (2019) Code2vec: Learning distributed representations of code. Proc. ACM Program Lang, p 3","DOI":"10.1145\/3290353"},{"issue":"1","key":"10215_CR3","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1007\/s10664-014-9350-8","volume":"21","author":"V Arnaoudova","year":"2016","unstructured":"Arnaoudova V, Di Penta M, Antoniol G (2016) Linguistic antipatterns: What they are and how developers perceive them. Empir Softw Eng 21 (1):104\u2013158","journal-title":"Empir Softw Eng"},{"key":"10215_CR4","doi-asserted-by":"crossref","unstructured":"Arnaoudova V, Di Penta M, Antoniol G, Gu\u00e9h\u00e9neuc YG (2013) A new family of software anti-patterns: Linguistic anti-patterns. In: 2013 17Th european conference on software maintenance and reengineering, pp 187\u2013196. IEEE","DOI":"10.1109\/CSMR.2013.28"},{"issue":"1","key":"10215_CR5","first-page":"152","volume":"17","author":"A Benavoli","year":"2016","unstructured":"Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(1):152\u2013161","journal-title":"J Mach Learn Res"},{"key":"10215_CR6","doi-asserted-by":"crossref","unstructured":"Borovits N, Kumara I, Krishnan P, Palma SD, Di Nucci D, Palomba F, Tamburri DA, van den Heuvel WJ (2020) Deepiac: Deep learning-based linguistic anti-pattern detection in iac. In: Proceedings of the 4th ACM SIGSOFT International workshop on machine-learning techniques for software-quality evaluation, MaLTeSQuE 2020, pp 7\u201312. Association for computing machinery","DOI":"10.1145\/3416505.3423564"},{"key":"10215_CR7","doi-asserted-by":"publisher","unstructured":"Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, KDD \u201916, pp 785\u2013794. Association for computing machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/2939672.2939785","DOI":"10.1145\/2939672.2939785"},{"key":"10215_CR8","doi-asserted-by":"publisher","unstructured":"Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: Proceedings of the 2016 Conference on empirical methods in natural language processing, pp 551\u2013561. Association for computational linguistics, Austin, Texas. https:\/\/doi.org\/10.18653\/v1\/D16-1053. https:\/\/www.aclweb.org\/anthology\/D16-1053","DOI":"10.18653\/v1\/D16-1053"},{"issue":"2","key":"10215_CR9","doi-asserted-by":"publisher","first-page":"751","DOI":"10.1007\/s11219-016-9347-1","volume":"26","author":"A Corazza","year":"2018","unstructured":"Corazza A, Maggio V, Scanniello G (2018) Coherence of comments and method implementations: a dataset and an empirical investigation. Software Qual J 26(2):751\u2013777. https:\/\/doi.org\/10.1007\/s11219-016-9347-1","journal-title":"Software Qual J"},{"issue":"3","key":"10215_CR10","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273\u2013297","journal-title":"Mach Learn"},{"key":"10215_CR11","doi-asserted-by":"crossref","unstructured":"Dai T, Karve A, Koper G, Zeng S (2020) Automatically detecting risky scripts in infrastructure code. In: Proceedings of the 11th ACM Symposium on Cloud Computing, SoCC \u201920, pp 358\u2013371. Association for computing machinery","DOI":"10.1145\/3419111.3421303"},{"key":"10215_CR12","doi-asserted-by":"publisher","first-page":"110726","DOI":"10.1016\/j.jss.2020.110726","volume":"170","author":"S Dalla Palma","year":"2020","unstructured":"Dalla Palma S, Di Nucci D, Palomba F, Tamburri DA (2020a) Toward a catalog of software quality metrics for infrastructure code. J Syst Softw 170:110726","journal-title":"J Syst Softw"},{"key":"10215_CR13","unstructured":"Dalla Palma S, Di Nucci D, Palomba F, Tamburri DA (2021) Within-project defect prediction of infrastructure-as-code using product and process metrics. IEEE Trans Softw Eng, pp 1\u20131"},{"key":"10215_CR14","doi-asserted-by":"publisher","first-page":"100633","DOI":"10.1016\/j.softx.2020.100633","volume":"12","author":"S Dalla Palma","year":"2020","unstructured":"Dalla Palma S, Di Nucci D, Tamburri DA (2020b) Ansiblemetrics: A python library for measuring infrastructure-as-code blueprints in ansible. SoftwareX 12:100633","journal-title":"SoftwareX"},{"key":"10215_CR15","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1\u201330","journal-title":"J Mach Learn Res"},{"key":"10215_CR16","doi-asserted-by":"publisher","unstructured":"Di Nitto E, Gorro\u00f1ogoitia J, Kumara I, Meditskos G, Radolovi\u0107 D, Sivalingam K, Gonz\u00e1lez RS (2020) An approach to support automated deployment of applications on heterogeneous cloud-hpc infrastructures. In: 2020 22Nd international symposium on symbolic and numeric algorithms for scientific computing (SYNASC), pp 133\u2013140. https:\/\/doi.org\/10.1109\/SYNASC51798.2020.00031","DOI":"10.1109\/SYNASC51798.2020.00031"},{"key":"10215_CR17","doi-asserted-by":"publisher","unstructured":"Dudchenko A, Kopanitsa G (2019) Comparison of word embeddings for extraction from medical records. International Journal of Environmental Research and Public Health 16(22). https:\/\/doi.org\/10.3390\/ijerph16224360. https:\/\/www.mdpi.com\/1660-4601\/16\/22\/4360","DOI":"10.3390\/ijerph16224360"},{"key":"10215_CR18","doi-asserted-by":"crossref","unstructured":"Fakhoury S, Arnaoudova V, Noiseux C, Khomh F, Antoniol G (2018) Keep it simple: is deep learning good for linguistic smell detection?. In: 2018 IEEE 25Th international conference on software analysis, evolution and reengineering (SANER), pp 602\u2013611","DOI":"10.1109\/SANER.2018.8330265"},{"issue":"3","key":"10215_CR19","doi-asserted-by":"publisher","first-page":"2140","DOI":"10.1007\/s10664-019-09751-4","volume":"25","author":"S Fakhoury","year":"2020","unstructured":"Fakhoury S, Roy D, Ma Y, Arnaoudova V, Adesope O (2020) Measuring the impact of lexical and structural inconsistencies on developers\u2019 cognitive load during bug localization. Empir Softw Eng 25(3):2140\u20132178. https:\/\/doi.org\/10.1007\/s10664-019-09751-4","journal-title":"Empir Softw Eng"},{"key":"10215_CR20","unstructured":"Folwer M (1999) Refactoring: Improving the design of existing programs"},{"issue":"1","key":"10215_CR21","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1214\/aoms\/1177731944","volume":"11","author":"M Friedman","year":"1940","unstructured":"Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86\u201392","journal-title":"Ann Math Stat"},{"key":"10215_CR22","doi-asserted-by":"crossref","unstructured":"Fu W, Menzies T (2017) Easy over hard: a case study on deep learning. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 49\u201360","DOI":"10.1145\/3106237.3106256"},{"key":"10215_CR23","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/j.neucom.2013.11.045","volume":"147","author":"A Gisbrecht","year":"2015","unstructured":"Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-sne. Neurocomputing 147:71\u201382","journal-title":"Neurocomputing"},{"key":"10215_CR24","doi-asserted-by":"crossref","unstructured":"Guerriero M, Garriga M, Tamburri DA, Palomba F (2019) Adoption, support, and challenges of infrastructure-as-code: Insights from industry. In: 2019 IEEE International conference on software maintenance and evolution (ICSME), pp 580\u2013589. IEEE","DOI":"10.1109\/ICSME.2019.00092"},{"issue":"3","key":"10215_CR25","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1007\/s11219-016-9318-6","volume":"25","author":"L Guerrouj","year":"2017","unstructured":"Guerrouj L, Kermansaravi Z, Arnaoudova V, Fung BCM, Khomh F, Antoniol G, Gu\u00e9h\u00e9neuc YG (2017) Investigating the relation between lexical smells and change- and fault-proneness: an empirical study. Softw Qual J 25(3):641\u2013670. https:\/\/doi.org\/10.1007\/s11219-016-9318-6","journal-title":"Softw Qual J"},{"issue":"6","key":"10215_CR26","doi-asserted-by":"publisher","first-page":"1276","DOI":"10.1109\/TSE.2011.103","volume":"38","author":"T Hall","year":"2011","unstructured":"Hall T, Beecham S, Bowes D, Gray D, Counsell S (2011) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276\u20131304","journal-title":"IEEE Trans Softw Eng"},{"key":"10215_CR27","doi-asserted-by":"crossref","unstructured":"Hasan MM, Bhuiyan FA, Rahman A (2020) Testing practices for infrastructure as code. In: Proceedings of the 1st ACM SIGSOFT International workshop on languages and tools for next-generation testing, LANGETI 2020, pp 7\u201312. Association for computing machinery","DOI":"10.1145\/3416504.3424334"},{"key":"10215_CR28","unstructured":"Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, USA"},{"key":"10215_CR29","doi-asserted-by":"crossref","unstructured":"Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1, pp 278\u2013282. IEEE","DOI":"10.1109\/ICDAR.1995.598994"},{"key":"10215_CR30","unstructured":"Holm S (1979) A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pp 65\u201370"},{"key":"10215_CR31","doi-asserted-by":"crossref","unstructured":"Islam Shamim MS, Ahamed Bhuiyan F, Rahman A (2020) Xi commandments of kubernetes security: a systematization of knowledge related to kubernetes security practices. In: 2020 IEEE Secure development (secdev), pp 58\u201364","DOI":"10.1109\/SecDev45635.2020.00025"},{"issue":"4","key":"10215_CR32","doi-asserted-by":"publisher","first-page":"917","DOI":"10.1007\/s10618-019-00619-1","volume":"33","author":"H Ismail Fawaz","year":"2019","unstructured":"Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller PA (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917\u2013963","journal-title":"Data Min Knowl Disc"},{"key":"10215_CR33","doi-asserted-by":"crossref","unstructured":"James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol. 112 Springer","DOI":"10.1007\/978-1-4614-7138-7"},{"key":"10215_CR34","doi-asserted-by":"crossref","unstructured":"Jiang Y, Adams B (2015) Co-evolution of infrastructure and source code-an empirical study. In: 2015 IEEE\/ACM 12Th working conference on mining software repositories, pp 45\u201355. IEEE","DOI":"10.1109\/MSR.2015.12"},{"key":"10215_CR35","doi-asserted-by":"crossref","unstructured":"Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification. arXiv:1607.01759","DOI":"10.18653\/v1\/E17-2068"},{"key":"10215_CR36","doi-asserted-by":"crossref","unstructured":"Kokuryo S, Kondo M, Mizuno O (2020) An empirical study of utilization of imperative modules in ansible. In: 2020 IEEE 20Th international conference on software quality, reliability and security (QRS), pp 442\u2013449","DOI":"10.1109\/QRS51102.2020.00063"},{"issue":"3","key":"10215_CR37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10723-021-09572-0","volume":"19","author":"I Kumara","year":"2021","unstructured":"Kumara I, Mundt P, Tokmakov K, Radolovi\u0107 D, Maslennikov A, Gonz\u00e1lez RS, Fabeiro JF, Quattrocchi G, Meth K, Di Nitto E et al (2021) Sodalite@rt: orchestrating applications on cloud-edge infrastructures. J Grid Comput 19(3):1\u201323","journal-title":"J Grid Comput"},{"key":"10215_CR38","doi-asserted-by":"publisher","unstructured":"Kumara I et al (2020) Towards semantic detection of smells in cloud infrastructure code. In: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics, WIMS 2020, pp 63\u201367. Association for computing machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/3405962.3405979","DOI":"10.1145\/3405962.3405979"},{"key":"10215_CR39","doi-asserted-by":"crossref","unstructured":"Lau JH, Baldwin T (2016) An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv:1607.05368","DOI":"10.18653\/v1\/W16-1609"},{"issue":"4","key":"10215_CR40","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11334-007-0031-2","volume":"3","author":"D Lawrie","year":"2007","unstructured":"Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innov Syst Softw Eng 3(4):303\u2013318","journal-title":"Innov Syst Softw Eng"},{"key":"10215_CR41","unstructured":"Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188\u20131196. PMLR"},{"key":"10215_CR42","doi-asserted-by":"crossref","unstructured":"Li G, Liu H, Jin J, Umer Q (2020) Deep learning based identification of suspicious return statements. In: 2020 IEEE 27Th international conference on software analysis, evolution and reengineering, pp 480\u2013491","DOI":"10.1109\/SANER48275.2020.9054826"},{"key":"10215_CR43","doi-asserted-by":"publisher","first-page":"106287","DOI":"10.1016\/j.infsof.2020.106287","volume":"122","author":"N Li","year":"2020","unstructured":"Li N, Shepperd M, Guo Y (2020) A systematic review of unsupervised learning techniques for software defect prediction. Inf Softw Technol 122:106287. https:\/\/doi.org\/10.1016\/j.infsof.2020.106287. https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0950584920300379","journal-title":"Inf Softw Technol"},{"key":"10215_CR44","doi-asserted-by":"crossref","unstructured":"Liu K et al (2019) Learning to spot and refactor inconsistent method names. In: 2019 IEEE\/ACM 41St international conference on software engineering (ICSE), pp 1\u201312","DOI":"10.1109\/ICSE.2019.00019"},{"issue":"5-6","key":"10215_CR45","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1016\/S0893-6080(03)00115-1","volume":"16","author":"M Matsugu","year":"2003","unstructured":"Matsugu M, Mori K, Mitari Y, Kaneda Y (2003) Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Netw 16(5-6):555\u2013559","journal-title":"Neural Netw"},{"key":"10215_CR46","unstructured":"Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781"},{"key":"10215_CR47","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111\u20133119"},{"key":"10215_CR48","doi-asserted-by":"crossref","unstructured":"Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning based text classification: A comprehensive review","DOI":"10.1145\/3439726"},{"key":"10215_CR49","unstructured":"Moore DS, Notz WI, Fligner MA (2015) The basic practice of statistics. Macmillan Higher Education"},{"key":"10215_CR50","unstructured":"Morris K (2016) Infrastructure as code: managing servers in the cloud. \u201d O\u2019Reilly Media Inc.\u201d"},{"key":"10215_CR51","doi-asserted-by":"publisher","unstructured":"Omri S, Sinz C (2020) Deep learning for software defect prediction: A survey. In: Proceedings of the IEEE\/ACM 42nd International conference on software engineering workshops, ICSEW\u201920, pp 209\u2013214. Association for computing machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/3387940.3391463","DOI":"10.1145\/3387940.3391463"},{"key":"10215_CR52","doi-asserted-by":"crossref","unstructured":"Opdebeeck R, Zerouali A, Vel\u00e1zquez-rodr\u00edguez C, Roover CD (2020) Does infrastructure as code adhere to semantic versioning? an analysis of ansible role evolution. In: 2020 IEEE 20Th international working conference on source code analysis and manipulation (SCAM), pp 238\u2013248","DOI":"10.1109\/SCAM51674.2020.00032"},{"issue":"02","key":"10215_CR53","doi-asserted-by":"publisher","first-page":"1742001","DOI":"10.1142\/S0218843017420011","volume":"26","author":"F Palma","year":"2017","unstructured":"Palma F, Gonzalez-Huerta J, Founi M, Moha N, Tremblay G, Gu\u00e9h\u00e9neuc YG (2017) Semantic analysis of restful apis for the detection of linguistic patterns and antipatterns. Int J Coop Inf Syst 26(02):1742001. https:\/\/doi.org\/10.1142\/S0218843017420011","journal-title":"Int J Coop Inf Syst"},{"key":"10215_CR54","doi-asserted-by":"publisher","unstructured":"Pennington J, Socher R, Manning C (2014) GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1532\u20131543. Association for computational linguistics, Doha, Qatar. https:\/\/doi.org\/10.3115\/v1\/D14-1162. https:\/\/www.aclweb.org\/anthology\/D14-1162","DOI":"10.3115\/v1\/D14-1162"},{"key":"10215_CR55","doi-asserted-by":"publisher","unstructured":"Pradel M, Sen K (2018) Deepbugs: A learning approach to name-based bug detection. Proc. ACM Program. Lang., p 2. https:\/\/doi.org\/10.1145\/3276517","DOI":"10.1145\/3276517"},{"issue":"5","key":"10215_CR56","doi-asserted-by":"publisher","first-page":"3430","DOI":"10.1007\/s10664-020-09841-8","volume":"25","author":"A Rahman","year":"2020","unstructured":"Rahman A, Farhana E, Williams L (2020) The \u2018as code\u2019 activities: Development anti-patterns for infrastructure as code. Empir Softw Eng 25(5):3430\u20133467","journal-title":"Empir Softw Eng"},{"key":"10215_CR57","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1016\/j.infsof.2018.12.004","volume":"108","author":"A Rahman","year":"2019","unstructured":"Rahman A, Mahdavi-Hezaveh R, Williams L (2019) A systematic mapping study of infrastructure as code research. Inf Softw Technol 108:65\u201377","journal-title":"Inf Softw Technol"},{"key":"10215_CR58","doi-asserted-by":"crossref","unstructured":"Rahman A, Parnin C, Williams L (2019) The seven sins: Security smells in infrastructure as code scripts. In: Proceedings of the 41st International conference on software engineering, pp 164\u2013175","DOI":"10.1109\/ICSE.2019.00033"},{"key":"10215_CR59","doi-asserted-by":"crossref","unstructured":"Rahman A, Rahman MR, Parnin C, Williams L (2021) Security smells in ansible and chef scripts: A replication study. ACM Transactions on Software Engineering and Methodology (TOSEM) 30(1)","DOI":"10.1145\/3408897"},{"key":"10215_CR60","doi-asserted-by":"crossref","unstructured":"Rahman A, Williams L (2018) Characterizing defective configuration scripts used for continuous deployment. In: 2018 IEEE 11Th international conference on software testing, verification and validation (ICST), pp 34\u201345. IEEE","DOI":"10.1109\/ICST.2018.00014"},{"key":"10215_CR61","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1016\/j.infsof.2019.04.013","volume":"112","author":"A Rahman","year":"2019","unstructured":"Rahman A, Williams L (2019) Source code properties of defective infrastructure as code scripts. Inf Softw Technol 112:148\u2013163","journal-title":"Inf Softw Technol"},{"key":"10215_CR62","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1016\/j.infsof.2019.04.013","volume":"112","author":"A Rahman","year":"2019","unstructured":"Rahman A, Williams L (2019) Source code properties of defective infrastructure as code scripts. Inf Softw Technol 112:148\u2013163","journal-title":"Inf Softw Technol"},{"key":"10215_CR63","unstructured":"Roberts K (2016) Assessing the corpus size vs. similarity trade-off for word embeddings in clinical nlp. In: Proceedings of the clinical natural language processing workshop (ClinicalNLP), pp 54\u201363"},{"key":"10215_CR64","doi-asserted-by":"publisher","first-page":"17734","DOI":"10.1109\/ACCESS.2020.2966597","volume":"8","author":"J Sandobal\u00edn","year":"2020","unstructured":"Sandobal\u00edn J, Insfran E, Abrah ao S (2020) On the effectiveness of tools to support infrastructure as code: Model-driven versus code-centric. IEEE Access 8:17734\u201317761","journal-title":"IEEE Access"},{"key":"10215_CR65","doi-asserted-by":"crossref","unstructured":"Schermann G, Zumberi S, Cito J (2018) Structured information on state and evolution of dockerfiles on github. In: Proceedings of the 15th International conference on mining software repositories, MSR \u201918, pp 26\u201329. ACM","DOI":"10.1145\/3196398.3196456"},{"key":"10215_CR66","doi-asserted-by":"crossref","unstructured":"Schwarz J, Steffens A, Lichter H (2018) Code smells in infrastructure as code. In: 2018 11Th international conference on the quality of information and communications technology (QUATIC), pp 220\u2013228. IEEE","DOI":"10.1109\/QUATIC.2018.00040"},{"key":"10215_CR67","doi-asserted-by":"crossref","unstructured":"Sharma T, Fragkoulis M, Spinellis D (2016) Does your configuration code smell?. In: 2016 IEEE\/ACM 13Th working conference on mining software repositories (MSR), pp 189\u2013200. IEEE","DOI":"10.1145\/2901739.2901761"},{"key":"10215_CR68","doi-asserted-by":"crossref","unstructured":"Sotiropoulos T, Mitropoulos D, Spinellis D (2020) Practical fault detection in puppet programs. In: Proceedings of the ACM\/IEEE 42nd International conference on software engineering, ICSE \u201920, pp 26\u201337. Association for computing machinery","DOI":"10.1145\/3377811.3380384"},{"key":"10215_CR69","doi-asserted-by":"crossref","unstructured":"Spadini D, Aniche M, Bacchelli A (2018) Pydriller: Python framework for mining software repositories. In: Proceedings of the 2018 26th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 908\u2013911","DOI":"10.1145\/3236024.3264598"},{"issue":"1","key":"10215_CR70","doi-asserted-by":"publisher","first-page":"996","DOI":"10.1007\/s10664-019-09775-w","volume":"25","author":"A Sulistya","year":"2020","unstructured":"Sulistya A, Prana GAA, Sharma A, Lo D, Treude C (2020) Sieve: Helping developers sift wheat from chaff via cross-platform analysis. Empir Softw Eng 25(1):996\u20131030. https:\/\/doi.org\/10.1007\/s10664-019-09775-w","journal-title":"Empir Softw Eng"},{"issue":"3","key":"10215_CR71","first-page":"143","volume":"4","author":"AA Takang","year":"1996","unstructured":"Takang AA, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Prog Lang 4(3):143\u2013167","journal-title":"J Prog Lang"},{"issue":"1","key":"10215_CR72","first-page":"3221","volume":"15","author":"L Van Der Maaten","year":"2014","unstructured":"Van Der Maaten L (2014) Accelerating t-sne using tree-based algorithms. J Mach Learn Res 15(1):3221\u20133245","journal-title":"J Mach Learn Res"},{"key":"10215_CR73","unstructured":"Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. Journal of machine learning research 9(11)"},{"key":"10215_CR74","doi-asserted-by":"crossref","unstructured":"Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International conference on software engineering, ICSE \u201916, pp 297-308. Association for computing machinery, New York, NY, USA","DOI":"10.1145\/2884781.2884804"},{"issue":"10","key":"10215_CR75","doi-asserted-by":"publisher","first-page":"e2","DOI":"10.23915\/distill.00002","volume":"1","author":"M Wattenberg","year":"2016","unstructured":"Wattenberg M, Vi\u00e9gas F, Johnson I (2016) How to use t-sne effectively. Distill 1(10):e2","journal-title":"Distill"},{"key":"10215_CR76","doi-asserted-by":"crossref","unstructured":"Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics, pp 196\u2013202. Springer","DOI":"10.1007\/978-1-4612-4380-9_16"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-022-10215-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-022-10215-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-022-10215-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,4]],"date-time":"2024-10-04T05:16:21Z","timestamp":1728018981000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-022-10215-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,20]]},"references-count":76,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["10215"],"URL":"https:\/\/doi.org\/10.1007\/s10664-022-10215-5","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,20]]},"assertion":[{"value":"15 July 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 September 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"178"}}