{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T22:26:19Z","timestamp":1748643979809,"version":"3.40.3"},"publisher-location":"Cham","reference-count":47,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031753862"},{"type":"electronic","value":"9783031753879"}],"license":[{"start":{"date-parts":[[2024,10,26]],"date-time":"2024-10-26T00:00:00Z","timestamp":1729900800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,26]],"date-time":"2024-10-26T00:00:00Z","timestamp":1729900800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>\u201cIn Silico\u201d research drives the world around us, as illustrated by the way our society handles climate change, controls the COVID-19 pandemic and governs economic growth. Unfortunately, the code embedded in the underlying data processing is mostly written by scientists lacking formal training in software engineering. The resulting code is vulnerable, suffering from what is known as threats to instrument validity.<\/jats:p><jats:p>This position paper aims to understand and remedy threats to instrument validity in current \u201cin silico\u201d research. To achieve this goal, we specify a research agenda listing how recent software engineering achievements may improve \u201cin silico\u201d research (SE4Silico) and, conversely, how software engineering may strengthen its applicability (Silico4SE).<\/jats:p>","DOI":"10.1007\/978-3-031-75387-9_6","type":"book-chapter","created":{"date-parts":[[2024,10,25]],"date-time":"2024-10-25T08:02:14Z","timestamp":1729843334000},"page":"82-96","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Threats to\u00a0Instrument Validity Within \u201cin Silico\u201d Research: Software Engineering to\u00a0the\u00a0Rescue"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4463-2945","authenticated-orcid":false,"given":"Serge","family":"Demeyer","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1710-1268","authenticated-orcid":false,"given":"Coen De","family":"Roover","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2714-8155","authenticated-orcid":false,"given":"Mutlu","family":"Beyazit","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7461-2320","authenticated-orcid":false,"given":"Johannes","family":"H\u00e4rtel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,10,26]]},"reference":[{"issue":"2","key":"6_CR1","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1023\/A:1005474526406","volume":"43","author":"S Ahmstorf","year":"1999","unstructured":"Ahmstorf, S., Ganopolski, A.: Long-term global warming scenarios computed with an efficient coupled climate model. Clim. Change 43(2), 353\u2013367 (1999)","journal-title":"Clim. Change"},{"issue":"1","key":"6_CR2","first-page":"1723","volume":"12","author":"M Sharma","year":"1999","unstructured":"Sharma, M., et al.: Understanding the effectiveness of government interventions against the resurgence of Covid-19 in Europe. Nat. Commun. 12(1), 1723\u20132041 (1999)","journal-title":"Nat. Commun."},{"key":"6_CR3","doi-asserted-by":"crossref","unstructured":"Kara, Y., Boyacioglu, M.A., Baykan, \u00d6.: Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul stock exchange. Expert Systems with Appl. 38(5), 5311\u20135319 (2011)","DOI":"10.1016\/j.eswa.2010.10.027"},{"key":"6_CR4","doi-asserted-by":"crossref","unstructured":"McElreath, R.: Statistical Rethinking: A Bayesian Course with Examples in R and STAN (2nd edition). Chapman and Hall\/CRC (2020)","DOI":"10.1201\/9780429029608"},{"issue":"5807","key":"6_CR5","doi-asserted-by":"publisher","first-page":"1856","DOI":"10.1126\/science.314.5807.1856","volume":"314","author":"G Miller","year":"2007","unstructured":"Miller, G.: A scientist\u2019s nightmare: Software problem leads to five retractions. Science 314(5807), 1856\u20131857 (2007)","journal-title":"Science"},{"key":"6_CR6","doi-asserted-by":"crossref","unstructured":"Herndon, T., Ash, M., Pollin, R.: Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge J. Econom. 38, 257\u2013279 (2013)","DOI":"10.1093\/cje\/bet075"},{"key":"6_CR7","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1038\/s41586-019-1043-4","volume":"568","author":"H Whitehouse","year":"2019","unstructured":"Whitehouse, H., et al.: RETRACTED ARTICLE: complex societies precede moralizing gods throughout world history. Nature 568, 226\u2013229 (2019)","journal-title":"Nature"},{"key":"6_CR8","doi-asserted-by":"publisher","first-page":"E29","DOI":"10.1038\/s41586-021-03655-4","volume":"595","author":"B Beheim","year":"2021","unstructured":"Beheim, B., et al.: Treatment of missing data determined conclusions regarding moralizing gods. Nature 595, E29\u2013E34 (2021)","journal-title":"Nature"},{"key":"6_CR9","doi-asserted-by":"publisher","first-page":"320","DOI":"10.1038\/s41586-021-03656-3","volume":"595","author":"H Whitehouse","year":"2021","unstructured":"Whitehouse, H., et al.: Retraction note: complex societies precede moralizing gods throughout world history. Nature 595, 320 (2021)","journal-title":"Nature"},{"key":"6_CR10","first-page":"13","volume":"1","author":"M Yusuf","year":"2023","unstructured":"Yusuf, M.: Insights into the in-silico research: current scenario, advantages, limits, and future perspectives. Life in Silico 1, 13\u201325 (2023)","journal-title":"Life in Silico"},{"key":"6_CR11","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1145\/3575663","volume":"66","author":"K Mike","year":"2023","unstructured":"Mike, K., Hazzan, O.: What is data science? Commun. ACM 66, 12\u201313 (2023)","journal-title":"Commun. ACM"},{"volume-title":"End-User Development","year":"2006","key":"6_CR12","unstructured":"Lieberman, H., Patern\u00f2, F., Wulf, V. (eds.): End-User Development. Springer, Netherlands, Dordrecht (2006)"},{"key":"6_CR13","unstructured":"Hern, A.: Covid: how Excel may have caused loss of 16,000 test results in England. The Guardian (2020)"},{"key":"6_CR14","doi-asserted-by":"crossref","unstructured":"Roy, S., Deursen, A.V., Hermans, F.: Perceived relevance of automatic code inspection in end-user development: A study on VBA. In: Proceedings EASE 2019 (23rd International Conference on Evaluation and Assessment in Software Engineering), (New York, NY, USA), pp.\u00a0167\u2014176, Association for Computing Machinery (2019)","DOI":"10.1145\/3319008.3319028"},{"key":"6_CR15","doi-asserted-by":"crossref","unstructured":"Pernia, D.L., Demeyer, S., Schalm, O., Anaf, W.: A data mining approach for indoor air assessment, an alternative tool for cultural heritage conservation. In: Proceedings HERI-TECH 2018 (IOP Conference Series: Materials Science and Engineering), vol.\u00a0364 \u2013 1, p.\u00a0012045 (2018)","DOI":"10.1088\/1757-899X\/364\/1\/012045"},{"key":"6_CR16","unstructured":"Carro, G., et al.: A new approach to make indoor air quality in the accommodation of ships understandable and actionable for seafaring staff. In: Proceedings ICMT 2020 8th International Conference on Maritime Transport \u2014 Maritime Transport VIII Sept (2020)"},{"issue":"2","key":"6_CR17","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/s10664-008-9102-8","volume":"14","author":"P Runeson","year":"2009","unstructured":"Runeson, P., H\u00f6st, M.: Guidelines for conducting and reporting case study research in software engineering. Empir. Softw. Eng. 14(2), 131\u2013164 (2009)","journal-title":"Empir. Softw. Eng."},{"key":"6_CR18","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1038\/d41586-018-07196-1","volume":"563","author":"J Perkel","year":"2018","unstructured":"Perkel, J.: Why jupyter is data scientists\u2019 computational notebook of choice. Nature 563, 145\u2013146 (2018)","journal-title":"Nature"},{"key":"6_CR19","volume-title":"Pattern-Oriented Software Architecture","author":"R Meunier","year":"1996","unstructured":"Meunier, R., Rohnert, H., Sommerlad, P., Stal, M., Buschmann, F.: Pattern-Oriented Software Architecture, vol. 1. Wiley, A System of Patterns (1996)"},{"key":"6_CR20","doi-asserted-by":"crossref","unstructured":"Kery, M.B., Radensky, M., Arya, M., John, B.E., Myers, B.A.: The story in the notebook: Exploratory data science using a literate programming tool. In: CHI 2018 Proceedings 2018 CHI Conference on Human Factors in Computing Systems, pp.\u00a01\u201311, Association for Computing Machinery, (2018)","DOI":"10.1145\/3173574.3173748"},{"issue":"2","key":"6_CR21","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1007\/s10664-021-10078-2","volume":"27","author":"J Businge","year":"2022","unstructured":"Businge, J., Openja, M., Nadi, S., Berger, T.: Reuse and maintenance practices among divergent forks in three software ecosystems. J. Emp. Softw. Eng. 27(2), 54 (2022)","journal-title":"J. Emp. Softw. Eng."},{"key":"6_CR22","doi-asserted-by":"crossref","unstructured":"Dubinsky, Y., Rubin, J., Berger, T., Duszynski, S., Becker, M., Czarnecki, K.: An exploratory study of cloning in industrial software product lines. In: Proceedings CSMR 2013 17th European Conference on Software Maintenance and Reengineering, pp.\u00a025 \u2013 34 (2013)","DOI":"10.1109\/CSMR.2013.13"},{"key":"6_CR23","doi-asserted-by":"crossref","unstructured":"Wang, J., Li, L., Zeller, A.: Better code, better sharing: on the need of analyzing jupyter notebooks. In: ICSE-NIER 2020 Proceedings ACM\/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results, pp.\u00a053\u201456, Association for Computing Machinery (2020)","DOI":"10.1145\/3377816.3381724"},{"key":"6_CR24","doi-asserted-by":"crossref","unstructured":"Pimentel, J.F., Murta, L., Braganholo, V., Freire, J.: A large-scale study about quality and reproducibility of jupyter notebooks. In: MSR 2019 Proceedings 2019 IEEE\/ACM 16th International Conference on Mining Software Repositorie, pp.\u00a0507\u2013517, IEEE (2019)","DOI":"10.1109\/MSR.2019.00077"},{"issue":"4","key":"6_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10664-021-09961-9","volume":"26","author":"JF Pimentel","year":"2021","unstructured":"Pimentel, J.F., Murta, L., Braganholo, V., Freire, J.: Understanding and improving the quality and reproducibility of Jupyter notebooks. Emp. Soft. Eng. 26(4), 1\u201355 (2021). https:\/\/doi.org\/10.1007\/s10664-021-09961-9","journal-title":"Emp. Soft. Eng."},{"key":"6_CR26","doi-asserted-by":"crossref","unstructured":"Trisovic, A., Lau, M.K., Pasquier, T., Crosas, M.: A large-scale study on research code quality and execution. Sci. Data 9(60) (2022)","DOI":"10.1038\/s41597-022-01143-6"},{"key":"6_CR27","doi-asserted-by":"publisher","unstructured":"Boll, A., Vieregg, N., Kehrer, T.: Replicability of experimental tool evaluations in model-based software and systems engineering with matlab\/simulink. Innov. Syst. Softw. Eng. (2022). https:\/\/doi.org\/10.1007\/s11334-022-00442-w","DOI":"10.1007\/s11334-022-00442-w"},{"key":"6_CR28","unstructured":"Lundblad, A.: The most copied stackoverflow snippet of all time is flawed!. programming.guide. https:\/\/programming.guide\/worlds-most-copied-so-snippet.html"},{"key":"6_CR29","doi-asserted-by":"crossref","unstructured":"Demeyer, S.,\u00a0Ducasse, S., Nierstrasz, O.: Object-Oriented Reengineering Patterns. Morgan Kaufmann (2003)","DOI":"10.1016\/B978-155860639-5\/50006-7"},{"key":"6_CR30","doi-asserted-by":"crossref","unstructured":"Kapser, C., Godfrey, M.W.: Cloning considered harmful\u2019 considered harmful. In: Proceedings WCRE 2006 13th Working Conference on Reverse Engineering, pp.\u00a019 \u2014 28 (2006)","DOI":"10.1109\/WCRE.2006.1"},{"key":"6_CR31","doi-asserted-by":"crossref","unstructured":"Tang, Y., Khatchadourian, R., Bagherzadeh, M., Singh, R., Stewart, A., Raja, A.: An empirical study of refactorings and technical debt in machine learning systems. In: ICSE 2021 Proceedings of 2021 IEEE\/ACM 43rd International Conference on Software Engineering, pp.\u00a0238\u2013250 (2021)","DOI":"10.1109\/ICSE43902.2021.00033"},{"key":"6_CR32","doi-asserted-by":"crossref","unstructured":"Koenzen, A.P., Ernst, N.A., Storey, M.A.D.: Code duplication and reuse in jupyter notebooks. In: Proceedings VL\/HCC2020 2020 IEEE Symposium on Visual Languages and Human-Centric Computing, pp.\u00a01\u20139 (2020)","DOI":"10.1109\/VL\/HCC50065.2020.9127202"},{"key":"6_CR33","doi-asserted-by":"crossref","unstructured":"K\u00e4ll\u00e9n, M., Wrigstad, T.: Jupyter notebooks on github: characteristics and code clones. In: The Art, Science, and Engineering of Programming, vol.5, no. 3, (2021)","DOI":"10.22152\/programming-journal.org\/2021\/5\/15"},{"key":"6_CR34","doi-asserted-by":"crossref","unstructured":"De Santana, T.L., Neto, P.A.D.M.S., De Almeida, E.S., Ahmed, I.: Bug analysis in jupyter notebook projects: an empirical study. ACM Trans. Softw. Eng. Methodol. 33 (2024)","DOI":"10.1145\/3641539"},{"key":"6_CR35","doi-asserted-by":"crossref","unstructured":"Islam, M.J., Nguyen, G., Pan, R., Rajan, H.: A comprehensive study on deep learning bug characteristics. In: ESEC\/FSE 2019 Proceedings 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, NY, USA, p.\u00a0510-520, Association for Computing Machinery (2019)","DOI":"10.1145\/3338906.3338955"},{"key":"6_CR36","doi-asserted-by":"crossref","unstructured":"Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel\/Hierarchical Models. Cambridge University Press (2006)","DOI":"10.1017\/CBO9780511790942"},{"key":"6_CR37","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1145\/3624700","volume":"67","author":"M Buyl","year":"2024","unstructured":"Buyl, M., De Bie, T.: Inherent limitations of AI fairness. Commun. ACM 67, 48\u201355 (2024)","journal-title":"Commun. ACM"},{"key":"6_CR38","doi-asserted-by":"crossref","unstructured":"Kowalczyk, E., Nair, K., Gao, Z., Silberstein, L., Long, T., Memon, A.: Modeling and ranking flaky tests at apple. In: Proceedings ICSE-SEIP 2020 42nd International Conference on Software Engineering: Software Engineering in Practice, pp.\u00a0110\u2013119 (2020)","DOI":"10.1145\/3377813.3381370"},{"key":"6_CR39","doi-asserted-by":"crossref","unstructured":"Kim, M., Sazawal, V., Notkin, D., Murphy, G.: An empirical study of code clone genealogies. In: Proceedings ESEC\/FSE 2005 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp.\u00a0187\u2013196 (2005)","DOI":"10.1145\/1081706.1081737"},{"key":"6_CR40","doi-asserted-by":"crossref","unstructured":"Krinke, J.: Is cloned code more stable than non-cloned code?. In: Proceedings SCAM 2008 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation, pp.\u00a057\u201366, IEEE (2008)","DOI":"10.1109\/SCAM.2008.14"},{"key":"6_CR41","doi-asserted-by":"crossref","unstructured":"Krinke, J.: A study of consistent and inconsistent changes to code clones. In: Proceedings WCRE 2007 14th Working Conference on Reverse Engineering, pp.\u00a0170\u2013178, IEEE, (2007)","DOI":"10.1109\/WCRE.2007.7"},{"key":"6_CR42","doi-asserted-by":"crossref","unstructured":"van Bladel, B.,\u00a0Demeyer, S.: A comparative study of code clone genealogies in test code and production code. In: Proceedings VST 2023 IEEE Workshop on Validation, Analysis and Evolution of Software Tests, pp.\u00a0913 \u2013 920, IEEE (2023)","DOI":"10.1109\/SANER56733.2023.00110"},{"key":"6_CR43","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1145\/1646353.1646374","volume":"53","author":"A Bessey","year":"2010","unstructured":"Bessey, A., et al.: A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM 53, 66\u201375 (2010)","journal-title":"Commun. ACM"},{"key":"6_CR44","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"683","DOI":"10.1007\/978-3-319-89884-1_24","volume-title":"Programming Languages and Systems","author":"C Urban","year":"2018","unstructured":"Urban, C., M\u00fcller, P.: An abstract interpretation framework for input data usage. In: Ahmed, A. (ed.) ESOP 2018. LNCS, vol. 10801, pp. 683\u2013710. Springer, Cham (2018). https:\/\/doi.org\/10.1007\/978-3-319-89884-1_24"},{"key":"6_CR45","doi-asserted-by":"crossref","unstructured":"Suboti\u0107, P.,\u00a0Miliki\u0107, L.,\u00a0Stoji\u0107, M.: A static analysis framework for data science notebooks. In: Proceedings ICSE-SEIP 2022 44th International Conference on Software Engineering: Software Engineering in Practice, (New York, NY, USA), pp.\u00a013 \u2013 22, Association for Computing Machinery (2022)","DOI":"10.1145\/3510457.3513032"},{"key":"6_CR46","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1145\/3474385","volume":"64","author":"E Tosch","year":"2021","unstructured":"Tosch, E., Bakshy, E., Berger, E.D., Jensen, D.D., Moss, J.E.B.: PlanAlyzer: assessing threats to the validity of online experiments. Commun. ACM 64, 108\u2013116 (2021)","journal-title":"Commun. ACM"},{"key":"6_CR47","doi-asserted-by":"publisher","unstructured":"H\u00e4rtel, J., L\u00e4mmel, R.: Operationalizing validity of empirical software engineering studies. Emp. Softw. Eng. 28(6) (2023). https:\/\/doi.org\/10.1007\/s10664-023-10370-3","DOI":"10.1007\/s10664-023-10370-3"}],"container-title":["Lecture Notes in Computer Science","Leveraging Applications of Formal Methods, Verification and Validation. Software Engineering Methodologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-75387-9_6","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,25]],"date-time":"2024-10-25T08:06:32Z","timestamp":1729843592000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-75387-9_6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,26]]},"ISBN":["9783031753862","9783031753879"],"references-count":47,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-75387-9_6","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2024,10,26]]},"assertion":[{"value":"26 October 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ISoLA","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Symposium on Leveraging Applications of Formal Methods","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Crete","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Greece","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"27 October 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"31 October 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"12","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"isola2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/isola-conference.org\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}