{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T23:23:19Z","timestamp":1768087399320,"version":"3.49.0"},"reference-count":102,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2024,11,21]],"date-time":"2024-11-21T00:00:00Z","timestamp":1732147200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example, through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.<\/jats:p>","DOI":"10.1145\/3674730","type":"journal-article","created":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T16:53:09Z","timestamp":1719593589000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Cleaning Up Confounding: Accounting for Endogeneity Using Instrumental Variables and Two-Stage Models"],"prefix":"10.1145","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0545-6643","authenticated-orcid":false,"given":"Lorenz","family":"Graf-Vlachy","sequence":"first","affiliation":[{"name":"TU Dortmund University, Dortmund, Germany and University of Stuttgart, Institute of Software Engineering, Stuttgart, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5256-8429","authenticated-orcid":false,"given":"Stefan","family":"Wagner","sequence":"additional","affiliation":[{"name":"Technical University of Munich, TUM School of Computation, Information and Technology, Heilbronn, Germany"}]}],"member":"320","published-online":{"date-parts":[[2024,11,21]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCTA.2012.6523564"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2007.29"},{"issue":"3","key":"e_1_3_2_4_2","first-page":"313","article-title":"lifetime earnings and the vietnam era draft lottery: Evidence from social security administrative records","volume":"80","author":"Angrist Joshua D.","year":"1990","unstructured":"Joshua D. Angrist. 1990. lifetime earnings and the vietnam era draft lottery: Evidence from social security administrative records. The American Economic Review 80, 3 (1990), 313\u2013336.","journal-title":"The American Economic Review"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1515\/9781400829828"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.leaqua.2010.10.010"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1093\/oxfordhb\/9780199755615.013.007"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1177\/1476127008094339"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2022.3222119"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbankfin.2007.09.016"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1162\/003355304772839588"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1985441.1985472"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1177\/0022343310373032"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2019.01481"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1995.10476536"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511813085"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10940-007-9024-4"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.2307\/1935959"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1037\/h0040950"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER48275.2020.9054818"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1002\/smj.2475"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094428115619013"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/2993259.2993261"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CSMR.2012.31"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.12987\/9780300255881"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/QSIC.2010.58"},{"key":"e_1_3_2_27_2","first-page":"250","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics","author":"Danescu-Niculescu-Mizil Cristian","year":"2013","unstructured":"Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 250\u2013259."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.73"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.2307\/1401917"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/32.935855"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3467895"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510121"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3479497"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00097"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1177\/0049124100029002001"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.3102\/0162373713493129"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1093\/jeg\/lby025"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1509\/jm.14.0244"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2002.1033226"},{"key":"e_1_3_2_40_2","volume-title":"Forecasting Economic Time Series","author":"Granger C. W. J.","year":"1977","unstructured":"C. W. J. Granger and Paul Newbold. 1977. Forecasting Economic Time Series. Academic Press, New York, NY, USA and London."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.245"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.5465\/amj.2016.1155"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/2629648"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1177\/1476127003001001218"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.2307\/1913827"},{"key":"e_1_3_2_46_2","volume-title":"Econometrics","author":"Hayashi Fumio","year":"2000","unstructured":"Fumio Hayashi. 2000. Econometrics. Princeton University Press, Princeton."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.2307\/1912352"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.0081-1750.2006.00164.x"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-021-10092-4"},{"key":"e_1_3_2_50_2","first-page":"392","volume-title":"Proceedings of the 35th International Conference on Software Engineering (ICSE \u201913)","author":"Herzig Kim","year":"2013","unstructured":"Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It\u2019s not a bug, it\u2019s a feature: How misclassification impacts bug prediction. In Proceedings of the 35th International Conference on Software Engineering (ICSE \u201913). IEEE, 392\u2013401."},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1177\/0149206320960533"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1986.10478354"},{"key":"e_1_3_2_53_2","volume-title":"Correlation and Causality","author":"Kenny David A.","year":"1979","unstructured":"David A. Kenny. 1979. Correlation and Causality. John Wiley & Sons, New York, NY, USA."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/1137983.1138027"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1080\/00461520.2016.1207177"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jacceco.2009.11.004"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.2307\/2025310"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jce.2007.09.001"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380272"},{"key":"e_1_3_2_60_2","volume-title":"A System of Logic, Ratiocinative and Inductive","author":"Mill John Stuart","year":"1843","unstructured":"John Stuart Mill. 1843. A System of Logic, Ratiocinative and Inductive, Vol. 1. John W. Parker, London."},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524842.3528528"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1257\/jep.20.4.111"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/1134285.1134349"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.2307\/1909352"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1086\/296497"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.2307\/2938359"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3273934.3273943"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180183"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-53469-5_18"},{"key":"e_1_3_2_70_2","volume-title":"Causality: Models, Reasoning, and Inference","author":"Pearl Judea","year":"2000","unstructured":"Judea Pearl. 2000. Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press, Cambridge.","edition":"2"},{"key":"e_1_3_2_71_2","unstructured":"Judea Pearl. 2016. The Three Layer Causal Hierarchy. Retrieved from https:\/\/web.cs.ucla.edu\/kaoru\/3-layer-causal-hierarchy.pdf"},{"key":"e_1_3_2_72_2","volume-title":"The Book of Why: The New Science of Cause and Effect","author":"Pearl Judea","year":"2018","unstructured":"Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books, New York, NY, USA."},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2019.00024"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/1287624.1287643"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/1985793.1985830"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/70.1.41"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1287\/mksc.2014.0860"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1037\/h0037350"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1561\/1400000049"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1002\/smj.2136"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1162\/rest.1997.79.2.348"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2023.107198"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1177012031"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.2307\/2171753"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511614491.006"},{"key":"e_1_3_2_86_2","volume-title":"Introduction to Econometrics","author":"Stock James H.","year":"2019","unstructured":"James H. Stock and Mark W. Watson. 2019. Introduction to Econometrics (4 ed.). Pearson, New York, NY, USA.","edition":"4"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1198\/073500102288618658"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510080"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472306.3478336"},{"key":"e_1_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-42089-9_44"},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2019.00036"},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/ESEM.2017.59"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42001-020-00068-7"},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2007.13"},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME46990.2020.00011"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1002\/smj.2995"},{"key":"e_1_3_2_97_2","volume-title":"Econometric Analysis of Cross Section and Panel Data","author":"Wooldridge Jeffrey M.","year":"2010","unstructured":"Jeffrey M. Wooldridge. 2010. Econometric Analysis of Cross Section and Panel Data (2 ed.). MIT Press, Cambridge and London.","edition":"2"},{"key":"e_1_3_2_98_2","volume-title":"Introductory Econometrics: A Modern Approach","author":"Wooldridge Jeffrey M.","year":"2020","unstructured":"Jeffrey M. Wooldridge. 2020. Introductory Econometrics: A Modern Approach (7 ed.). Cengage, Boston.","edition":"7"},{"key":"e_1_3_2_99_2","doi-asserted-by":"publisher","DOI":"10.2307\/1914093"},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2013.6606654"},{"key":"e_1_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549103"},{"key":"e_1_3_2_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2017.8115619"},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2019.00011"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674730","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674730","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:57:50Z","timestamp":1750294670000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674730"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,21]]},"references-count":102,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3674730"],"URL":"https:\/\/doi.org\/10.1145\/3674730","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,21]]},"assertion":[{"value":"2023-09-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}