{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:31:11Z","timestamp":1777455071603,"version":"3.51.4"},"reference-count":68,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Big Data &amp; Society"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>Machine learning (ML) systems have shown great potential for performing or supporting inferential reasoning through analyzing large data sets, thereby potentially facilitating more informed decision-making. However, a hindrance to such use of ML systems is that the predictive models created through ML are often complex, opaque, and poorly understood, even if the programs \u201clearning\u201d the models are simple, transparent, and well understood. ML models become difficult to trust, since lay-people, specialists, and even researchers have difficulties gauging the reasonableness, correctness, and reliability of the inferences performed. In this article, we argue that bridging this gap in the understanding of ML models and their reasonableness requires a focus on developing an improved methodology for their creation. This process has been likened to \u201calchemy\u201d and criticized for involving a large degree of \u201cblack art,\u201d owing to its reliance on poorly understood \u201cbest practices\u201d. We soften this critique and argue that the seeming arbitrariness often is the result of a lack of explicit hypothesizing stemming from an empiricist and myopic focus on optimizing for predictive performance rather than from an occult or mystical process. We present some of the problems resulting from the excessive focus on optimizing generalization performance at the cost of hypothesizing about the selection of data and biases. We suggest embedding ML in a general logic of scientific discovery similar to the one presented by Charles Sanders Peirce, and present a recontextualized version of Peirce\u2019s scientific hypothesis adjusted to ML.<\/jats:p>","DOI":"10.1177\/20539517211020775","type":"journal-article","created":{"date-parts":[[2021,6,4]],"date-time":"2021-06-04T06:52:42Z","timestamp":1622789562000},"update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":6,"title":["Turning biases into hypotheses through method: A logic of scientific discovery for machine learning"],"prefix":"10.1177","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1544-4371","authenticated-orcid":false,"given":"Simon Aagaard","family":"Enni","sequence":"first","affiliation":[{"name":"Department of Computer Science, Aarhus University, Aarhus, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4412-9896","authenticated-orcid":false,"given":"Maja Bak","family":"Herrie","sequence":"additional","affiliation":[{"name":"Department of Art History, Aesthetics & Culture and Museology, Aarhus University, Aarhus, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2021,5,30]]},"reference":[{"key":"bibr1-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1002\/bies.1125"},{"issue":"7","key":"bibr2-20539517211020775","volume":"16","author":"Anderson C","year":"2008","journal-title":"Wired"},{"key":"bibr4-20539517211020775","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop CM","year":"2006"},{"key":"bibr5-20539517211020775","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/6352.001.0001"},{"key":"bibr6-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1080\/1369118X.2012.678878"},{"key":"bibr7-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1177\/2053951715622512"},{"key":"bibr8-20539517211020775","doi-asserted-by":"publisher","DOI":"10.17351\/ests2020.277"},{"key":"bibr9-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/s11229-009-9709-3"},{"key":"bibr10-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-09823-4_45"},{"key":"bibr11-20539517211020775","doi-asserted-by":"crossref","unstructured":"Chu X, Ilyas IF, Krishnan S, et\u00a0al. (2016) Data cleaning: Overview and emerging challenges. In:\n                      Proceedings of the 2016 international conference on management of data.\n                      New York, NY: Association for Computing Machinery, pp. 2201\u20132206.","DOI":"10.1145\/2882903.2912574"},{"key":"bibr12-20539517211020775","first-page":"1","volume":"89","author":"Citron DK","year":"2014","journal-title":"Washington Law Review"},{"key":"bibr13-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1145\/2347736.2347755"},{"key":"bibr14-20539517211020775","volume-title":"Stanford Encyclopedia of Philosophy","author":"Douven I","year":"2017"},{"key":"bibr15-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-010-3163-9"},{"key":"bibr16-20539517211020775","unstructured":"Fazel M, Freund Y, Jordan M, et\u00a0al. (2017) Computational challenges and the future of ML. Simons Institute, UC Berkeley. Available at: https:\/\/youtu.be\/uyZOcUDhIbY?t=1012 (accessed 10 March 2021)."},{"key":"bibr17-20539517211020775","first-page":"2962","volume":"28","author":"Feurer M","year":"2015","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr18-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1145\/230538.230561"},{"key":"bibr19-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/BF00344251"},{"key":"bibr20-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-017-1733-5"},{"key":"bibr21-20539517211020775","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/9302.003.0002"},{"key":"bibr22-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/BF00993472"},{"key":"bibr23-20539517211020775","unstructured":"Green B, Hu L (2018) The myth in the methodology: Towards a recontextualization of fairness in machine learning. In:\n                      Proceedings of the machine learning: The Debates workshop at the 35th International Conference on Machine Learning\n                      , Stockholm, Sweden."},{"key":"bibr24-20539517211020775","doi-asserted-by":"crossref","unstructured":"Guidotti R, Monreale A, Ruggieri S, et\u00a0al. (2018) A survey of methods for explaining black box models. In:\n                      ACM Computing Surveys (CSUR).\n                      New York, NY: Association for Computing Machinery, pp. 1\u201342.","DOI":"10.1145\/3236009"},{"key":"bibr25-20539517211020775","unstructured":"Gunning D (2017) Explainable artificial intelligence (XAI).\n                      Defense Advanced Research Projects Agency (DARPA),\n                      pp. 1\u201336."},{"key":"bibr26-20539517211020775","doi-asserted-by":"publisher","DOI":"10.2307\/2983326"},{"key":"bibr27-20539517211020775","volume-title":"Against Prediction Profiling, Policing, and Punishing in an Actuarial Age","author":"Harcourt BE","year":"2007"},{"key":"bibr28-20539517211020775","doi-asserted-by":"publisher","DOI":"10.2307\/2183532"},{"key":"bibr29-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7"},{"key":"bibr31-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"bibr32-20539517211020775","doi-asserted-by":"crossref","unstructured":"Kanter JM, Veeramachaneni K (2015) Deep feature synthesis: Towards automating data science endeavors. In:\n                      IEEE international conference on data science and advanced analytics (DSAA)\n                      , Paris, France. Oct 19\u201321, 2015, pp. 1\u201310.","DOI":"10.1109\/DSAA.2015.7344858"},{"key":"bibr33-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1023\/A:1021564703268"},{"key":"bibr34-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1177\/2053951714528481"},{"key":"bibr35-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1177\/2053951716631130"},{"key":"bibr36-20539517211020775","first-page":"1097","volume":"1","author":"Krizhevsky A","year":"2012","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"3","key":"bibr37-20539517211020775","first-page":"633","volume":"165","author":"Kroll JA","year":"2017","journal-title":"University of Pennsylvania Law Review"},{"key":"bibr38-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/BF00115008"},{"key":"bibr39-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-011-5242-y"},{"key":"bibr40-20539517211020775","first-page":"143","volume":"19","author":"LeCun Y","year":"1989","journal-title":"Connectionism in Perspective"},{"key":"bibr41-20539517211020775","unstructured":"LeCun Y (2017) My take on Ali Rahimi\u2019s \u2018Test of Time\u2019 award talk at NIPS.\n                      Facebook."},{"key":"bibr42-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1989.1.4.541"},{"key":"bibr43-20539517211020775","unstructured":"Marcus G (2018) Deep learning: A critical appraisal.\n                      arXiv\n                      ."},{"key":"bibr44-20539517211020775","unstructured":"Marr B (2014) Big data: The 5 vs everyone must know.\n                      LinkedIn Pulse."},{"key":"bibr45-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2018.07.007"},{"key":"bibr46-20539517211020775","volume-title":"The Need for Biases in Learning Generalizations","author":"Mitchell TM","year":"1980"},{"key":"bibr47-20539517211020775","volume-title":"Machine Learning","author":"Mitchell TM","year":"1997"},{"key":"bibr48-20539517211020775","doi-asserted-by":"crossref","unstructured":"Mittelstadt B, Russell C, Wachter S (2019) Explaining explanations in AI. In:\n                      Proceedings of the conference on fairness, accountability, and transparency.\n                      Atlanta, GA: Association for Computing Machinery, pp. 279\u2013288.","DOI":"10.1145\/3287560.3287574"},{"key":"bibr49-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-017-0606-3_12"},{"key":"bibr50-20539517211020775","doi-asserted-by":"crossref","unstructured":"Moss E, Sch\u00fc\u00fcr F (2018) How modes of myth\u2010making affect the particulars of DS\/ML adoption in industry. In:\n                      Ethnographic praxis in industry conference proceedings.\n                      Wiley Online Library, pp. 264\u2013280.","DOI":"10.1111\/1559-8918.2018.01207"},{"key":"bibr51-20539517211020775","doi-asserted-by":"publisher","DOI":"10.4159\/harvard.9780674736061"},{"key":"bibr52-20539517211020775","doi-asserted-by":"crossref","unstructured":"Passi S, Jackson S (2017) Data vision: Learning to see through algorithmic abstraction. In:\n                      Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing.\n                      Portland, OR: Association for Computing Machinery, pp. 2436\u20132447.","DOI":"10.1145\/2998181.2998331"},{"key":"bibr53-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1145\/3274405"},{"key":"bibr54-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1177\/2053951720939605"},{"key":"bibr55-20539517211020775","volume-title":"Collected Papers of Charles Sanders Peirce","author":"Peirce CS","year":"1966"},{"key":"bibr56-20539517211020775","volume-title":"Reasoning and the Logic of Things: The Cambridge Conferences Lectures of 1898","author":"Peirce CS","year":"1992"},{"key":"bibr57-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/BF00993474"},{"key":"bibr58-20539517211020775","first-page":"3567","author":"Ratner AJ","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr59-20539517211020775","first-page":"135","volume-title":"Law, Human Agency and Autonomic Computing","author":"Rouvroy A","year":"2011"},{"key":"bibr60-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2014.09.003"},{"key":"bibr61-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1007\/s11229-007-9223-4"},{"key":"bibr62-20539517211020775","unstructured":"Sculley D, Snoek J, Wiltschko A, et\u00a0al. (2018) Winner's curse? On pace, progress, and empirical rigor. In:\n                      Proceedings of the international conference on learning representations,\n                      Vancouver, Canada. April 30\u2013May 3, 2018."},{"key":"bibr63-20539517211020775","doi-asserted-by":"crossref","unstructured":"Selbst AD, Boyd D, Friedler SA, et\u00a0al. (2019) Fairness and abstraction in sociotechnical systems. In:\n                      Proceedings of the conference on fairness, accountability, and transparency\n                      , Atlanta, GA, USA. January 29\u201331, 2019, pp. 59\u201368.","DOI":"10.1145\/3287560.3287598"},{"key":"bibr64-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781107298019"},{"key":"bibr65-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1177\/2053951717743530"},{"issue":"2","key":"bibr66-20539517211020775","first-page":"1","author":"Wachter S","year":"2019","journal-title":"Business Law Review"},{"key":"bibr67-20539517211020775","unstructured":"Whittaker M, Crawford K, Dobbe R, et\u00a0al. (2018)\n                      AI now report 2018\n                      . New York University, New York, USA."},{"key":"bibr68-20539517211020775","doi-asserted-by":"crossref","unstructured":"Wieringa M (2020) What to account for when accounting for algorithms: A systematic literature review on algorithmic accountability. In:\n                      Proceedings of the 2020 conference on fairness, accountability, and transparency\n                      , Barcelona, Spain. January 27\u201330, 2020, pp. 1\u201318.","DOI":"10.1145\/3351095.3372833"},{"key":"bibr69-20539517211020775","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1996.8.7.1341"},{"key":"bibr70-20539517211020775","doi-asserted-by":"crossref","unstructured":"Wong SC, Gatt A, Stamatescu V, et\u00a0al. (2016) Understanding data augmentation for classification: when to warp?\n                      The International Conference on Digital Image Computing: Techniques and Applications (DICTA).\n                      Gold Coast: Institute of Electrical and Electronics Engineers (IEEE), pp. 1\u20136.","DOI":"10.1109\/DICTA.2016.7797091"}],"container-title":["Big Data &amp; Society"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/20539517211020775","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/20539517211020775","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/20539517211020775","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T12:58:13Z","timestamp":1777381093000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/20539517211020775"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":68,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1177\/20539517211020775"],"URL":"https:\/\/doi.org\/10.1177\/20539517211020775","relation":{},"ISSN":["2053-9517","2053-9517"],"issn-type":[{"value":"2053-9517","type":"print"},{"value":"2053-9517","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1]]},"article-number":"20539517211020775"}}