{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T02:14:47Z","timestamp":1772590487170,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":226,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,7,26]],"date-time":"2022-07-26T00:00:00Z","timestamp":1658793600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["IIS-1930642"],"award-info":[{"award-number":["IIS-1930642"]}]},{"name":"National Science Foundation","award":["IIS-1763642"],"award-info":[{"award-number":["IIS-1763642"]}]},{"name":"Office of Naval Research"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,7,26]]},"DOI":"10.1145\/3514094.3534196","type":"proceedings-article","created":{"date-parts":[[2022,7,27]],"date-time":"2022-07-27T22:25:13Z","timestamp":1658960713000},"page":"335-348","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["The Worst of Both Worlds: A Comparative Analysis of Errors in Learning from Data in Psychology and Machine Learning"],"prefix":"10.1145","author":[{"given":"Jessica","family":"Hullman","sequence":"first","affiliation":[{"name":"Northwestern University, Evanston, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sayash","family":"Kapoor","sequence":"additional","affiliation":[{"name":"Princeton University, Princeton, NJ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Priyanka","family":"Nanayakkara","sequence":"additional","affiliation":[{"name":"Northwestern University, Evanston, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Gelman","sequence":"additional","affiliation":[{"name":"Columbia University, New York, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arvind","family":"Narayanan","sequence":"additional","affiliation":[{"name":"Princeton University, Princeton, NJ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,7,27]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611975673.90"},{"key":"e_1_3_2_1_2_1","volume-title":"Blind retrospection: Why shark attacks are bad for democracy","author":"Achen Christopher H","year":"2012","unstructured":"Christopher H Achen and Larry M Bartels . 2012. Blind retrospection: Why shark attacks are bad for democracy . Center for the Study of Democratic Institutions, Vanderbilt University . Working Paper ( 2012 ). Christopher H Achen and Larry M Bartels. 2012. Blind retrospection: Why shark attacks are bad for democracy. Center for the Study of Democratic Institutions, Vanderbilt University. Working Paper (2012)."},{"key":"e_1_3_2_1_3_1","volume-title":"Aaron C Courville, and Marc Bellemare.","author":"Agarwal Rishabh","year":"2021","unstructured":"Rishabh Agarwal , Max Schwarzer , Pablo Samuel Castro , Aaron C Courville, and Marc Bellemare. 2021 . Deep reinforcement learning at the edge of the statistical precipice. NeurIPS 34 (2021). Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C Courville, and Marc Bellemare. 2021. Deep reinforcement learning at the edge of the statistical precipice. NeurIPS 34 (2021)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1136\/bmj.311.7003.485"},{"key":"e_1_3_2_1_5_1","volume-title":"Inferential statistics as descriptive statistics: There is no replication crisis if we don't expect replication. American Statistician 73, sup1","author":"Amrhein Valentin","year":"2019","unstructured":"Valentin Amrhein , David Trafimow , and Sander Greenland . 2019. Inferential statistics as descriptive statistics: There is no replication crisis if we don't expect replication. American Statistician 73, sup1 ( 2019 ), 262--270. Valentin Amrhein, David Trafimow, and Sander Greenland. 2019. Inferential statistics as descriptive statistics: There is no replication crisis if we don't expect replication. American Statistician 73, sup1 (2019), 262--270."},{"key":"e_1_3_2_1_6_1","unstructured":"Marcin Andrychowicz Anton Raichuk Piotr Stanczyk Manu Orsini Sertan Girgin Rapha\u00ebl Marinier Leonard Hussenot Matthieu Geist Olivier Pietquin Marcin Michalski Sylvain Gelly and Olivier Bachem. 2020. What matters for on-policy deep actor-critic methods? A large-scale study. In ICLR. Marcin Andrychowicz Anton Raichuk Piotr Stanczyk Manu Orsini Sertan Girgin Rapha\u00ebl Marinier Leonard Hussenot Matthieu Geist Olivier Pietquin Marcin Michalski Sylvain Gelly and Olivier Bachem. 2020. What matters for on-policy deep actor-critic methods? A large-scale study. In ICLR."},{"key":"e_1_3_2_1_7_1","volume-title":"Mostly Harmless Econometrics","author":"Angrist Joshua D","unstructured":"Joshua D Angrist and J\u00f6rn-Steffen Pischke . 2008. Mostly Harmless Econometrics . Princeton university press . Joshua D Angrist and J\u00f6rn-Steffen Pischke. 2008. Mostly Harmless Econometrics. Princeton university press."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_1_9_1","volume-title":"Invariant risk minimization. arXiv:1907.02893","author":"Arjovsky Martin","year":"2019","unstructured":"Martin Arjovsky , L\u00e9on Bottou , Ishaan Gulrajani , and David Lopez-Paz . 2019. Invariant risk minimization. arXiv:1907.02893 ( 2019 ). Martin Arjovsky, L\u00e9on Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. arXiv:1907.02893 (2019)."},{"key":"e_1_3_2_1_10_1","volume-title":"International Conference on Machine Learning. PMLR, 233--242","author":"Arpit Devansh","year":"2017","unstructured":"Devansh Arpit , Stanislaw Jastrzebski , Nicolas Ballas , David Krueger , Emmanuel Bengio , Maxinder S Kanwal , Tegan Maharaj , Asja Fischer , Aaron Courville , Yoshua Bengio , 2017 . A closer look at memorization in deep networks . In International Conference on Machine Learning. PMLR, 233--242 . Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. 2017. A closer look at memorization in deep networks. In International Conference on Machine Learning. PMLR, 233--242."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1257\/jep.11.1.109"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0020412"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1037\/0022-3514.71.2.230"},{"key":"e_1_3_2_1_14_1","first-page":"671","article-title":"Big data's disparate impact","volume":"104","author":"Barocas Solon","year":"2016","unstructured":"Solon Barocas and Andrew D Selbst . 2016 . Big data's disparate impact . California Law Review 104 (2016), 671 . Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. California Law Review 104 (2016), 671.","journal-title":"California Law Review"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1745-6916.2007.00051.x"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01270-0_28"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1903070116"},{"key":"e_1_3_2_1_18_1","volume-title":"Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate. NeurIPS 31","author":"Belkin Mikhail","year":"2018","unstructured":"Mikhail Belkin , Daniel J Hsu , and Partha Mitra . 2018. Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate. NeurIPS 31 ( 2018 ). Mikhail Belkin, Daniel J Hsu, and Partha Mitra. 2018. Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate. NeurIPS 31 (2018)."},{"key":"e_1_3_2_1_19_1","volume-title":"Perspectives on Machine Learning from Psychology's Reproducibility Crisis. arXiv:2104.08878","author":"Bell Samuel J","year":"2021","unstructured":"Samuel J Bell and Onno P Kampman . 2021. Perspectives on Machine Learning from Psychology's Reproducibility Crisis. arXiv:2104.08878 ( 2021 ). Samuel J Bell and Onno P Kampman. 2021. Perspectives on Machine Learning from Psychology's Reproducibility Crisis. arXiv:2104.08878 (2021)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_2_1_21_1","volume-title":"The consciousness prior. arXiv:1709.08568","author":"Bengio Yoshua","year":"2017","unstructured":"Yoshua Bengio . 2017. The consciousness prior. arXiv:1709.08568 ( 2017 ). Yoshua Bengio. 2017. The consciousness prior. arXiv:1709.08568 (2017)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.50"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416609"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"James O Berger and Robert L Wolpert. 1988. The Likelihood Principle. IMS. James O Berger and Robert L Wolpert. 1988. The Likelihood Principle. IMS.","DOI":"10.1214\/lnms\/1215466210"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2011.12.028"},{"key":"e_1_3_2_1_26_1","volume-title":"Smith","author":"Bernardo Jose M.","year":"1994","unstructured":"Jose M. Bernardo and Adrian F. M . Smith . 1994 . Bayesian Theory. Wiley . Jose M. Bernardo and Adrian F. M. Smith. 1994. Bayesian Theory. Wiley."},{"key":"e_1_3_2_1_27_1","unstructured":"Ryan Bernstein. 2021. Drawing maps of model space with modular Stan. (2021). https:\/\/statmodeling.stat.columbia.edu\/2021\/11\/19\/drawing-maps-ofmodel-space-with-modular-stan\/ Ryan Bernstein. 2021. Drawing maps of model space with modular Stan. (2021). https:\/\/statmodeling.stat.columbia.edu\/2021\/11\/19\/drawing-maps-ofmodel-space-with-modular-stan\/"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290607.3310432"},{"key":"e_1_3_2_1_29_1","article-title":"Discriminative learning under covariate shift","volume":"10","author":"Bickel Steffen","year":"2009","unstructured":"Steffen Bickel , Michael Br\u00fcckner , and Tobias Scheffer . 2009 . Discriminative learning under covariate shift . J. of Machine Learning Research 10 , 9 (2009). Steffen Bickel, Michael Br\u00fcckner, and Tobias Scheffer. 2009. Discriminative learning under covariate shift. J. of Machine Learning Research 10, 9 (2009).","journal-title":"J. of Machine Learning Research"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2018.02558"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1037\/xge0000558"},{"key":"e_1_3_2_1_32_1","unstructured":"Rishi Bommasani Drew A Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill etal 2021. On the opportunities and risks of foundation models. arXiv:2108.07258 (2021). Rishi Bommasani Drew A Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et al. 2021. On the opportunities and risks of foundation models. arXiv:2108.07258 (2021)."},{"key":"e_1_3_2_1_33_1","volume-title":"Applying machine learning to facilitate autism diagnostics: pitfalls and promises. J. of autism and developmental disorders 45, 5","author":"Bone Daniel","year":"2015","unstructured":"Daniel Bone , Matthew S Goodwin , Matthew P Black , Chi-Chun Lee , Kartik Audhkhasi , and Shrikanth Narayanan . 2015. Applying machine learning to facilitate autism diagnostics: pitfalls and promises. J. of autism and developmental disorders 45, 5 ( 2015 ), 1121--1136. Daniel Bone, Matthew S Goodwin, Matthew P Black, Chi-Chun Lee, Kartik Audhkhasi, and Shrikanth Narayanan. 2015. Applying machine learning to facilitate autism diagnostics: pitfalls and promises. J. of autism and developmental disorders 45, 5 (2015), 1121--1136."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1198\/tas.2011.10129"},{"key":"e_1_3_2_1_35_1","volume-title":"Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Ga\u00ebl Varoquaux, and Pascal Vincent.","author":"Bouthillier Xavier","year":"2021","unstructured":"Xavier Bouthillier , Pierre Delaunay , Mirko Bronzi , Assya Trofimov , Brennan Nichyporuk , Justin Szeto , Naz Sepah , Edward Raff , Kanika Madan , Vikram Voleti , Samira Ebrahimi Kahou , Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Ga\u00ebl Varoquaux, and Pascal Vincent. 2021 . Accounting for variance in machine learning Bbenchmarks. In Machine Learning and Systems (MLSys) . Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Ga\u00ebl Varoquaux, and Pascal Vincent. 2021. Accounting for variance in machine learning Bbenchmarks. In Machine Learning and Systems (MLSys)."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.516"},{"key":"e_1_3_2_1_38_1","volume-title":"A large annotated corpus for learning natural language inference. arXiv:1508.05326","author":"Bowman Samuel R","year":"2015","unstructured":"Samuel R Bowman , Gabor Angeli , Christopher Potts , and Christopher D Manning . 2015. A large annotated corpus for learning natural language inference. arXiv:1508.05326 ( 2015 ). Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. 2015. A large annotated corpus for learning natural language inference. arXiv:1508.05326 (2015)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1009213726"},{"key":"e_1_3_2_1_40_1","volume-title":"Wavelets and Statistics","author":"Buckheit Jonathan B","unstructured":"Jonathan B Buckheit and David L Donoho . 1995. Wavelab and reproducible research . In Wavelets and Statistics . Springer , 55--81. Jonathan B Buckheit and David L Donoho. 1995. Wavelab and reproducible research. In Wavelets and Statistics. Springer, 55--81."},{"key":"e_1_3_2_1_41_1","volume-title":"Conference on Fairness, Accountability and Transparency. PMLR, 77--91","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru . 2018 . Gender shades: Intersectional accuracy disparities in commercial gender classification . In Conference on Fairness, Accountability and Transparency. PMLR, 77--91 . Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. PMLR, 77--91."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1038\/nrn3475"},{"key":"e_1_3_2_1_43_1","volume-title":"With little power comes great responsibility. arXiv:2010.06595","author":"Card Dallas","year":"2020","unstructured":"Dallas Card , Peter Henderson , Urvashi Khandelwal , Robin Jia , Kyle Mahowald , and Dan Jurafsky . 2020. With little power comes great responsibility. arXiv:2010.06595 ( 2020 ). Dallas Card, Peter Henderson, Urvashi Khandelwal, Robin Jia, Kyle Mahowald, and Dan Jurafsky. 2020. With little power comes great responsibility. arXiv:2010.06595 (2020)."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2017.49"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1859921"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13428-013-0365-7"},{"key":"e_1_3_2_1_47_1","volume-title":"On empirical comparisons of optimizers for deep learning. arXiv:1910.05446","author":"Choi Dami","year":"2019","unstructured":"Dami Choi , Christopher J Shallue , Zachary Nado , Jaehoon Lee , Chris J Maddison , and George E Dahl . 2019. On empirical comparisons of optimizers for deep learning. arXiv:1910.05446 ( 2019 ). Dami Choi, Christopher J Shallue, Zachary Nado, Jaehoon Lee, Chris J Maddison, and George E Dahl. 2019. On empirical comparisons of optimizers for deep learning. arXiv:1910.05446 (2019)."},{"key":"e_1_3_2_1_48_1","volume-title":"G\u00e9rard Ben Arous, and Yann LeCun","author":"Choromanska Anna","year":"2015","unstructured":"Anna Choromanska , Mikael Henaff , Michael Mathieu , G\u00e9rard Ben Arous, and Yann LeCun . 2015 . The loss surfaces of multilayer networks. In Artificial Intelligence and Statistics. PMLR , 192--204. Anna Choromanska, Mikael Henaff, Michael Mathieu, G\u00e9rard Ben Arous, and Yann LeCun. 2015. The loss surfaces of multilayer networks. In Artificial Intelligence and Statistics. PMLR, 192--204."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-5371(73)80014-3"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8721.ep10768783"},{"key":"e_1_3_2_1_51_1","volume-title":"Alan E Hubbard, and Mark J van der Laan.","author":"Coyle Jeremy R","year":"2020","unstructured":"Jeremy R Coyle , Nima S Hejazi , Ivana Malenica , Rachael V Phillips , Benjamin F Arnold , Andrew Mertens , Jade Benjamin-Chung , Weixin Cai , Sonali Dayal , John M Colford Jr , Alan E Hubbard, and Mark J van der Laan. 2020 . Targeting learning: Robust statistics for reproducible research. arXiv:2006.07333 (2020). Jeremy R Coyle, Nima S Hejazi, Ivana Malenica, Rachael V Phillips, Benjamin F Arnold, Andrew Mertens, Jade Benjamin-Chung, Weixin Cai, Sonali Dayal, John M Colford Jr, Alan E Hubbard, and Mark J van der Laan. 2020. Targeting learning: Robust statistics for reproducible research. arXiv:2006.07333 (2020)."},{"key":"e_1_3_2_1_52_1","volume-title":"The trouble with bias. (2017). https:\/\/www.youtube.com\/watch?v=fMym_BKWQzk NIPS","author":"Crawford Kate","year":"2017","unstructured":"Kate Crawford . 2017. The trouble with bias. (2017). https:\/\/www.youtube.com\/watch?v=fMym_BKWQzk NIPS 2017 . Kate Crawford. 2017. The trouble with bias. (2017). https:\/\/www.youtube.com\/watch?v=fMym_BKWQzk NIPS 2017."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3434185"},{"key":"e_1_3_2_1_54_1","unstructured":"Alexander D'Amour Katherine Heller Dan Moldovan Ben Adlam Babak Alipanahi Alex Beutel Christina Chen Jonathan Deaton Jacob Eisenstein Matthew D Hoffman etal 2020. Underspecification presents challenges for credibility in modern machine learning. arXiv:2011.03395 (2020). Alexander D'Amour Katherine Heller Dan Moldovan Ben Adlam Babak Alipanahi Alex Beutel Christina Chen Jonathan Deaton Jacob Eisenstein Matthew D Hoffman et al. 2020. Underspecification presents challenges for credibility in modern machine learning. arXiv:2011.03395 (2020)."},{"key":"e_1_3_2_1_55_1","volume-title":"A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning. arXiv:2109.02355","author":"Dar Yehuda","year":"2021","unstructured":"Yehuda Dar , Vidya Muthukumar , and Richard G Baraniuk . 2021. A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning. arXiv:2109.02355 ( 2021 ). Yehuda Dar, Vidya Muthukumar, and Richard G Baraniuk. 2021. A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning. arXiv:2109.02355 (2021)."},{"key":"e_1_3_2_1_56_1","volume-title":"Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NeurIPS 27","author":"Dauphin Yann N","year":"2014","unstructured":"Yann N Dauphin , Razvan Pascanu , Caglar Gulcehre , Kyunghyun Cho , Surya Ganguli , and Yoshua Bengio . 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NeurIPS 27 ( 2014 ). Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NeurIPS 27 (2014)."},{"key":"e_1_3_2_1_57_1","first-page":"92","article-title":"Dealing with disagreements: Looking beyond the majority vote in subjective annotations","volume":"10","author":"Davani Aida Mostafazadeh","year":"2022","unstructured":"Aida Mostafazadeh Davani , Mark D\u00edaz , and Vinodkumar Prabhakaran . 2022 . Dealing with disagreements: Looking beyond the majority vote in subjective annotations . Transactions of the ACL 10 (2022), 92 -- 110 . Aida Mostafazadeh Davani, Mark D\u00edaz, and Vinodkumar Prabhakaran. 2022. Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the ACL 10 (2022), 92--110.","journal-title":"Transactions of the ACL"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1177\/014616702236869"},{"key":"e_1_3_2_1_59_1","volume-title":"The benchmark lottery. arXiv:2107.07002","author":"Dehghani Mostafa","year":"2021","unstructured":"Mostafa Dehghani , Yi Tay , Alexey A Gritsenko , Zhe Zhao , Neil Houlsby , Fernando Diaz , Donald Metzler , and Oriol Vinyals . 2021. The benchmark lottery. arXiv:2107.07002 ( 2021 ). Mostafa Dehghani, Yi Tay, Alexey A Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, and Oriol Vinyals. 2021. The benchmark lottery. arXiv:2107.07002 (2021)."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1817706116"},{"key":"e_1_3_2_1_61_1","volume-title":"Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255.","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255."},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1098\/rsos.200805"},{"key":"e_1_3_2_1_63_1","volume-title":"Show your work: Improved reporting of experimental results. arXiv:1909.03004","author":"Dodge Jesse","year":"2019","unstructured":"Jesse Dodge , Suchin Gururangan , Dallas Card , Roy Schwartz , and Noah A Smith . 2019. Show your work: Improved reporting of experimental results. arXiv:1909.03004 ( 2019 ). Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, and Noah A Smith. 2019. Show your work: Improved reporting of experimental results. arXiv:1909.03004 (2019)."},{"key":"e_1_3_2_1_64_1","volume-title":"Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv:2002.06305","author":"Dodge Jesse","year":"2020","unstructured":"Jesse Dodge , Gabriel Ilharco , Roy Schwartz , Ali Farhadi , Hannaneh Hajishirzi , and Noah Smith . 2020. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv:2002.06305 ( 2020 ). Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, and Noah Smith. 2020. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv:2002.06305 (2020)."},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/2347736.2347755"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1080\/10618600.2017.1384734"},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1516179112"},{"key":"e_1_3_2_1_68_1","first-page":"471","article-title":"Replicability analysis for natural language processing: Testing significance with multiple datasets","volume":"5","author":"Dror Rotem","year":"2017","unstructured":"Rotem Dror , Gili Baumer , Marina Bogomolov , and Roi Reichart . 2017 . Replicability analysis for natural language processing: Testing significance with multiple datasets . Transactions of the ACL 5 (2017), 471 -- 486 . Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart. 2017. Replicability analysis for natural language processing: Testing significance with multiple datasets. Transactions of the ACL 5 (2017), 471--486.","journal-title":"Transactions of the ACL"},{"key":"e_1_3_2_1_69_1","volume-title":"AAAI Workshop on Evaluation Methods for Machine Learning. 1--5.","author":"Drummond Chris","year":"2006","unstructured":"Chris Drummond . 2006 . Machine learning as an experimental science (revisited) . In AAAI Workshop on Evaluation Methods for Machine Learning. 1--5. Chris Drummond. 2006. Machine learning as an experimental science (revisited). In AAAI Workshop on Evaluation Methods for Machine Learning. 1--5."},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1177\/0956797612466416"},{"key":"e_1_3_2_1_71_1","unstructured":"Peter Eckersley Yomna Nasser etal 2017. EFF AI progress measurement project. Retreived from: https:\/\/eff. org\/ai\/metrics accessed on (2017) 09--09. Peter Eckersley Yomna Nasser et al. 2017. EFF AI progress measurement project. Retreived from: https:\/\/eff. org\/ai\/metrics accessed on (2017) 09--09."},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214504000000692"},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1111\/insr.12409"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0010068"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1177\/1745691612459520"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-012-0322-y"},{"key":"e_1_3_2_1_77_1","volume-title":"Competency problems: On finding and removing artifacts in language data. arXiv:2104.08646","author":"Gardner Matt","year":"2021","unstructured":"Matt Gardner , William Merrill , Jesse Dodge , Matthew E Peters , Alexis Ross , Sameer Singh , and Noah Smith . 2021. Competency problems: On finding and removing artifacts in language data. arXiv:2104.08646 ( 2021 ). Matt Gardner, William Merrill, Jesse Dodge, Matthew E Peters, Alexis Ross, Sameer Singh, and Noah Smith. 2021. Competency problems: On finding and removing artifacts in language data. arXiv:2104.08646 (2021)."},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458723"},{"key":"e_1_3_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-00257-z"},{"key":"e_1_3_2_1_80_1","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1080\/09332480.2012.752294","article-title":"Ethics and statistics: Ethics and the statistical use of prior information","volume":"25","author":"Gelman Andrew","year":"2012","unstructured":"Andrew Gelman . 2012 . Ethics and statistics: Ethics and the statistical use of prior information . Chance 25 , 4 (2012), 52 -- 54 . Andrew Gelman. 2012. Ethics and statistics: Ethics and the statistical use of prior information. Chance 25, 4 (2012), 52--54.","journal-title":"Chance"},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1097\/EDE.0b013e31827886f7"},{"key":"e_1_3_2_1_82_1","first-page":"632","article-title":"The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective","volume":"41","author":"Gelman Andrew","year":"2015","unstructured":"Andrew Gelman . 2015 . The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective . J. of Management 41 , 2 (2015), 632 -- 643 . Andrew Gelman. 2015. The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective. J. of Management 41, 2 (2015), 632--643.","journal-title":"J. of Management"},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1080\/09332480.2017.1302720"},{"key":"e_1_3_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1177\/0146167217729162"},{"key":"e_1_3_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1177\/1745691614551642"},{"key":"e_1_3_2_1_86_1","unstructured":"Andrew Gelman and Eric Loken. 2013. The garden of forking paths: Why multiple comparisons can be a problem even when there is no \"fishing expedition\" or \"p-hacking\" and the research hypothesis was posited ahead of time. Department of Statistics Columbia University 348 (2013). Andrew Gelman and Eric Loken. 2013. The garden of forking paths: Why multiple comparisons can be a problem even when there is no \"fishing expedition\" or \"p-hacking\" and the research hypothesis was posited ahead of time. Department of Statistics Columbia University 348 (2013)."},{"key":"e_1_3_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1080\/09332480.2014.890872"},{"key":"e_1_3_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1511\/2014.111.460"},{"key":"e_1_3_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.3390\/e19100555"},{"key":"e_1_3_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1198\/000313006X152649"},{"key":"e_1_3_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1511\/2009.79.310"},{"key":"e_1_3_2_1_92_1","volume-title":"We need to think more about how we conduct research. Behavioral and Brain Sciences 45","author":"Gigerenzer Gerd","year":"2022","unstructured":"Gerd Gigerenzer . 2022. We need to think more about how we conduct research. Behavioral and Brain Sciences 45 ( 2022 ). Gerd Gigerenzer. 2022. We need to think more about how we conduct research. Behavioral and Brain Sciences 45 (2022)."},{"key":"e_1_3_2_1_93_1","first-page":"421","article-title":"Surrogate science: The idol of a universal method for scientific inference","volume":"41","author":"Gigerenzer Gerd","year":"2015","unstructured":"Gerd Gigerenzer and Julian N Marewski . 2015 . Surrogate science: The idol of a universal method for scientific inference . J. of Management 41 , 2 (2015), 421 -- 440 . Gerd Gigerenzer and Julian N Marewski. 2015. Surrogate science: The idol of a universal method for scientific inference. J. of Management 41, 2 (2015), 421--440.","journal-title":"J. of Management"},{"key":"e_1_3_2_1_94_1","unstructured":"Justin Gilmer Behrooz Ghorbani Ankush Garg Sneha Kudugunta Behnam Neyshabur David Cardoze George Dahl Zack Nado and Orhan Firat. 2021. A loss curvature perspective on training instabilities of deep learning models. In ICLR. Justin Gilmer Behrooz Ghorbani Ankush Garg Sneha Kudugunta Behnam Neyshabur David Cardoze George Dahl Zack Nado and Orhan Firat. 2021. A loss curvature perspective on training instabilities of deep learning models. In ICLR."},{"key":"e_1_3_2_1_95_1","unstructured":"Tom Goldstein. 2022. My recent talk at the NSF town hall focused on the history of the AI winters how the ML community became \"anti-science \" and whether the rejection of science will cause a winter for ML theory. I'll summarize these issues below... http:\/\/archive.today\/ryryU Tom Goldstein. 2022. My recent talk at the NSF town hall focused on the history of the AI winters how the ML community became \"anti-science \" and whether the rejection of science will cause a winter for ML theory. I'll summarize these issues below... http:\/\/archive.today\/ryryU"},{"key":"e_1_3_2_1_96_1","volume-title":"Explaining and harnessing adversarial examples. arXiv:1412.6572","author":"Goodfellow Ian J","year":"2014","unstructured":"Ian J Goodfellow , Jonathon Shlens , and Christian Szegedy . 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572 ( 2014 ). Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014)."},{"key":"e_1_3_2_1_97_1","doi-asserted-by":"publisher","DOI":"10.1093\/oxfordjournals.aje.a116700"},{"key":"e_1_3_2_1_98_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445423"},{"key":"e_1_3_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1267"},{"key":"e_1_3_2_1_100_1","volume-title":"Inductive biases for deep learning of higher-level cognition. arXiv:2011.15091","author":"Goyal Anirudh","year":"2020","unstructured":"Anirudh Goyal and Yoshua Bengio . 2020. Inductive biases for deep learning of higher-level cognition. arXiv:2011.15091 ( 2020 ). Anirudh Goyal and Yoshua Bengio. 2020. Inductive biases for deep learning of higher-level cognition. arXiv:2011.15091 (2020)."},{"key":"e_1_3_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.670"},{"key":"e_1_3_2_1_102_1","volume-title":"Valid p-values behave exactly as they should: Some misleading criticisms of p-values and their resolution with s-values. American Statistician 73, sup1","author":"Greenland Sander","year":"2019","unstructured":"Sander Greenland . 2019. Valid p-values behave exactly as they should: Some misleading criticisms of p-values and their resolution with s-values. American Statistician 73, sup1 ( 2019 ), 106--114. Sander Greenland. 2019. Valid p-values behave exactly as they should: Some misleading criticisms of p-values and their resolution with s-values. American Statistician 73, sup1 (2019), 106--114."},{"key":"e_1_3_2_1_103_1","volume-title":"To aid scientific inference, emphasize unconditional descriptions of statistics. arXiv:1909.08583","author":"Greenland Sander","year":"2019","unstructured":"Sander Greenland and Zad Rafi . 2019. To aid scientific inference, emphasize unconditional descriptions of statistics. arXiv:1909.08583 ( 2019 ). Sander Greenland and Zad Rafi. 2019. To aid scientific inference, emphasize unconditional descriptions of statistics. arXiv:1909.08583 (2019)."},{"key":"e_1_3_2_1_104_1","volume-title":"Don't stop pretraining: adapt language models to domains and tasks. arXiv:2004.10964","author":"Gururangan Suchin","year":"2020","unstructured":"Suchin Gururangan , Ana Marasovic , Swabha Swayamdipta , Kyle Lo , Iz Beltagy , Doug Downey , and Noah A Smith . 2020. Don't stop pretraining: adapt language models to domains and tasks. arXiv:2004.10964 ( 2020 ). Suchin Gururangan, Ana Marasovic, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A Smith. 2020. Don't stop pretraining: adapt language models to domains and tasks. arXiv:2004.10964 (2020)."},{"key":"e_1_3_2_1_105_1","doi-asserted-by":"publisher","DOI":"10.3102\/0013189X032005019"},{"key":"e_1_3_2_1_106_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02294587"},{"key":"e_1_3_2_1_107_1","volume-title":"Ahmed Hosny, Farnoosh Khodakarami, Levi Waldron, Bo Wang, Chris McIntosh, Anna Goldenberg, Anshul Kundaje, Casey S Greene, et al.","author":"Haibe-Kains Benjamin","year":"2020","unstructured":"Benjamin Haibe-Kains , George Alexandru Adam , Ahmed Hosny, Farnoosh Khodakarami, Levi Waldron, Bo Wang, Chris McIntosh, Anna Goldenberg, Anshul Kundaje, Casey S Greene, et al. 2020 . Transparency and reproducibility in artificial intelligence. Nature 586, 7829 (2020), E14--E16. Benjamin Haibe-Kains, George Alexandru Adam, Ahmed Hosny, Farnoosh Khodakarami, Levi Waldron, Bo Wang, Chris McIntosh, Anna Goldenberg, Anshul Kundaje, Casey S Greene, et al. 2020. Transparency and reproducibility in artificial intelligence. Nature 586, 7829 (2020), E14--E16."},{"key":"e_1_3_2_1_108_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2009.36"},{"key":"e_1_3_2_1_109_1","volume-title":"The Elements of Statistical Learning: Data Mining, Inference, and Prediction","author":"Hastie Trevor","unstructured":"Trevor Hastie , Robert Tibshirani , and Jerome H Friedman . 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction . Vol. 2 . Springer . Trevor Hastie, Robert Tibshirani, and Jerome H Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Vol. 2. Springer."},{"key":"e_1_3_2_1_110_1","volume-title":"AI is wrestling with a replication crisis. MIT Technology Review","author":"Heaven Will Douglas","year":"2020","unstructured":"Will Douglas Heaven . 2020. AI is wrestling with a replication crisis. MIT Technology Review ( 2020 ). Will Douglas Heaven. 2020. AI is wrestling with a replication crisis. MIT Technology Review (2020)."},{"key":"e_1_3_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"e_1_3_2_1_112_1","volume-title":"The weirdest people in the world? Behavioral and Brain sSiences 33, 2--3","author":"Henrich Joseph","year":"2010","unstructured":"Joseph Henrich , Steven J Heine , and Ara Norenzayan . 2010. The weirdest people in the world? Behavioral and Brain sSiences 33, 2--3 ( 2010 ), 61--83. Joseph Henrich, Steven J Heine, and Ara Norenzayan. 2010. The weirdest people in the world? Behavioral and Brain sSiences 33, 2--3 (2010), 61--83."},{"key":"e_1_3_2_1_113_1","doi-asserted-by":"publisher","DOI":"10.15195\/v3.a26"},{"key":"e_1_3_2_1_114_1","volume-title":"Prediction and explanation in social systems. Science 355, 6324","author":"Hofman Jake M","year":"2017","unstructured":"Jake M Hofman , Amit Sharma , and Duncan J Watts . 2017. Prediction and explanation in social systems. Science 355, 6324 ( 2017 ), 486--488. Jake M Hofman, Amit Sharma, and Duncan J Watts. 2017. Prediction and explanation in social systems. Science 355, 6324 (2017), 486--488."},{"key":"e_1_3_2_1_115_1","doi-asserted-by":"crossref","unstructured":"Jake M Hofman Duncan J Watts Susan Athey Filiz Garip Thomas L Griffiths Jon Kleinberg Helen Margetts Sendhil Mullainathan Matthew J Salganik Simine Vazire etal 2021. Integrating explanation and prediction in computational social science. Nature 595 7866 (2021) 181--188. Jake M Hofman Duncan J Watts Susan Athey Filiz Garip Thomas L Griffiths Jon Kleinberg Helen Margetts Sendhil Mullainathan Matthew J Salganik Simine Vazire et al. 2021. Integrating explanation and prediction in computational social science. Nature 595 7866 (2021) 181--188.","DOI":"10.1038\/s41586-021-03659-0"},{"key":"e_1_3_2_1_116_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neubiorev.2020.09.036"},{"key":"e_1_3_2_1_117_1","doi-asserted-by":"publisher","DOI":"10.1177\/2515245917751886"},{"key":"e_1_3_2_1_118_1","volume-title":"Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv:2005.00687 (Feb","author":"Hu Weihua","year":"2021","unstructured":"Weihua Hu , Matthias Fey , Marinka Zitnik , Yuxiao Dong , Hongyu Ren , Bowen Liu , Michele Catasta , and Jure Leskovec . 2021. Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv:2005.00687 (Feb . 2021 ). http:\/\/arxiv.org\/abs\/2005.00687 arXiv: 2005.00687. Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2021. Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv:2005.00687 (Feb. 2021). http:\/\/arxiv.org\/abs\/2005.00687 arXiv: 2005.00687."},{"key":"e_1_3_2_1_119_1","volume-title":"International Conference on Machine Learning. PMLR, 2891--2900","author":"Huang Chen","year":"2019","unstructured":"Chen Huang , Shuangfei Zhai , Walter Talbott , Miguel Bautista Martin , Shih-Yu Sun , Carlos Guestrin , and Josh Susskind . 2019 . Addressing the loss-metric mismatch with adaptive loss alignment . In International Conference on Machine Learning. PMLR, 2891--2900 . Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Bautista Martin, Shih-Yu Sun, Carlos Guestrin, and Josh Susskind. 2019. Addressing the loss-metric mismatch with adaptive loss alignment. In International Conference on Machine Learning. PMLR, 2891--2900."},{"key":"e_1_3_2_1_120_1","volume-title":"P values are not error probabilities","author":"Hubbard Raymond","year":"2003","unstructured":"Raymond Hubbard and MJ Bayarri . 2003. P values are not error probabilities . Institute of Stat. and Dec. Sci., Working Paper 03--26 ( 2003 ), 27708--0251. Raymond Hubbard and MJ Bayarri. 2003. P values are not error probabilities. Institute of Stat. and Dec. Sci., Working Paper 03--26 (2003), 27708--0251."},{"key":"e_1_3_2_1_121_1","doi-asserted-by":"publisher","DOI":"10.1198\/0003130031856"},{"key":"e_1_3_2_1_122_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445918"},{"key":"e_1_3_2_1_123_1","volume-title":"Has artificial intelligence become alchemy? Science","author":"Hutson Matthew","year":"2018","unstructured":"Matthew Hutson . 2018. Has artificial intelligence become alchemy? Science ( 2018 ). Matthew Hutson. 2018. Has artificial intelligence become alchemy? Science (2018)."},{"key":"e_1_3_2_1_124_1","volume-title":"Adversarial examples are not bugs, they are features. NeurIPS 32","author":"Ilyas Andrew","year":"2019","unstructured":"Andrew Ilyas , Shibani Santurkar , Dimitris Tsipras , Logan Engstrom , Brandon Tran , and Aleksander Madry . 2019. Adversarial examples are not bugs, they are features. NeurIPS 32 ( 2019 ). Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. 2019. Adversarial examples are not bugs, they are features. NeurIPS 32 (2019)."},{"key":"e_1_3_2_1_125_1","doi-asserted-by":"publisher","DOI":"10.1097\/EDE.0b013e31818131e7"},{"key":"e_1_3_2_1_126_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445901"},{"key":"e_1_3_2_1_127_1","volume-title":"Fantastic generalization measures and where to find them. arXiv:1912.02178","author":"Jiang Yiding","year":"2019","unstructured":"Yiding Jiang , Behnam Neyshabur , Hossein Mobahi , Dilip Krishnan , and Samy Bengio . 2019. Fantastic generalization measures and where to find them. arXiv:1912.02178 ( 2019 ). Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, and Samy Bengio. 2019. Fantastic generalization measures and where to find them. arXiv:1912.02178 (2019)."},{"key":"e_1_3_2_1_128_1","doi-asserted-by":"publisher","DOI":"10.1145\/3351095.3372829"},{"key":"e_1_3_2_1_129_1","volume-title":"Measuring the tendency of cnns to learn surface statistical regularities. arXiv:1711.11561","author":"Jo Jason","year":"2017","unstructured":"Jason Jo and Yoshua Bengio . 2017. Measuring the tendency of cnns to learn surface statistical regularities. arXiv:1711.11561 ( 2017 ). Jason Jo and Yoshua Bengio. 2017. Measuring the tendency of cnns to learn surface statistical regularities. arXiv:1711.11561 (2017)."},{"key":"e_1_3_2_1_130_1","volume-title":"SGD on neural networks learns functions of increasing complexity. NeurIPS 32","author":"Kalimeris Dimitris","year":"2019","unstructured":"Dimitris Kalimeris , Gal Kaplun , Preetum Nakkiran , Benjamin Edelman , Tristan Yang , Boaz Barak , and Haofeng Zhang . 2019. SGD on neural networks learns functions of increasing complexity. NeurIPS 32 ( 2019 ). Dimitris Kalimeris, Gal Kaplun, Preetum Nakkiran, Benjamin Edelman, Tristan Yang, Boaz Barak, and Haofeng Zhang. 2019. SGD on neural networks learns functions of increasing complexity. NeurIPS 32 (2019)."},{"key":"e_1_3_2_1_131_1","volume-title":"How much reading does reading comprehension require? a critical investigation of popular benchmarks. arXiv:1808.04926","author":"Kaushik Divyansh","year":"2018","unstructured":"Divyansh Kaushik and Zachary C Lipton . 2018. How much reading does reading comprehension require? a critical investigation of popular benchmarks. arXiv:1808.04926 ( 2018 ). Divyansh Kaushik and Zachary C Lipton. 2018. How much reading does reading comprehension require? a critical investigation of popular benchmarks. arXiv:1808.04926 (2018)."},{"key":"e_1_3_2_1_132_1","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702520"},{"key":"e_1_3_2_1_133_1","volume-title":"2nd Reproducibility in ML Workshop (ICML)","author":"Khetarpal Khimya","year":"2018","unstructured":"Khimya Khetarpal , Zafarali Ahmed , Andre Cianflone , Riashat Islam , and Joelle Pineau . 2018 . RE-EVALUATE: Reproducibility in evaluating reinforcement learning algorithms . 2nd Reproducibility in ML Workshop (ICML) (2018). Khimya Khetarpal, Zafarali Ahmed, Andre Cianflone, Riashat Islam, and Joelle Pineau. 2018. RE-EVALUATE: Reproducibility in evaluating reinforcement learning algorithms. 2nd Reproducibility in ML Workshop (ICML) (2018)."},{"key":"e_1_3_2_1_134_1","doi-asserted-by":"publisher","DOI":"10.1257\/pandp.20181018"},{"key":"e_1_3_2_1_135_1","unstructured":"Alex Krizhevsky Geoffrey Hinton etal 2009. Learning multiple layers of features from tiny images. (2009). Alex Krizhevsky Geoffrey Hinton et al. 2009. Learning multiple layers of features from tiny images. (2009)."},{"key":"e_1_3_2_1_136_1","volume-title":"Bayesian Cognitive Modeling: A Practical Course","author":"Lee Michael D","unstructured":"Michael D Lee and Eric-Jan Wagenmakers . 2014. Bayesian Cognitive Modeling: A Practical Course . Cambridge University Press . Michael D Lee and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press."},{"key":"e_1_3_2_1_137_1","volume-title":"ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning.","author":"Liao Thomas","year":"2020","unstructured":"Thomas Liao , Benjamin Recht , and Ludwig Schmidt . 2020 . In a forward direction: Analyzing distribution shifts in machine translation test sets over time . ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning. Thomas Liao, Benjamin Recht, and Ludwig Schmidt. 2020. In a forward direction: Analyzing distribution shifts in machine translation test sets over time. ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning."},{"key":"e_1_3_2_1_138_1","volume-title":"Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).","author":"Liao Thomas","year":"2021","unstructured":"Thomas Liao , Rohan Taori , Inioluwa Deborah Raji , and Ludwig Schmidt . 2021 . Are we learning yet? A meta review of evaluation failures across machine learning . In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). Thomas Liao, Rohan Taori, Inioluwa Deborah Raji, and Ludwig Schmidt. 2021. Are we learning yet? A meta review of evaluation failures across machine learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)."},{"key":"e_1_3_2_1_139_1","volume-title":"Significant improvements over the state of the art? A case study of the MS MARCO Document Ranking Leaderboard. (Feb","author":"Lin Jimmy","year":"2021","unstructured":"Jimmy Lin , Daniel Campos , Nick Craswell , Bhaskar Mitra , and Emine Yilmaz . 2021. Significant improvements over the state of the art? A case study of the MS MARCO Document Ranking Leaderboard. (Feb . 2021 ). https:\/\/arxiv.org\/abs\/2102.12887v1 Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, and Emine Yilmaz. 2021. Significant improvements over the state of the art? A case study of the MS MARCO Document Ranking Leaderboard. (Feb. 2021). https:\/\/arxiv.org\/abs\/2102.12887v1"},{"key":"e_1_3_2_1_140_1","doi-asserted-by":"publisher","DOI":"10.1145\/3316774"},{"key":"e_1_3_2_1_141_1","volume-title":"Measurement error and the replication crisis. Science 355, 6325","author":"Loken Eric","year":"2017","unstructured":"Eric Loken and Andrew Gelman . 2017. Measurement error and the replication crisis. Science 355, 6325 ( 2017 ), 584--585. Eric Loken and Andrew Gelman. 2017. Measurement error and the replication crisis. Science 355, 6325 (2017), 584--585."},{"key":"e_1_3_2_1_142_1","volume-title":"Are GANs created equal? A large-scale study. NeurIPS 31","author":"Lucic Mario","year":"2018","unstructured":"Mario Lucic , Karol Kurach , Marcin Michalski , Sylvain Gelly , and Olivier Bousquet . 2018. Are GANs created equal? A large-scale study. NeurIPS 31 ( 2018 ). Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. 2018. Are GANs created equal? A large-scale study. NeurIPS 31 (2018)."},{"key":"e_1_3_2_1_143_1","volume-title":"Time waits for no one! Analysis and challenges of temporal misalignment. arXiv:2111.07408","author":"Luu Kelvin","year":"2021","unstructured":"Kelvin Luu , Daniel Khashabi , Suchin Gururangan , Karishma Mandyam , and Noah A Smith . 2021. Time waits for no one! Analysis and challenges of temporal misalignment. arXiv:2111.07408 ( 2021 ). Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, and Noah A Smith. 2021. Time waits for no one! Analysis and challenges of temporal misalignment. arXiv:2111.07408 (2021)."},{"key":"e_1_3_2_1_144_1","doi-asserted-by":"publisher","DOI":"10.1086\/208919"},{"key":"e_1_3_2_1_145_1","volume-title":"AI Adoption in the Enterprise","author":"Magoulas Roger","unstructured":"Roger Magoulas and Steve Swoyer . 2020. AI Adoption in the Enterprise . Beijing : O'Reilly . Recuperado de http:\/\/www. oreilly. com\/data\/free\/ai . . . . Roger Magoulas and Steve Swoyer. 2020. AI Adoption in the Enterprise. Beijing: O'Reilly. Recuperado de http:\/\/www. oreilly. com\/data\/free\/ai . . . ."},{"key":"e_1_3_2_1_146_1","volume-title":"A hierarchy of limitations in machine learning. arXiv:2002.05193","author":"Malik Momin M","year":"2020","unstructured":"Momin M Malik . 2020. A hierarchy of limitations in machine learning. arXiv:2002.05193 ( 2020 ). Momin M Malik. 2020. A hierarchy of limitations in machine learning. arXiv:2002.05193 (2020)."},{"key":"e_1_3_2_1_147_1","volume-title":"Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. arXiv:1902.01007","author":"McCoy R Thomas","year":"2019","unstructured":"R Thomas McCoy , Ellie Pavlick , and Tal Linzen . 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. arXiv:1902.01007 ( 2019 ). R Thomas McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. arXiv:1902.01007 (2019)."},{"key":"e_1_3_2_1_148_1","doi-asserted-by":"publisher","DOI":"10.1086\/288135"},{"key":"e_1_3_2_1_149_1","doi-asserted-by":"publisher","DOI":"10.2466\/pr0.1990.66.1.195"},{"key":"e_1_3_2_1_150_1","volume-title":"On the state of the art of evaluation in neural language models. arXiv:1707.05589","author":"Melis G\u00e1bor","year":"2017","unstructured":"G\u00e1bor Melis , Chris Dyer , and Phil Blunsom . 2017. On the state of the art of evaluation in neural language models. arXiv:1707.05589 ( 2017 ). G\u00e1bor Melis, Chris Dyer, and Phil Blunsom. 2017. On the state of the art of evaluation in neural language models. arXiv:1707.05589 (2017)."},{"key":"e_1_3_2_1_151_1","doi-asserted-by":"publisher","DOI":"10.1214\/18-AOAS1161SF"},{"key":"e_1_3_2_1_152_1","doi-asserted-by":"publisher","DOI":"10.1145\/3287560.3287596"},{"key":"e_1_3_2_1_153_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-016-0021"},{"key":"e_1_3_2_1_154_1","doi-asserted-by":"publisher","DOI":"10.1198\/000313008X332421"},{"key":"e_1_3_2_1_155_1","volume-title":"Deterministic implementations for reproducibility in deep reinforcement learning. arXiv:1809.05676","author":"Nagarajan Prabhat","year":"2018","unstructured":"Prabhat Nagarajan , GarrettWarnell, and Peter Stone . 2018. Deterministic implementations for reproducibility in deep reinforcement learning. arXiv:1809.05676 ( 2018 ). Prabhat Nagarajan, GarrettWarnell, and Peter Stone. 2018. Deterministic implementations for reproducibility in deep reinforcement learning. arXiv:1809.05676 (2018)."},{"key":"e_1_3_2_1_156_1","doi-asserted-by":"crossref","unstructured":"Danielle Navarro. 2020. Paths in strange spaces: A comment on preregistration. (2020). Danielle Navarro. 2020. Paths in strange spaces: A comment on preregistration. (2020).","DOI":"10.31234\/osf.io\/wxn58"},{"key":"e_1_3_2_1_157_1","doi-asserted-by":"publisher","DOI":"10.1017\/pan.2018.39"},{"key":"e_1_3_2_1_158_1","unstructured":"Behnam Neyshabur Ryota Tomioka and Nathan Srebro. 2014. In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv:1412.6614 (2014). Behnam Neyshabur Ryota Tomioka and Nathan Srebro. 2014. In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv:1412.6614 (2014)."},{"key":"e_1_3_2_1_159_1","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2016.00934"},{"key":"e_1_3_2_1_160_1","volume-title":"Pervasive label errors in test sets destabilize machine learning benchmarks. arXiv:2103.14749","author":"Northcutt Curtis G","year":"2021","unstructured":"Curtis G Northcutt , Anish Athalye , and Jonas Mueller . 2021. Pervasive label errors in test sets destabilize machine learning benchmarks. arXiv:2103.14749 ( 2021 ). Curtis G Northcutt, Anish Athalye, and Jonas Mueller. 2021. Pervasive label errors in test sets destabilize machine learning benchmarks. arXiv:2103.14749 (2021)."},{"key":"e_1_3_2_1_161_1","volume-title":"Nosek et al","author":"Brian","year":"2015","unstructured":"Brian A. Nosek et al . 2015 . Estimating the reproducibility of psychological science. Science 349 (2015), aac4716. Brian A. Nosek et al. 2015. Estimating the reproducibility of psychological science. Science 349 (2015), aac4716."},{"key":"e_1_3_2_1_162_1","doi-asserted-by":"publisher","DOI":"10.1177\/2515245920917961"},{"key":"e_1_3_2_1_163_1","volume-title":"Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift. NeurIPS 32","author":"Ovadia Yaniv","year":"2019","unstructured":"Yaniv Ovadia , Emily Fertig , Jie Ren , Zachary Nado , David Sculley , Sebastian Nowozin , Joshua Dillon , Balaji Lakshminarayanan , and Jasper Snoek . 2019. Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift. NeurIPS 32 ( 2019 ). Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, David Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshminarayanan, and Jasper Snoek. 2019. Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift. NeurIPS 32 (2019)."},{"key":"e_1_3_2_1_164_1","volume-title":"Reducing gender bias in abusive language detection. arXiv:1808.07231","author":"Park Ji Ho","year":"2018","unstructured":"Ji Ho Park , Jamin Shin , and Pascale Fung . 2018. Reducing gender bias in abusive language detection. arXiv:1808.07231 ( 2018 ). Ji Ho Park, Jamin Shin, and Pascale Fung. 2018. Reducing gender bias in abusive language detection. arXiv:1808.07231 (2018)."},{"key":"e_1_3_2_1_165_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2021.100336"},{"key":"e_1_3_2_1_166_1","volume-title":"The sceptical Bayes factor for the assessment of replication success. arXiv:2009.01520","author":"Pawel Samuel","year":"2020","unstructured":"Samuel Pawel and Leonhard Held . 2020. The sceptical Bayes factor for the assessment of replication success. arXiv:2009.01520 ( 2020 ). Samuel Pawel and Leonhard Held. 2020. The sceptical Bayes factor for the assessment of replication success. arXiv:2009.01520 (2020)."},{"key":"e_1_3_2_1_167_1","volume-title":"International Conference on Machine Learning. PMLR, 7599--7609","author":"Perdomo Juan","year":"2020","unstructured":"Juan Perdomo , Tijana Zrnic , Celestine Mendler-D\u00fcnner , and Moritz Hardt . 2020 . Performative prediction . In International Conference on Machine Learning. PMLR, 7599--7609 . Juan Perdomo, Tijana Zrnic, Celestine Mendler-D\u00fcnner, and Moritz Hardt. 2020. Performative prediction. In International Conference on Machine Learning. PMLR, 7599--7609."},{"key":"e_1_3_2_1_168_1","volume-title":"manual_seed (3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision. arXiv:2109.08203","author":"Picard David","year":"2021","unstructured":"David Picard . 2021. Torch. manual_seed (3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision. arXiv:2109.08203 ( 2021 ). David Picard. 2021. Torch. manual_seed (3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision. arXiv:2109.08203 (2021)."},{"key":"e_1_3_2_1_169_1","volume-title":"Improving reproducibility in machine learning research: a report from the NeurIPS 2019 reproducibility program. J. of Machine Learning Research 22","author":"Pineau Joelle","year":"2021","unstructured":"Joelle Pineau , Philippe Vincent-Lamarre , Koustuv Sinha , Vincent Larivi\u00e8re , Alina Beygelzimer , Florence d' Alch\u00e9 Buc , Emily Fox , and Hugo Larochelle . 2021. Improving reproducibility in machine learning research: a report from the NeurIPS 2019 reproducibility program. J. of Machine Learning Research 22 ( 2021 ). Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivi\u00e8re, Alina Beygelzimer, Florence d'Alch\u00e9 Buc, Emily Fox, and Hugo Larochelle. 2021. Improving reproducibility in machine learning research: a report from the NeurIPS 2019 reproducibility program. J. of Machine Learning Research 22 (2021)."},{"key":"e_1_3_2_1_170_1","volume-title":"Dataset shift in machine learning","author":"Qui\u00f1onero-Candela Joaquin","unstructured":"Joaquin Qui\u00f1onero-Candela , Masashi Sugiyama , Anton Schwaighofer , and Neil D Lawrence . 2008. Dataset shift in machine learning . Mit Press . Joaquin Qui\u00f1onero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2008. Dataset shift in machine learning. Mit Press."},{"key":"e_1_3_2_1_171_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12874-020-01105-9"},{"key":"e_1_3_2_1_172_1","volume-title":"AI and the everything in the whole wide world benchmark. arXiv:2111.15366","author":"Raji Inioluwa Deborah","year":"2021","unstructured":"Inioluwa Deborah Raji , Emily M Bender , Amandalynne Paullada , Emily Denton , and Alex Hanna . 2021. AI and the everything in the whole wide world benchmark. arXiv:2111.15366 ( 2021 ). Inioluwa Deborah Raji, Emily M Bender, Amandalynne Paullada, Emily Denton, and Alex Hanna. 2021. AI and the everything in the whole wide world benchmark. arXiv:2111.15366 (2021)."},{"key":"e_1_3_2_1_173_1","volume-title":"Model evaluation, model selection, and algorithm selection in machine learning. arXiv:1811.12808","author":"Raschka Sebastian","year":"2018","unstructured":"Sebastian Raschka . 2018. Model evaluation, model selection, and algorithm selection in machine learning. arXiv:1811.12808 ( 2018 ). Sebastian Raschka. 2018. Model evaluation, model selection, and algorithm selection in machine learning. arXiv:1811.12808 (2018)."},{"key":"e_1_3_2_1_174_1","volume-title":"Do CIFAR-10 classifiers generalize to CIFAR-10? arXiv:1806.00451","author":"Recht Benjamin","year":"2018","unstructured":"Benjamin Recht , Rebecca Roelofs , Ludwig Schmidt , and Vaishaal Shankar . 2018. Do CIFAR-10 classifiers generalize to CIFAR-10? arXiv:1806.00451 ( 2018 ). Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. 2018. Do CIFAR-10 classifiers generalize to CIFAR-10? arXiv:1806.00451 (2018)."},{"key":"e_1_3_2_1_175_1","unstructured":"B Recht R Roelofs L Schmidt and V Shankar. 2019. Unbiased look at dataset bias. ICML. B Recht R Roelofs L Schmidt and V Shankar. 2019. Unbiased look at dataset bias. ICML."},{"key":"e_1_3_2_1_176_1","doi-asserted-by":"publisher","DOI":"10.1142\/S270507852050006X"},{"key":"e_1_3_2_1_177_1","doi-asserted-by":"publisher","DOI":"10.1177\/25152459211026864"},{"key":"e_1_3_2_1_178_1","volume-title":"The Cultural Nature of Human Development","author":"Rogoff Barbara","unstructured":"Barbara Rogoff . 2003. The Cultural Nature of Human Development . Oxford University Press . Barbara Rogoff. 2003. The Cultural Nature of Human Development. Oxford University Press."},{"key":"e_1_3_2_1_179_1","volume-title":"The elephant in the room. arXiv:1808.03305","author":"Rosenfeld Amir","year":"2018","unstructured":"Amir Rosenfeld , Richard Zemel , and John K Tsotsos . 2018. The elephant in the room. arXiv:1808.03305 ( 2018 ). Amir Rosenfeld, Richard Zemel, and John K Tsotsos. 2018. The elephant in the room. arXiv:1808.03305 (2018)."},{"key":"e_1_3_2_1_180_1","volume-title":"Workshop on Transparent and Interpretable Machine Learning in Safety Critical Environments, 31st Conference on Neural Information Processing Systems","volume":"4","author":"Ross Andrew","year":"2017","unstructured":"Andrew Ross , Isaac Lage , and Finale Doshi-Velez . 2017 . The neural lasso: Local linear sparsity for interpretable explanations . In Workshop on Transparent and Interpretable Machine Learning in Safety Critical Environments, 31st Conference on Neural Information Processing Systems , Vol. 4 . Andrew Ross, Isaac Lage, and Finale Doshi-Velez. 2017. The neural lasso: Local linear sparsity for interpretable explanations. In Workshop on Transparent and Interpretable Machine Learning in Safety Critical Environments, 31st Conference on Neural Information Processing Systems, Vol. 4."},{"key":"e_1_3_2_1_181_1","volume-title":"Artificial Intelligence: A Modern Approach.","author":"Russell Stuart J","year":"2003","unstructured":"Stuart J Russell and Peter Norvig . 2003 . Artificial Intelligence: A Modern Approach. Stuart J Russell and Peter Norvig. 2003. Artificial Intelligence: A Modern Approach."},{"key":"e_1_3_2_1_182_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1003285"},{"key":"e_1_3_2_1_183_1","doi-asserted-by":"publisher","DOI":"10.1177\/2515245919838781"},{"key":"e_1_3_2_1_184_1","volume-title":"Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120","author":"Saxe Andrew M","year":"2013","unstructured":"Andrew M Saxe , James L McClelland , and Surya Ganguli . 2013. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120 ( 2013 ). Andrew M Saxe, James L McClelland, and Surya Ganguli. 2013. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120 (2013)."},{"key":"e_1_3_2_1_185_1","volume-title":"Publication bias (the \"file-drawer problem\") in scientific inference. physics\/9909033","author":"Scargle Jeffrey D","year":"1999","unstructured":"Jeffrey D Scargle . 1999. Publication bias (the \"file-drawer problem\") in scientific inference. physics\/9909033 ( 1999 ). Jeffrey D Scargle. 1999. Publication bias (the \"file-drawer problem\") in scientific inference. physics\/9909033 (1999)."},{"key":"e_1_3_2_1_186_1","first-page":"1","article-title":"Do datasets have politics? Disciplinary values in computer vision dataset development","volume":"5","author":"Scheuerman Morgan Klaus","year":"2021","unstructured":"Morgan Klaus Scheuerman , Alex Hanna , and Emily Denton . 2021 . Do datasets have politics? Disciplinary values in computer vision dataset development . Proc. of CSCW 5 (2021), 1 -- 37 . Morgan Klaus Scheuerman, Alex Hanna, and Emily Denton. 2021. Do datasets have politics? Disciplinary values in computer vision dataset development. Proc. of CSCW 5 (2021), 1--37.","journal-title":"Proc. of CSCW"},{"key":"e_1_3_2_1_187_1","volume-title":"International Conference on Machine Learning. PMLR, 9367--9376","author":"Schmidt Robin M","year":"2021","unstructured":"Robin M Schmidt , Frank Schneider , and Philipp Hennig . 2021 . Descending through a crowded valley-benchmarking deep learning optimizers . In International Conference on Machine Learning. PMLR, 9367--9376 . Robin M Schmidt, Frank Schneider, and Philipp Hennig. 2021. Descending through a crowded valley-benchmarking deep learning optimizers. In International Conference on Machine Learning. PMLR, 9367--9376."},{"key":"e_1_3_2_1_188_1","volume-title":"Winner's curse? On pace, progress, and empirical rigor. ICLR","author":"Sculley David","year":"2018","unstructured":"David Sculley , Jasper Snoek , Alex Wiltschko , and Ali Rahimi . 2018. Winner's curse? On pace, progress, and empirical rigor. ICLR ( 2018 ). David Sculley, Jasper Snoek, Alex Wiltschko, and Ali Rahimi. 2018. Winner's curse? On pace, progress, and empirical rigor. ICLR (2018)."},{"key":"e_1_3_2_1_189_1","doi-asserted-by":"publisher","DOI":"10.1080\/135952201753172953"},{"key":"e_1_3_2_1_190_1","first-page":"9573","article-title":"The pitfalls of simplicity bias in neural networks","volume":"33","author":"Shah Harshay","year":"2020","unstructured":"Harshay Shah , Kaustav Tamuly , Aditi Raghunathan , Prateek Jain , and Praneeth Netrapalli . 2020 . The pitfalls of simplicity bias in neural networks . NeurIPS 33 (2020), 9573 -- 9585 . Harshay Shah, Kaustav Tamuly, Aditi Raghunathan, Prateek Jain, and Praneeth Netrapalli. 2020. The pitfalls of simplicity bias in neural networks. NeurIPS 33 (2020), 9573--9585.","journal-title":"NeurIPS"},{"key":"e_1_3_2_1_191_1","doi-asserted-by":"publisher","DOI":"10.1214\/10-STS330"},{"key":"e_1_3_2_1_192_1","doi-asserted-by":"publisher","DOI":"10.1177\/0956797611417632"},{"key":"e_1_3_2_1_193_1","doi-asserted-by":"publisher","DOI":"10.1002\/jcpy.1207"},{"key":"e_1_3_2_1_194_1","doi-asserted-by":"publisher","DOI":"10.1177\/1745691617708630"},{"key":"e_1_3_2_1_195_1","doi-asserted-by":"publisher","DOI":"10.5555\/3291125.3309632"},{"key":"e_1_3_2_1_196_1","volume-title":"A causal replication framework for designing and assessing replication efforts. Zeitschrift f\u00fcr Psychologie","author":"Steiner Peter M","year":"2019","unstructured":"Peter M Steiner , Vivian C Wong , and Kylie Anglin . 2019. A causal replication framework for designing and assessing replication efforts. Zeitschrift f\u00fcr Psychologie ( 2019 ). Peter M Steiner, Vivian C Wong, and Kylie Anglin. 2019. A causal replication framework for designing and assessing replication efforts. Zeitschrift f\u00fcr Psychologie (2019)."},{"key":"e_1_3_2_1_197_1","unstructured":"Victoria Stodden and Sheila Miguez. 2014. Provisioning Reproducible Computational Science. (2014). Victoria Stodden and Sheila Miguez. 2014. Provisioning Reproducible Computational Science. (2014)."},{"key":"e_1_3_2_1_198_1","first-page":"3","article-title":"When training and test sets are different: Characterizing learning transfer","volume":"30","author":"Storkey Amos","year":"2009","unstructured":"Amos Storkey . 2009 . When training and test sets are different: Characterizing learning transfer . Dataset Shift in Machine Learning 30 (2009), 3 -- 28 . Amos Storkey. 2009. When training and test sets are different: Characterizing learning transfer. Dataset Shift in Machine Learning 30 (2009), 3--28.","journal-title":"Dataset Shift in Machine Learning"},{"key":"e_1_3_2_1_199_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i09.7123"},{"key":"e_1_3_2_1_200_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.97"},{"key":"e_1_3_2_1_201_1","doi-asserted-by":"crossref","unstructured":"Harini Suresh and John Guttag. 2021. A framework for understanding sources of harm throughout the machine learning life cycle. In Equity and Access in Algorithms Mechanisms and Optimization. 1--9. Harini Suresh and John Guttag. 2021. A framework for understanding sources of harm throughout the machine learning life cycle. In Equity and Access in Algorithms Mechanisms and Optimization. 1--9.","DOI":"10.1145\/3465416.3483305"},{"key":"e_1_3_2_1_202_1","volume-title":"Intriguing properties of neural networks. arXiv:1312.6199","author":"Szegedy Christian","year":"2013","unstructured":"Christian Szegedy , Wojciech Zaremba , Ilya Sutskever , Joan Bruna , Dumitru Erhan , Ian Goodfellow , and Rob Fergus . 2013. Intriguing properties of neural networks. arXiv:1312.6199 ( 2013 ). Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv:1312.6199 (2013)."},{"key":"e_1_3_2_1_203_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2019.11.009"},{"key":"e_1_3_2_1_204_1","doi-asserted-by":"publisher","DOI":"10.3389\/fnhum.2017.00390"},{"key":"e_1_3_2_1_205_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-021-00981-0"},{"key":"e_1_3_2_1_206_1","volume-title":"Optimizer benchmarking needs to account for hyperparameter tuning. arXiv e-prints","author":"Sivaprasad Prabhu Teja","year":"2019","unstructured":"Prabhu Teja Sivaprasad , Florian Mai , Thijs Vogels , Martin Jaggi , and Fran\u00e7ois Fleuret . 2019. Optimizer benchmarking needs to account for hyperparameter tuning. arXiv e-prints ( 2019 ), arXiv--1910. Prabhu Teja Sivaprasad, Florian Mai, Thijs Vogels, Martin Jaggi, and Fran\u00e7ois Fleuret. 2019. Optimizer benchmarking needs to account for hyperparameter tuning. arXiv e-prints (2019), arXiv--1910."},{"key":"e_1_3_2_1_207_1","first-page":"407","article-title":"On the value of out-of-distribution testing: An example of Goodhart's law","volume":"33","author":"Teney Damien","year":"2020","unstructured":"Damien Teney , Ehsan Abbasnejad , Kushal Kafle , Robik Shrestha , Christopher Kanan , and Anton Van Den Hengel . 2020 . On the value of out-of-distribution testing: An example of Goodhart's law . NeurIPS 33 (2020), 407 -- 417 . Damien Teney, Ehsan Abbasnejad, Kushal Kafle, Robik Shrestha, Christopher Kanan, and Anton Van Den Hengel. 2020. On the value of out-of-distribution testing: An example of Goodhart's law. NeurIPS 33 (2020), 407--417.","journal-title":"NeurIPS"},{"key":"e_1_3_2_1_208_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995347"},{"key":"e_1_3_2_1_209_1","volume-title":"The piranha problem: Large effects swimming in a small pond. arXiv:2105.13445","author":"Tosh Christopher","year":"2021","unstructured":"Christopher Tosh , Philip Greengard , Ben Goodrich , Andrew Gelman , Aki Vehtari , and Daniel Hsu . 2021. The piranha problem: Large effects swimming in a small pond. arXiv:2105.13445 ( 2021 ). Christopher Tosh, Philip Greengard, Ben Goodrich, Andrew Gelman, Aki Vehtari, and Daniel Hsu. 2021. The piranha problem: Large effects swimming in a small pond. arXiv:2105.13445 (2021)."},{"key":"e_1_3_2_1_210_1","doi-asserted-by":"publisher","DOI":"10.1145\/1968.1972"},{"key":"e_1_3_2_1_211_1","doi-asserted-by":"publisher","DOI":"10.1093\/aje\/kwr458"},{"key":"e_1_3_2_1_212_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3264-1"},{"key":"e_1_3_2_1_213_1","volume-title":"Misspecification and unreliable interpretations in psychology and social science. Psychological Methods","author":"Vowels Matthew J","year":"2021","unstructured":"Matthew J Vowels . 2021. Misspecification and unreliable interpretations in psychology and social science. Psychological Methods ( 2021 ). Matthew J Vowels. 2021. Misspecification and unreliable interpretations in psychology and social science. Psychological Methods (2021)."},{"key":"e_1_3_2_1_214_1","doi-asserted-by":"publisher","DOI":"10.3758\/BF03194105"},{"key":"e_1_3_2_1_215_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-017-1343-3"},{"key":"e_1_3_2_1_216_1","volume-title":"Proc. of the EMNLP Workshop BlackboxNLP.ACL","author":"Singh Amanpreet","year":"2018","unstructured":"AlexWang, Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , and Samuel Bowman . 2018 . GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding . In Proc. of the EMNLP Workshop BlackboxNLP.ACL , Brussels, Belgium, 353--355. https:\/\/doi.org\/10. 18653\/v1\/W18--5446 10.18653\/v1 AlexWang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proc. of the EMNLP Workshop BlackboxNLP.ACL, Brussels, Belgium, 353--355. https:\/\/doi.org\/10.18653\/v1\/W18--5446"},{"key":"e_1_3_2_1_217_1","volume-title":"All of Statistics","author":"Wasserman Larry","unstructured":"Larry Wasserman . 2004. Bayesian inference . In All of Statistics . Springer , 175--192. Larry Wasserman. 2004. Bayesian inference. In All of Statistics. Springer, 175--192."},{"key":"e_1_3_2_1_218_1","doi-asserted-by":"publisher","DOI":"10.1177\/01461672992512005"},{"key":"e_1_3_2_1_219_1","volume-title":"Protecting against evaluation overfitting in empirical reinforcement learning","author":"Whiteson Shimon","unstructured":"Shimon Whiteson , Brian Tanner , Matthew E Taylor , and Peter Stone . 2011. Protecting against evaluation overfitting in empirical reinforcement learning . In ADPRL. IEEE , 120--127. Shimon Whiteson, Brian Tanner, Matthew E Taylor, and Peter Stone. 2011. Protecting against evaluation overfitting in empirical reinforcement learning. In ADPRL. IEEE, 120--127."},{"key":"e_1_3_2_1_220_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018046501280"},{"key":"e_1_3_2_1_221_1","volume-title":"Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt.","author":"Wortsman Mitchell","year":"2021","unstructured":"Mitchell Wortsman , Gabriel Ilharco , Mike Li , Jong Wook Kim , Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. 2021 . Robust fine-tuning of zero-shot models. arXiv:2109.01903 (2021). Mitchell Wortsman, Gabriel Ilharco, Mike Li, Jong Wook Kim, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. 2021. Robust fine-tuning of zero-shot models. arXiv:2109.01903 (2021)."},{"key":"e_1_3_2_1_222_1","volume-title":"Cold case: The lost mnist digits. NeurIPS 32","author":"Yadav Chhavi","year":"2019","unstructured":"Chhavi Yadav and L\u00e9on Bottou . 2019. Cold case: The lost mnist digits. NeurIPS 32 ( 2019 ). Chhavi Yadav and L\u00e9on Bottou. 2019. Cold case: The lost mnist digits. NeurIPS 32 (2019)."},{"key":"e_1_3_2_1_223_1","volume-title":"The generalizability crisis. Behavioral and Brain Sciences 45","author":"Yarkoni Tal","year":"2022","unstructured":"Tal Yarkoni . 2022. The generalizability crisis. Behavioral and Brain Sciences 45 ( 2022 ). Tal Yarkoni. 2022. The generalizability crisis. Behavioral and Brain Sciences 45 (2022)."},{"key":"e_1_3_2_1_224_1","doi-asserted-by":"publisher","DOI":"10.1177\/1745691617693393"},{"key":"e_1_3_2_1_225_1","volume-title":"A failed replication draws a scathing personal attack from a psychology professor. Discover","author":"Yong Ed","year":"2012","unstructured":"Ed Yong . 2012. A failed replication draws a scathing personal attack from a psychology professor. Discover ( 2012 ). https:\/\/web.archive.org\/web\/20120313012842\/http:\/\/blogs.discovermagazine.com\/notrocketscience\/2012\/03\/10\/failed-replication-bargh-psychology-study-doyen\/ Ed Yong. 2012. A failed replication draws a scathing personal attack from a psychology professor. Discover (2012). https:\/\/web.archive.org\/web\/20120313012842\/http:\/\/blogs.discovermagazine.com\/notrocketscience\/2012\/03\/10\/failed-replication-bargh-psychology-study-doyen\/"},{"key":"e_1_3_2_1_226_1","doi-asserted-by":"publisher","DOI":"10.1145\/3446776"},{"key":"e_1_3_2_1_227_1","volume-title":"Men also like shopping: Reducing gender bias amplification using corpuslevel constraints. arXiv:1707.09457","author":"Zhao Jieyu","year":"2017","unstructured":"Jieyu Zhao , Tianlu Wang , Mark Yatskar , Vicente Ordonez , and Kai-Wei Chang . 2017. Men also like shopping: Reducing gender bias amplification using corpuslevel constraints. arXiv:1707.09457 ( 2017 ). Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpuslevel constraints. arXiv:1707.09457 (2017)."}],"event":{"name":"AIES '22: AAAI\/ACM Conference on AI, Ethics, and Society","location":"Oxford United Kingdom","acronym":"AIES '22","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence","AAAI"]},"container-title":["Proceedings of the 2022 AAAI\/ACM Conference on AI, Ethics, and Society"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514094.3534196","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3514094.3534196","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3514094.3534196","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:37Z","timestamp":1750186957000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514094.3534196"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,26]]},"references-count":226,"alternative-id":["10.1145\/3514094.3534196","10.1145\/3514094"],"URL":"https:\/\/doi.org\/10.1145\/3514094.3534196","relation":{},"subject":[],"published":{"date-parts":[[2022,7,26]]},"assertion":[{"value":"2022-07-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}