{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:01:00Z","timestamp":1773792060521,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":46,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T00:00:00Z","timestamp":1687824000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,27]]},"DOI":"10.1145\/3589806.3600044","type":"proceedings-article","created":{"date-parts":[[2023,6,28]],"date-time":"2023-06-28T20:09:22Z","timestamp":1687982962000},"page":"37-61","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["On Reporting Robust and Trustworthy Conclusions from Model Comparison Studies Involving Neural Networks and Randomness"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9754-5941","authenticated-orcid":false,"given":"Odd Erik","family":"Gundersen","sequence":"first","affiliation":[{"name":"Department of Computer Science, Norwegian University of Science and Technology, Norway and Aneo AI Research, Aneo AS, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9444-7185","authenticated-orcid":false,"given":"Saeid","family":"Shamsaliei","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Norwegian University of Science and Technology, Norway and Aneo AI Research, Aneo AS, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5441-6038","authenticated-orcid":false,"given":"H\u00e5kon Sletten","family":"Kj\u00e6rnli","sequence":"additional","affiliation":[{"name":"Aneo AS, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6324-6284","authenticated-orcid":false,"given":"Helge","family":"Langseth","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Norwegian University of Science and Technology, Norway and Aneo AI Research, Aneo AS, Norway"}]}],"member":"320","published-online":{"date-parts":[[2023,6,28]]},"reference":[{"key":"e_1_3_2_3_1_1","first-page":"1","article-title":"GluonTS: Probabilistic and Neural Time Series Modeling in Python","volume":"21","author":"Alexandrov Alexander","year":"2020","unstructured":"Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle\u00a0C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali\u00a0Caner T\u00fcrkmen, and Yuyang Wang. 2020. GluonTS: Probabilistic and Neural Time Series Modeling in Python. Journal of Machine Learning Research 21, 116 (2020), 1\u20136. http:\/\/jmlr.org\/papers\/v21\/19-820.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_3_2_1","doi-asserted-by":"publisher","DOI":"10.1038\/533452a"},{"key":"e_1_3_2_3_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448250"},{"key":"e_1_3_2_3_4_1","volume-title":"Online learning and stochastic approximations. On-line learning in neural networks 17, 9","author":"L\u00e9on Bottou","year":"1998","unstructured":"L\u00e9on Bottou 1998. Online learning and stochastic approximations. On-line learning in neural networks 17, 9 (1998), 142."},{"key":"e_1_3_2_3_5_1","first-page":"747","article-title":"Accounting for variance in machine learning benchmarks","volume":"3","author":"Bouthillier Xavier","year":"2021","unstructured":"Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Nazanin Mohammadi\u00a0Sepahvand, Edward Raff, Kanika Madan, Vikram Voleti, 2021. Accounting for variance in machine learning benchmarks. Proceedings of Machine Learning and Systems 3 (2021), 747\u2013769.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_3_6_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.\u00a097)","author":"Bouthillier Xavier","year":"2019","unstructured":"Xavier Bouthillier, C\u00e9sar Laurent, and Pascal Vincent. 2019. Unreproducible Research is Reproducible. In Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.\u00a097). PMLR, 725\u2013734."},{"key":"e_1_3_2_3_7_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U"},{"key":"e_1_3_2_3_8_1","volume-title":"Proceedings of the 28th international conference on machine learning (ICML-11)","author":"Cuturi Marco","year":"2011","unstructured":"Marco Cuturi. 2011. Fast global alignment kernels. In Proceedings of the 28th international conference on machine learning (ICML-11). 929\u2013936."},{"key":"e_1_3_2_3_9_1","unstructured":"Dheeru Dua and Casey Graff. 2019. UCI Machine Learning Repository. http:\/\/archive.ics.uci.edu\/ml"},{"key":"e_1_3_2_3_10_1","volume-title":"Trainable Neural Networks. In 7th International Conference on Learning Representations, ICLR","author":"Frankle Jonathan","year":"2019","unstructured":"Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, 42\u00a0pages. https:\/\/openreview.net\/forum?id=rJl-b3RcF7"},{"key":"e_1_3_2_3_11_1","volume-title":"Deep Learning","author":"Goodfellow Ian","unstructured":"Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http:\/\/www.deeplearningbook.org."},{"key":"e_1_3_2_3_12_1","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v41i3.5318"},{"key":"e_1_3_2_3_13_1","unstructured":"Odd\u00a0Erik Gundersen Kevin Coakley Christine Kirkpatrick and Yolanda Gil. 2023. Sources of Irreproducibility in Machine Learning: A Review. arxiv:2204.07610\u00a0[cs.LG]"},{"key":"e_1_3_2_3_14_1","volume-title":"Do machine learning platforms provide out-of-the-box reproducibility?Future Generation Computer Systems 126","author":"Gundersen Odd\u00a0Erik","year":"2022","unstructured":"Odd\u00a0Erik Gundersen, Saeid Shamsaliei, and Richard\u00a0Juul Isdahl. 2022. Do machine learning platforms provide out-of-the-box reproducibility?Future Generation Computer Systems 126 (2022), 34\u201347."},{"key":"e_1_3_2_3_15_1","doi-asserted-by":"publisher","DOI":"10.6339\/JDS.2005.03(1).181"},{"key":"e_1_3_2_3_16_1","volume-title":"Transparency and reproducibility in artificial intelligence. Nature 586, 7829","author":"Haibe-Kains Benjamin","year":"2020","unstructured":"Benjamin Haibe-Kains, George\u00a0Alexandru Adam, Ahmed Hosny, Farnoosh Khodakarami, Levi Waldron, Bo Wang, Chris McIntosh, Anna Goldenberg, Anshul Kundaje, Casey\u00a0S Greene, 2020. Transparency and reproducibility in artificial intelligence. Nature 586, 7829 (2020), E14\u2013E16."},{"key":"e_1_3_2_3_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_3_18_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"e_1_3_2_3_19_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735\u20131780."},{"key":"e_1_3_2_3_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_3_21_1","volume-title":"Artificial intelligence faces reproducibility crisis. Science","author":"Hutson Matthew","year":"2018","unstructured":"Matthew Hutson. 2018. Artificial intelligence faces reproducibility crisis. Science (2018)."},{"key":"e_1_3_2_3_22_1","volume-title":"Why most published research findings are false. PLoS medicine 2, 8","author":"Ioannidis PA","year":"2005","unstructured":"John\u00a0PA Ioannidis. 2005. Why most published research findings are false. PLoS medicine 2, 8 (2005), e124."},{"key":"e_1_3_2_3_23_1","unstructured":"Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical report CIFAR."},{"key":"e_1_3_2_3_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210006"},{"key":"e_1_3_2_3_25_1","volume-title":"Advances in Neural Information Processing Systems, I.\u00a0Guyon, U.\u00a0V. Luxburg, S.\u00a0Bengio, H.\u00a0Wallach, R.\u00a0Fergus, S.\u00a0Vishwanathan, and R.\u00a0Garnett (Eds.). Vol.\u00a030. Curran Associates","author":"Lakshminarayanan Balaji","year":"2017","unstructured":"Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems, I.\u00a0Guyon, U.\u00a0V. Luxburg, S.\u00a0Bengio, H.\u00a0Wallach, R.\u00a0Fergus, S.\u00a0Vishwanathan, and R.\u00a0Garnett (Eds.). Vol.\u00a030. Curran Associates, Inc.https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf"},{"key":"e_1_3_2_3_26_1","unstructured":"Mario Lucic Karol Kurach Marcin Michalski Sylvain Gelly and Olivier Bousquet. 2018. Are GANs created equal? A large-scale study. In Advances in neural information processing systems. 700\u2013709."},{"key":"e_1_3_2_3_27_1","volume-title":"On the State of the Art of Evaluation in Neural Language Models. In International Conference on Learning Representations.","author":"Melis G\u00e1bor","year":"2018","unstructured":"G\u00e1bor Melis, Chris Dyer, and Phil Blunsom. 2018. On the State of the Art of Evaluation in Neural Language Models. In International Conference on Learning Representations."},{"key":"e_1_3_2_3_28_1","doi-asserted-by":"publisher","DOI":"10.1177\/0022022117744892"},{"key":"e_1_3_2_3_29_1","unstructured":"J.N. Miller and J.C. Miller. 2018. Statistics and Chemometrics for Analytical Chemistry. Pearson Education London England."},{"key":"e_1_3_2_3_30_1","volume-title":"Estimating the reproducibility of psychological science. Science 349, 6251","author":"Collaboration Open Science","year":"2015","unstructured":"Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349, 6251 (2015), aac4716."},{"key":"e_1_3_2_3_31_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=r1ecqn4YwB","author":"Oreshkin N.","year":"2020","unstructured":"Boris\u00a0N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. 2020. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=r1ecqn4YwB"},{"key":"e_1_3_2_3_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2021.116918"},{"key":"e_1_3_2_3_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416545"},{"key":"e_1_3_2_3_34_1","volume-title":"The machine learning reproducibility checklist. URL: https:\/\/www.cs.mcgill.ca\/jpineau\/ReproducibilityChecklist.pdf","author":"Pineau Joelle","year":"2020","unstructured":"Joelle Pineau. 2020. The machine learning reproducibility checklist. URL: https:\/\/www.cs.mcgill.ca\/jpineau\/ReproducibilityChecklist.pdf (2020)."},{"key":"e_1_3_2_3_35_1","volume-title":"Emily Fox, and Hugo Larochelle.","author":"Pineau Joelle","year":"2021","unstructured":"Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivi\u00e8re, Alina Beygelzimer, Florence d\u201a\u00c4\u00f4Alch\u00e9 Buc, Emily Fox, and Hugo Larochelle. 2021. Improving reproducibility in machine learning research: a report from the NeurIPS 2019 reproducibility program. Journal of Machine Learning Research 22 (2021)."},{"key":"e_1_3_2_3_36_1","volume-title":"Believe it or not: how much can we rely on published data on potential drug targets?Nature reviews Drug discovery 10, 9","author":"Prinz Florian","year":"2011","unstructured":"Florian Prinz, Thomas Schlange, and Khusru Asadullah. 2011. Believe it or not: how much can we rely on published data on potential drug targets?Nature reviews Drug discovery 10, 9 (2011), 712\u2013712."},{"key":"e_1_3_2_3_37_1","volume-title":"Advances in Neural Information Processing Systems 31, S.\u00a0Bengio, H.\u00a0Wallach, H.\u00a0Larochelle, K.\u00a0Grauman, N.\u00a0Cesa-Bianchi, and R.\u00a0Garnett (Eds.). Curran Associates","author":"Rangapuram Syama\u00a0Sundar","unstructured":"Syama\u00a0Sundar Rangapuram, Matthias\u00a0W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. 2018. Deep State Space Models for Time Series Forecasting. In Advances in Neural Information Processing Systems 31, S.\u00a0Bengio, H.\u00a0Wallach, H.\u00a0Larochelle, K.\u00a0Grauman, N.\u00a0Cesa-Bianchi, and R.\u00a0Garnett (Eds.). Curran Associates, Inc., 7785\u20137794. http:\/\/papers.nips.cc\/paper\/8004-deep-state-space-models-for-time-series-forecasting.pdf"},{"key":"e_1_3_2_3_38_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1035"},{"key":"e_1_3_2_3_39_1","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems. 6827\u20136837","author":"Salinas David","year":"2019","unstructured":"David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, and Jan Gasthaus. 2019. High-dimensional multivariate forecasting with low-rank Gaussian copula processes. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 6827\u20136837."},{"key":"e_1_3_2_3_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijforecast.2019.07.001"},{"key":"e_1_3_2_3_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_3_42_1","volume-title":"ICLR 2018 Workshop Track.","author":"Sculley David","year":"2018","unstructured":"David Sculley, Jasper Snoek, Alex Wiltschko, and Ali Rahimi. 2018. Winner\u2019s curse? On pace, progress, and empirical rigor. In ICLR 2018 Workshop Track."},{"key":"e_1_3_2_3_43_1","volume-title":"Replicates and repeats?what is the difference and is it significant? A brief discussion of statistics and experimental design. EMBO reports 13, 4","author":"Vaux L","year":"2012","unstructured":"David\u00a0L Vaux, Fiona Fidler, and Geoff Cumming. 2012. Replicates and repeats?what is the difference and is it significant? A brief discussion of statistics and experimental design. EMBO reports 13, 4 (2012), 291\u2013296."},{"key":"e_1_3_2_3_44_1","volume-title":"Article arXiv:1905.12417 (May","author":"Wang Yuyang","year":"2019","unstructured":"Yuyang Wang, Alex Smola, Danielle\u00a0C. Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski. 2019. Deep Factors for Forecasting. arXiv e-prints, Article arXiv:1905.12417 (May 2019), arXiv:1905.12417\u00a0pages. arxiv:1905.12417\u00a0[stat.ML]"},{"key":"e_1_3_2_3_45_1","volume-title":"Advances in Neural Information Processing Systems, H.\u00a0Larochelle, M.\u00a0Ranzato, R.\u00a0Hadsell, M.\u00a0F. Balcan, and H.\u00a0Lin (Eds.). Vol.\u00a033. Curran Associates","author":"Wenzel Florian","year":"2020","unstructured":"Florian Wenzel, Jasper Snoek, Dustin Tran, and Rodolphe Jenatton. 2020. Hyperparameter Ensembles for Robustness and Uncertainty Quantification. In Advances in Neural Information Processing Systems, H.\u00a0Larochelle, M.\u00a0Ranzato, R.\u00a0Hadsell, M.\u00a0F. Balcan, and H.\u00a0Lin (Eds.). Vol.\u00a033. Curran Associates, Inc., 6514\u20136527. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/481fbfa59da2581098e841b7afc122f1-Paper.pdf"},{"key":"e_1_3_2_3_46_1","first-page":"316","article-title":"Randomness in neural network training: Characterizing the impact of tooling","volume":"4","author":"Zhuang Donglin","year":"2022","unstructured":"Donglin Zhuang, Xingyao Zhang, Shuaiwen Song, and Sara Hooker. 2022. Randomness in neural network training: Characterizing the impact of tooling. Proceedings of Machine Learning and Systems 4 (2022), 316\u2013336.","journal-title":"Proceedings of Machine Learning and Systems"}],"event":{"name":"ACM REP '23: 2023 ACM Conference on Reproducibility and Replicability","location":"Santa Cruz CA USA","acronym":"ACM REP '23","sponsor":["EIGREP Emerging Interest Group on Reproducibility and Replicability"]},"container-title":["Proceedings of the 2023 ACM Conference on Reproducibility and Replicability"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589806.3600044","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589806.3600044","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:22Z","timestamp":1750182562000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589806.3600044"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,27]]},"references-count":46,"alternative-id":["10.1145\/3589806.3600044","10.1145\/3589806"],"URL":"https:\/\/doi.org\/10.1145\/3589806.3600044","relation":{},"subject":[],"published":{"date-parts":[[2023,6,27]]},"assertion":[{"value":"2023-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}