{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T21:07:19Z","timestamp":1762981639581,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":59,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T00:00:00Z","timestamp":1687824000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,27]]},"DOI":"10.1145\/3589806.3600042","type":"proceedings-article","created":{"date-parts":[[2023,6,28]],"date-time":"2023-06-28T20:09:22Z","timestamp":1687982962000},"page":"115-120","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["A Siren Song of Open Source Reproducibility, Examples from Machine Learning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9900-1972","authenticated-orcid":false,"given":"Edward","family":"Raff","sequence":"first","affiliation":[{"name":"Booz Allen Hamilton, USA and Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2023-9636","authenticated-orcid":false,"given":"Andrew L.","family":"Farris","sequence":"additional","affiliation":[{"name":"Booz Allen Hamilton, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,6,28]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467v2 (March","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg\u00a0S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467v2 (March 2016), 19. http:\/\/arxiv.org\/abs\/1603.04467 arXiv:1603.04467."},{"key":"e_1_3_2_1_2_1","unstructured":"Kwangjun Ahn Prateek Jain Ziwei Ji Satyen Kale Praneeth Netrapalli and Gil\u00a0I. Shamir. 2022. Reproducibility in Optimization: Theoretical Framework and Limits. (2022) 1\u201351. http:\/\/arxiv.org\/abs\/2202.04598 arXiv:2202.04598."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976699300016007"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3390\/jimaging6060041"},{"key":"e_1_3_2_1_5_1","volume-title":"Should We Really Use Post-Hoc Tests Based on Mean-Ranks?Journal of Machine Learning Research 17, 5","author":"Benavoli Alessio","year":"2016","unstructured":"Alessio Benavoli, Giorgio Corani, and Francesca Mangili. 2016. Should We Really Use Post-Hoc Tests Based on Mean-Ranks?Journal of Machine Learning Research 17, 5 (2016), 1\u201310. http:\/\/jmlr.org\/papers\/v17\/benavoli16a.html"},{"key":"e_1_3_2_1_6_1","unstructured":"Siddharth Bhat. 2019. Everything you know about word2vec is wrong. https:\/\/bollu.github.io\/everything-you-know-about-word2vec-is-wrong.html"},{"key":"e_1_3_2_1_7_1","unstructured":"Xavier Bouthillier Pierre Delaunay Mirko Bronzi Assya Trofimov Brennan Nichyporuk Justin Szeto Naz Sepah Edward Raff Kanika Madan Vikram Voleti Samira\u00a0Ebrahimi Kahou Vincent Michalski Dmitriy Serdyuk Tal Arbel Chris Pal Ga\u00ebl Varoquaux and Pascal Vincent. 2021. Accounting for Variance in Machine Learning Benchmarks. In Machine Learning and Systems (MLSys). http:\/\/arxiv.org\/abs\/2103.03098 arXiv:2103.03098."},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning","author":"Bouthillier Xavier","year":"2019","unstructured":"Xavier Bouthillier, C\u00e9sar Laurent, and Pascal Vincent. 2019. Unreproducible Research is Reproducible. In Proceedings of the 36th International Conference on Machine Learning, Vol.\u00a097. PMLR, 725\u2013734. http:\/\/proceedings.mlr.press\/v97\/bouthillier19a.html Series Title: Proceedings of Machine Learning Research."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(96)00142-2"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2007.77"},{"key":"e_1_3_2_1_11_1","volume-title":"On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. Journal of Machine Learning Research 11 (Aug","author":"Cawley C","year":"2010","unstructured":"Gavin\u00a0C Cawley and Nicola L\u00a0C Talbot. 2010. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. Journal of Machine Learning Research 11 (Aug. 2010), 2079\u20132107. http:\/\/dl.acm.org\/citation.cfm?id=1756006.1859921 Publisher: JMLR.org."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1190\/1.1822162"},{"key":"e_1_3_2_1_13_1","volume-title":"Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7 (Dec","author":"Dem\u0161ar Janez","year":"2006","unstructured":"Janez Dem\u0161ar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7 (Dec. 2006), 1\u201330. http:\/\/dl.acm.org\/citation.cfm?id=1248547.1248548 Publisher: JMLR.org."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2347736.2347755"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th ICML, Montreal, Canada,2009. Series Title: Evaluation Methods for Machine Learning Workshop, the 26th ICML","author":"Drummond Chris","year":"2009","unstructured":"Chris Drummond. 2009. Replicability is not reproducibility: nor is it good science. In Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th ICML, Montreal, Canada,2009. Series Title: Evaluation Methods for Machine Learning Workshop, the 26th ICML, June 14-18, 2009, Montreal, Canada."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/3524938.3525212"},{"key":"e_1_3_2_1_19_1","volume-title":"Reproducibility in ML Workshop, ICML\u201918","author":"Forde Jessica","year":"2018","unstructured":"Jessica Forde, Tim Head, Chris Holdgraf, Yuvi Panda, Fernando Perez, Gladys Nalvarte, Benjamin Ragan-kelley, and Erik Sundell. 2018. Reproducible Research Environments with repo2docker. In Reproducibility in ML Workshop, ICML\u201918."},{"key":"e_1_3_2_1_20_1","unstructured":"Jessica\u00a0Zosa Forde Matthias Bussonnier F\u00e9lix-Antoine Fortin Brian\u00a0E Granger Timothy\u00a0Daniel Head Chris Holdgraf Paul Ivanov Kyle Kelley Michael\u00a0D Pacer Yuvi Panda Fernando P\u00e9rez Gladys Nalvarte Benjamin Ragan-Kelley Zachary\u00a0R Sailer Steven Silvester Erik Sundell and Carol Willing. 2018. Reproducing Machine Learning Research on Binder. In Machine Learning Open Source Software 2018: Sustainable communities."},{"key":"e_1_3_2_1_21_1","volume-title":"Reproducibility in ML Workshop, ICML\u201918","author":"Gardner Josh","year":"2018","unstructured":"Josh Gardner, Christopher Brooks, and Ryan\u00a0S Baker. 2018. Enabling End-To-End Machine Learning Replicability : A Case Study in Educational Data Mining. In Reproducibility in ML Workshop, ICML\u201918."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458723"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/qss_a_00144"},{"key":"e_1_3_2_1_24_1","volume-title":"word2vec Explained: Deriving Mikolov et al.\u2019s Negative-Sampling Word-Embedding Method. arXiv preprint arXiv:1402.3722","author":"Goldberg Yoav","year":"2014","unstructured":"Yoav Goldberg and Omer Levy. 2014. word2vec Explained: Deriving Mikolov et al.\u2019s Negative-Sampling Word-Embedding Method. arXiv preprint arXiv:1402.3722 (2014). http:\/\/arxiv.org\/abs\/1402.3722 arXiv:1402.3722v1."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-011-9181-9"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2009.36"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-020-2649-2"},{"key":"e_1_3_2_1_28_1","volume-title":"The quality and reliability of scientific software. Transactions on Information and Communications Technologies 4","author":"Hatton L.","year":"1993","unstructured":"L. Hatton. 1993. The quality and reliability of scientific software. Transactions on Information and Communications Technologies 4 (1993)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/99.609829"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","unstructured":"L. Hatton and A. Roberts. 1994. How accurate is scientific software?IEEE Transactions on Software Engineering 20 10 (1994) 785\u2013797. https:\/\/doi.org\/10.1109\/32.328993","DOI":"10.1109\/32.328993"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.58906"},{"key":"e_1_3_2_1_32_1","volume-title":"Adam: A Method for Stochastic Optimization. In International Conference On Learning Representations. arXiv:1412","author":"Kingma P","year":"2015","unstructured":"Diederik\u00a0P Kingma and Jimmy\u00a0Lei Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference On Learning Representations. arXiv:1412.6980v2."},{"volume-title":"Positioning and Power in Academic Publishing: Players","author":"Kluyver Thomas","key":"e_1_3_2_1_33_1","unstructured":"Thomas Kluyver, Benjamin Ragan-Kelley, Fernando P\u00e9rez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Dami\u00e1n Avila, Safia Abdalla, Carol Willing, and Jupyter development team. 2016. Jupyter Notebooks - a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas, Fernando Loizides and Birgit Scmidt (Eds.). IOS Press, 87\u201390. https:\/\/eprints.soton.ac.uk\/403913\/"},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the 31st International Conference on Machine Learning, Eric\u00a0P Xing and Tony Jebara (Eds.).","author":"Le Quoc","year":"2014","unstructured":"Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on Machine Learning, Eric\u00a0P Xing and Tony Jebara (Eds.). Vol.\u00a032. PMLR, Bejing, China, 1188\u20131196. https:\/\/proceedings.mlr.press\/v32\/le14.html Series Title: Proceedings of Machine Learning Research Issue: 2."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/0164-1212(87)90032-X"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3317287.3328534"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01589116"},{"key":"e_1_3_2_1_38_1","volume-title":"Decoupled Weight Decay Regularization. In International Conference on Learning Representations (ICLR). https:\/\/github.com\/loshchil\/AdamW-and-SGDW","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations (ICLR). https:\/\/github.com\/loshchil\/AdamW-and-SGDW"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244303322533223"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3287560.3287596"},{"key":"e_1_3_2_1_41_1","volume-title":"Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints. In IJCAI-21 1st International Workshop on Adaptive Cyber Defense. http:\/\/arxiv.org\/abs\/2108","author":"Nguyen T.","year":"2021","unstructured":"Andre\u00a0T. Nguyen, Edward Raff, Charles Nicholas, and James Holt. 2021. Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints. In IJCAI-21 1st International Workshop on Adaptive Cyber Defense. http:\/\/arxiv.org\/abs\/2108.04081 arXiv:2108.04081."},{"key":"e_1_3_2_1_42_1","volume-title":"Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021)","author":"Northcutt G.","year":"2021","unstructured":"Curtis\u00a0G. Northcutt, Anish Athalye, and Jonas Mueller. 2021. Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. http:\/\/arxiv.org\/abs\/2103.14749 arXiv:2103.14749."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/FIE.2003.1263332"},{"key":"e_1_3_2_1_44_1","volume-title":"dagger: A Python Framework for Reproducible Machine Learning Experiment Orchestration. arXiv","author":"Paganini Michela","year":"2020","unstructured":"Michela Paganini and Jessica\u00a0Zosa Forde. 2020. dagger: A Python Framework for Reproducible Machine Learning Experiment Orchestration. arXiv (2020). http:\/\/arxiv.org\/abs\/2006.07484 arXiv:2006.07484."},{"volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","key":"e_1_3_2_1_45_1","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H\u00a0Wallach, H\u00a0Larochelle, A\u00a0Beygelzimer, F\u00a0d\\textquotesingle Alch\u00e9-Buc, E\u00a0Fox, and R\u00a0Garnett (Eds.). Curran Associates, Inc., 8024\u20138035."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.3102\/00346543058003303"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416545"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.3389\/fninf.2017.00076"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2018.11.030"},{"key":"e_1_3_2_1_50_1","unstructured":"Edward Raff. 2019. A Step Toward Quantifying Independently Reproducible Machine Learning Research. In NeurIPS. http:\/\/arxiv.org\/abs\/1909.06674 arXiv:1909.06674."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i1.16124"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2204.03829"},{"key":"e_1_3_2_1_53_1","volume-title":"Do CIFAR-10 Classifiers Generalize to CIFAR-10?arXiv","author":"Recht Benjamin","year":"2018","unstructured":"Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. 2018. Do CIFAR-10 Classifiers Generalize to CIFAR-10?arXiv (2018), 1\u201325. http:\/\/arxiv.org\/abs\/1806.00451 arXiv:1806.00451."},{"key":"e_1_3_2_1_54_1","volume-title":"Do ImageNet Classifiers Generalize to ImageNet?arXiv (Feb","author":"Recht Benjamin","year":"2019","unstructured":"Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. 2019. Do ImageNet Classifiers Generalize to ImageNet?arXiv (Feb. 2019). http:\/\/arxiv.org\/abs\/1902.10811 arXiv:1902.10811."},{"key":"e_1_3_2_1_56_1","volume-title":"An Open Review of OpenReview: A Critical Analysis of the Machine Learning Conference Review Process. arXiv (2020","author":"Tran David","year":"2020","unstructured":"David Tran, Alex Valtchanov, Keshav Ganapathy, Raymond Feng, Eric Slud, Micah Goldblum, and Tom Goldstein. 2020. An Open Review of OpenReview: A Critical Analysis of the Machine Learning Conference Review Process. arXiv (2020). http:\/\/arxiv.org\/abs\/2010.05137 arXiv:2010.05137."},{"key":"e_1_3_2_1_57_1","unstructured":"Wei Yuan and Kai-xin Gao. [n. d.]. EAdam Optimizer: How $\\epsilon$ Impact Adam. arXiv ([n. d.]). arXiv:2011.02150v1."},{"key":"e_1_3_2_1_58_1","first-page":"39","article-title":"Accelerating the Machine Learning Lifecycle with MLflow","volume":"41","author":"Zaharia A","year":"2018","unstructured":"Matei\u00a0A Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue\u00a0Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, and Corey Zumar. 2018. Accelerating the Machine Learning Lifecycle with MLflow. IEEE Data Eng. Bull. 41 (2018), 39\u201345.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_3_2_1_59_1","volume-title":"Randomness In Neural Network Training: Characterizing The Impact of Tooling. arXiv","author":"Zhuang Donglin","year":"2021","unstructured":"Donglin Zhuang, Xingyao Zhang, Shuaiwen\u00a0Leon Song, and Sara Hooker. 2021. Randomness In Neural Network Training: Characterizing The Impact of Tooling. arXiv (2021). http:\/\/arxiv.org\/abs\/2106.11872 arXiv:2106.11872."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"}],"event":{"name":"ACM REP '23: 2023 ACM Conference on Reproducibility and Replicability","sponsor":["EIGREP Emerging Interest Group on Reproducibility and Replicability"],"location":"Santa Cruz CA USA","acronym":"ACM REP '23"},"container-title":["Proceedings of the 2023 ACM Conference on Reproducibility and Replicability"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589806.3600042","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589806.3600042","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:22Z","timestamp":1750182562000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589806.3600042"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,27]]},"references-count":59,"alternative-id":["10.1145\/3589806.3600042","10.1145\/3589806"],"URL":"https:\/\/doi.org\/10.1145\/3589806.3600042","relation":{},"subject":[],"published":{"date-parts":[[2023,6,27]]},"assertion":[{"value":"2023-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}