{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T07:59:58Z","timestamp":1776931198141,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":22,"publisher":"ACM","license":[{"start":{"date-parts":[[2026,10,21]],"date-time":"2026-10-21T00:00:00Z","timestamp":1792540800000},"content-version":"vor","delay-in-days":449,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["2226408"],"award-info":[{"award-number":["2226408"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,7,29]]},"DOI":"10.1145\/3736731.3746153","type":"proceedings-article","created":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T11:58:19Z","timestamp":1761047899000},"page":"224-228","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Learning from Irreproducibility: Introducing Data Leakage Case Studies for Machine Learning Education"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9897-9282","authenticated-orcid":false,"given":"Fraida","family":"Fund","sequence":"first","affiliation":[{"name":"NYU Tandon School of Engineering, Brooklyn, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8535-8136","authenticated-orcid":false,"given":"Mohamed","family":"Saeed","sequence":"additional","affiliation":[{"name":"Microsoft, Cairo, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8797-4046","authenticated-orcid":false,"given":"Shaivi","family":"Malik","sequence":"additional","affiliation":[{"name":"Guru Gobind Singh Indraprastha University, Delhi, India"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0627-1901","authenticated-orcid":false,"given":"Kyrillos","family":"Ishak","sequence":"additional","affiliation":[{"name":"TU Darmstadt, Darmstadt, Germany"}]}],"member":"320","published-online":{"date-parts":[[2025,10,21]]},"reference":[{"key":"e_1_3_3_2_2_2","doi-asserted-by":"publisher","unstructured":"Kumar Abhishek Aditi Jain and Ghassan Hamarneh. 2025. Investigating the quality of DermaMNIST and Fitzpatrick17k dermatological image datasets. Scientific Data 12 1 (2025) 196. 10.1038\/s41597-025-04382-5","DOI":"10.1038\/s41597-025-04382-5"},{"key":"e_1_3_3_2_3_2","volume-title":"Introduction to machine learning (fourth ed.)","author":"Alpayd\u0131n Ethem","year":"2020","unstructured":"Ethem Alpayd\u0131n. 2020. Introduction to machine learning (fourth ed.). MIT Press."},{"key":"e_1_3_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58799-4_39"},{"key":"e_1_3_3_2_5_2","volume-title":"Pattern recognition and machine learning","author":"Bishop Christopher\u00a0M","year":"2006","unstructured":"Christopher\u00a0M Bishop. 2006. Pattern recognition and machine learning. Springer."},{"key":"e_1_3_3_2_6_2","volume-title":"OpenIntro statistics (4 ed.)","author":"Diez David\u00a0M","year":"2012","unstructured":"David\u00a0M Diez, Christopher\u00a0D Barr, and Mine Cetinkaya-Rundel. 2012. OpenIntro statistics (4 ed.). OpenIntro."},{"key":"e_1_3_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589806.3600033"},{"key":"e_1_3_3_2_8_2","doi-asserted-by":"publisher","unstructured":"Beatriz Garcia Santa Cruz Mat\u00edas\u00a0Nicol\u00e1s Bossa Jan S\u00f6lter and Andreas\u00a0Dominik Husch. 2021. Public Covid-19 X-ray datasets and their impact on model bias \u2013 A systematic review of a significant problem. Medical Image Analysis 74 (2021) 102225. 10.1016\/j.media.2021.102225","DOI":"10.1016\/j.media.2021.102225"},{"key":"e_1_3_3_2_9_2","volume-title":"Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems","author":"G\u00e9ron Aur\u00e9lien","year":"2022","unstructured":"Aur\u00e9lien G\u00e9ron. 2022. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O\u2019Reilly Media, Inc."},{"key":"e_1_3_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7"},{"key":"e_1_3_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-38747-0"},{"key":"e_1_3_3_2_12_2","doi-asserted-by":"publisher","unstructured":"Sayash Kapoor and Arvind Narayanan. 2023. Leakage and the reproducibility crisis in machine-learning-based science. Patterns (Aug. 2023). 10.1016\/j.patter.2023.100804Publisher: Elsevier.","DOI":"10.1016\/j.patter.2023.100804"},{"key":"e_1_3_3_2_13_2","volume-title":"Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC \u201920)","author":"Keahey Kate","year":"2020","unstructured":"Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi\u00a0S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, Fran\u00e7ois Halbach, Alex Rocha, and Joe Stubbs. 2020. Lessons Learned from the Chameleon Testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC \u201920). USENIX Association. https:\/\/www.usenix.org\/conference\/atc20\/presentation\/keahey"},{"key":"e_1_3_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEMCON.2019.8936292"},{"key":"e_1_3_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3736731.3746149"},{"key":"e_1_3_3_2_16_2","volume-title":"Probabilistic machine learning: an introduction","author":"Murphy Kevin\u00a0P","year":"2022","unstructured":"Kevin\u00a0P Murphy. 2022. Probabilistic machine learning: an introduction. MIT Press."},{"key":"e_1_3_3_2_17_2","doi-asserted-by":"publisher","unstructured":"Md\u00a0Mamunur Rahaman Chen Li Yudong Yao Frank Kulwa Mohammad\u00a0Asadur Rahman Qian Wang Shouliang Qi Fanjie Kong Xuemin Zhu and Xin Zhao. 2020. Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches. Journal of X-ray Science and Technology 28 5 (2020) 821\u2013839. 10.3233\/XST-200715","DOI":"10.3233\/XST-200715"},{"key":"e_1_3_3_2_18_2","volume-title":"Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python","author":"Raschka Sebastian","year":"2022","unstructured":"Sebastian Raschka, Yuxi\u00a0Hayden Liu, and Vahid Mirjalili. 2022. Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python. Packt Publishing Ltd."},{"key":"e_1_3_3_2_19_2","doi-asserted-by":"publisher","unstructured":"Michael Roberts Derek Driggs Matthew Thorpe Julian Gilbey Michael Yeung Stephan Ursprung Angelica\u00a0I Aviles-Rivero Christian Etmann Cathal McCague Lucian Beer et\u00a0al. 2021. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence 3 3 (2021) 199\u2013217. 10.1038\/s42256-021-00307-0","DOI":"10.1038\/s42256-021-00307-0"},{"key":"e_1_3_3_2_20_2","doi-asserted-by":"publisher","unstructured":"Philipp Tschandl Cliff Rosendahl and Harald Kittler. 2018. The HAM10000 dataset a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data 5 1 (2018) 1\u20139. 10.1038\/sdata.2018.161","DOI":"10.1038\/sdata.2018.161"},{"key":"e_1_3_3_2_21_2","volume-title":"Python data science handbook: Essential tools for working with data","author":"VanderPlas Jake","year":"2016","unstructured":"Jake VanderPlas. 2016. Python data science handbook: Essential tools for working with data. O\u2019Reilly Media, Inc."},{"key":"e_1_3_3_2_22_2","doi-asserted-by":"publisher","unstructured":"Gilles Vandewiele Isabelle Dehaene Gy\u00f6rgy Kov\u00e1cs Lucas Sterckx Olivier Janssens Femke Ongenae Femke De\u00a0Backere Filip De\u00a0Turck Kristien Roelens Johan Decruyenaere et\u00a0al. 2021. Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling. Artificial Intelligence in Medicine 111 (2021) 101987. 10.1016\/j.artmed.2020.101987","DOI":"10.1016\/j.artmed.2020.101987"},{"key":"e_1_3_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.18260\/p.24575"}],"event":{"name":"ACM REP '25: ACM Conference on Reproducibility and Replicability","location":"Vancouver Canada","acronym":"ACM REP '25","sponsor":["EIGREP Emerging Interest Group on Reproducibility and Replicability"]},"container-title":["Proceedings of the 3rd ACM Conference on Reproducibility and Replicability"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3736731.3746153","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3736731.3746153","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T18:04:03Z","timestamp":1767981843000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3736731.3746153"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,29]]},"references-count":22,"alternative-id":["10.1145\/3736731.3746153","10.1145\/3736731"],"URL":"https:\/\/doi.org\/10.1145\/3736731.3746153","relation":{},"subject":[],"published":{"date-parts":[[2025,7,29]]},"assertion":[{"value":"2025-10-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}