{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T18:52:03Z","timestamp":1777661523764,"version":"3.51.4"},"reference-count":20,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"University of W\u00fcrzburg"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>This study delves into the multifaceted nature of cross-validation (CV) techniques in machine learning model evaluation and selection, underscoring the challenge of choosing the most appropriate method due to the plethora of available variants. It aims to clarify and standardize terminology such as sets, groups, folds, and samples pivotal in the CV domain, and introduces an exhaustive compilation of advanced CV methods like leave-one-out, leave-p-out, Monte Carlo, grouped, stratified, and time-split CV within a hold-out CV framework. Through graphical representations, the paper enhances the comprehension of these methodologies, facilitating more informed decision making for practitioners. It further explores the synergy between different CV strategies and advocates for a unified approach to reporting model performance by consolidating essential metrics. The paper culminates in a comprehensive overview of the CV techniques discussed, illustrated with practical examples, offering valuable insights for both novice and experienced researchers in the field.<\/jats:p>","DOI":"10.3390\/make6020065","type":"journal-article","created":{"date-parts":[[2024,6,21]],"date-time":"2024-06-21T12:27:33Z","timestamp":1718972853000},"page":"1378-1388","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":132,"title":["Cross-Validation Visualized: A Narrative Guide to Advanced Methods"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9051-2004","authenticated-orcid":false,"given":"Johannes","family":"Allgaier","sequence":"first","affiliation":[{"name":"Institute of Medical Data Science, University Hospital W\u00fcrzburg, 97080 W\u00fcrzburg, Germany"},{"name":"Institute of Clinical Epidemiology and Biometry, Julius-Maximilians-Universit\u00e4t W\u00fcrzburg, 97080 W\u00fcrzburg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1522-785X","authenticated-orcid":false,"given":"R\u00fcdiger","family":"Pryss","sequence":"additional","affiliation":[{"name":"Institute of Medical Data Science, University Hospital W\u00fcrzburg, 97080 W\u00fcrzburg, Germany"},{"name":"Institute of Clinical Epidemiology and Biometry, Julius-Maximilians-Universit\u00e4t W\u00fcrzburg, 97080 W\u00fcrzburg, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"ref_1","unstructured":"Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., and Wirth, R. (2000). CRISP-DM 1.0: Step-by-Step Data Mining Guide, SPSS Inc."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning, Springer.","DOI":"10.1007\/978-0-387-84858-7"},{"key":"ref_3","unstructured":"Hart, P.E., Stork, D.G., and Duda, R.O. (2000). Pattern Classification, Wiley."},{"key":"ref_4","first-page":"542","article-title":"Cross-Validation","volume":"1","author":"Berrar","year":"2019","journal-title":"Encycl. Bioinform. Comput. Biol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1021\/ci0342472","article-title":"The problem of overfitting","volume":"44","author":"Hawkins","year":"2004","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"022022","DOI":"10.1088\/1742-6596\/1168\/2\/022022","article-title":"An overview of overfitting and its solutions","volume":"1168","author":"Ying","year":"2019","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"15849","DOI":"10.1073\/pnas.1903070116","article-title":"Reconciling modern machine-learning practice and the classical bias\u2013variance trade-off","volume":"116","author":"Belkin","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"913","DOI":"10.1111\/ecog.02881","article-title":"Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure","volume":"40","author":"Roberts","year":"2017","journal-title":"Ecography"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","article-title":"Cross-Validatory Choice and Assessment of Statistical Predictions","volume":"36","author":"Stone","year":"1974","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1037\/h0072400","article-title":"The shrinkage of the coefficient of multiple correlation","volume":"22","author":"Larson","year":"1931","journal-title":"J. Educ. Psychol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2350","DOI":"10.1016\/j.csda.2007.10.002","article-title":"Nonparametric density estimation by exact leave-p-out cross-validation","volume":"52","author":"Celisse","year":"2008","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0169-7439(00)00122-2","article-title":"Monte Carlo cross validation","volume":"56","author":"Xu","year":"2001","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1080\/01621459.1984.10478083","article-title":"Cross-validation of regression models","volume":"79","author":"Picard","year":"1984","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Dubitzky, W., Granzow, M., and Berrar, D.P. (2007). Fundamentals of Data Mining in Genomics and Proteomics, Springer Science & Business Media.","DOI":"10.1007\/978-0-387-47509-7"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1016\/j.ins.2011.12.028","article-title":"On the use of cross-validation for time series predictor evaluation","volume":"191","author":"Bergmeir","year":"2012","journal-title":"Inf. Sci."},{"key":"ref_16","first-page":"95","article-title":"Model selection and forward validation","volume":"9","author":"Hjorth","year":"1982","journal-title":"Scand. J. Stat."},{"key":"ref_17","unstructured":"Hjorth, J.U. (1993). Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap, CRC Press."},{"key":"ref_18","first-page":"2346","article-title":"Learning under concept drift: A review","volume":"31","author":"Lu","year":"2018","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_19","unstructured":"Baier, L., J\u00f6hren, F., and Seebacher, S. (2019, January 8\u201314). Challenges in the Deployment and Operation of Machine Learning in Practice. Proceedings of the Twenty-Seventh European Conference on Information Systems (ECIS2019), Stockholm-Uppsala, Sweden."},{"key":"ref_20","first-page":"114","article-title":"Challenges in deploying machine learning: A survey of case studies","volume":"55","author":"Paleyes","year":"2022","journal-title":"ACM Comput. Surv."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/2\/65\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:01:50Z","timestamp":1760108510000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/2\/65"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":20,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["make6020065"],"URL":"https:\/\/doi.org\/10.3390\/make6020065","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,20]]}}}