{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T15:20:01Z","timestamp":1781277601791,"version":"3.54.1"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,3]],"date-time":"2016-10-03T00:00:00Z","timestamp":1475452800000},"content-version":"vor","delay-in-days":845,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context.<\/jats:p><jats:p>Methods: We develop and implement a systematic approach to \u2018cross-study validation\u2019, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation.<\/jats:p><jats:p>Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation.<\/jats:p><jats:p>Availability: The survHD: Survival in High Dimensions package (http:\/\/www.bitbucket.org\/lwaldron\/survhd) will be made available through Bioconductor.<\/jats:p><jats:p>Contact: \u00a0levi.waldron@hunter.cuny.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu279","type":"journal-article","created":{"date-parts":[[2014,6,16]],"date-time":"2014-06-16T21:55:09Z","timestamp":1402955709000},"page":"i105-i112","source":"Crossref","is-referenced-by-count":78,"title":["Cross-study validation for the assessment of prediction algorithms"],"prefix":"10.1093","volume":"30","author":[{"given":"Christoph","family":"Bernau","sequence":"first","affiliation":[{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"},{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Markus","family":"Riester","sequence":"additional","affiliation":[{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"},{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anne-Laure","family":"Boulesteix","sequence":"additional","affiliation":[{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Giovanni","family":"Parmigiani","sequence":"additional","affiliation":[{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"},{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Curtis","family":"Huttenhower","sequence":"additional","affiliation":[{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Levi","family":"Waldron","sequence":"additional","affiliation":[{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lorenzo","family":"Trippa","sequence":"additional","affiliation":[{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"},{"name":"1 Leibniz Supercomputing Center, Garching, 2Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, 3Dana-Farber Cancer Institute, Boston, 4Harvard School of Public Health, Boston, USA and 5City University of New York School of Public Health, Hunter College, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2014,6,11]]},"reference":[{"key":"2023012711074495600_btu279-B1","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1093\/bib\/bbp016","article-title":"Development of biomarker classifiers from high-dimensional data","volume":"10","author":"Baek","year":"2009","journal-title":"Brief. Bioinform."},{"key":"2023012711074495600_btu279-B2","doi-asserted-by":"crossref","first-page":"1186","DOI":"10.1200\/JCO.2007.15.1951","article-title":"Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer","volume":"26","author":"Baggerly","year":"2008","journal-title":"J. Clin. Oncol."},{"key":"2023012711074495600_btu279-B3","doi-asserted-by":"crossref","first-page":"1713","DOI":"10.1002\/sim.2059","article-title":"Generating survival times to simulate Cox proportional hazards models","volume":"24","author":"Bender","year":"2005","journal-title":"Stat. Med."},{"key":"2023012711074495600_btu279-B4","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/1471-2105-9-14","article-title":"Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models","volume":"9","author":"Binder","year":"2008","journal-title":"BMC Bioinform."},{"key":"2023012711074495600_btu279-B5","first-page":"511","article-title":"Semi-supervised methods to predict patient survival from gene expression data","volume":"2","author":"Blair","year":"2004","journal-title":"PLoS Biol."},{"key":"2023012711074495600_btu279-B6","doi-asserted-by":"crossref","first-page":"2664","DOI":"10.1093\/bioinformatics\/btt458","article-title":"On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by smith et al.","volume":"29","author":"Boulesteix","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012711074495600_btu279-B7","doi-asserted-by":"crossref","first-page":"2080","DOI":"10.1093\/bioinformatics\/btm305","article-title":"Predicting survival from microarray data\u2013a comparative study","volume":"23","author":"B\u00f8velstad","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012711074495600_btu279-B8","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1093\/bib\/bbq073","article-title":"An empirical assessment of validation practices for molecular classifiers","volume":"12","author":"Castaldi","year":"2011","journal-title":"Brief. Bioinform."},{"key":"2023012711074495600_btu279-B9","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1016\/j.ccr.2006.10.009","article-title":"Genomic and transcriptional aberrations linked to breast cancer pathophysiologies","volume":"10","author":"Chin","year":"2006","journal-title":"Cancer Cell"},{"key":"2023012711074495600_btu279-B10","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Dem\u0161ar","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"2023012711074495600_btu279-B11","doi-asserted-by":"crossref","first-page":"3207","DOI":"10.1158\/1078-0432.CCR-06-2765","article-title":"Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the transbig multicenter independent validation series","volume":"13","author":"Desmedt","year":"2007","journal-title":"Clin. Cancer Res."},{"key":"2023012711074495600_btu279-B12","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-4541-9","volume-title":"An Introduction to the Bootstrap","author":"Efron","year":"1993"},{"key":"2023012711074495600_btu279-B13","doi-asserted-by":"crossref","first-page":"1665","DOI":"10.1200\/JCO.2005.03.9115","article-title":"Multicenter validation of a gene ExpressionBased prognostic signature in lymph NodeNegative primary breast cancer","volume":"24","author":"Foekens","year":"2006","journal-title":"J. Clin. Oncol."},{"key":"2023012711074495600_btu279-B14","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bat013","article-title":"curatedOvarianData: clinically annotated data for the ovarian cancer transcriptome","author":"Ganzfried","year":"2013","journal-title":"Database"},{"key":"2023012711074495600_btu279-B15","doi-asserted-by":"crossref","first-page":"R80","DOI":"10.1186\/gb-2004-5-10-r80","article-title":"Bioconductor: open software development for computational biology and bioinformatics","volume":"5","author":"Gentleman","year":"2004","journal-title":"Genome Biol."},{"key":"2023012711074495600_btu279-B16","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1002\/bimj.200900028","article-title":"l \u00a01 penalized estimation in the cox proportional hazards model","volume":"52","author":"Goeman","year":"2010","journal-title":"Biometr. J."},{"key":"2023012711074495600_btu279-B17","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1093\/biomet\/92.4.965","article-title":"Concordance probability and discriminatory power in proportional hazards regression","volume":"92","author":"Gnen","year":"2005","journal-title":"Biometrika"},{"key":"2023012711074495600_btu279-B18","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1093\/jnci\/djr545","article-title":"A three-gene model to robustly identify breast cancer molecular subtypes","volume":"104","author":"Haibe-Kains","year":"2012","journal-title":"J. Natl Cancer Inst."},{"key":"2023012711074495600_btu279-B19","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1002\/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4","article-title":"Multivariate prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors","volume":"15","author":"Harrell","year":"1996","journal-title":"Stati. Med."},{"key":"2023012711074495600_btu279-B20","doi-asserted-by":"crossref","first-page":"411","DOI":"10.2307\/2529429","article-title":"A \u2018Super-Population viewpoint\u2019 for finite population sampling","volume":"31","author":"Hartley","year":"1975","journal-title":"Biometrics"},{"key":"2023012711074495600_btu279-B21","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nrg2825","article-title":"Tackling the widespread and critical impact of batch effects in high-throughput data","volume":"11","author":"Leek","year":"2010","journal-title":"Nat. Rev. Genet."},{"key":"2023012711074495600_btu279-B22","doi-asserted-by":"crossref","DOI":"10.17226\/13297","volume-title":"Evolution of Translational Omics: Lessons Learned and the Path Forward","author":"Micheel","year":"2012"},{"key":"2023012711074495600_btu279-B23","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1186\/1471-2105-12-322","article-title":"Strategies for aggregating gene expression data: the collapserows R function","volume":"12","author":"Miller","year":"2011","journal-title":"BMC Bioinform."},{"key":"2023012711074495600_btu279-B24","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1038\/nature03799","article-title":"Genes that mediate breast cancer metastasis to lung","volume":"436","author":"Minn","year":"2005","journal-title":"Nature"},{"key":"2023012711074495600_btu279-B25","doi-asserted-by":"crossref","first-page":"6740","DOI":"10.1073\/pnas.0701138104","article-title":"Lung metastasis genes couple breast tumor size and metastatic spread","volume":"104","author":"Minn","year":"2007","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012711074495600_btu279-B26","doi-asserted-by":"crossref","first-page":"1962","DOI":"10.1001\/jama.1995.03530240072044","article-title":"Meta-analysis of randomized controlled trials: A concern for standards","volume":"274","author":"Moher","year":"1995","journal-title":"JAMA"},{"key":"2023012711074495600_btu279-B27","doi-asserted-by":"crossref","first-page":"3301","DOI":"10.1093\/bioinformatics\/bti499","article-title":"Prediction error estimation: a comparison of resampling methods","volume":"21","author":"Molinaro","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012711074495600_btu279-B28","doi-asserted-by":"crossref","DOI":"10.1093\/jnci\/dju048","article-title":"Risk prediction for Late-Stage ovarian cancer by meta-analysis of 1525 patient samples","author":"Riester","year":"2014","journal-title":"JNCI J Natl Cancer Inst."},{"key":"2023012711074495600_btu279-B29","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1016\/0197-2456(96)00075-X","article-title":"A note on quantifying follow-up in studies of failure time","volume":"17","author":"Schemper","year":"1996","journal-title":"Clinical Trials"},{"key":"2023012711074495600_btu279-B30","doi-asserted-by":"crossref","first-page":"5405","DOI":"10.1158\/0008-5472.CAN-07-5206","article-title":"The humoral immune system has a key prognostic impact in node-negative breast cancer","volume":"68","author":"Schmidt","year":"2008","journal-title":"Cancer Res."},{"key":"2023012711074495600_btu279-B31","doi-asserted-by":"crossref","first-page":"1446","DOI":"10.1093\/jnci\/djp335","article-title":"Use of archived specimens in evaluation of prognostic and predictive biomarkers","volume":"101","author":"Simon","year":"2009","journal-title":"J. Natl Cancer Inst."},{"key":"2023012711074495600_btu279-B32","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1093\/bib\/bbr001","article-title":"Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data","volume":"12","author":"Simon","year":"2011","journal-title":"Brief. Bioinform."},{"key":"2023012711074495600_btu279-B33","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1093\/jnci\/djj052","article-title":"Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis","volume":"98","author":"Sotiriou","year":"2006","journal-title":"J. Natl Cancer Inst."},{"key":"2023012711074495600_btu279-B34","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1093\/jnci\/djq025","article-title":"Gene expression-based prognostic signatures in lung cancer: ready for clinical use? J","volume":"102","author":"Subramanian","year":"2010","journal-title":"Natl Cancer Inst."},{"key":"2023012711074495600_btu279-B35","doi-asserted-by":"crossref","first-page":"4111","DOI":"10.1200\/JCO.2010.28.4273","article-title":"Genomic index of sensitivity to endocrine therapy for breast cancer","volume":"28","author":"Symmans","year":"2010","journal-title":"J. Clin. Oncol."},{"key":"2023012711074495600_btu279-B36","doi-asserted-by":"crossref","first-page":"3204","DOI":"10.1093\/bioinformatics\/btr529","article-title":"inSilicoDb: an R\/Bioconductor package for accessing human affymetrix expert-curated datasets from GEO","volume":"27","author":"Taminau","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012711074495600_btu279-B37","volume-title":"uniCox: Univarate shrinkage prediction in the Cox model","author":"Tibshirani","year":"2009"},{"key":"2023012711074495600_btu279-B38","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1186\/1471-2105-7-91","article-title":"Bias in error estimation when using cross-validation for model selection","volume":"7","author":"Varma","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012711074495600_btu279-B39","doi-asserted-by":"crossref","first-page":"3399","DOI":"10.1093\/bioinformatics\/btr591","article-title":"Optimized application of penalized regression methods to diverse genomic data","volume":"27","author":"Waldron","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012711074495600_btu279-B40","doi-asserted-by":"crossref","DOI":"10.1093\/jnci\/dju049","article-title":"Comparative meta-analysis of prognostic gene signatures for Late-Stage ovarian cancer","author":"Waldron","year":"2014","journal-title":"JNCI J Natl Cancer Inst."},{"key":"2023012711074495600_btu279-B41","article-title":"Mas-o-menos: a simple sign averaging method for discrimination in genomic data analysis","author":"Zhao","year":"2013"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/12\/i105\/48926888\/bioinformatics_30_12_i105.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/12\/i105\/48926888\/bioinformatics_30_12_i105.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T19:36:44Z","timestamp":1716838604000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/12\/i105\/388164"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6,11]]},"references-count":41,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2014,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu279","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,6,15]]},"published":{"date-parts":[[2014,6,11]]}}}