{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T23:35:03Z","timestamp":1775777703284,"version":"3.50.1"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2022,7,12]],"date-time":"2022-07-12T00:00:00Z","timestamp":1657584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"German Federal Ministry of Education and Research"},{"DOI":"10.13039\/501100002347","name":"BMBF","doi-asserted-by":"publisher","award":["01IS18036A"],"award-info":[{"award-number":["01IS18036A"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>In this article, we consider how to evaluate survival distribution predictions with measures of discrimination. This is non-trivial as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons or \u2018C-hacking\u2019. We demonstrate by example how simple it can be to manipulate results and use this to argue for better reporting guidelines and transparency in the literature. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The code used in the final experiment is available at https:\/\/github.com\/RaphaelS1\/distribution_discrimination.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac451","type":"journal-article","created":{"date-parts":[[2022,7,12]],"date-time":"2022-07-12T09:50:03Z","timestamp":1657619403000},"page":"4178-4184","source":"Crossref","is-referenced-by-count":11,"title":["Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9225-4654","authenticated-orcid":false,"given":"Raphael","family":"Sonabend","sequence":"first","affiliation":[{"name":"Department of Computer Science, Technische Universit\u00e4t Kaiserslautern , 67663 Kaiserslautern, Germany"},{"name":"Engineering Department, University of Cambridge , CB2 1PZ Cambridge, UK"},{"name":"MRC Centre for Global Infectious Disease Analysis, Jameel Institute, Imperial College London, School of Public Health , W2 1PG London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5628-8611","authenticated-orcid":false,"given":"Andreas","family":"Bender","sequence":"additional","affiliation":[{"name":"Department of Statistics, LMU Munich , 80539 Bavaria, Germany"}]},{"given":"Sebastian","family":"Vollmer","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Technische Universit\u00e4t Kaiserslautern , 67663 Kaiserslautern, Germany"},{"name":"Data Science and its Application, Deutsches Forschungszentrum f\u00fcr K\u00fcnstliche Intelligenz (DFKI) , 67663 Kaiserslautern, Germany"},{"name":"Mathematics Institute, University of Warwick , CV4 7AL Coventry, UK"}]}],"member":"286","published-online":{"date-parts":[[2022,7,12]]},"reference":[{"key":"2023041408361790400_","doi-asserted-by":"crossref","DOI":"10.1002\/0471249688","volume-title":"Categorical Data Analysis","author":"Agresti","year":"2002"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"14058","DOI":"10.1038\/s41598-021-92944-z","article-title":"A comparison of time to event analysis methods, using weight status and breast cancer as a case study","volume":"11","author":"Aivaliotis","year":"2021","journal-title":"Sci. Rep"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"3927","DOI":"10.1002\/sim.2427","article-title":"A time-dependent discrimination index for survival data","volume":"24","author":"Antolini","year":"2005","journal-title":"Stat. Med"},{"key":"2023041408361790400_"},{"key":"2023041408361790400_","first-page":"1","article-title":"Mlr: machine learning in R","volume":"17","author":"Bischl","year":"2016","journal-title":"J. Mach. Learn. Res"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1093\/biostatistics\/kxy006","article-title":"The c-index is not proper for the evaluation of t-year predicted risks","volume":"20","author":"Blanche","year":"2019","journal-title":"Biostatistics"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1186\/1471-2288-14-40","article-title":"External validation of multivariable prediction models: a systematic review of methodological conduct and reporting","volume":"14","author":"Collins","year":"2014","journal-title":"BMC Me+d. Res. Methodol"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.2517-6161.1972.tb00899.x","article-title":"Regression models and life-tables","volume":"34","author":"Cox","year":"1972","journal-title":"J. R. Stat. Soc. Series B Stat. Methodol"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1200\/CCI.21.00062","article-title":"Implementing a machine learning strategy to predict pathologic response in patients with soft tissue sarcomas treated with neoadjuvant chemotherapy","volume":"5","author":"Cromb\u00e9","year":"2021","journal-title":"JCO Clin. Cancer Inform"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"1317","DOI":"10.21105\/joss.01317","article-title":"Lifelines: survival analysis in python","volume":"4","author":"Davidson-Pilon","year":"2019","journal-title":"JOSS"},{"key":"2023041408361790400_","article-title":"Gaussian processes for survival analysis","author":"Fern\u00e1ndez","year":"2016","journal-title":"Neural Inf. Process. Syst"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"e6257","DOI":"10.7717\/peerj.6257","article-title":"A scalable discrete-time survival model for neural networks","volume":"7","author":"Gensheimer","year":"2019","journal-title":"PeerJ"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1093\/biomet\/92.4.965","article-title":"Concordance probability and discriminatory power in proportional hazards regression","volume":"92","author":"G\u00f6nen","year":"2005","journal-title":"Biometrika"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1016\/j.jjcc.2021.11.006","article-title":"Machine learning-based prediction of 1-year mortality for acute coronary syndrome","volume":"79","author":"Hadanny","year":"2022","journal-title":"J. Cardiol"},{"key":"2023041408361790400_","first-page":"1","article-title":"Effective ways to build and evaluate individual survival distributions","volume":"21","author":"Haider","year":"2020","journal-title":"J. Mach. Learn. Res"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"2543","DOI":"10.1001\/jama.1982.03320430047030","article-title":"Evaluating the yield of medical tests","volume":"247","author":"Harrell","year":"1982","journal-title":"JAMA"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1002\/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4","article-title":"Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors","volume":"15","author":"Harrell","year":"1996","journal-title":"Stat. Med"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"e1002106","DOI":"10.1371\/journal.pbio.1002106","article-title":"The extent and consequences of p-hacking in science","volume":"13","author":"Head","year":"2015","journal-title":"PLoS Biol"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1111\/j.0006-341X.2005.030814.x","article-title":"Survival model predictive accuracy and ROC curves","volume":"61","author":"Heagerty","year":"2005","journal-title":"Biometrics"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1111\/j.0006-341X.2000.00337.x","article-title":"Time-dependent ROC curves for censored survival data and a diagnostic marker","volume":"56","author":"Heagerty","year":"2000","journal-title":"Biometrics"},{"key":"2023041408361790400_","author":"Herrmann","year":"2021"},{"key":"2023041408361790400_","volume-title":"Applied Survival Analysis: Regression Modeling of Time-to-Event Data","author":"Hosmer","year":"2011"},{"key":"2023041408361790400_","author":"Hothorn","year":"2020"},{"key":"2023041408361790400_","first-page":"841","article-title":"Random survival forests","volume":"2","author":"Ishwaran","year":"2008","journal-title":"Ann. Stat"},{"key":"2023041408361790400_","author":"Ishwaran","year":"2022"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"3145","DOI":"10.1007\/s10554-021-02294-0","article-title":"Role of artificial intelligence in cardiovascular risk prediction and outcomes: comparison of machine-learning and conventional statistical approaches for the analysis of carotid ultrasound features and intra-plaque neovascularization","volume":"37","author":"Johri","year":"2021","journal-title":"Int. J. Cardiovasc. Imaging"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1186\/s12874-020-01153-1","article-title":"Survival prediction models since liver transplantation - comparisons between cox models and machine learning techniques","volume":"20","author":"Kantidakis","year":"2020","journal-title":"BMC Med. Res. Methodol"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1002\/sim.4780090503","article-title":"Measures of explained variation for survival data","volume":"9","author":"Korn","year":"1990","journal-title":"Stat. Med"},{"key":"2023041408361790400_","author":"Kvamme","year":"2021"},{"key":"2023041408361790400_","first-page":"1","article-title":"Time-to-event prediction with neural networks and cox regression","volume":"20","author":"Kvamme","year":"2019","journal-title":"J. Mach. Learn. Res"},{"key":"2023041408361790400_","author":"Lee","year":"2018"},{"key":"2023041408361790400_","author":"Loureiro","year":"2021"},{"key":"2023041408361790400_","first-page":"3863","article-title":"Mantel-Haenszel analyses of litter-matched time-to-Response data, with modifications for recovery of interlitter information","volume":"37","author":"Mantel","year":"1977","journal-title":"Cancer Res"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"e84483","DOI":"10.1371\/journal.pone.0084483","article-title":"Boosting the concordance index for survival data\u2014a unified framework to derive and evaluate biomarker combinations","volume":"9","author":"Mayr","year":"2014","journal-title":"PLoS One"},{"key":"2023041408361790400_","author":"Mogensen","year":"2012"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"1017","DOI":"10.1186\/s12885-020-07492-y","article-title":"Improved personalized survival prediction of patients with diffuse large B-cell lymphoma using gene expression profiling","volume":"20","author":"Mosquera Orgueira","year":"2020","journal-title":"BMC Cancer"},{"key":"2023041408361790400_","first-page":"1","article-title":"Scikit-survival: a library for time-to-event analysis built on top of scikit-learn","volume":"21","author":"P\u00f6lsterl","year":"2020","journal-title":"J. Mach. Learn. Res"},{"key":"2023041408361790400_","author":"Potapov","year":"2012"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12874-017-0336-2","article-title":"Review and evaluation of performance measures for survival prediction models in external validation settings","volume":"17","author":"Rahman","year":"2017","journal-title":"BMC Med. Res. Methodol"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1002\/(SICI)1097-0258(20000229)19:4<541::AID-SIM355>3.0.CO;2-V","article-title":"On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology","volume":"19","author":"Schwarzer","year":"2000","journal-title":"Stat. Med"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"2789","DOI":"10.1093\/bioinformatics\/btab039","article-title":"mlr3proba: an R package for machine learning in survival analysis","volume":"37","author":"Sonabend","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"20410","DOI":"10.1038\/s41598-020-77220-w","article-title":"A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction","volume":"10","author":"Spooner","year":"2020","journal-title":"Sci. Rep"},{"key":"2023041408361790400_","author":"Therneau","year":"2022"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1002\/sim.4154","article-title":"On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data","volume":"30","author":"Uno","year":"2011","journal-title":"Stat. Med"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.artmed.2011.06.006","article-title":"Support vector methods for survival analysis: a comparison between ranking and regression approaches","volume":"53","author":"Van Belle","year":"2011","journal-title":"Artif. Intell. Med"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"3401","DOI":"10.1002\/1097-0258(20001230)19:24<3401::AID-SIM554>3.0.CO;2-2","article-title":"Validation, calibration, revision and combination of prognostic survival models","volume":"19","author":"Van Houwelingen","year":"2000","journal-title":"Statist. Med"},{"key":"2023041408361790400_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v077.i01","article-title":"Ranger: a fast implementation of random forests for high dimensional data in C++ and R","volume":"77","author":"Wright","year":"2017","journal-title":"J. Stat. Soft"},{"key":"2023041408361790400_","author":"Zhang","year":"2021"},{"key":"2023041408361790400_","author":"Zhao","year":"2020"},{"key":"2023041408361790400_","author":"Zhong","year":"2019"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac451\/45052366\/btac451.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/17\/4178\/49889652\/btac451.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/17\/4178\/49889652\/btac451.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,28]],"date-time":"2024-09-28T23:17:20Z","timestamp":1727565440000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/17\/4178\/6640155"}},"subtitle":[],"editor":[{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,7,12]]},"references-count":50,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2022,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac451","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9,1]]},"published":{"date-parts":[[2022,7,12]]}}}