{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T15:36:24Z","timestamp":1777476984317,"version":"3.51.4"},"reference-count":63,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T00:00:00Z","timestamp":1676937600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T00:00:00Z","timestamp":1676937600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000272","name":"DH | National Institute for Health Research","doi-asserted-by":"publisher","award":["NIHR202639"],"award-info":[{"award-number":["NIHR202639"]}],"id":[{"id":"10.13039\/501100000272","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000308","name":"British Council","doi-asserted-by":"publisher","award":["UCL-NMU-SEU International Collaboration"],"award-info":[{"award-number":["UCL-NMU-SEU International Collaboration"]}],"id":[{"id":"10.13039\/501100000308","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000848","name":"University of Edinburgh","doi-asserted-by":"publisher","award":["The Advanced Care Research Centre Programme"],"award-info":[{"award-number":["The Advanced Care Research Centre Programme"]}],"id":[{"id":"10.13039\/501100000848","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000265","name":"RCUK | Medical Research Council","doi-asserted-by":"publisher","award":["National Text Analytics Project"],"award-info":[{"award-number":["National Text Analytics Project"]}],"id":[{"id":"10.13039\/501100000265","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100012338","name":"Alan Turing Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100012338","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    In supervised learning model development, domain experts are often used to provide the class labels (annotations). Annotation inconsistencies commonly occur when even highly experienced clinical experts annotate the same phenomenon (e.g., medical image, diagnostics, or prognostic status), due to inherent expert bias, judgments, and slips, among other factors. While their existence is relatively well-known, the implications of such inconsistencies are largely understudied in real-world settings, when supervised learning is applied on such \u2018noisy\u2019 labelled data. To shed light on these issues, we conducted extensive experiments and analyses on three real-world Intensive Care Unit (ICU) datasets. Specifically, individual models were built from a common dataset, annotated independently by 11 Glasgow Queen Elizabeth University Hospital ICU consultants, and model performance estimates were compared through internal validation (Fleiss\u2019\n                    <jats:italic>\u03ba<\/jats:italic>\n                    \u2009=\u2009\n                    <jats:italic>0<\/jats:italic>\n                    .383 i.e., fair agreement). Further, broad external validation (on both static and time series datasets) of these 11 classifiers was carried out on a HiRID external dataset, where the models\u2019 classifications were found to have low pairwise agreements (average Cohen\u2019s\n                    <jats:italic>\u03ba<\/jats:italic>\n                    \u2009=\u20090.255 i.e., minimal agreement). Moreover, they tend to disagree more on making discharge decisions (Fleiss\u2019\n                    <jats:italic>\u03ba<\/jats:italic>\n                    \u2009=\u2009\n                    <jats:italic>0.174<\/jats:italic>\n                    ) than predicting mortality (Fleiss\u2019\n                    <jats:italic>\u03ba<\/jats:italic>\n                    \u2009=\u2009\n                    <jats:italic>0.267<\/jats:italic>\n                    ). Given these inconsistencies, further analyses were conducted to evaluate the current best practices in obtaining gold-standard models and determining consensus. The results suggest that: (a) there may not always be a \u201csuper expert\u201d in acute clinical settings (using internal and external validation model performances as a proxy); and (b) standard consensus seeking (such as majority vote) consistently leads to suboptimal models. Further analysis, however, suggests that assessing annotation learnability and using only \u2018learnable\u2019 annotated datasets for determining consensus achieves optimal models in most cases.\n                  <\/jats:p>","DOI":"10.1038\/s41746-023-00773-3","type":"journal-article","created":{"date-parts":[[2023,2,24]],"date-time":"2023-02-24T10:43:38Z","timestamp":1677235418000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":78,"title":["The impact of inconsistent human annotations on AI driven clinical decision making"],"prefix":"10.1038","volume":"6","author":[{"given":"Aneeta","family":"Sylolypavan","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7609-4371","authenticated-orcid":false,"given":"Derek","family":"Sleeman","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0213-5668","authenticated-orcid":false,"given":"Honghan","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Malcolm","family":"Sim","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,21]]},"reference":[{"key":"773_CR1","unstructured":"Bootkrajang, J. & Kab\u00e1n, A. Multi-class Classification in the Presence of Labelling Errors. Proceedings of the 2011 European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2011), 345\u2013350 (2011)."},{"key":"773_CR2","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1007\/978-3-319-90503-7_10","volume":"28","author":"F Cabitza","year":"2019","unstructured":"Cabitza, F., Ciucci, D. & Rasoini, R. A Giant with Feet of Clay: On the Validity of the Data that Feed Machine Learning in Medicine. Organ. Digital World 28, 121\u2013136 (2019).","journal-title":"Organ. Digital World"},{"key":"773_CR3","doi-asserted-by":"publisher","unstructured":"Mahato, D., Dudhal, D., Revagade, D. Bhargava, Y. A Method to Detect Inconsistent Annotations in a Medical Document using UMLS. Proceedings of the 11th Forum for Information Retrieval Evaluation. 47\u201351, https:\/\/doi.org\/10.1145\/3368567.3368577 (2019).","DOI":"10.1145\/3368567.3368577"},{"key":"773_CR4","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1016\/j.neucom.2014.10.085","volume":"160","author":"LPF Garcia","year":"2015","unstructured":"Garcia, L. P. F., De Carvalho, A. C. & Lorena, A. C. Effect of label noise in the complexity of classification problems. Neurocomputing 160, 108\u2013119 (2015).","journal-title":"Neurocomputing"},{"key":"773_CR5","doi-asserted-by":"publisher","first-page":"66","DOI":"10.5220\/0008922000660076","volume":"5","author":"D Sleeman","year":"2020","unstructured":"Sleeman, D., Kostadinov, K., Moss, L. & Sim, M. Resolving Differences of Opinion between Medical Experts: A Case Study with the IS-DELPHI System. Proc. 13th Int. Jt. Conf. Biomed. Eng. Syst. Technol. 5, 66\u201376 (2020).","journal-title":"Proc. 13th Int. Jt. Conf. Biomed. Eng. Syst. Technol."},{"key":"773_CR6","first-page":"953","volume":"34","author":"LM Bachmann","year":"2005","unstructured":"Bachmann, L. M. et al. Consequences of different diagnostic \u201cgold standards\u201d in test accuracy research: Carpal Tunnel Syndrome as an example. J. Clin. Epidemiol. 34, 953\u2013955 (2005).","journal-title":"J. Clin. Epidemiol."},{"key":"773_CR7","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/j.artmed.2012.03.001","volume":"55","author":"D Sleeman","year":"2012","unstructured":"Sleeman, D. et al. Detecting and resolving inconsistencies between domain experts\u2019 different perspectives on (classification) tasks. Artif. Intell. Med. 55, 71\u201386 (2012).","journal-title":"Artif. Intell. Med."},{"key":"773_CR8","doi-asserted-by":"publisher","first-page":"843","DOI":"10.1109\/JBHI.2013.2252182","volume":"17","author":"S Rogers","year":"2013","unstructured":"Rogers, S., Sleeman, D. & Kinsella, J. Investigating the disagreement between clinicians\u2019 ratings of patients in ICUs. IEEE J. Biomed. Health Inform. 17, 843\u2013852 (2013).","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"773_CR9","unstructured":"Kahneman, D., Sibony, O., Sunstein, C. R. Noise: A Flaw in Human Judgment. 124\u2013127 (London, William Collins, p. 124\u2013127, First Edition. 2021)."},{"key":"773_CR10","doi-asserted-by":"publisher","first-page":"845","DOI":"10.1109\/TNNLS.2013.2292894","volume":"25","author":"B Fr\u00e9nay","year":"2014","unstructured":"Fr\u00e9nay, B. & Verleysen, M. Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845\u2013869 (2014).","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"773_CR11","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1007\/s10462-004-0751-8","volume":"22","author":"X Zhu","year":"2004","unstructured":"Zhu, X. & Wu, X. Class noise vs. attribute noise: a quantitative study of their impacts. Artif. Intell. Rev. 22, 177\u2013210 (2004).","journal-title":"Artif. Intell. Rev."},{"key":"773_CR12","unstructured":"Fr\u00e9nay, B., Kab\u00e1n, A. A Comprehensive Introduction to Label Noise: Proceedings of the 2014 European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014). Proceedings of the 2014 European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) (2014)."},{"key":"773_CR13","doi-asserted-by":"crossref","unstructured":"Yin, H., Dong, H. The problem of noise in classification: Past, current and future work. 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN), 412\u2013416 (2011).","DOI":"10.1109\/ICCSN.2011.6014597"},{"key":"773_CR14","doi-asserted-by":"crossref","unstructured":"Indrayan, A., Holt, M. P. Concise Encyclopedia of Biostatistics for Medical Professionals. 44 (CRC Press, 2017).","DOI":"10.1201\/9781315372891"},{"key":"773_CR15","doi-asserted-by":"crossref","unstructured":"Sun, D. Q. et al. Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution. Proceedings of the 28th International Conference on Computational Linguistics, 3547\u20133557, (2020).","DOI":"10.18653\/v1\/2020.coling-main.316"},{"key":"773_CR16","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1001\/jama.2017.7797","volume":"318","author":"F Cabitza","year":"2017","unstructured":"Cabitza, F., Rasoini, R. & Gensini, G. F. Unintended Consequences of Machine Learning in Medicine. JAMA 318, 517\u2013518 (2017).","journal-title":"JAMA"},{"key":"773_CR17","doi-asserted-by":"publisher","first-page":"448","DOI":"10.1109\/21.31052","volume":"19","author":"B Fischhoff","year":"1989","unstructured":"Fischhoff, B. Eliciting knowledge for analytical representation. IEEE Trans. Syst., Man, Cybern. 19, 448\u2013461 (1989).","journal-title":"IEEE Trans. Syst., Man, Cybern."},{"key":"773_CR18","doi-asserted-by":"publisher","first-page":"917","DOI":"10.1038\/modpathol.2011.66","volume":"24","author":"RK Jain","year":"2011","unstructured":"Jain, R. K. et al. Atypical ductal hyperplasia: interobserver and intraobserver variability. Mod. Pathol. 24, 917\u2013923 (2011).","journal-title":"Mod. Pathol."},{"key":"773_CR19","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1176\/appi.ajp.2012.12070999","volume":"170","author":"DA Regier","year":"2013","unstructured":"Regier, D. A. et al. DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses. Am. J. Psychiatry 170, 59\u201370 (2013).","journal-title":"Am. J. Psychiatry"},{"key":"773_CR20","doi-asserted-by":"publisher","first-page":"e5","DOI":"10.1192\/bjpo.bp.115.000786","volume":"1","author":"S Lieblich","year":"2015","unstructured":"Lieblich, S. et al. High heterogeneity and low reliability in the diagnosis of major depression will impair the development of new drugs. Br. J. Psychiatry Open 1, e5\u2013e7 (2015).","journal-title":"Br. J. Psychiatry Open"},{"key":"773_CR21","doi-asserted-by":"publisher","first-page":"1661","DOI":"10.1016\/j.clinph.2014.11.008","volume":"126","author":"JJ Halford","year":"2015","unstructured":"Halford, J. J. Inter-rater agreement on identification of electrographic seizures and periodic discharges in ICU EEG recording. Clin. Neurophysiol. 126, 1661\u20131669 (2015).","journal-title":"Clin. Neurophysiol."},{"key":"773_CR22","doi-asserted-by":"publisher","unstructured":"Moor, M., Rieck, B., Horn, M., Jutzeler, C. R., Borgwardt, K. Early Prediction of Sepsis in the ICU Using Machine Learning: A Systematic Review. Sec. Infectious Diseases \u2013 Surveillance, Prevention and Treatment, Front. Med. https:\/\/doi.org\/10.3389\/fmed.2021.607952 (2021).","DOI":"10.3389\/fmed.2021.607952"},{"key":"773_CR23","doi-asserted-by":"publisher","first-page":"481","DOI":"10.2147\/OAEM.S376419","volume":"14","author":"W Zhang","year":"2022","unstructured":"Zhang, W., Wong, L. Y., Liu, J. & Sarkar, S. MONitoring Knockbacks in EmergencY (MONKEY) \u2013 An Audit of Disposition Outcomes in Emergency Patients with Rejected Admission Requests. Open Access Emerg. Med. 14, 481\u2013490 (2022).","journal-title":"Open Access Emerg. Med."},{"key":"773_CR24","unstructured":"Xia, F., Yetisgen-Yildiz, M. Clinical Corpus Annotation: Challenges and Strategies. Proceedings of the third workshop on building and evaluating resources for biomedical text mining (BioTxtM\u20192012) in conjunction with the international conference on language resources and evaluation (LREC) (2012)."},{"key":"773_CR25","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1007\/BF00116251","volume":"1","author":"JR Quinlan","year":"1986","unstructured":"Quinlan, J. R. Induction Of Decision Trees. Mach. Learn. 1, 81\u2013106 (1986).","journal-title":"Mach. Learn."},{"key":"773_CR26","unstructured":"Quinlan, J. R. Learning from noisy data. Proceedings of the Second International Machine Learning Workshop 58\u201364 (1983)."},{"key":"773_CR27","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1007\/s10462-010-9156-z","volume":"33","author":"DF Nettleton","year":"2010","unstructured":"Nettleton, D. F., Orriols-Puig, A. & Fornells, A. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33, 275\u2013306 (2010).","journal-title":"Artif. Intell. Rev."},{"key":"773_CR28","doi-asserted-by":"publisher","unstructured":"Svensson, C. M., Hubler, R., Figge, M. T. Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance. J. Immunol. Res. https:\/\/doi.org\/10.1155\/2015\/573165 (2015).","DOI":"10.1155\/2015\/573165"},{"key":"773_CR29","doi-asserted-by":"crossref","unstructured":"Johnson, M. J. & Khoshgoftaar, M. T. A Survey on Classifying Big Data with Label Noise. J. Data Inform Quality. 14, 1\u201343 (2022).","DOI":"10.1145\/3492546"},{"key":"773_CR30","doi-asserted-by":"publisher","first-page":"101759","DOI":"10.1016\/j.media.2020.101759","volume":"65","author":"D Karimi","year":"2019","unstructured":"Karimi, D., Dou, H., Warfield, S. K. & Gholipour, A. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2019).","journal-title":"Med. Image Anal."},{"key":"773_CR31","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","volume":"20","author":"J Cohen","year":"1960","unstructured":"Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37\u201346 (1960).","journal-title":"Educ. Psychol. Meas."},{"key":"773_CR32","doi-asserted-by":"publisher","first-page":"276","DOI":"10.11613\/BM.2012.031","volume":"22","author":"ML McHugh","year":"2012","unstructured":"McHugh, M. L. Interrater reliability: The kappa statistic. Biochemia Med. 22, 276\u2013282 (2012).","journal-title":"Biochemia Med."},{"key":"773_CR33","doi-asserted-by":"crossref","unstructured":"Fleiss, J. L., Levin, B., Paik, M. C. Statistical methods for rates and proportions. (John Wiley & Sons, Inc., 2003).","DOI":"10.1002\/0471445428"},{"key":"773_CR34","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","volume":"33","author":"JR Landis","year":"1977","unstructured":"Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159\u2013174 (1977).","journal-title":"Biometrics"},{"key":"773_CR35","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L. Random Forests. Mach. Learn. 45, 5\u201332 (2001).","journal-title":"Mach. Learn."},{"key":"773_CR36","unstructured":"Sylolypavan, A. The Impact of Inconsistent Annotations on Machine-Learning Driven Clinical Decision-Making (University College London, 2021)."},{"key":"773_CR37","unstructured":"Raschka, S., Mirjalili, V. Python Machine Learning. (Packt Publishing Ltd, Third Edition. 2019)."},{"key":"773_CR38","doi-asserted-by":"crossref","unstructured":"Sheng, V. S., Provost, F., Ipeirotis, P. G. Get another label? improving data quality and data mining using multiple, noisy labelers. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 614\u2013622, (2008).","DOI":"10.1145\/1401890.1401965"},{"key":"773_CR39","doi-asserted-by":"crossref","unstructured":"Snow, R., O\u2019Connor, B., Jurafsky, D. & Yg, A. Y. Cheap and Fast \u2014 But is it Good? evaluating non-expert annotations for natural language tasks. Proceedings of the 2008 conference on empirical methods in natural language processing (EMNLP 2008). 254\u2013263 (2008).","DOI":"10.3115\/1613715.1613751"},{"key":"773_CR40","doi-asserted-by":"crossref","unstructured":"Yang, H., Mityagin, A., Svore, K. M. & Markov, S. Collecting high quality overlapping labels at low cost. Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2010). 459\u2013466 (2010).","DOI":"10.1145\/1835449.1835526"},{"key":"773_CR41","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1016\/S1386-5056(01)00173-3","volume":"63","author":"DF Nettleton","year":"2001","unstructured":"Nettleton, D. F. & Mu\u00f1iz, J. Processing and representation of meta-data for sleep apnea diagnosis with an artificial intelligence approach. Int. J. Med. Inform. 63, 77\u201389 (2001).","journal-title":"Int. J. Med. Inform."},{"key":"773_CR42","first-page":"2424","volume":"2","author":"P Welinder","year":"2010","unstructured":"Welinder, P., Branson, S., Perona, P. & Belongie, S. The Multidimensional Wisdom of Crowds. Proc. 23rd Int. Conf. Neural Inf. Process. Syst. 2, 2424\u20132432 (2010).","journal-title":"Proc. 23rd Int. Conf. Neural Inf. Process. Syst."},{"key":"773_CR43","unstructured":"Nettleton, D. F. & Hern\u00e1ndez, L. In Proc. Workshop: Intelligent Data Analysis in Medicine and Pharmacology, IDAMAP. 91\u2013102."},{"key":"773_CR44","doi-asserted-by":"crossref","unstructured":"Ferruci, D. et al. Building Watson: An Overview of the DeepQA Project. AI Magazine. 31, 59\u201379 (2010).","DOI":"10.1609\/aimag.v31i3.2303"},{"key":"773_CR45","unstructured":"Craw, S., Sleeman, D. Automating the refinement of knowledge-based systems. Proceedings of ECCAI-90, 167\u2013172 (1990)."},{"key":"773_CR46","unstructured":"Sim, M. The development and application of novel intelligent scoring systems in critical illness (University of Glasgow, 2015)."},{"key":"773_CR47","doi-asserted-by":"publisher","first-page":"707","DOI":"10.1007\/BF01709751","volume":"22","author":"JL Vincent","year":"1996","unstructured":"Vincent, J. L. The SOFA (Sepsis.related Organ Failure Assessment) score to describe organ dysfunction\/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 22, 707\u2013710 (1996).","journal-title":"Intensive Care Med."},{"key":"773_CR48","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1002\/bjs.9736","volume":"102","author":"GS Collins","year":"2015","unstructured":"Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Br. J. Surg. 102, 148\u2013158 (2015).","journal-title":"Br. J. Surg."},{"key":"773_CR49","doi-asserted-by":"publisher","first-page":"1925","DOI":"10.1093\/eurheartj\/ehu207","volume":"35","author":"EW Steyerberg","year":"2014","unstructured":"Steyerberg, E. W. & Vergouwe, Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur. Heart J. 35, 1925\u20131931 (2014).","journal-title":"Eur. Heart J."},{"key":"773_CR50","doi-asserted-by":"publisher","first-page":"1351","DOI":"10.1038\/s41591-020-1037-7","volume":"26","author":"SC Rivera","year":"2020","unstructured":"Rivera, S. C., Liu, X., Chan, A., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. Nat. Med. 26, 1351\u20131363 (2020).","journal-title":"Nat. Med."},{"key":"773_CR51","doi-asserted-by":"publisher","first-page":"323","DOI":"10.2196\/jmir.5870","volume":"18","author":"W Luo","year":"2016","unstructured":"Luo, W. et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J. Med. Internet Res. 18, 323 (2016).","journal-title":"J. Med. Internet Res."},{"key":"773_CR52","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1016\/j.jclinepi.2015.04.005","volume":"69","author":"EW Steyerberg","year":"2016","unstructured":"Steyerberg, E. W. & Harrell, F. E. Jr Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245\u2013247 (2016).","journal-title":"J. Clin. Epidemiol."},{"key":"773_CR53","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1002\/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5","volume":"19","author":"DG Altman","year":"2000","unstructured":"Altman, D. G. & Royston, P. What do we mean by validating a prognostic model? Stat. Med. 19, 453\u2013473 (2000).","journal-title":"Stat. Med."},{"key":"773_CR54","doi-asserted-by":"publisher","first-page":"826","DOI":"10.1016\/S0895-4356(03)00207-5","volume":"56","author":"SE Bleeker","year":"2003","unstructured":"Bleeker, S. E. et al. External validation is necessary in prediction research: A clinical example. J. Clin. Epidemiol. 56, 826\u2013832 (2003).","journal-title":"J. Clin. Epidemiol."},{"key":"773_CR55","doi-asserted-by":"publisher","unstructured":"Collins, G. S. et al. External validation of multivariable prediction models- a systematic review of methodological conduct and reporting. BMC Med. Res. Methodol. 14 https:\/\/doi.org\/10.1186\/1471-2288-14-40. (2014).","DOI":"10.1186\/1471-2288-14-40."},{"key":"773_CR56","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/j.jclinepi.2014.09.007","volume":"68","author":"GC Siontis","year":"2015","unstructured":"Siontis, G. C. et al. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. J. Clin. Epidemiol. 68, 25\u201334 (2015).","journal-title":"J. Clin. Epidemiol."},{"key":"773_CR57","doi-asserted-by":"publisher","unstructured":"Faltys, M. et al. HiRID, a high time-resolution ICU dataset (version 1.1.1). Physio. Net. https:\/\/doi.org\/10.13026\/nkwc-js72 (2021).","DOI":"10.13026\/nkwc-js72"},{"key":"773_CR58","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1161\/01.CIR.101.23.e215","volume":"101","author":"A Goldberger","year":"2000","unstructured":"Goldberger, A. et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101, 215\u2013220 (2000).","journal-title":"Circulation"},{"key":"773_CR59","doi-asserted-by":"publisher","unstructured":"Johnson, A. E. W. et al. MIMIC-III (v.1.4), a freely accessible critical care database. Scientific Data. https:\/\/doi.org\/10.1038\/sdata.2016.35. (2016).","DOI":"10.1038\/sdata.2016.35."},{"key":"773_CR60","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825\u20132830 (2011).","journal-title":"J. Mach. Learn. Res."},{"key":"773_CR61","doi-asserted-by":"crossref","unstructured":"Seabold, S., Perktold, J. Statsmodels: econometric and statistical modeling with python. 9th Python in Science Conference (2010).","DOI":"10.25080\/Majora-92bf1922-011"},{"key":"773_CR62","unstructured":"Perry, T. SimpleDorff - Calculate Krippendorff\u2019s Alpha on a DataFrame, <https:\/\/pypi.org\/project\/simpledorff\/> (2020)."},{"key":"773_CR63","doi-asserted-by":"publisher","unstructured":"Zapf, A., Castell, S., Morawietz, L., Karch, A. Measuring inter-rater reliability for nominal data \u2013 which coefficients and confidence intervals are appropriate? BMC Med. Res. Methodol. 16 https:\/\/doi.org\/10.1186\/s12874-016-0200-9 (2016).","DOI":"10.1186\/s12874-016-0200-9"}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00773-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00773-3","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00773-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,24]],"date-time":"2023-02-24T10:43:54Z","timestamp":1677235434000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00773-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,21]]},"references-count":63,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["773"],"URL":"https:\/\/doi.org\/10.1038\/s41746-023-00773-3","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-1937575\/v1","asserted-by":"object"}]},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,21]]},"assertion":[{"value":"7 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 February 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"The methods were performed in accordance with relevant guidelines and regulations and approved by the University College London Research Ethics Committee. Permission was granted by the data controllers to use the (thoroughly anonymised) QEUH ICU, MIMIC-III and HiRID datasets. No personal data was processed in this study. The consultants who annotated the QEUH datasets were identified using anonymous code names.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}}],"article-number":"26"}}