{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:19:44Z","timestamp":1778080784374,"version":"3.51.4"},"reference-count":64,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2023,3,2]],"date-time":"2023-03-02T00:00:00Z","timestamp":1677715200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,2]],"date-time":"2023-03-02T00:00:00Z","timestamp":1677715200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Helmholtz Imaging"},{"name":"National Center for Tumor Diseases"},{"name":"Helmholtz Imaging,National Center for Tumor Diseases"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Mach Intell"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Biomedical image analysis algorithm validation depends on high-quality annotation of reference datasets, for which labelling instructions are key. Despite their importance, their optimization remains largely unexplored. Here we present a systematic study of labelling instructions and their impact on annotation quality in the field. Through comprehensive examination of professional practice and international competitions registered at the Medical Image Computing and Computer Assisted Intervention Society, the largest international society in the biomedical imaging field, we uncovered a discrepancy between annotators\u2019 needs for labelling instructions and their current quality and availability. On the basis of an analysis of 14,040 images annotated by 156 annotators from four professional annotation companies and 708 Amazon Mechanical Turk crowdworkers using instructions with different information density levels, we further found that including exemplary images substantially boosts annotation performance compared with text-only descriptions, while solely extending text descriptions does not. Finally, professional annotators constantly outperform Amazon Mechanical Turk crowdworkers. Our study raises awareness for the need of quality standards in biomedical image analysis labelling instructions.<\/jats:p>","DOI":"10.1038\/s42256-023-00625-5","type":"journal-article","created":{"date-parts":[[2023,3,2]],"date-time":"2023-03-02T17:03:13Z","timestamp":1677776593000},"page":"273-283","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":73,"title":["Labelling instructions matter in biomedical image analysis"],"prefix":"10.1038","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3518-0315","authenticated-orcid":false,"given":"Tim","family":"R\u00e4dsch","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4363-1876","authenticated-orcid":false,"given":"Annika","family":"Reinke","sequence":"additional","affiliation":[]},{"given":"Vivienn","family":"Weru","sequence":"additional","affiliation":[]},{"given":"Minu D.","family":"Tizabi","sequence":"additional","affiliation":[]},{"given":"Nicholas","family":"Schreck","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9328-8140","authenticated-orcid":false,"given":"A. Emre","family":"Kavur","sequence":"additional","affiliation":[]},{"given":"B\u00fcnyamin","family":"Pekdemir","sequence":"additional","affiliation":[]},{"given":"Tobias","family":"Ro\u00df","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1810-0267","authenticated-orcid":false,"given":"Annette","family":"Kopp-Schneider","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4910-9368","authenticated-orcid":false,"given":"Lena","family":"Maier-Hein","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,3,2]]},"reference":[{"key":"625_CR1","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1038\/s41746-020-00324-0","volume":"3","author":"S Benjamens","year":"2020","unstructured":"Benjamens, S., Dhunnoo, P. & Mesk\u00f3, B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit. Med. 3, 118 (2020).","journal-title":"NPJ Digit. Med."},{"key":"625_CR2","doi-asserted-by":"publisher","first-page":"929","DOI":"10.1038\/s42256-021-00399-8","volume":"3","author":"R Shad","year":"2021","unstructured":"Shad, R., Cunningham, J. P., Ashley, E. A., Langlotz, C. P. & Hiesinger, W. Designing clinically translatable artificial intelligence systems for high-dimensional medical imaging. Nat. Mach. Intell. 3, 929\u2013935 (2021).","journal-title":"Nat. Mach. Intell."},{"key":"625_CR3","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1038\/s42256-020-0181-6","volume":"2","author":"N Peiffer-Smadja","year":"2020","unstructured":"Peiffer-Smadja, N. et al. Machine learning for COVID-19 needs global collaboration and data-sharing. Nat. Mach. Intell. 2, 293\u2013294 (2020).","journal-title":"Nat. Mach. Intell."},{"key":"625_CR4","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1038\/s42256-020-0185-2","volume":"2","author":"Y Hu","year":"2020","unstructured":"Hu, Y. et al. The challenges of deploying artificial intelligence models in a rapidly evolving pandemic. Nat. Mach. Intell. 2, 298\u2013300 (2020).","journal-title":"Nat. Mach. Intell."},{"key":"625_CR5","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1148\/radiol.2020192224","volume":"295","author":"MJ Willemink","year":"2020","unstructured":"Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology 295, 4\u201315 (2020).","journal-title":"Radiology"},{"key":"625_CR6","unstructured":"Northcutt, C. G., Athalye, A. & Mueller, J. Pervasive label errors in test sets destabilize machine learning benchmarks. In Proc. 35th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks (eds Vanschoren, J. & Yeung, S.) (NeurIPS, 2021)."},{"key":"625_CR7","doi-asserted-by":"crossref","unstructured":"R\u00e4dsch, T. et al. What your radiologist might be missing: using machine learning to identify mislabeled instances of X-ray images. In Proc. 54th Hawaii International Conference on System Sciences (HICSS) (ed. Bui, T. X.) (HICSS, 2021).","DOI":"10.24251\/HICSS.2021.157"},{"key":"625_CR8","doi-asserted-by":"publisher","first-page":"100336","DOI":"10.1016\/j.patter.2021.100336","volume":"2","author":"A Paullada","year":"2021","unstructured":"Paullada, A., Raji, I. D., Bender, E. M., Denton, E. & Hanna, A. Data and its (dis)contents: a survey of dataset development and use in machine learning research. Patterns 2, 100336 (2021).","journal-title":"Patterns"},{"key":"625_CR9","unstructured":"Peng, K., Mathur, A. & Narayanan, A. Mitigating dataset harms requires stewardship: lessons from 1000 papers. In Proc. 35th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks (eds Vanschoren, J. & Yeung, S.) (NeurIPS, 2021)."},{"key":"625_CR10","doi-asserted-by":"crossref","unstructured":"The rise and fall (and rise) of datasets. Nat. Mach. Intell. 4, 1\u20132 (2022).","DOI":"10.1038\/s42256-022-00442-2"},{"key":"625_CR11","doi-asserted-by":"publisher","first-page":"102306","DOI":"10.1016\/j.media.2021.102306","volume":"76","author":"L Maier-Hein","year":"2022","unstructured":"Maier-Hein, L. et al. Surgical data science\u2014from concepts toward clinical translation. Med. Image Anal. 76, 102306 (2022).","journal-title":"Med. Image Anal."},{"key":"625_CR12","doi-asserted-by":"publisher","first-page":"1391","DOI":"10.1007\/s00330-018-5695-5","volume":"29","author":"L Joskowicz","year":"2019","unstructured":"Joskowicz, L., Cohen, D., Caplan, N. & Sosna, J. Inter-observer variability of manual contour delineation of structures in CT. Eur. Radiol. 29, 1391\u20131399 (2019).","journal-title":"Eur. Radiol."},{"key":"625_CR13","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1609\/hcomp.v9i1.18940","volume":"9","author":"B Freeman","year":"2021","unstructured":"Freeman, B. et al. Iterative quality control strategies for expert medical image labeling. Proc. AAAI Conference on Human Computation and Crowdsourcing 9, 60\u201371 (2021).","journal-title":"Proc. AAAI Conference on Human Computation and Crowdsourcing"},{"key":"625_CR14","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1007\/s10278-017-9976-3","volume":"30","author":"MD Kohli","year":"2017","unstructured":"Kohli, M. D., Summers, R. M. & Geis, J. R. Medical image data and datasets in the era of machine learning\u2014whitepaper from the 2016 C-MIMI meeting dataset session. J. Digit. Imaging 30, 392\u2013399 (2017).","journal-title":"J. Digit. Imaging"},{"key":"625_CR15","doi-asserted-by":"publisher","first-page":"102195","DOI":"10.1016\/j.artmed.2021.102195","volume":"121","author":"A Balagopal","year":"2021","unstructured":"Balagopal, A. et al. PSA-Net: deep learning-based physician style-aware segmentation network for postoperative prostate cancer clinical target volumes. Artif. Intell. Med. 121, 102195 (2021).","journal-title":"Artif. Intell. Med."},{"key":"625_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.15346\/hc.v7i1.1","volume":"7","author":"SN \u00d8rting","year":"2020","unstructured":"\u00d8rting, S. N. et al. A survey of crowdsourcing in medical image analysis. Hum. Comput. 7, 1\u201326 (2020).","journal-title":"Hum. Comput."},{"key":"625_CR17","doi-asserted-by":"publisher","first-page":"e187","DOI":"10.2196\/jmir.9330","volume":"20","author":"P Cr\u00e9quit","year":"2018","unstructured":"Cr\u00e9quit, P., Mansouri, G., Benchoufi, M., Vivot, A. & Ravaud, P. Mapping of crowdsourcing in health: systematic review. J. Med. Internet Res. 20, e187 (2018).","journal-title":"J. Med. Internet Res."},{"key":"625_CR18","unstructured":"Amazon Mechanical Turk (Amazon Mechanical Turk, 2022); https:\/\/www.mturk.com\/"},{"key":"625_CR19","doi-asserted-by":"crossref","unstructured":"Budd, S. et al. in Domain Adaptation and Representation Transfer, and Affordable Healthcare and AI for Resource Diverse Global Health (eds Albarqouni, S. et al.) 251\u2013262 (Springer, 2021).","DOI":"10.1007\/978-3-030-87722-4_23"},{"key":"625_CR20","doi-asserted-by":"publisher","first-page":"034002","DOI":"10.1117\/1.JMI.5.3.034002","volume":"5","author":"E Heim","year":"2018","unstructured":"Heim, E. et al. Large-scale medical image annotation with crowd-powered algorithms. J. Med. Imaging 5, 034002 (2018).","journal-title":"J. Med. Imaging"},{"key":"625_CR21","doi-asserted-by":"crossref","unstructured":"Cheplygina, V., Perez-Rovira, A., Kuo, W., Tiddens, H. A. W. M. & de Bruijne, M. in Deep Learning and Data Labeling for Medical Applications (Carneiro, G. et al.) 209\u2013218 (Springer, 2016).","DOI":"10.1007\/978-3-319-46976-8_22"},{"key":"625_CR22","doi-asserted-by":"crossref","unstructured":"Maier-Hein, L. et al. Can masses of non-experts train highly accurate image classifiers? In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Golland, P. et al.) 438\u2013445 (Springer, 2014).","DOI":"10.1007\/978-3-319-10470-6_55"},{"key":"625_CR23","doi-asserted-by":"publisher","first-page":"519","DOI":"10.3758\/s13428-014-0483-x","volume":"47","author":"L Litman","year":"2015","unstructured":"Litman, L., Robinson, J. & Rosenzweig, C. The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk. Behav. Res. Methods 47, 519\u2013528 (2015).","journal-title":"Behav. Res. Methods"},{"key":"625_CR24","unstructured":"Denton, E., D\u00edaz, M., Kivlichan, I., Prabhakaran, V. & Rosen, R. Whose ground truth? Accounting for individual and collective identities underlying dataset annotation. NeurIPS Data-Centric AI Workshop (NeurIPS, 2021)."},{"key":"625_CR25","doi-asserted-by":"publisher","first-page":"614","DOI":"10.1017\/psrm.2020.6","volume":"8","author":"R Kennedy","year":"2020","unstructured":"Kennedy, R. et al. The shape of and solutions to the MTurk quality crisis. Polit. Sci. Res. Methods 8, 614\u2013629 (2020).","journal-title":"Polit. Sci. Res. Methods"},{"key":"625_CR26","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1109\/MC.2014.245","volume":"47","author":"T Hossfeld","year":"2014","unstructured":"Hossfeld, T., Keimel, C. & Timmerer, C. Crowdsourcing quality-of-experience assessments. Computer 47, 98\u2013102 (2014).","journal-title":"Computer"},{"key":"625_CR27","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1109\/MIC.2012.66","volume":"16","author":"O Tokarchuk","year":"2012","unstructured":"Tokarchuk, O., Cuel, R. & Zamarian, M. Analyzing crowd labor and designing incentives for humans in the loop. IEEE Internet Comput. 16, 45\u201351 (2012).","journal-title":"IEEE Internet Comput."},{"key":"625_CR28","unstructured":"Clark, H. H. & Brennan, S. E. in Perspectives on Socially Shared Cognition (eds Resnick, L. et al.) 127\u2013149 (American Psychological Association, 1991)."},{"key":"625_CR29","doi-asserted-by":"publisher","first-page":"820","DOI":"10.1038\/nbt.4225","volume":"36","author":"DP Sullivan","year":"2018","unstructured":"Sullivan, D. P. et al. Deep learning is combined with massive-scale citizen science to improve large-scale image classification. Nat. Biotechnol. 36, 820\u2013828 (2018).","journal-title":"Nat. Biotechnol."},{"key":"625_CR30","doi-asserted-by":"crossref","unstructured":"Albarqouni, S., Matl, S., Baust, M., Navab, N. & Demirci, S. in Deep Learning and Data Labeling for Medical Applications (eds Carneiro, G. et al.) 269\u2013277 (Springer, 2016).","DOI":"10.1007\/978-3-319-46976-8_28"},{"key":"625_CR31","doi-asserted-by":"publisher","first-page":"e37245","DOI":"10.1371\/journal.pone.0037245","volume":"7","author":"S Mavandadi","year":"2012","unstructured":"Mavandadi, S. et al. Distributed medical image analysis and diagnosis through crowd-sourced games: a malaria case study. PLoS ONE 7, e37245 (2012).","journal-title":"PLoS ONE"},{"key":"625_CR32","doi-asserted-by":"publisher","first-page":"e2338","DOI":"10.2196\/jmir.2338","volume":"14","author":"MA Luengo-Oroz","year":"2012","unstructured":"Luengo-Oroz, M. A., Arranz, A. & Frean, J. Crowdsourcing malaria parasite quantification: an online game for analyzing images of infected thick blood smears. J. Med. Internet Res. 14, e2338 (2012).","journal-title":"J. Med. Internet Res."},{"key":"625_CR33","doi-asserted-by":"crossref","unstructured":"Ning, Q. et al. Easy, reproducible and quality-controlled data collection with CROWDAQ. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (eds Liu, Q. & Schlangen, D.) 127\u2013134 (ACL, 2020).","DOI":"10.18653\/v1\/2020.emnlp-demos.17"},{"key":"625_CR34","doi-asserted-by":"crossref","unstructured":"Chaithanya Manam, V. K., Jampani, D., Zaim, M., Wu, M.-H. & J. Quinn, A. TaskMate: a mechanism to improve the quality of instructions in crowdsourcing. In Companion Proc. 2019 World Wide Web Conference (Liu, L. & White, R.) 1121\u20131130 (ACM, 2019).","DOI":"10.1145\/3308560.3317081"},{"key":"625_CR35","doi-asserted-by":"crossref","unstructured":"Bragg, J., Mausam & Weld, D. S. Sprout: crowd-powered task design for crowdsourcing. In Proc. 31st Annual ACM Symposium on User Interface Software and Technology (eds Baudisch, P. et al.) 165\u2013176 (ACM, 2018).","DOI":"10.1145\/3242587.3242598"},{"key":"625_CR36","doi-asserted-by":"crossref","unstructured":"Manam, V. C. & Quinn, A. Wingit: efficient refinement of unclear task instructions. Proc. AAAI Conference on Human Computation and Crowdsourcing 6, 108\u2013116 (2018).","DOI":"10.1609\/hcomp.v6i1.13338"},{"key":"625_CR37","doi-asserted-by":"crossref","unstructured":"Chang, J. C., Amershi, S. & Kamar, E. Revolt: collaborative crowdsourcing for labeling machine learning datasets. In Proc. 2017 CHI Conference on Human Factors in Computing Systems (eds Mark, G. et al.) 2334\u20132346 (ACM, 2017).","DOI":"10.1145\/3025453.3026044"},{"key":"625_CR38","first-page":"86","volume":"64","author":"T Gebru","year":"2021","unstructured":"Gebru, T. et al. Datasheets for datasets. Commun. Assoc. Comput. Mach. 64, 86\u201392 (2021).","journal-title":"Commun. Assoc. Comput. Mach."},{"key":"625_CR39","doi-asserted-by":"publisher","first-page":"101796","DOI":"10.1016\/j.media.2020.101796","volume":"66","author":"L Maier-Hein","year":"2020","unstructured":"Maier-Hein, L. et al. BIAS: transparent reporting of biomedical image analysis challenges. Med. Image Anal. 66, 101796 (2020).","journal-title":"Med. Image Anal."},{"key":"625_CR40","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-07619-7","volume":"9","author":"L Maier-Hein","year":"2018","unstructured":"Maier-Hein, L. et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9, 5217 (2018).","journal-title":"Nat. Commun."},{"key":"625_CR41","unstructured":"Call for challenges. The Medical Image Computing and Computer Assisted Intervention Society http:\/\/www.miccai.org\/news\/2021\/10\/25\/call-for-challenges (2021)."},{"key":"625_CR42","doi-asserted-by":"crossref","unstructured":"Reinke, A. et al. How to exploit weaknesses in biomedical challenge design and organization. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Frangi, A. F. et al.) 388\u2013395 (Springer, 2018).","DOI":"10.1007\/978-3-030-00937-3_45"},{"key":"625_CR43","unstructured":"Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int. J. Surg. 88, 105906 (2021)."},{"key":"625_CR44","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-021-00882-2","volume":"8","author":"L Maier-Hein","year":"2021","unstructured":"Maier-Hein, L. et al. Heidelberg colorectal data set for surgical data science in the sensor operating room. Sci. Data 8, 101 (2021).","journal-title":"Sci. Data"},{"key":"625_CR45","doi-asserted-by":"publisher","first-page":"101920","DOI":"10.1016\/j.media.2020.101920","volume":"70","author":"T Ro\u00df","year":"2021","unstructured":"Ro\u00df, T. et al. Comparative validation of multi-instance instrument segmentation in endoscopy: results of the ROBUST-MIS 2019 challenge. Med. Image Anal. 70, 101920 (2021).","journal-title":"Med. Image Anal."},{"key":"625_CR46","doi-asserted-by":"publisher","first-page":"297","DOI":"10.2307\/1932409","volume":"26","author":"LR Dice","year":"1945","unstructured":"Dice, L. R. Measures of the amount of ecologic association between species. Ecology 26, 297\u2013302 (1945).","journal-title":"Ecology"},{"key":"625_CR47","unstructured":"MICCAI special interest group for biomedical image analysis challenges. The Medical Image Computing and Computer Assisted Intervention Society https:\/\/miccai.org\/index.php\/special-interest-groups\/challenges\/ (2022)."},{"key":"625_CR48","unstructured":"Shankar, V. et al. Evaluating machine accuracy on ImageNet. In Proc. 37th International Conference on Machine Learning (eds Daum\u00e9 III, H. and Singh, A.) 8634\u20138644 (PMLR, 2020)."},{"key":"625_CR49","doi-asserted-by":"publisher","first-page":"2557","DOI":"10.1109\/TIP.2016.2544703","volume":"25","author":"TA Lampert","year":"2016","unstructured":"Lampert, T. A., Stumpf, A. & Gan\u00e7arski, P. An empirical study into annotator agreement, ground truth estimation, and algorithm evaluation. IEEE Trans. Image Process. 25, 2557\u20132572 (2016).","journal-title":"IEEE Trans. Image Process."},{"key":"625_CR50","doi-asserted-by":"publisher","first-page":"1086","DOI":"10.1001\/jamasurg.2015.2405","volume":"150","author":"TS Lendvay","year":"2015","unstructured":"Lendvay, T. S., White, L. & Kowalewski, T. Crowdsourcing to assess surgical skill. JAMA Surg. 150, 1086\u20131087 (2015).","journal-title":"JAMA Surg."},{"key":"625_CR51","doi-asserted-by":"crossref","unstructured":"Nowak, S. & R\u00fcger, S. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In Proc. International Conference on Multimedia Information Retrieval (eds Wang, J. Z. et al.) 557\u2013566 (ACM 2010).","DOI":"10.1145\/1743384.1743478"},{"key":"625_CR52","doi-asserted-by":"crossref","unstructured":"Sambasivan, N. et al. \u201cEveryone wants to do the model work, not the data work\u201d: data cascades in high-stakes AI. In Proc. 2021 CHI Conference on Human Factors in Computing Systems (eds Kitamura, Y. et al.) 1\u201315 (ACM, 2021).","DOI":"10.1145\/3411764.3445518"},{"key":"625_CR53","doi-asserted-by":"publisher","first-page":"101759","DOI":"10.1016\/j.media.2020.101759","volume":"65","author":"D Karimi","year":"2020","unstructured":"Karimi, D., Dou, H., Warfield, S. K. & Gholipour, A. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2020).","journal-title":"Med. Image Anal."},{"key":"625_CR54","unstructured":"Maier-Hein, L. et al. Metrics reloaded: pitfalls and recommendations for image analysis validation. Preprint at https:\/\/arxiv.org\/abs\/2206.01653 (2022)."},{"key":"625_CR55","unstructured":"Reinke, A. et al. Common limitations of image processing metrics: a picture story. Preprint at https:\/\/arxiv.org\/abs\/2104.05642 (2021)."},{"key":"625_CR56","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","volume":"60","author":"A Krizhevsky","year":"2017","unstructured":"Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84\u201390 (2017).","journal-title":"Commun. ACM"},{"key":"625_CR57","doi-asserted-by":"publisher","first-page":"867","DOI":"10.1016\/j.jesp.2009.03.009","volume":"45","author":"DM Oppenheimer","year":"2009","unstructured":"Oppenheimer, D. M., Meyvis, T. & Davidenko, N. Instructional manipulation checks: detecting satisficing to increase statistical power. J. Exp. Soc. Psychol. 45, 867\u2013872 (2009).","journal-title":"J. Exp. Soc. Psychol."},{"key":"625_CR58","doi-asserted-by":"publisher","first-page":"2728","DOI":"10.1109\/TMI.2022.3170077","volume":"41","author":"D Zimmerer","year":"2022","unstructured":"Zimmerer, D. et al. MOOD 2020: A public benchmark for out-of-distribution detection and localization on medical images. IEEE Trans. Med. Imaging 41, 2728\u20132738 (2022).","journal-title":"IEEE Trans. Med. Imaging"},{"key":"625_CR59","unstructured":"Ro\u00df, T. et al. How can we learn (more) from challenges? A statistical approach to driving future algorithm development. Preprint at https:\/\/arxiv.org\/abs\/2106.09302 (2021)."},{"key":"625_CR60","doi-asserted-by":"publisher","first-page":"2611","DOI":"10.1093\/bioinformatics\/btw308","volume":"32","author":"EZ Chen","year":"2016","unstructured":"Chen, E. Z. & Li, H. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32, 2611\u20132617 (2016).","journal-title":"Bioinformatics"},{"key":"625_CR61","unstructured":"R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020)."},{"key":"625_CR62","unstructured":"MICCAI registered challenges. The Medical Image Computing and Computer Assisted Intervention Society https:\/\/miccai.org\/index.php\/special-interest-groups\/challenges\/miccai-registered-challenges\/ (2021)."},{"key":"625_CR63","unstructured":"Ro\u00df, T. & Reinke, A. Robust Medical Instrument Segmentation (ROBUST-MIS) Challenge 2019 - syn18779624 - Wiki. SYNAPSE https:\/\/www.synapse.org\/#!Synapse:syn18779624\/wiki\/592660 (2019)."},{"key":"625_CR64","unstructured":"R\u00e4dsch, T. Labeling instructions matter code repository. GitHub https:\/\/github.com\/IMSY-DKFZ\/labeling_instructions_matter (2023)."}],"container-title":["Nature Machine Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00625-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00625-5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00625-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,23]],"date-time":"2023-03-23T00:06:28Z","timestamp":1679529988000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00625-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,2]]},"references-count":64,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["625"],"URL":"https:\/\/doi.org\/10.1038\/s42256-023-00625-5","relation":{},"ISSN":["2522-5839"],"issn-type":[{"value":"2522-5839","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,2]]},"assertion":[{"value":"19 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 February 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 March 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Ti.R. was an employee of the company understand.ai, which sponsored the creation of the annotations. After his research, To.R. was employed by Quality Match GmbH. The other authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}