{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T08:22:30Z","timestamp":1765354950700,"version":"3.37.3"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T00:00:00Z","timestamp":1685664000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T00:00:00Z","timestamp":1685664000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Deutsches Krebsforschungszentrum (DKFZ)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J CARS"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Purpose<\/jats:title>\n                <jats:p>Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analysis problems, there is a lack of experimental evidence on the concrete effects of common and rare pitfalls on specific applications. We address this gap in the literature in the context of colon cancer screening.\n<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>Our contribution is twofold. Firstly, we present the winning solution of the Endoscopy Computer Vision Challenge on colon cancer detection, conducted in conjunction with the IEEE International Symposium on Biomedical Imaging 2022. Secondly, we demonstrate the sensitivity of commonly used metrics to a range of hyperparameters as well as the consequences of poor metric choices.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Based on comprehensive validation studies performed with patient data from six clinical centers, we found all commonly applied object detection metrics to be subject to high inter-center variability. Furthermore, our results clearly demonstrate that the adaptation of standard hyperparameters used in the computer vision community does not generally lead to the clinically most plausible results. Finally, we present localization criteria that correspond well to clinical relevance.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>We conclude from our study that (1) performance results in polyp detection are highly sensitive to various design choices, (2) common metric configurations do not reflect the clinical need and rely on suboptimal hyperparameters and (3) comparison of performance across datasets can be largely misleading. Our work could be a first step towards reconsidering common validation strategies in deep learning-based colonoscopy and beyond.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1007\/s11548-023-02936-9","type":"journal-article","created":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T12:41:05Z","timestamp":1685709665000},"page":"1311-1322","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Sources of performance variability in deep learning-based polyp detection"],"prefix":"10.1007","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1727-0037","authenticated-orcid":false,"given":"T. N.","family":"Tran","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"T. J.","family":"Adler","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"A.","family":"Yamlahi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"E.","family":"Christodoulou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P.","family":"Godau","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"A.","family":"Reinke","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"M. D.","family":"Tizabi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P.","family":"Sauer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"T.","family":"Persicke","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J. G.","family":"Albert","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"L.","family":"Maier-Hein","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,6,2]]},"reference":[{"key":"2936_CR1","doi-asserted-by":"publisher","DOI":"10.1055\/s-0029-1242458","author":"FA Haggar","year":"2009","unstructured":"Haggar FA, Boushey RP (2009) Colorectal cancer epidemiology: incidence, mortality, survival, and risk factors. Clin Colon Rect Surg. https:\/\/doi.org\/10.1055\/s-0029-1242458","journal-title":"Clin Colon Rect Surg"},{"key":"2936_CR2","doi-asserted-by":"publisher","DOI":"10.1080\/00365521.2022.2085059","author":"D Fitting","year":"2022","unstructured":"Fitting D, Krenzer A, Troya J, Banck M, Sudarevic B, Brand M, B\u00f6ck W, Zoller WG, R\u00f6sch T, Puppe F et al (2022) A video based benchmark data set (endotest) to evaluate computer-aided polyp detection systems. Scand J Gastroentero. https:\/\/doi.org\/10.1080\/00365521.2022.2085059","journal-title":"Scand J Gastroentero"},{"key":"2936_CR3","doi-asserted-by":"publisher","unstructured":"Ali S, Ghatwary N, Jha D, Isik-Polat E, Polat G, Yang C, Li W, Galdran A, Ballester M-\u00c1G, Thambawita V, Hicks S, Poudel S, Lee S-W, Jin Z, Gan T, Yu C, Yan J, Yeo D, Lee H, Tomar NK, Haithmi M, Ahmed A, Riegler MA, Daul C, Halvorsen P, Rittscher J, Salem OE, Lamarque D, Cannizzaro R, Realdon S, de Lange T, East JE (2022) Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge. https:\/\/doi.org\/10.48550\/arXiv.2202.12031","DOI":"10.48550\/arXiv.2202.12031"},{"key":"2936_CR4","unstructured":"Sharib\u00a0Ali NG (2022) Endoscopic computer vision challenges 2.0. https:\/\/endocv2022.grand-challenge.org\/. Accessed 14 Nov 2022"},{"key":"2936_CR5","unstructured":"Bernal J, Histace A (2022) Gastrointestinal image analysis (GIANA) (2021). https:\/\/giana.grand-challenge.org\/. Accessed 15 Nov 2021"},{"key":"2936_CR6","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-64340-9_21","author":"J Bernal","year":"2021","unstructured":"Bernal J, Tudela Y, Riera M, S\u00e1nchez FJ (2021) Polyp detection in colonoscopy videos. Comput-Aided Anal Gastrointest Videos. https:\/\/doi.org\/10.1007\/978-3-030-64340-9_21","journal-title":"Comput-Aided Anal Gastrointest Videos"},{"key":"2936_CR7","doi-asserted-by":"publisher","unstructured":"Reinke A, Tizabi MD, Sudre CH, Eisenmann M, R\u00e4dsch T, Baumgartner M, Acion L, Antonelli M, Arbel T, Bakas S, et al (2021) Common limitations of image processing metrics: a picture story. https:\/\/doi.org\/10.48550\/arXiv.2104.05642","DOI":"10.48550\/arXiv.2104.05642"},{"key":"2936_CR8","unstructured":"Yamlahi A, Godau P, Tran TN, M\u00fcller L-R, Adler T, Tizabi MD, Baumgartner M, J\u00e4ger P, Maier-Hein L (2022) Heterogeneous model ensemble for polyp detection and tracking in colonoscopy. EndoCV@ISBI"},{"key":"2936_CR9","doi-asserted-by":"publisher","DOI":"10.1016\/j.compmedimag.2015.02.007","author":"J Bernal","year":"2015","unstructured":"Bernal J, S\u00e1nchez FJ, Fern\u00e1ndez-Esparrach G, Gil D, Rodr\u00edguez C, Vilari\u00f1o F (2015) Wm-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput Med Imag Grap. https:\/\/doi.org\/10.1016\/j.compmedimag.2015.02.007","journal-title":"Comput Med Imag Grap"},{"key":"2936_CR10","doi-asserted-by":"publisher","DOI":"10.1007\/s11548-013-0926-3","author":"J Silva","year":"2014","unstructured":"Silva J, Histace A, Romain O, Dray X, Granado B (2014) Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. Int J Comput Ass Rad. https:\/\/doi.org\/10.1007\/s11548-013-0926-3","journal-title":"Int J Comput Ass Rad"},{"key":"2936_CR11","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-67543-5_3","author":"Q Angermann","year":"2017","unstructured":"Angermann Q, Bernal J, S\u00e1nchez-Montes C, Hammami M, Fern\u00e1ndez-Esparrach G, Dray X, Romain O, S\u00e1nchez FJ, Histace A (2017) Towards real-time polyp detection in colonoscopy videos: adapting still frame-based methodologies for video sequences analysis. Comput Assist Robot Endosc Clin Image-based Proced. https:\/\/doi.org\/10.1007\/978-3-319-67543-5_3","journal-title":"Comput. Assist. Robot. Endosc. Clin. Image-based Proced."},{"key":"2936_CR12","doi-asserted-by":"publisher","unstructured":"Ali S, Jha D, Ghatwary N, Realdon S, Cannizzaro R, Salem OE, Lamarque D, Daul C, Riegler MA, Anonsen KV, et al (2021) Polypgen: a multi-center polyp detection and segmentation dataset for generalisability assessment. https:\/\/doi.org\/10.48550\/arXiv.2106.04463","DOI":"10.48550\/arXiv.2106.04463"},{"key":"2936_CR13","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1911.08287","author":"Z Zheng","year":"2020","unstructured":"Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. AAAI. https:\/\/doi.org\/10.48550\/arXiv.1911.08287","journal-title":"AAAI"},{"key":"2936_CR14","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2021.104117","author":"R Solovyev","year":"2021","unstructured":"Solovyev R, Wang W, Gabruseva T (2021) Weighted boxes fusion: ensembling boxes from different object detection models. Image Vis Comput. https:\/\/doi.org\/10.1016\/j.imavis.2021.104117","journal-title":"Image Vis Comput"},{"key":"2936_CR15","doi-asserted-by":"publisher","first-page":"26","DOI":"10.48550\/ARXIV.2206.01653","volume":"1","author":"L Maier-Hein","year":"2022","unstructured":"Maier-Hein L, Menze B et al (2022) Metrics reloaded: pitfalls and recommendations for image analysis validation. arXiv 1:26. https:\/\/doi.org\/10.48550\/ARXIV.2206.01653","journal-title":"arXiv"},{"key":"2936_CR16","doi-asserted-by":"publisher","DOI":"10.1109\/TMI.2017.2664042","author":"J Bernal","year":"2017","unstructured":"Bernal J, Tajkbaksh N, Sanchez FJ, Matuszewski BJ, Chen H, Yu L, Angermann Q, Romain O, Rustad B, Balasingham I et al (2017) Comparative validation of polyp detection methods in video colonoscopy: results from the miccai 2015 endoscopic vision challenge. IEEE T Med Imaging. https:\/\/doi.org\/10.1109\/TMI.2017.2664042","journal-title":"IEEE T Med Imaging"},{"key":"2936_CR17","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft COCO: common objects in context. https:\/\/cocodataset.org\/#detection-eval. Accessed 31 Jan 2023","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"2936_CR18","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48","author":"T-Y Lin","year":"2014","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft coco: common objects in context. ECCV. https:\/\/doi.org\/10.1007\/978-3-319-10602-1_48","journal-title":"ECCV"},{"key":"2936_CR19","unstructured":"Polat G, I\u015f\u0131k\u00a0Polat E, Kayabay K, Temizel A (2021) Polyp detection in colonoscopy images using deep learning and bootstrap aggregation. EndoCV@ISBI"},{"key":"2936_CR20","doi-asserted-by":"publisher","unstructured":"Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. https:\/\/doi.org\/10.48550\/arXiv.2207.02696","DOI":"10.48550\/arXiv.2207.02696"},{"key":"2936_CR21","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-33-4676-5_10","author":"R Ismail","year":"2021","unstructured":"Ismail R, Nagy S (2021) On metrics used in colonoscopy image processing for detection of colorectal polyps. NAMSP. https:\/\/doi.org\/10.1007\/978-981-33-4676-5_10","journal-title":"NAMSP"},{"key":"2936_CR22","doi-asserted-by":"publisher","unstructured":"Kofler F, Ezhov I, Isensee F, Balsiger F, Berger C, Koerner M, Paetzold J, Li H, Shit S, McKinley R, et al (2021) Are we using appropriate segmentation metrics? Identifying correlates of human expert perception for cnn training beyond rolling the dice coefficient. https:\/\/doi.org\/10.48550\/arXiv.2103.06205","DOI":"10.48550\/arXiv.2103.06205"},{"key":"2936_CR23","doi-asserted-by":"publisher","unstructured":"Gooding MJ, Smith AJ, Tariq M, Aljabar P, Peressutti D, van der Stoep J, Reymen B, Emans D, Hattu D, van Loon J et al (2018) Comparative evaluation of autocontouring in clinical practice: a practical method using the turing test. Med Phys. https:\/\/doi.org\/10.1002\/mp.13200","DOI":"10.1002\/mp.13200"}],"container-title":["International Journal of Computer Assisted Radiology and Surgery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-023-02936-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11548-023-02936-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-023-02936-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,8]],"date-time":"2023-07-08T16:10:52Z","timestamp":1688832652000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11548-023-02936-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,2]]},"references-count":23,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["2936"],"URL":"https:\/\/doi.org\/10.1007\/s11548-023-02936-9","relation":{},"ISSN":["1861-6429"],"issn-type":[{"type":"electronic","value":"1861-6429"}],"subject":[],"published":{"date-parts":[[2023,6,2]]},"assertion":[{"value":"16 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 June 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This work was conducted using public datasets of human subject data made available by [].","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}