{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T16:52:32Z","timestamp":1776876752129,"version":"3.51.2"},"reference-count":38,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,9,18]],"date-time":"2023-09-18T00:00:00Z","timestamp":1694995200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006769","name":"Russian Science Foundation","doi-asserted-by":"publisher","award":["20-71-10134"],"award-info":[{"award-number":["20-71-10134"]}],"id":[{"id":"10.13039\/501100006769","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Deep learning models perform unreliably when the data come from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection methods help to identify such data samples, preventing erroneous predictions. In this paper, we further investigate OOD detection effectiveness when applied to 3D medical image segmentation. We designed several OOD challenges representing clinically occurring cases and found that none of the methods achieved acceptable performance. Methods not dedicated to segmentation severely failed to perform in the designed setups; the best mean false-positive rate at a 95% true-positive rate (FPR) was 0.59. Segmentation-dedicated methods still achieved suboptimal performance, with the best mean FPR being 0.31 (lower is better). To indicate this suboptimality, we developed a simple method called Intensity Histogram Features (IHF), which performed comparably or better in the same challenges, with a mean FPR of 0.25. Our findings highlight the limitations of the existing OOD detection methods with 3D medical images and present a promising avenue for improving them. To facilitate research in this area, we release the designed challenges as a publicly available benchmark and formulate practical criteria to test the generalization of OOD detection beyond the suggested benchmark. We also propose IHF as a solid baseline to contest emerging methods.<\/jats:p>","DOI":"10.3390\/jimaging9090191","type":"journal-article","created":{"date-parts":[[2023,9,19]],"date-time":"2023-09-19T02:26:17Z","timestamp":1695090377000},"page":"191","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5672-273X","authenticated-orcid":false,"given":"Anton","family":"Vasiliuk","sequence":"first","affiliation":[{"name":"Moscow Institute of Physics and Technology, Moscow 141701, Russia"},{"name":"Artificial Intelligence Research Institute (AIRI), Moscow 105064, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8368-9015","authenticated-orcid":false,"given":"Daria","family":"Frolova","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Research Institute (AIRI), Moscow 105064, Russia"},{"name":"Skolkovo Institute of Science and Technology, Moscow 121205, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2776-5316","authenticated-orcid":false,"given":"Mikhail","family":"Belyaev","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Research Institute (AIRI), Moscow 105064, Russia"},{"name":"Skolkovo Institute of Science and Technology, Moscow 121205, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2901-5789","authenticated-orcid":false,"given":"Boris","family":"Shirokikh","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Research Institute (AIRI), Moscow 105064, Russia"},{"name":"Skolkovo Institute of Science and Technology, Moscow 121205, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.neucom.2018.05.083","article-title":"Deep Visual Domain Adaptation: A Survey","volume":"312","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1038\/s41746-020-00367-3","article-title":"Second opinion needed: Communicating uncertainty in medical machine learning","volume":"4","author":"Kompa","year":"2021","journal-title":"NPJ Digit. Med."},{"key":"ref_3","unstructured":"Yang, J., Zhou, K., Li, Y., and Liu, Z. (2021). Generalized out-of-distribution detection: A survey. arXiv."},{"key":"ref_4","unstructured":"Hendrycks, D., and Gimpel, K. (2016). A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv."},{"key":"ref_5","unstructured":"Hendrycks, D., Basart, S., Mazeika, M., Mostajabi, M., Steinhardt, J., and Song, D. (2019). Scaling out-of-distribution detection for real-world settings. arXiv."},{"key":"ref_6","unstructured":"Mahmood, A., Oliva, J., and Styner, M. (2020). Multiscale score matching for out-of-distribution detection. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Pacheco, A.G., Sastry, C.S., Trappenberg, T., Oore, S., and Krohling, R.A. (2020, January 14\u201319). On out-of-distribution detection algorithms with deep neural skin cancer classifiers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00374"},{"key":"ref_8","unstructured":"Berger, C., Paschali, M., Glocker, B., and Kamnitsas, K. (2021). Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Perinatal Imaging, Placental and Preterm Image Analysis, Springer."},{"key":"ref_9","unstructured":"Cao, T., Huang, C.W., Hui, D.Y.T., and Cohen, J.P. (2020). A benchmark of medical out of distribution detection. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.media.2017.07.005","article-title":"A survey on deep learning in medical image analysis","volume":"42","author":"Litjens","year":"2017","journal-title":"Med. Image Anal."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1109\/TAI.2022.3159510","article-title":"Improving Calibration and Out-of-Distribution Detection in Deep Models for Medical Image Segmentation","volume":"4","author":"Karimi","year":"2022","journal-title":"IEEE Trans. Artif. Intell."},{"key":"ref_12","unstructured":"Zimmerer, D., Petersen, J., K\u00f6hler, G., J\u00e4ger, P., Full, P., Maier-Hein, K., Ro\u00df, T., Adler, T., Reinke, A., and Maier-Hein, L. (2022, January 18\u201322). Medical Out-of-Distribution Analysis Challenge 2022. Proceedings of the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022), Singapore."},{"key":"ref_13","unstructured":"Lambert, B., Forbes, F., Doyle, S., Tucholka, A., and Dojat, M. (2022). Improving Uncertainty-based Out-of-Distribution Detection for Medical Image Segmentation. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"915","DOI":"10.1118\/1.3528204","article-title":"The lung image database consortium (LIDC) and image database resource initiative (IDRI): A completed reference database of lung nodules on CT scans","volume":"38","author":"McLennan","year":"2011","journal-title":"Med. Phys."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Morozov, S., Gombolevskiy, V., Elizarov, A., Gusev, M., Novik, V., Prokudaylo, S., Bardin, A., Popov, E., Ledikhova, N., and Chernina, V. (2021). A simplified cluster model and a tool adapted for collaborative labeling of lung cancer CT scans. Comput. Methods Programs Biomed., 206.","DOI":"10.1016\/j.cmpb.2021.106111"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"E204","DOI":"10.1148\/radiol.2021203957","article-title":"The RSNA International COVID-19 Open Radiology Database (RICORD)","volume":"299","author":"Tsai","year":"2021","journal-title":"Radiology"},{"key":"ref_17","unstructured":"Bilic, P., Christ, P.F., Vorontsov, E., Chlebus, G., Chen, H., Dou, Q., Fu, C.W., Han, X., Heng, P.A., and Hesser, J. (2019). The liver tumor segmentation benchmark (lits). arXiv."},{"key":"ref_18","first-page":"14","article-title":"Computed tomography images for intracranial hemorrhage detection and segmentation","volume":"5","author":"Hssayeni","year":"2020","journal-title":"Intracranial Hemorrhage Segm. Using A Deep. Convolutional Model Data"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41597-021-01064-w","article-title":"Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm","volume":"8","author":"Shapey","year":"2021","journal-title":"Sci. Data"},{"key":"ref_20","unstructured":"Dorent, R., Kujawa, A., Cornelissen, S., Langenhuizen, P., Shapey, J., and Vercauteren, T. (2023, August 06). Cross-Modality Domain Adaptation Challenge 2022 (CrossMoDA). Available online: https:\/\/zenodo.org\/record\/6504722."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"107191","DOI":"10.1016\/j.dib.2021.107191","article-title":"The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma","volume":"37","author":"Incekara","year":"2021","journal-title":"Data Brief"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"482","DOI":"10.1016\/j.neuroimage.2017.08.021","article-title":"An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement","volume":"170","author":"Souza","year":"2018","journal-title":"NeuroImage"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"P\u00e9rez-Garc\u00eda, F., Sparks, R., and Ourselin, S. (2021). TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Comput. Methods Programs Biomed., 208.","DOI":"10.1016\/j.cmpb.2021.106236"},{"key":"ref_24","unstructured":"Yang, J., Wang, P., Zou, D., Zhou, Z., Ding, K., Peng, W., Wang, H., Chen, G., Li, B., and Sun, Y. (2022). OpenOOD: Benchmarking Generalized Out-of-Distribution Detection. arXiv."},{"key":"ref_25","unstructured":"Jungo, A., and Reyes, M. Assessing reliability and challenges of uncertainty estimations for medical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2728","DOI":"10.1109\/TMI.2022.3170077","article-title":"MOOD 2020: A public Benchmark for Out-of-Distribution Detection and Localization on medical Images","volume":"41","author":"Zimmerer","year":"2022","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3868","DOI":"10.1109\/TMI.2020.3006437","article-title":"Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation","volume":"39","author":"Mehrtash","year":"2020","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_28","unstructured":"Smith, L., and Gal, Y. (2018). Understanding Measures of Uncertainty for Adversarial Example Detection. arXiv."},{"key":"ref_29","first-page":"6402","article-title":"Simple and scalable predictive uncertainty estimation using deep ensembles","volume":"30","author":"Lakshminarayanan","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","unstructured":"Gal, Y., and Ghahramani, Z. (2016, January 20\u201322). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. Proceedings of the International Conference on MACHINE Learning, PMLR, New York, NY, USA."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Hsu, Y.C., Shen, Y., Jin, H., and Kira, Z. (2020, January 14\u201319). Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01096"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liang, Y., Zhang, J., Zhao, S., Wu, R., Liu, Y., and Pan, S. (2022). Omni-frequency Channel-selection Representations for Unsupervised Anomaly Detection. arXiv.","DOI":"10.1109\/TIP.2023.3293772"},{"key":"ref_33","unstructured":"Meissen, F., Wiestler, B., Kaissis, G., and Rueckert, D. (2022). On the Pitfalls of Using the Residual Error as Anomaly Score. arXiv."},{"key":"ref_34","unstructured":"Cho, J., Kang, I., and Park, J. Self-supervised 3D Out-of-Distribution Detection via Pseudoanomaly Generation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention."},{"key":"ref_35","unstructured":"Zakazov, I., Shirokikh, B., Chernyavskiy, A., and Belyaev, M. Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention."},{"key":"ref_36","first-page":"7167","article-title":"A simple unified framework for detecting out-of-distribution samples and adversarial attacks","volume":"31","author":"Lee","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_37","unstructured":"Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., and Maier-Hein, K.H. No new-net. Proceedings of the International MICCAI Brainlesion Workshop."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Abraham, N., and Khan, N.M. (2019, January 8\u201311). A novel focal tversky loss function with improved attention u-net for lesion segmentation. Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy.","DOI":"10.1109\/ISBI.2019.8759329"}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/9\/9\/191\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:53:17Z","timestamp":1760129597000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/9\/9\/191"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,18]]},"references-count":38,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["jimaging9090191"],"URL":"https:\/\/doi.org\/10.3390\/jimaging9090191","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,18]]}}}