{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T01:21:12Z","timestamp":1775697672950,"version":"3.50.1"},"reference-count":45,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,2,26]],"date-time":"2024-02-26T00:00:00Z","timestamp":1708905600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["758985"],"award-info":[{"award-number":["758985"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100021130","name":"Bundesministerium f\u00fcr Wirtschaft und Klimaschutz","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100021130","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Convolutional Neural Networks (CNNs) are frequently and successfully used in medical prediction tasks. They are often used in combination with transfer learning, leading to improved performance when training data for the task are scarce. The resulting models are highly complex and typically do not provide any insight into their predictive mechanisms, motivating the field of \u201cexplainable\u201d artificial intelligence (XAI). However, previous studies have rarely quantitatively evaluated the \u201cexplanation performance\u201d of XAI methods against ground-truth data, and transfer learning and its influence on objective measures of explanation performance has not been investigated. Here, we propose a benchmark dataset that allows for quantifying explanation performance in a realistic magnetic resonance imaging (MRI) classification task. We employ this benchmark to understand the influence of transfer learning on the quality of explanations. Experimental results show that popular XAI methods applied to the same underlying model differ vastly in performance, even when considering only correctly classified examples. We further observe that explanation performance strongly depends on the task used for pre-training and the number of CNN layers pre-trained. These results hold after correcting for a substantial correlation between explanation and classification performance.<\/jats:p>","DOI":"10.3389\/frai.2024.1330919","type":"journal-article","created":{"date-parts":[[2024,2,26]],"date-time":"2024-02-26T04:47:16Z","timestamp":1708922836000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Benchmarking the influence of pre-training on explanation performance in MR image classification"],"prefix":"10.3389","volume":"7","author":[{"given":"Marta","family":"Oliveira","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rick","family":"Wilming","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Benedict","family":"Clark","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"C\u00e9line","family":"Budding","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fabian","family":"Eitel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kerstin","family":"Ritter","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefan","family":"Haufe","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2024,2,26]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2206.11104","article-title":"Openxai: towards a transparent evaluation of model explanations","author":"Agarwal","year":"2022","journal-title":"arXiv"},{"key":"B2","doi-asserted-by":"publisher","first-page":"780405","DOI":"10.3389\/frai.2022.780405","article-title":"Transfer learning approaches for neuroimaging analysis: a scoping review","volume":"5","author":"Ardalan","year":"2022","journal-title":"Front. Artif. Intell"},{"key":"B3","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1016\/j.inffus.2021.11.008","article-title":"Clevr-xai: a benchmark dataset for the ground truth evaluation of neural network explanations","volume":"81","author":"Arras","year":"2022","journal-title":"Inf. Fus"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0130140","article-title":"On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation","volume":"10","author":"Bach","year":"2015","journal-title":"PLoS ONE"},{"key":"B5","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1007\/s10278-016-9929-2","article-title":"Transfer learning with convolutional neural networks for classification of abdominal ultrasound images","volume":"30","author":"Cheng","year":"2017","journal-title":"J. Digit. Imaging"},{"key":"B6","first-page":"1","volume-title":"Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images","author":"Cherti","year":"2021"},{"key":"B7","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2306.12816","article-title":"Xai-tris: non-linear benchmarks to quantify ml explanation performance","author":"Clark","year":"2023","journal-title":"arXiv"},{"key":"B8","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1177\/001316447403400105","article-title":"A revised definition for suppressor variables: a guide to their identification and interpretation","volume":"34","author":"Conger","year":"1974","journal-title":"Educ. Psychol. Meas"},{"key":"B9","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1002\/mrm.28148","article-title":"A transfer-learning approach for accelerated MRI using deep neural networks","volume":"84","author":"Dar","year":"2020","journal-title":"Magn. Reson. Med"},{"key":"B10","doi-asserted-by":"publisher","first-page":"fcz041","DOI":"10.1093\/braincomms\/fcz041","article-title":"White matter hyperintensities are common in midlife and already associated with cognitive decline","volume":"1","author":"d'Arbeloff","year":"2019","journal-title":"Brain Commun"},{"key":"B11","article-title":"Opportunities and challenges in explainable artificial intelligence (XAI): A survey","author":"Das","year":"2020","journal-title":"arXiv [Preprint]"},{"key":"B12","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1109\/CVPR.2009.5206848","article-title":"\u201cImagenet: a large-scale hierarchical image database,\u201d","author":"Deng","year":"2009","journal-title":"2009 IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"B13","year":"2018","journal-title":"2018 Reform of EU Data Protection Rules"},{"key":"B14","doi-asserted-by":"publisher","first-page":"774","DOI":"10.1016\/j.neuroimage.2012.01.021","article-title":"FreeSurfer","volume":"62","author":"Fischl","year":"2012","journal-title":"Neuroimage"},{"key":"B15","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1198\/000313005X41337","article-title":"Graphical views of suppression and multicollinearity in multiple linear regression","volume":"59","author":"Friedman","year":"2005","journal-title":"Am. Stat"},{"key":"B16","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.neuroimage.2013.04.127","article-title":"The minimal preprocessing pipelines for the human connectome project","volume":"80","author":"Glasser","year":"2013","journal-title":"Neuroimage"},{"key":"B17","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1016\/j.neuroimage.2013.10.067","article-title":"On the interpretation of weight vectors of linear models in multivariate neuroimaging","volume":"87","author":"Haufe","year":"2014","journal-title":"Neuroimage"},{"key":"B18","doi-asserted-by":"publisher","first-page":"119504","DOI":"10.1016\/j.neuroimage.2022.119504","article-title":"Towards the interpretability of deep learning models for multi-modal neuroimaging: finding structural changes of the ageing brain","volume":"261","author":"Hofmann","year":"2022","journal-title":"Neuroimage"},{"key":"B19","first-page":"6441","article-title":"Benchmarking deep learning interpretability in time series predictions","volume":"33","author":"Ismail","year":"2020","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"B20","first-page":"10814","author":"Ismail","year":"2019","journal-title":"Input-Cell Attention Reduces Vanishing Saliency of Recurrent Neural Networks"},{"key":"B21","doi-asserted-by":"publisher","first-page":"825","DOI":"10.1006\/nimg.2002.1132","article-title":"Improved optimization for the robust and accurate linear registration and motion correction of brain images","volume":"17","author":"Jenkinson","year":"2002","journal-title":"Neuroimage"},{"key":"B22","doi-asserted-by":"publisher","first-page":"782","DOI":"10.1016\/j.neuroimage.2011.09.015","article-title":"FSL","volume":"62","author":"Jenkinson","year":"2012","journal-title":"Neuroimage"},{"key":"B23","first-page":"2668","article-title":"\u201cInterpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV),\u201d","author":"Kim","year":"2018","journal-title":"International Conference on Machine Learning"},{"key":"B24","first-page":"1097","article-title":"\u201cImageNet classification with deep convolutional neural networks,\u201d","author":"Krizhevsky","year":"2012","journal-title":"Advances in Neural Information Processing Systems 25"},{"key":"B25","first-page":"4768","article-title":"\u201cA unified approach to interpreting model predictions,\u201d","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Lundberg","year":"2017"},{"key":"B26","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1007\/s12021-012-9160-3","article-title":"Obscuring surface anatomy in volumetric imaging data","volume":"11","author":"Milchenko","year":"2013","journal-title":"Neuroinformatics"},{"key":"B27","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1109\/TSMC.1979.4310076","article-title":"A threshold selection method from gray-level histograms","volume":"9","author":"Otsu","year":"1979","journal-title":"IEEE Trans. Syst. Man Cybern"},{"key":"B28","doi-asserted-by":"publisher","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"B29","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","article-title":"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead","volume":"1","author":"Rudin","year":"2019","journal-title":"Nat. Mach. Intell"},{"key":"B30","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1007\/978-3-030-60548-3_12","article-title":"\u201cFirst U-Net layers contain more domain specific information than the last ones,\u201d","volume-title":"Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning","author":"Shirokikh","year":"2020"},{"key":"B31","first-page":"3145","article-title":"\u201cLearning important features through propagating activation differences,\u201d","author":"Shrikumar","year":"2017","journal-title":"International Conference on Machine Learning"},{"key":"B32","article-title":"Deep inside convolutional networks: visualising image classification models and saliency maps","author":"Simonyan","year":"2013","journal-title":"CoRR"},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1409.1556","article-title":"Very deep convolutional networks for large-scale IMAGE recognition","author":"Simonyan","year":"2014","journal-title":"arXiv"},{"key":"B34","article-title":"\u201cStriving for simplicity: the all convolutional net,\u201d","author":"Springenberg","year":"2015","journal-title":"ICLR (Workshop Track)"},{"key":"B35","first-page":"3319","article-title":"\u201cAxiomatic attribution for deep networks,\u201d","author":"Sundararajan","year":"2017","journal-title":"Proceedings of the 34th International Conference on Machine Learning, Vol. 70"},{"key":"B36","doi-asserted-by":"crossref","first-page":"858","DOI":"10.1109\/TAI.2022.3228834","article-title":"Quantifying explainability of saliency methods in deep neural networks with a synthetic dataset","volume":"4","author":"Tjoa","year":"2020","journal-title":"IEEE Trans. Artif. Intell."},{"key":"B37","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1038\/s41591-018-0300-7","article-title":"High-performance medicine: the convergence of human and artificial intelligence","volume":"25","author":"Topol","year":"2019","journal-title":"Nat. Med"},{"key":"B38","doi-asserted-by":"publisher","first-page":"66","DOI":"10.3390\/jimaging7040066","article-title":"Transfer learning in magnetic resonance brain imaging: a systematic review","volume":"7","author":"Valverde","year":"2021","journal-title":"J. Imaging"},{"key":"B39","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1016\/j.neuroimage.2013.05.041","article-title":"The WU-Minn human connectome project: an overview","volume":"80","author":"Van Essen","year":"2013","journal-title":"Neuroimage"},{"key":"B40","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1111\/bpa.12219","article-title":"Age-associated white matter lesions: the MRC cognitive function and ageing study","volume":"25","author":"Wharton","year":"2015","journal-title":"Brain Pathol"},{"key":"B41","doi-asserted-by":"publisher","first-page":"1903","DOI":"10.1007\/s10994-022-06167-y","article-title":"Scrutinizing XAI using linear ground-truth data with suppressor variables","volume":"111","author":"Wilming","year":"2022","journal-title":"Mach. Learn"},{"key":"B42","first-page":"37091","article-title":"\u201cTheoretical behavior of xai methods in the presence of suppressor variables,\u201d","author":"Wilming","year":"2023","journal-title":"Proceedings of the 40th International Conference on Machine Learning (ICML), Vol. 202"},{"key":"B43","article-title":"Benchmarking attribution methods with relative feature importance","author":"Yang","year":"2019","journal-title":"arXiv [Preprint]"},{"key":"B44","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1007\/978-3-319-10590-1_53","article-title":"\u201cVisualizing and understanding convolutional networks,\u201d","volume-title":"Computer Vision-ECCV 2014","author":"Zeiler","year":"2014"},{"key":"B45","doi-asserted-by":"crossref","first-page":"1740","DOI":"10.1109\/BIBM.2018.8621359","article-title":"\u201cExplainable sentiment analysis with applications in medicine,\u201d","author":"Zucco","year":"2018","journal-title":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1330919\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,26]],"date-time":"2024-02-26T04:47:31Z","timestamp":1708922851000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1330919\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,26]]},"references-count":45,"alternative-id":["10.3389\/frai.2024.1330919"],"URL":"https:\/\/doi.org\/10.3389\/frai.2024.1330919","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,26]]},"article-number":"1330919"}}