{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,31]],"date-time":"2025-05-31T04:09:23Z","timestamp":1748664563161,"version":"3.41.0"},"publisher-location":"Cham","reference-count":37,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031926471","type":"print"},{"value":"9783031926488","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,12]],"date-time":"2025-05-12T00:00:00Z","timestamp":1747008000000},"content-version":"vor","delay-in-days":131,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>A major challenge in developing data-driven algorithms for medical imaging is the limited size of available datasets. Furthermore, these datasets often suffer from inter-site heterogeneity caused by the use of different scanners and scanning protocols. These factors may contribute to overfitting,\u00a0which undermines the generalization ability and robustness of\u00a0deep learning classification models in the medical domain, leading\u00a0to inadequate performance in real-world applications. To address\u00a0these challenges and mitigate overfitting, we propose a framework\u00a0which incorporates explanation supervision during training of Vision Transformer (ViT) models for image classification. Our approach leverages foreground masks of the class object during training\u00a0to regularize attribution maps extracted from ViT, encouraging\u00a0the model to focus on relevant image regions and make predictions\u00a0based on pertinent features. We introduce a new method for generating explanatory attribution maps from ViT-based models and construct\u00a0a dual-loss function that combines a conventional classification\u00a0loss with a term that regularizes attribution maps. Our approach demonstrates superior performance over existing methods on\u00a0two challenging medical imaging datasets, highlighting its effectiveness in the medical domain and its potential for application in\u00a0other fields. Source code is available at: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/sagibe\/LGMViT\" ext-link-type=\"uri\">https:\/\/github.com\/sagibe\/LGMViT<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/978-3-031-92648-8_8","type":"book-chapter","created":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T16:28:05Z","timestamp":1748622485000},"page":"118-133","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Localization-Guided Supervision for\u00a0Robust Medical Image Classification by\u00a0Vision Transformers"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-2683-9875","authenticated-orcid":false,"given":"Sagi","family":"Ben Itzhak","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1436-2275","authenticated-orcid":false,"given":"Nahum","family":"Kiryati","sequence":"additional","affiliation":[]},{"given":"Orith","family":"Portnoy","sequence":"additional","affiliation":[]},{"given":"Arnaldo","family":"Mayer","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,12]]},"reference":[{"key":"8_CR1","doi-asserted-by":"crossref","unstructured":"Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020)","DOI":"10.18653\/v1\/2020.acl-main.385"},{"key":"8_CR2","unstructured":"Achtibat, R., et al.: AttnLRP: attention-aware layer-wise relevance propagation for transformers. arXiv preprint arXiv:2402.05602 (2024)"},{"key":"8_CR3","unstructured":"Ali, A., Schnake, T., Eberle, O., Montavon, G., M\u00fcller, K.R., Wolf, L.: XAI for transformers: Better explanations through conservative propagation. In: International Conference on Machine Learning, pp. 435\u2013451. PMLR (2022)"},{"key":"8_CR4","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1016\/j.inffus.2021.11.008","volume":"81","author":"L Arras","year":"2022","unstructured":"Arras, L., Osman, A., Samek, W.: CLEVR-XAI: a benchmark dataset for the ground truth evaluation of neural network explanations. Inf. Fusion 81, 14\u201340 (2022)","journal-title":"Inf. Fusion"},{"issue":"7","key":"8_CR5","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0130140","volume":"10","author":"S Bach","year":"2015","unstructured":"Bach, S., Binder, A., Montavon, G., Klauschen, F., M\u00fcller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)","journal-title":"PLoS ONE"},{"issue":"1","key":"8_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2017.117","volume":"4","author":"S Bakas","year":"2017","unstructured":"Bakas, S., et al.: Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data 4(1), 1\u201313 (2017)","journal-title":"Sci. Data"},{"key":"8_CR7","unstructured":"Bakas, S., et\u00a0al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv preprint arXiv:1811.02629 (2018)"},{"key":"8_CR8","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2022.102680","volume":"84","author":"P Bilic","year":"2023","unstructured":"Bilic, P., et al.: The liver tumor segmentation benchmark (LiTS). Med. Image Anal. 84, 102680 (2023)","journal-title":"Med. Image Anal."},{"key":"8_CR9","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1613\/jair.1.12228","volume":"70","author":"N Burkart","year":"2021","unstructured":"Burkart, N., Huber, M.F.: A survey on the explainability of supervised machine learning. J. Artif. Intell. Res. 70, 245\u2013317 (2021)","journal-title":"J. Artif. Intell. Res."},{"key":"8_CR10","doi-asserted-by":"crossref","unstructured":"Chefer, H., Gur, S., Wolf, L.: Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 397\u2013406 (2021)","DOI":"10.1109\/ICCV48922.2021.00045"},{"key":"8_CR11","doi-asserted-by":"crossref","unstructured":"Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visualization. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 782\u2013791 (2021)","DOI":"10.1109\/CVPR46437.2021.00084"},{"key":"8_CR12","first-page":"33618","volume":"35","author":"H Chefer","year":"2022","unstructured":"Chefer, H., Schwartz, I., Wolf, L.: Optimizing relevance maps of vision transformers improves robustness. Adv. Neural. Inf. Process. Syst. 35, 33618\u201333632 (2022)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"issue":"7","key":"8_CR13","doi-asserted-by":"publisher","first-page":"610","DOI":"10.1038\/s42256-021-00338-7","volume":"3","author":"AJ DeGrave","year":"2021","unstructured":"DeGrave, A.J., Janizek, J.D., Lee, S.I.: AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3(7), 610\u2013619 (2021)","journal-title":"Nat. Mach. Intell."},{"key":"8_CR14","unstructured":"Dosovitskiy, A., et\u00a0al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)"},{"issue":"9","key":"8_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3561048","volume":"55","author":"R Dwivedi","year":"2023","unstructured":"Dwivedi, R., et al.: Explainable AI (XAI): core ideas, techniques, and solutions. ACM Comput. Surv. 55(9), 1\u201333 (2023)","journal-title":"ACM Comput. Surv."},{"key":"8_CR16","doi-asserted-by":"crossref","unstructured":"Gao, Y., Sun, T.S., Bai, G., Gu, S., Hong, S.R., Liang, Z.: Res: a robust framework for guiding visual explanation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 432\u2013442 (2022)","DOI":"10.1145\/3534678.3539419"},{"issue":"5","key":"8_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3236009","volume":"51","author":"R Guidotti","year":"2018","unstructured":"Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1\u201342 (2018)","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"8_CR18","doi-asserted-by":"crossref","unstructured":"Kashefi, R., Barekatain, L., Sabokrou, M., Aghaeipoor, F.: Explainability of vision transformers: a comprehensive review and new perspectives. arXiv preprint arXiv:2311.06786 (2023)","DOI":"10.2139\/ssrn.5055345"},{"key":"8_CR19","unstructured":"Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)"},{"issue":"8","key":"8_CR20","doi-asserted-by":"publisher","first-page":"155","DOI":"10.3390\/jimaging7080155","volume":"7","author":"N Kiryati","year":"2021","unstructured":"Kiryati, N., Landau, Y.: Dataset growth in medical image analysis research. J. Imaging 7(8), 155 (2021)","journal-title":"J. Imaging"},{"key":"8_CR21","doi-asserted-by":"crossref","unstructured":"Komorowski, P., Baniecki, H., Biecek, P.: Towards evaluating explanations of vision transformers for medical imaging. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 3726\u20133732 (2023)","DOI":"10.1109\/CVPRW59228.2023.00383"},{"issue":"1","key":"8_CR22","doi-asserted-by":"publisher","first-page":"18","DOI":"10.3390\/e23010018","volume":"23","author":"P Linardatos","year":"2020","unstructured":"Linardatos, P., Papastefanopoulos, V., Kotsiantis, S.: Explainable AI: a review of machine learning interpretability methods. Entropy 23(1), 18 (2020)","journal-title":"Entropy"},{"issue":"10","key":"8_CR23","doi-asserted-by":"publisher","first-page":"1993","DOI":"10.1109\/TMI.2014.2377694","volume":"34","author":"BH Menze","year":"2014","unstructured":"Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BraTS). IEEE Trans. Med. Imaging 34(10), 1993\u20132024 (2014)","journal-title":"IEEE Trans. Med. Imaging"},{"key":"8_CR24","series-title":"Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence)","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/978-3-030-28954-6_10","volume-title":"Explainable AI: Interpreting, Explaining and Visualizing Deep Learning","author":"G Montavon","year":"2019","unstructured":"Montavon, G., Binder, A., Lapuschkin, S., Samek, W., M\u00fcller, K.-R.: Layer-wise relevance propagation: an overview. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., M\u00fcller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 193\u2013209. Springer, Cham (2019). https:\/\/doi.org\/10.1007\/978-3-030-28954-6_10"},{"issue":"6","key":"8_CR25","doi-asserted-by":"publisher","first-page":"3947","DOI":"10.1007\/s10462-019-09784-7","volume":"53","author":"R Moradi","year":"2020","unstructured":"Moradi, R., Berangi, R., Minaei, B.: A survey of regularization strategies for deep models. Artif. Intell. Rev. 53(6), 3947\u20133986 (2020)","journal-title":"Artif. Intell. Rev."},{"issue":"4","key":"8_CR26","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1109\/MSP.2022.3142719","volume":"39","author":"IE Nielsen","year":"2022","unstructured":"Nielsen, I.E., Dera, D., Rasool, G., Ramachandran, R.P., Bouaynaya, N.C.: Robust explainability: a tutorial on gradient-based attribution methods for deep neural networks. IEEE Signal Process. Mag. 39(4), 73\u201384 (2022)","journal-title":"IEEE Signal Process. Mag."},{"key":"8_CR27","doi-asserted-by":"crossref","unstructured":"Ross, A.S., Hughes, M.C., Doshi-Velez, F.: Right for the right reasons: training differentiable models by constraining their explanations. arXiv preprint arXiv:1703.03717 (2017)","DOI":"10.24963\/ijcai.2017\/371"},{"issue":"3","key":"8_CR28","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1109\/JPROC.2021.3060483","volume":"109","author":"W Samek","year":"2021","unstructured":"Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., M\u00fcller, K.R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247\u2013278 (2021)","journal-title":"Proc. IEEE"},{"key":"8_CR29","doi-asserted-by":"publisher","unstructured":"Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., M\u00fcller, K.-R. (eds.): Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700. Springer, Cham (2019). https:\/\/doi.org\/10.1007\/978-3-030-28954-6","DOI":"10.1007\/978-3-030-28954-6"},{"issue":"10","key":"8_CR30","doi-asserted-by":"publisher","first-page":"867","DOI":"10.1038\/s42256-022-00536-x","volume":"4","author":"A Saporta","year":"2022","unstructured":"Saporta, A., et al.: Benchmarking saliency methods for chest x-ray interpretation. Nat. Mach. Intell. 4(10), 867\u2013878 (2022)","journal-title":"Nat. Mach. Intell."},{"key":"8_CR31","doi-asserted-by":"crossref","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618\u2013626 (2017)","DOI":"10.1109\/ICCV.2017.74"},{"key":"8_CR32","unstructured":"Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)"},{"key":"8_CR33","unstructured":"Simpson, B., Dutil, F., Bengio, Y., Cohen, J.P.: GradMask: reduce overfitting by regularizing saliency. arXiv preprint arXiv:1904.07478 (2019)"},{"key":"8_CR34","unstructured":"Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)"},{"key":"8_CR35","unstructured":"Viviano, J.D., Simpson, B., Dutil, F., Bengio, Y., Cohen, J.P.: Saliency is a possible red herring when diagnosing poor generalization. arXiv preprint arXiv:1910.00199 (2019)"},{"key":"8_CR36","doi-asserted-by":"crossref","unstructured":"Watson, M., Hasan, B.A.S., Al\u00a0Moubayed, N.: Agree to disagree: when deep learning models with identical architectures produce distinct explanations. In: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, pp. 875\u2013884 (2022)","DOI":"10.1109\/WACV51458.2022.00159"},{"key":"8_CR37","unstructured":"Zech, J.R., Badgeley, M.A., Liu, M., Costa, A.B., Titano, J.J., Oermann, E.K.: Confounding variables can degrade generalization performance of radiological deep learning models. arXiv preprint arXiv:1807.00431 (2018)"}],"container-title":["Lecture Notes in Computer Science","Computer Vision \u2013 ECCV 2024 Workshops"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-92648-8_8","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T16:28:15Z","timestamp":1748622495000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-92648-8_8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"ISBN":["9783031926471","9783031926488"],"references-count":37,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-92648-8_8","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025]]},"assertion":[{"value":"12 May 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ECCV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"European Conference on Computer Vision","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Milan","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Italy","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"29 September 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"4 October 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"18","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"eccv2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/eccv2024.ecva.net\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}