{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T17:13:32Z","timestamp":1773249212925,"version":"3.50.1"},"reference-count":72,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T00:00:00Z","timestamp":1753833600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T00:00:00Z","timestamp":1753833600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2025,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Explainability in artificial intelligence (XAI) remains a crucial aspect for fostering trust and understanding in machine learning models. Current visual explanation techniques, such as gradient-based or class-activation-based methods, often exhibit a strong dependence on specific model architectures. Conversely, perturbation-based methods, despite being model-agnostic, are computationally expensive as they require evaluating models on a large number of forward passes. We introduce Foveation-based Explanations (FovEx), a novel XAI method inspired by human vision, which combines biologically inspired foveation-based transformations with gradient-driven overt attention to iteratively select locations of interest. These locations are selected to maximize the performance of the model to be explained with respect to the downstream task and then combined to generate an attribution map. We provide a thorough evaluation with qualitative and quantitative assessments on established benchmarks. Our method achieves state-of-the-art performance on both transformers (on 4 out of 5 metrics) and convolutional models (on 3 out of 5 metrics), demonstrating its versatility among various architectures. Furthermore, we show the alignment between the explanation map produced by FovEx and human gaze patterns (+14% in NSS compared to RISE, +203% in NSS compared to GradCAM). This comparison enhances our confidence in FovEx\u2019s ability to close the interpretation gap between humans and machines.<\/jats:p>","DOI":"10.1007\/s11263-025-02543-y","type":"journal-article","created":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:49:29Z","timestamp":1753894169000},"page":"7437-7459","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["FovEx: Human-Inspired Explanations for Vision Transformers and Convolutional Neural Networks"],"prefix":"10.1007","volume":"133","author":[{"given":"Mahadev Prasad","family":"Panda","sequence":"first","affiliation":[]},{"given":"Matteo","family":"Tiezzi","sequence":"additional","affiliation":[]},{"given":"Martina","family":"Vilas","sequence":"additional","affiliation":[]},{"given":"Gemma","family":"Roig","sequence":"additional","affiliation":[]},{"given":"Bjoern M.","family":"Eskofier","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5886-0597","authenticated-orcid":false,"given":"Dario","family":"Zanca","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,30]]},"reference":[{"key":"2543_CR1","doi-asserted-by":"crossref","unstructured":"Abnar, S., & Zuidema, W. (2020). Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928.","DOI":"10.18653\/v1\/2020.acl-main.385"},{"issue":"9","key":"2543_CR2","doi-asserted-by":"publisher","first-page":"1006","DOI":"10.1038\/s42256-023-00711-8","volume":"5","author":"R Achtibat","year":"2023","unstructured":"Achtibat, R., Dreyer, M., Eisenbraun, I., Bosse, S., Wiegand, T., Samek, W., & Lapuschkin, S. (2023). From attribution maps to human-understandable explanations through concept relevance propagation. Nature Machine Intelligence, 5(9), 1006\u20131019.","journal-title":"Nature Machine Intelligence"},{"key":"2543_CR3","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.101805","volume":"99","author":"S Ali","year":"2023","unstructured":"Ali, S., Abuhmed, T., El-Sappagh, S., Muhammad, K., Alonso-Moral, J. M., Confalonieri, R., Guidotti, R., Del Ser, J., D\u00edaz-Rodr\u00edguez, N., & Herrera, F. (2023). Explainable artificial intelligence (xai): What we know and what is left to attain trustworthy artificial intelligence. Information Fusion, 99, Article 101805.","journal-title":"Information Fusion"},{"issue":"7","key":"2543_CR4","doi-asserted-by":"publisher","first-page":"0130140","DOI":"10.1371\/journal.pone.0130140","volume":"10","author":"S Bach","year":"2015","unstructured":"Bach, S., Binder, A., Montavon, G., Klauschen, F., M\u00fcller, K.-R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One, 10(7), 0130140.","journal-title":"PloS One"},{"key":"2543_CR5","unstructured":"Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., & Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258."},{"key":"2543_CR6","volume-title":"The Fovea: Structure, function, development, and tractional disorders","author":"A Bringmann","year":"2021","unstructured":"Bringmann, A., & Wiedemann, P. (2021). The Fovea: Structure, function, development, and tractional disorders. Academic Press."},{"key":"2543_CR7","doi-asserted-by":"crossref","unstructured":"Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2018). Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 839\u2013847).","DOI":"10.1109\/WACV.2018.00097"},{"key":"2543_CR8","doi-asserted-by":"crossref","unstructured":"Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2018). Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839\u2013847 . IEEE.","DOI":"10.1109\/WACV.2018.00097"},{"key":"2543_CR9","doi-asserted-by":"crossref","unstructured":"Chefer, H., Gur, S., & Wolf, L. (2021). Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), (pp. 397\u2013406).","DOI":"10.1109\/ICCV48922.2021.00045"},{"key":"2543_CR10","doi-asserted-by":"crossref","unstructured":"Chefer, H., Gur, S., & Wolf, L. (2021a). Transformer interpretability beyond attention visualization. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 782\u2013791).","DOI":"10.1109\/CVPR46437.2021.00084"},{"key":"2543_CR11","doi-asserted-by":"crossref","unstructured":"Chefer, H., Gur, S., & Wolf, L. (2021b). Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (pp. 397\u2013406).","DOI":"10.1109\/ICCV48922.2021.00045"},{"key":"2543_CR12","doi-asserted-by":"crossref","unstructured":"Choi, H., Jin, S., & Han, K. (2023). Adversarial normalization: I can visualize everything (ice). In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 12115\u201312124).","DOI":"10.1109\/CVPR52729.2023.01166"},{"key":"2543_CR13","unstructured":"Darcet, T., Oquab, M., Mairal, J., & Bojanowski, P. (2023). Vision transformers need registers. arXiv preprint arXiv:2309.16588."},{"key":"2543_CR14","unstructured":"Deza, A., & Konkle, T. (2020). Emergent properties of foveated perceptual systems. arXiv preprint arXiv:2006.07991."},{"key":"2543_CR15","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929."},{"key":"2543_CR16","doi-asserted-by":"crossref","unstructured":"Fong, R. C., & Vedaldi, A. (2017). Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3429\u20133437).","DOI":"10.1109\/ICCV.2017.371"},{"key":"2543_CR17","doi-asserted-by":"crossref","unstructured":"Fong, R., Patrick, M., & Vedaldi, A. (2019). Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (pp. 2950\u20132958).","DOI":"10.1109\/ICCV.2019.00304"},{"issue":"4","key":"2543_CR18","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1016\/j.iatssr.2019.11.008","volume":"43","author":"H Fujiyoshi","year":"2019","unstructured":"Fujiyoshi, H., Hirakawa, T., & Yamashita, T. (2019). Deep learning-based image recognition for autonomous driving. IATSS Research, 43(4), 244\u2013252.","journal-title":"IATSS Research"},{"issue":"3","key":"2543_CR19","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1609\/aimag.v38i3.2741","volume":"38","author":"B Goodman","year":"2017","unstructured":"Goodman, B., & Flaxman, S. (2017). European union regulations on algorithmic decision-making and a \u201cright to explanation\u2019\u2019. AI magazine, 38(3), 50\u201357.","journal-title":"AI magazine"},{"key":"2543_CR20","doi-asserted-by":"publisher","first-page":"1175","DOI":"10.1007\/s11042-020-09425-0","volume":"80","author":"M Gruosso","year":"2021","unstructured":"Gruosso, M., Capece, N., & Erra, U. (2021). Human segmentation in surveillance video with deep learning. Multimedia Tools and Applications, 80, 1175\u20131199.","journal-title":"Multimedia Tools and Applications"},{"issue":"1","key":"2543_CR21","doi-asserted-by":"publisher","first-page":"1411","DOI":"10.1038\/s41598-019-57261-6","volume":"10","author":"Y Han","year":"2020","unstructured":"Han, Y., Roig, G., Geiger, G., & Poggio, T. (2020). Scale and translation-invariance for novel objects in human vision. Scientific reports, 10(1), 1411.","journal-title":"Scientific reports"},{"key":"2543_CR22","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 770\u2013778).","DOI":"10.1109\/CVPR.2016.90"},{"issue":"6","key":"2543_CR23","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1016\/S0161-6420(84)34247-6","volume":"91","author":"AE Hendrickson","year":"1984","unstructured":"Hendrickson, A. E., & Yuodelis, C. (1984). The morphological development of the human fovea. Ophthalmology, 91(6), 603\u2013612.","journal-title":"Ophthalmology"},{"key":"2543_CR24","unstructured":"Hsiao, J., & Chan, A. (2023). Towards the next generation explainable ai that promotes ai-human mutual understanding. In: XAI in Action: Past, Present, and Future Applications."},{"issue":"5","key":"2543_CR25","doi-asserted-by":"publisher","first-page":"3077","DOI":"10.1109\/TII.2019.2902274","volume":"15","author":"R Iqbal","year":"2019","unstructured":"Iqbal, R., Maniak, T., Doctor, F., & Karyotis, C. (2019). Fault detection and isolation in industrial processes using deep learning approaches. IEEE Transactions on Industrial Informatics, 15(5), 3077\u20133084.","journal-title":"IEEE Transactions on Industrial Informatics"},{"key":"2543_CR26","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1016\/j.patrec.2021.06.030","volume":"150","author":"M Ivanovs","year":"2021","unstructured":"Ivanovs, M., Kadikis, R., & Ozols, K. (2021). Perturbation-based methods for explaining deep neural networks: A survey. Pattern Recognition Letters, 150, 228\u2013234.","journal-title":"Pattern Recognition Letters"},{"key":"2543_CR27","doi-asserted-by":"crossref","unstructured":"Iwana, B. K., Kuroki, R., & Uchida, S. (2019). Explaining convolutional neural networks using softmax gradient layer-wise relevance propagation. In International Conference on Computer Vision Workshops.","DOI":"10.1109\/ICCVW.2019.00513"},{"key":"2543_CR28","doi-asserted-by":"crossref","unstructured":"Jing, T., Xia, H., Tian, R., Ding, H., Luo, X., Domeyer, J., Sherony, R., & Ding, Z. (2022). Inaction: Interpretable action decision making for autonomous driving. In European Conference on Computer Vision (pp. 370\u2013387). Springer.","DOI":"10.1007\/978-3-031-19839-7_22"},{"key":"2543_CR29","unstructured":"Jonnalagadda, A., Wang, W. Y., Manjunath, B., & Eckstein, M. P. (2021). Foveater: Foveated transformer for image classification. arXiv preprint arXiv:2105.14173."},{"key":"2543_CR30","doi-asserted-by":"crossref","unstructured":"Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In 2009 IEEE 12th International Conference on Computer Vision (pp. 2106\u20132113). IEEE.","DOI":"10.1109\/ICCV.2009.5459462"},{"key":"2543_CR31","doi-asserted-by":"publisher","first-page":"2086","DOI":"10.1109\/TMM.2020.3007321","volume":"23","author":"Q Lai","year":"2020","unstructured":"Lai, Q., Khan, S., Nie, Y., Sun, H., Shen, J., & Shao, L. (2020). Understanding more about human and machine attention in deep neural networks. IEEE Transactions on Multimedia, 23, 2086\u20132099.","journal-title":"IEEE Transactions on Multimedia"},{"key":"2543_CR32","doi-asserted-by":"crossref","unstructured":"Lee, J. R., Kim, S., Park, I., Eo, T., & Hwang, D. (2021). Relevance-cam: Your model already knows where to look. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (pp. 14944\u201314953).","DOI":"10.1109\/CVPR46437.2021.01470"},{"key":"2543_CR33","doi-asserted-by":"crossref","unstructured":"Li, Z., Wang, W., Li, Z., Huang, Y., & Sato, Y. (2021). Towards visually explaining video understanding networks with perturbation. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (pp. 1120\u20131129).","DOI":"10.1109\/WACV48630.2021.00116"},{"key":"2543_CR34","doi-asserted-by":"crossref","unstructured":"Liang, J., Mahler, J., Laskey, M., Li, P., & Goldberg, K. (2017). Using dvrk teleoperation to facilitate deep learning of automation tasks for an industrial robot. In 2017 13th IEEE Conference on Automation Science and Engineering (CASE), (pp. 1\u20138). IEEE.","DOI":"10.1109\/COASE.2017.8256067"},{"key":"2543_CR35","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (pp. 740\u2013755). Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"2543_CR36","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Proceedings, Part I 14 (pp. 21\u201337). Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"2543_CR37","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A convnet for the 2020s. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (pp. 11976\u201311986).","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"2543_CR38","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2024.102301","volume":"106","author":"L Longo","year":"2024","unstructured":"Longo, L., Brcic, M., Cabitza, F., Choi, J., Confalonieri, R., Del Ser, J., Guidotti, R., Hayashi, Y., Herrera, F., Holzinger, A., et al. (2024). Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions. Information Fusion, 106, Article 102301.","journal-title":"Information Fusion"},{"key":"2543_CR39","doi-asserted-by":"crossref","unstructured":"Lucieri, A., Bajwa, M. N., Dengel, A., & Ahmed, S. (2020). Explaining ai-based decision support systems using concept localization maps. In International Conference on Neural Information Processing (pp. 185\u2013193). Springer.","DOI":"10.1007\/978-3-030-63820-7_21"},{"key":"2543_CR40","unstructured":"Malkin, E., Deza, A., & Poggio, T. (2020). Cuda-optimized real-time rendering of a foveated visual system. arXiv preprint arXiv:2012.08655."},{"issue":"2","key":"2543_CR41","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1109\/MIE.2020.3034884","volume":"15","author":"B Maschler","year":"2021","unstructured":"Maschler, B., & Weyrich, M. (2021). Deep transfer learning for industrial automation: A review and discussion of new techniques for data-driven machine learning. IEEE Industrial Electronics Magazine, 15(2), 65\u201375.","journal-title":"IEEE Industrial Electronics Magazine"},{"issue":"CSCW1","key":"2543_CR42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3579481","volume":"7","author":"K Morrison","year":"2023","unstructured":"Morrison, K., Shin, D., Holstein, K., & Perer, A. (2023). Evaluating the impact of human explanation strategies on human-ai visual decision-making. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1\u201337.","journal-title":"Proceedings of the ACM on Human-Computer Interaction"},{"key":"2543_CR43","doi-asserted-by":"crossref","unstructured":"Najibi, M., Ji, J., Zhou, Y., Qi, C. R., Yan, X., Ettinger, S., & Anguelov, D. (2022). Motion inspired unsupervised perception and prediction in autonomous driving. In European Conference on Computer Vision (pp. 424\u2013443). Springer.","DOI":"10.1007\/978-3-031-19839-7_25"},{"key":"2543_CR44","doi-asserted-by":"crossref","unstructured":"Nguyen, T. T. H., Truong, V. B., Nguyen, V. T. K., Cao, Q. H., & Nguyen, Q. K. (2023). Towards trust of explainable ai in thyroid nodule diagnosis. arXiv preprint arXiv:2303.04731.","DOI":"10.1007\/978-3-031-36938-4_2"},{"issue":"4","key":"2543_CR45","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1109\/MSP.2022.3142719","volume":"39","author":"IE Nielsen","year":"2022","unstructured":"Nielsen, I. E., Dera, D., Rasool, G., Ramachandran, R. P., & Bouaynaya, N. C. (2022). Robust explainability: A tutorial on gradient-based attribution methods for deep neural networks. IEEE Signal Processing Magazine, 39(4), 73\u201384.","journal-title":"IEEE Signal Processing Magazine"},{"key":"2543_CR46","doi-asserted-by":"crossref","unstructured":"Ouyang, C., Biffi, C., Chen, C., Kart, T., Qiu, H., & Rueckert, D. (2020). Self-supervision with superpixels: Training few-shot medical image segmentation without annotation. In: Computer Vision\u2013ECCV 2020: 16th European Conference, August 23\u201328, 2020, Proceedings, Part XXIX 16 (pp. 762\u2013780). Springer, Glasgow, UK.","DOI":"10.1007\/978-3-030-58526-6_45"},{"key":"2543_CR47","doi-asserted-by":"crossref","unstructured":"Pamplona, D., & Bernardino, A. (2009). Smooth foveal vision with gaussian receptive fields. In 2009 9th IEEE-RAS International Conference on Humanoid Robots (pp. 223\u2013229). IEEE.","DOI":"10.1109\/ICHR.2009.5379575"},{"key":"2543_CR48","unstructured":"Petsiuk, V., Das, A., & Saenko, K. (2018). Rise: Randomized input sampling for explanation of black-box models. In Proceedings of the British Machine Vision Conference (BMVC)."},{"key":"2543_CR49","doi-asserted-by":"crossref","unstructured":"Petsiuk, V., Jain, R., Manjunatha, V., Morariu, V. I., Mehra, A., Ordonez, V., & Saenko, K. (2021). Black-box explanation of object detectors via saliency maps. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (pp. 11443\u201311452).","DOI":"10.1109\/CVPR46437.2021.01128"},{"key":"2543_CR50","unstructured":"Qi, R., Zheng, Y., Yang, Y., Cao, C. C., & Hsiao, J. H. (2023). Explanation strategies for image classification in humans vs. current explainable ai. arXiv preprint arXiv:2304.04448."},{"key":"2543_CR51","unstructured":"Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (pp. 8748\u20138763). PmLR."},{"key":"2543_CR52","unstructured":"Ridnik, T., Ben-Baruch, E., Noy, A., & Zelnik-Manor, L. (2021). Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972."},{"issue":"3","key":"2543_CR53","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","volume":"115","author":"O Russakovsky","year":"2015","unstructured":"Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3), 211\u2013252.","journal-title":"International Journal of Computer Vision (IJCV)"},{"key":"2543_CR54","unstructured":"Schwinn, L., Precup, D., Eskofier, B., & Zanca, D. (2022). Behind the machine\u2019s gaze: Neural networks with biologically-inspired constraints exhibit human-like visual attention. arXiv preprint arXiv:2204.09093."},{"key":"2543_CR55","doi-asserted-by":"crossref","unstructured":"Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 618\u2013626).","DOI":"10.1109\/ICCV.2017.74"},{"key":"2543_CR56","doi-asserted-by":"crossref","unstructured":"Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, (pp. 618\u2013626).","DOI":"10.1109\/ICCV.2017.74"},{"key":"2543_CR57","unstructured":"Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Workshop at International Conference on Learning Representations."},{"key":"2543_CR58","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1016\/j.procs.2020.06.030","volume":"173","author":"V Singh","year":"2020","unstructured":"Singh, V., Singh, S., & Gupta, P. (2020). Real-time anomaly recognition through cctv using neural networks. Procedia Computer Science, 173, 254\u2013263.","journal-title":"Procedia Computer Science"},{"key":"2543_CR59","unstructured":"Smilkov, D., Thorat, N., Kim, B., Vi\u00e9gas, F., & Wattenberg, M. (2017). Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825."},{"key":"2543_CR60","doi-asserted-by":"crossref","unstructured":"Tiezzi, M., Marullo, S., Betti, A., Meloni, E., Faggi, L., Gori, M., & Melacci, S. (2022). Foveated neural computation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 19\u201335). Springer.","DOI":"10.1007\/978-3-031-26409-2_2"},{"key":"2543_CR61","volume-title":"A Computational Perspective on Visual Attention","author":"JK Tsotsos","year":"2021","unstructured":"Tsotsos, J. K. (2021). A Computational Perspective on Visual Attention. MIT Press."},{"key":"2543_CR62","first-page":"40030","volume":"36","author":"MG Vilas","year":"2023","unstructured":"Vilas, M. G., Schauml\u00f6ffel, T., & Roig, G. (2023). Analyzing Vision Transformers for Image Classification in Class Embedding Space. Advances in Neural Information Processing Systems, 36, 40030\u201340041.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2543_CR63","unstructured":"Volokitin, A., Roig, G., & Poggio, T. A. (2017). Do deep neural networks suffer from crowding? Advances in neural information processing systems,30."},{"key":"2543_CR64","doi-asserted-by":"crossref","unstructured":"Wagner, J., Kohler, J. M., Gindele, T., Hetzel, L., Wiedemer, J. T., & Behnke, S. (2019). Interpretable and fine-grained visual explanations for convolutional neural networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (pp. 9097\u20139107).","DOI":"10.1109\/CVPR.2019.00931"},{"key":"2543_CR65","doi-asserted-by":"crossref","unstructured":"Wang, J., Jin, Y., & Wang, L. (2022). Personalizing federated medical image segmentation via local calibration. In: European Conference on Computer Vision (pp. 456\u2013472). Springer.","DOI":"10.1007\/978-3-031-19803-8_27"},{"key":"2543_CR66","doi-asserted-by":"crossref","unstructured":"Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., & Hu, X. (2020). Score-cam: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 24\u201325).","DOI":"10.1109\/CVPRW50498.2020.00020"},{"key":"2543_CR67","doi-asserted-by":"publisher","DOI":"10.1093\/oso\/9780195126938.001.0001","volume-title":"Visual Attention","author":"RD Wright","year":"1998","unstructured":"Wright, R. D. (1998). Visual Attention. Oxford University Press."},{"key":"2543_CR68","doi-asserted-by":"publisher","first-page":"29245","DOI":"10.1007\/s11042-018-5953-1","volume":"77","author":"J Wu","year":"2018","unstructured":"Wu, J., Zhong, S.-H., Ma, Z., Heinen, S. J., & Jiang, J. (2018). Foveated convolutional neural networks for video summarization. Multimedia Tools and Applications, 77, 29245\u201329267.","journal-title":"Multimedia Tools and Applications"},{"issue":"4","key":"2543_CR69","doi-asserted-by":"publisher","first-page":"5495","DOI":"10.1007\/s11042-020-09964-6","volume":"80","author":"J Xu","year":"2021","unstructured":"Xu, J. (2021). A deep learning approach to building an intelligent video surveillance system. Multimedia Tools and Applications, 80(4), 5495\u20135515.","journal-title":"Multimedia Tools and Applications"},{"key":"2543_CR70","unstructured":"Zanca, D., Serchi, V., Piu, P., Rosini, F., & Rufa, A. (2018). Fixatons: A collection of human fixations datasets and metrics for scanpath similarity. arXiv preprint arXiv:1802.02534."},{"key":"2543_CR71","unstructured":"Zanca, D., Zugarini, A., Dietz, S., Altstidl, T. R., Ndjeuha, M. A. T., Schwinn, L., & Eskofier, B. M. (2023). Contrastive language-image pretrained models are zero-shot human scanpath predictors. CoRR abs\/2305.12380."},{"key":"2543_CR72","doi-asserted-by":"crossref","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2921\u20132929).","DOI":"10.1109\/CVPR.2016.319"}],"updated-by":[{"DOI":"10.1007\/s11263-025-02580-7","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T00:00:00Z","timestamp":1761782400000}}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02543-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-025-02543-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02543-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T06:41:54Z","timestamp":1761892914000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-025-02543-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,30]]},"references-count":72,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10]]}},"alternative-id":["2543"],"URL":"https:\/\/doi.org\/10.1007\/s11263-025-02543-y","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,30]]},"assertion":[{"value":"7 August 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 October 2025","order":5,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":6,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"This article was revised due to a retrospective Open Access order","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 October 2025","order":8,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":9,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":10,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1007\/s11263-025-02580-7","URL":"https:\/\/doi.org\/10.1007\/s11263-025-02580-7","order":11,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}]}}