{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:48:51Z","timestamp":1760233731438,"version":"build-2065373602"},"reference-count":33,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,2,11]],"date-time":"2021-02-11T00:00:00Z","timestamp":1613001600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["2019R1F1A1061941"],"award-info":[{"award-number":["2019R1F1A1061941"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Explaining the prediction of deep neural networks makes the networks more understandable and trusted, leading to their use in various mission critical tasks. Recent progress in the learning capability of networks has primarily been due to the enormous number of model parameters, so that it is usually hard to interpret their operations, as opposed to classical white-box models. For this purpose, generating saliency maps is a popular approach to identify the important input features used for the model prediction. Existing explanation methods typically only use the output of the last convolution layer of the model to generate a saliency map, lacking the information included in intermediate layers. Thus, the corresponding explanations are coarse and result in limited accuracy. Although the accuracy can be improved by iteratively developing a saliency map, this is too time-consuming and is thus impractical. To address these problems, we proposed a novel approach to explain the model prediction by developing an attentive surrogate network using the knowledge distillation. The surrogate network aims to generate a fine-grained saliency map corresponding to the model prediction using meaningful regional information presented over all network layers. Experiments demonstrated that the saliency maps are the result of spatially attentive features learned from the distillation. Thus, they are useful for fine-grained classification tasks. Moreover, the proposed method runs at the rate of 24.3 frames per second, which is much faster than the existing methods by orders of magnitude.<\/jats:p>","DOI":"10.3390\/s21041280","type":"journal-article","created":{"date-parts":[[2021,2,12]],"date-time":"2021-02-12T16:12:10Z","timestamp":1613146330000},"page":"1280","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Explaining Neural Networks Using Attentive Knowledge Distillation"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9733-671X","authenticated-orcid":false,"given":"Hyeonseok","family":"Lee","sequence":"first","affiliation":[{"name":"Division of Computer Science and Engineering, Jeonbuk National University, Jeollabuk-do 54896, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5887-5606","authenticated-orcid":false,"given":"Sungchan","family":"Kim","sequence":"additional","affiliation":[{"name":"Division of Computer Science and Engineering, Jeonbuk National University, Jeollabuk-do 54896, Korea"},{"name":"Research Center for Artificial Intelligence Technology, Jeonbuk National University, Jeollabuk-do 54896, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,11]]},"reference":[{"key":"ref_1","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A.A. (2017, January 4\u20139). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and So Kweon, I. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (arXiv, 2016). Inception-v4, inception-resnet and the impact of residual connections on learning, arXiv.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"52138","DOI":"10.1109\/ACCESS.2018.2870052","article-title":"Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI)","volume":"6","author":"Adadi","year":"2018","journal-title":"IEEE Access"},{"key":"ref_9","unstructured":"Petsiuk, V., Das, A., and Saenko, K. (arXiv, 2018). Rise: Randomized input sampling for explanation of black-box models, arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.T., Singh, S., and Guestrin, C. (2016, January 13\u201317). \u201cWhy should I trust you?\u201d Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939778"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017, January 22\u201329). Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.74"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"8572","DOI":"10.1109\/ACCESS.2019.2963055","article-title":"Regional multi-scale approach for visually pleasing explanations of deep neural networks","volume":"8","author":"Seo","year":"2019","journal-title":"IEEE Access"},{"key":"ref_13","unstructured":"Simonyan, K., Vedaldi, A., and Zisserman, A. (arXiv, 2013). Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wagner, J., Kohler, J.M., Gindele, T., Hetzel, L., Wiedemer, J.T., and Behnke, S. (2019, January 16\u201320). Interpretable and fine-grained visual explanations for convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00931"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., and Fergus, R. (2014). Visualizing and understanding convolutional networks. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, Germany, 6\u201312 September 2014, Springer.","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"ref_16","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (July, January 26). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Fong, R.C., and Vedaldi, A. (2017, January 22\u201329). Interpretable explanations of black boxes by meaningful perturbation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.371"},{"key":"ref_18","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (arXiv, 2015). Distilling the knowledge in a neural network, arXiv."},{"key":"ref_19","unstructured":"Dabkowski, P., and Gal, Y. (2017, January 4\u20139). Real time image saliency for black box classifiers. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_20","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (arXiv, 2014). Neural machine translation by jointly learning to align and translate, arXiv."},{"key":"ref_21","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (arXiv, 2018). Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201322). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1002\/mp.13933","article-title":"CT prostate segmentation based on synthetic MRI-aided deep attention fully convolution network","volume":"47","author":"Lei","year":"2020","journal-title":"Med. Phys."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5\u20139 October 2015, Springer.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J.Y., Zhou, T., and Efros, A.A. (2017, January 21\u201326). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.632"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhu, J.Y., Park, T., Isola, P., and Efros, A.A. (2017, January 22\u201329). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.244"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., and Catanzaro, B. (2018, January 18\u201322). High-resolution image synthesis and semantic manipulation with conditional gans. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00917"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_29","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. (2011). The Caltech-Ucsd Birds-200-2011 Dataset, California Institute of Technology."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Krause, J., Stark, M., Deng, J., and Fei-Fei, L. (2013, January 1\u20138). 3d object representations for fine-grained categorization. Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, Australia.","DOI":"10.1109\/ICCVW.2013.77"},{"key":"ref_31","unstructured":"Maji, S., Rahtu, E., Kannala, J., Blaschko, M., and Vedaldi, A. (arXiv, 2013). Fine-grained visual classification of aircraft, arXiv."},{"key":"ref_32","unstructured":"Nesterov, Y. (2013). Introductory Lectures on Convex Optimization: A Basic Course, Springer Science & Business Media."},{"key":"ref_33","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). Pytorch: An imperative style, high-performance deep learning library. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/4\/1280\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:22:47Z","timestamp":1760160167000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/4\/1280"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,11]]},"references-count":33,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["s21041280"],"URL":"https:\/\/doi.org\/10.3390\/s21041280","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,2,11]]}}}