{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T19:35:42Z","timestamp":1768073742261,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,14]],"date-time":"2021-08-14T00:00:00Z","timestamp":1628899200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,14]]},"DOI":"10.1145\/3447548.3467307","type":"proceedings-article","created":{"date-parts":[[2021,8,13]],"date-time":"2021-08-13T18:21:39Z","timestamp":1628878899000},"page":"25-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":48,"title":["Why Attentions May Not Be Interpretable?"],"prefix":"10.1145","author":[{"given":"Bing","family":"Bai","sequence":"first","affiliation":[{"name":"Tencent Inc., Beijing, China"}]},{"given":"Jian","family":"Liang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}]},{"given":"Guanhua","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tencent Inc., Guangzhou, China"}]},{"given":"Hao","family":"Li","sequence":"additional","affiliation":[{"name":"Tencent Inc., Beijing, China"}]},{"given":"Kun","family":"Bai","sequence":"additional","affiliation":[{"name":"Tencent Inc., Guangzhou, China"}]},{"given":"Fei","family":"Wang","sequence":"additional","affiliation":[{"name":"Weill Cornell Medicine, New York, NY, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,8,14]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural Machine Translation by Jointly Learning to Align and Translate . In Proceedings of the International Conference on Learning Representations. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_2_2_1","volume-title":"CSRN: Collaborative Sequential Recommendation Networks for News Retrieval. arXiv preprint arXiv:2004.04816","author":"Bai Bing","year":"2020","unstructured":"Bing Bai , Guanhua Zhang , Ye Lin , Hao Li , Kun Bai , and Bo Luo . 2020 . CSRN: Collaborative Sequential Recommendation Networks for News Retrieval. arXiv preprint arXiv:2004.04816 (2020). Bing Bai, Guanhua Zhang, Ye Lin, Hao Li, Kun Bai, and Bo Luo. 2020. CSRN: Collaborative Sequential Recommendation Networks for News Retrieval. arXiv preprint arXiv:2004.04816 (2020)."},{"key":"e_1_3_2_2_3_1","volume-title":"Explaining a Black-box using Deep Variational Information Bottleneck Approach. arXiv preprint arXiv:1902.06918","author":"Bang Seojin","year":"2019","unstructured":"Seojin Bang , Pengtao Xie , Wei Wu , and Eric Xing . 2019. Explaining a Black-box using Deep Variational Information Bottleneck Approach. arXiv preprint arXiv:1902.06918 ( 2019 ). Seojin Bang, Pengtao Xie, Wei Wu, and Eric Xing. 2019. Explaining a Black-box using Deep Variational Information Bottleneck Approach. arXiv preprint arXiv:1902.06918 (2019)."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00254"},{"key":"e_1_3_2_2_5_1","volume-title":"The Emerging Landscape of Interpretable Agent Behavior. In Proceedings of the International Conference on Automated Planning and Scheduling","volume":"96","author":"Chakraborti Tathagata","year":"2019","unstructured":"Tathagata Chakraborti , Anagha Kulkarni , Sarath Sreedharan , David E Smith , and Subbarao Kambhampati . 2019 . Explicability? Legibility? Predictability? Transparency? Privacy? Security ? The Emerging Landscape of Interpretable Agent Behavior. In Proceedings of the International Conference on Automated Planning and Scheduling , Vol. 29(1). 86-- 96 . Tathagata Chakraborti, Anagha Kulkarni, Sarath Sreedharan, David E Smith, and Subbarao Kambhampati. 2019. Explicability? Legibility? Predictability? Transparency? Privacy? Security? The Emerging Landscape of Interpretable Agent Behavior. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 29(1). 86--96."},{"key":"e_1_3_2_2_6_1","volume-title":"International Conference on Machine Learning. 883--892","author":"Chen Jianbo","year":"2018","unstructured":"Jianbo Chen , Le Song , Martin Wainwright , and Michael Jordan . 2018 . Learning to Explain: An Information-Theoretic Perspective on Model Interpretation . In International Conference on Machine Learning. 883--892 . Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In International Conference on Machine Learning. 883--892."},{"key":"e_1_3_2_2_7_1","volume-title":"Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart.","author":"Choi Edward","year":"2016","unstructured":"Edward Choi , Mohammad Taha Bahadori , Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016 . RETAIN : An Interpretable Predictive Model for Healthcare Using Reverse Time Attention Mechanism. In Advances in Neural Information Processing Systems . 3504--3512. Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. RETAIN: An Interpretable Predictive Model for Healthcare Using Reverse Time Attention Mechanism. In Advances in Neural Information Processing Systems. 3504--3512."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-4828"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.2202\/1557-4679.1198"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2005.24"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2642953"},{"key":"e_1_3_2_2_12_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Jain Sarthak","year":"2019","unstructured":"Sarthak Jain and Byron C Wallace . 2019 . Attention is Not Explanation . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers). 3543--3556. Sarthak Jain and Byron C Wallace. 2019. Attention is Not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3543--3556."},{"key":"e_1_3_2_2_13_1","volume-title":"Categorical Reparametrization with Gumble-Softmax. In International Conference on Learning Representations.","author":"Jang Eric","year":"2017","unstructured":"Eric Jang , Shixiang Gu , and Ben Poole . 2017 . Categorical Reparametrization with Gumble-Softmax. In International Conference on Learning Representations. Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparametrization with Gumble-Softmax. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v29i1.9513"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00719"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403071"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236386.3241340"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002491"},{"key":"e_1_3_2_2_20_1","volume-title":"International Conference on Machine Learning. 1614--1623","author":"Martins Andre","year":"2016","unstructured":"Andre Martins and Ramon Astudillo . 2016 . From Softmax to Sparsemax: A Sparse Model of Attention and Multi-label Classification . In International Conference on Machine Learning. 1614--1623 . Andre Martins and Ramon Astudillo. 2016. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-label Classification. In International Conference on Machine Learning. 1614--1623."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/70.1.41"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0037350"},{"key":"e_1_3_2_2_25_1","unstructured":"Patrick Schwab and Walter Karlen. 2019. CXPlain: Causal Explanations for Model Interpretation Under Uncertainty. In Advances in Neural Information Processing Systems. 10220--10230.  Patrick Schwab and Walter Karlen. 2019. CXPlain: Causal Explanations for Model Interpretation Under Uncertainty. In Advances in Neural Information Processing Systems. 10220--10230."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1282"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214508000000733"},{"key":"e_1_3_2_2_28_1","volume-title":"Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv preprint arXiv:1312.6034","author":"Simonyan Karen","year":"2013","unstructured":"Karen Simonyan , Andrea Vedaldi , and Andrew Zisserman . 2013. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv preprint arXiv:1312.6034 ( 2013 ). Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv preprint arXiv:1312.6034 (2013)."},{"key":"e_1_3_2_2_29_1","unstructured":"Vladimir Vapnik. 1992. Principles of Risk Minimization for Learning Theory. In Advances in Neural Information Processing Systems. 831--838.  Vladimir Vapnik. 1992. Principles of Risk Minimization for Learning Theory. In Advances in Neural Information Processing Systems. 831--838."},{"key":"e_1_3_2_2_30_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems. 5998--6008."},{"key":"e_1_3_2_2_31_1","unstructured":"Oriol Vinyals \u0141ukasz Kaiser Terry Koo Slav Petrov Ilya Sutskever and Geoffrey Hinton. 2015. Grammar as a Foreign Language. In Advances in Neural Information Processing Systems. 2773--2781.  Oriol Vinyals \u0141ukasz Kaiser Terry Koo Slav Petrov Ilya Sutskever and Geoffrey Hinton. 2015. Grammar as a Foreign Language. In Advances in Neural Information Processing Systems. 2773--2781."},{"key":"e_1_3_2_2_32_1","volume-title":"Should Health Care Demand Interpretable Artificial Intelligence or Accept \"Black Box\" Medicine? Annals of Internal Medicine","author":"Wang Fei","year":"2019","unstructured":"Fei Wang , Rainu Kaushal , and Dhruv Khullar . 2019. Should Health Care Demand Interpretable Artificial Intelligence or Accept \"Black Box\" Medicine? Annals of Internal Medicine ( 2019 ). Fei Wang, Rainu Kaushal, and Dhruv Khullar. 2019. Should Health Care Demand Interpretable Artificial Intelligence or Accept \"Black Box\" Medicine? Annals of Internal Medicine (2019)."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1058"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1002"},{"key":"e_1_3_2_2_35_1","volume-title":"The Estimation of Causal Effects From Observational Data. Annual review of sociology","author":"Winship Christopher","year":"1999","unstructured":"Christopher Winship and Stephen L Morgan . 1999. The Estimation of Causal Effects From Observational Data. Annual review of sociology , Vol. 25 , 1 ( 1999 ), 659--706. Christopher Winship and Stephen L Morgan. 1999. The Estimation of Causal Effects From Observational Data. Annual review of sociology, Vol. 25, 1 (1999), 659--706."},{"key":"e_1_3_2_2_36_1","volume-title":"Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747","author":"Xiao Han","year":"2017","unstructured":"Han Xiao , Kashif Rasul , and Roland Vollgraf . 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 ( 2017 ). Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017)."},{"key":"e_1_3_2_2_37_1","volume-title":"Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning. 2048--2057","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhudinov , Rich Zemel , and Yoshua Bengio . 2015 . Show , Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning. 2048--2057 . Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning. 2048--2057."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1420"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015425"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1435"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.380"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403334"},{"key":"e_1_3_2_2_43_1","unstructured":"Xiang Zhang Junbo Zhao and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems. 649--657.  Xiang Zhang Junbo Zhao and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems. 649--657."},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.219"}],"event":{"name":"KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Virtual Event Singapore","acronym":"KDD '21","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447548.3467307","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447548.3467307","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:22Z","timestamp":1750191502000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447548.3467307"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,14]]},"references-count":44,"alternative-id":["10.1145\/3447548.3467307","10.1145\/3447548"],"URL":"https:\/\/doi.org\/10.1145\/3447548.3467307","relation":{},"subject":[],"published":{"date-parts":[[2021,8,14]]},"assertion":[{"value":"2021-08-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}