{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T19:38:51Z","timestamp":1776109131624,"version":"3.50.1"},"reference-count":138,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2022,12,23]],"date-time":"2022-12-23T00:00:00Z","timestamp":1671753600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Canada CIFAR AI Chairs program"},{"name":"NSERC Discovery"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2023,8,31]]},"abstract":"<jats:p>Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.<\/jats:p>","DOI":"10.1145\/3546577","type":"journal-article","created":{"date-parts":[[2022,7,9]],"date-time":"2022-07-09T09:14:08Z","timestamp":1657358048000},"page":"1-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":144,"title":["Post-hoc Interpretability for Neural NLP: A Survey"],"prefix":"10.1145","volume":"55","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1487-2796","authenticated-orcid":false,"given":"Andreas","family":"Madsen","sequence":"first","affiliation":[{"name":"Mila &amp; Polytechnic Montreal, Montr\u00e9al, Quebec, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3753-0323","authenticated-orcid":false,"given":"Siva","family":"Reddy","sequence":"additional","affiliation":[{"name":"Mila &amp; McGill, Montr\u00e9al, QC, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9678-2830","authenticated-orcid":false,"given":"Sarath","family":"Chandar","sequence":"additional","affiliation":[{"name":"Mila &amp; Polytechnique Montreal, Montreal, Quebec, Canada"}]}],"member":"320","published-online":{"date-parts":[[2022,12,23]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.385"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2870052"},{"key":"e_1_3_3_4_2","first-page":"9505","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Adebayo Julius","year":"2018","unstructured":"Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 9505\u20139515."},{"key":"e_1_3_3_5_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Adi Yossi","year":"2017","unstructured":"Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. In Proceedings of the International Conference on Learning Representations. 1\u201312."},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1042"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1022"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.12"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1859912"},{"key":"e_1_3_3_10_2","first-page":"1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations. 1\u201315."},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00009"},{"key":"e_1_3_3_12_2","doi-asserted-by":"crossref","unstructured":"Jasmijn Bastings Sebastian Ebert Polina Zablotskaia Anders Sandholm and Katja Filippova. 2021. \u201cWill you find these shortcuts?\u201d A protocol for evaluating the faithfulness of input salience methods for text classification. arXiv:2111.07367. Retrieved from http:\/\/arxiv.org\/abs\/2111.07367.","DOI":"10.18653\/v1\/2022.emnlp-main.64"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.blackboxnlp-1.14"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00254\/43503\/Analysis-Methods-in-Neural-Language-Processing-A"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-tutorials.1"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00254"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3351095.3375624"},{"key":"e_1_3_3_19_2","article-title":"Latent Dirichlet allocation","author":"Blei David M.","year":"2003","unstructured":"David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993\u20131022. Retrieved from https:\/\/jmlr.org\/papers\/v3\/blei03a.html.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_20_2","first-page":"4356","article-title":"Man is to computer programmer as woman is to homemaker? Debiasing word embeddings","author":"Bolukbasi Tolga","year":"2016","unstructured":"Tolga Bolukbasi, Kai Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems (2016), 4356\u20134364.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_21_2","first-page":"1877","volume-title":"Proceedings of the Advances in Neural Information Processing Systems.","volume":"33","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems.H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877\u20131901. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf."},{"key":"e_1_3_3_22_2","article-title":"On identifiability in transformers","author":"Brunner Gino","year":"2020","unstructured":"Gino Brunner, Yang Liu, Dami\u00e1n Pascual, Oliver Richter, Massimiliano Ciaramita, and Roger Wattenhofer. 2020. On identifiability in transformers. International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=BJg1f6EFDB.","journal-title":"International Conference on Learning Representations"},{"key":"e_1_3_3_23_2","unstructured":"Gino Brunner Yuyi Wang Roger Wattenhofer and Michael Weigelt. 2019. Natural language multitasking analyzing and improving syntactic saliency of latent representations. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS\u201917) arXiv:1801.06024. Retrieved from http:\/\/arxiv.org\/abs\/1801.06024."},{"key":"e_1_3_3_24_2","first-page":"9539","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Camburu Oana-Maria","year":"2018","unstructured":"Oana-Maria Camburu, Tim Rockt\u00e4schel, Thomas Lukasiewicz, and Phil Blunsom. 2018. e-SNLI: Natural language inference with natural language explanations. In Proceedings of the Advances in Neural Information Processing Systems. 9539\u20139549."},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics8080832"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/UIC-ATC.2017.8397411"},{"key":"e_1_3_3_27_2","first-page":"288","volume-title":"Proceedings of the Advances in Neural Information Processing Systems.","volume":"22","author":"Chang Jonathan","year":"2009","unstructured":"Jonathan Chang, Jordan Boyd-graber, Sean Gerrish, Chong Wang, David M. Blei, Jordan Boyd-graber, and David M. Blei. 2009. Reading tea leaves: How humans interpret topic models. In Proceedings of the Advances in Neural Information Processing Systems.Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta (Eds.), Vol. 22. Curran Associates, Inc., 288\u2013296. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2009\/file\/f92586a25bb3145facd64ab20fd554ff-Paper.pdf."},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14034"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-4828"},{"key":"e_1_3_3_30_2","doi-asserted-by":"crossref","unstructured":"Louis Clouatre Prasanna Parthasarathi Amal Zouaq and Sarath Chandar. 2022. Local structure matters most: Perturbation study in NLU. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022 Association for Computational Linguistics Stroudsburg PA 3712\u20133731. http:\/\/arxiv.org\/abs\/2107.13955https:\/\/aclanthology.org\/2022.findings-acl.293.","DOI":"10.18653\/v1\/2022.findings-acl.293"},{"key":"e_1_3_3_31_2","first-page":"8594","volume-title":"Proceedings of the Advances in Neural Information Processing Systems.","volume":"32","author":"Coenen Andy","year":"2019","unstructured":"Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Vi\u00e9gas, Martin Wattenberg, Ann Yuan Been Kim, Adam Pearce, Fernanda Vi\u00e9gas, Martin Wattenberg, Ann Yuan, Martin Wattenberg, Fernanda B. Viegas, Andy Coenen, Adam Pearce, and Been Kim. 2019. Visualizing and measuring the geometry of BERT. In Proceedings of the Advances in Neural Information Processing Systems.H. Wallach, H. Larochelle, A. Beygelzimer, F. d\u2019Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 8594\u20138603. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2019\/file\/159c1ffe5b61b41b3c4d8f4c2150f6c4-Paper.pdf."},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1198"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1080\/00401706.1980.10486199"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.7275\/jyj1-4868"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02310792"},{"key":"e_1_3_3_36_2","unstructured":"Marina Danilevsky Kun Qian Ranit Aharonov Yannis Katsis Ban Kawas and Prithviraj Sen. 2020. A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing Association for Computational Linguistics Suzhou 447\u2013459. https:\/\/aclanthology.org\/2020.aacl-main.46."},{"key":"e_1_3_3_37_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_3_38_2","unstructured":"Finale Doshi-Velez and Been Kim. 2017. Towards A rigorous science of interpretable machine learning. arXiv:1702.08608. Retrieved from http:\/\/arxiv.org\/abs\/1702.08608."},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.3064761"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3359786"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2006"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.117"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.3390\/app11073184"},{"key":"e_1_3_3_44_2","article-title":"Towards automatic concept-based explanations","author":"Ghorbani Amirata","year":"2019","unstructured":"Amirata Ghorbani, James Wexler, James Zou, and Been Kim. 2019. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, Vol. 32.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_45_2","unstructured":"Yash Goyal Uri Shalit and Been Kim. 2019. Explaining classifiers with causal concept effect (CaCE). arXiv:1907.07165. Retrieved from http:\/\/arxiv.org\/abs\/1907.07165."},{"key":"e_1_3_3_46_2","doi-asserted-by":"crossref","unstructured":"Han Guo Nazneen Fatema Rajani Peter Hase Mohit Bansal and Caiming Xiong. 2021. FastIF: Scalable influence functions for efficient model interpretation and debugging. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics Stroudsburg PA 10333\u201310350. https:\/\/aclanthology.org\/2021.emnlp-main.808.","DOI":"10.18653\/v1\/2021.emnlp-main.808"},{"key":"e_1_3_3_47_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Gupta Nitish","year":"2020","unstructured":"Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, and Matt Gardner. 2020. Neural module networks for reasoning over text. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=SygWvAVFPr."},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.492"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.390"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1275"},{"key":"e_1_3_3_51_2","unstructured":"Daniel E. Ho and Alice Xiang. 2020. Affirmative algorithms: The legal grounds for fairness as awareness. The University of Chicago Law Review Online . http:\/\/arxiv.org\/abs\/2012.14285."},{"key":"e_1_3_3_52_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Hooker Sara","year":"2019","unstructured":"Sara Hooker, Dumitru Erhan, Pieter-Jan Jan Kindermans, and Been Kim. 2019. A benchmark for interpretability methods in deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems."},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3306618.3314230"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.386"},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1357"},{"key":"e_1_3_3_56_2","unstructured":"Myeongjun Jang and Thomas Lukasiewicz. 2021. Are training resources insufficient? Predict first then explain! arXiv:2110.02056. Retrieved from http:\/\/arxiv.org\/abs\/2110.02056."},{"key":"e_1_3_3_57_2","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom B. Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. arXiv:2001.08361. Retrieved from http:\/\/arxiv.org\/abs\/2001.08361."},{"key":"e_1_3_3_58_2","volume-title":"Proceedings of the Journal of Machine Learning Research","author":"Kaufmann Emilie","year":"2013","unstructured":"Emilie Kaufmann and Shivaram Kalyanakrishnan. 2013. Information complexity in bandit subset selection. In Proceedings of the Journal of Machine Learning Research. Retrieved fromhttp:\/\/proceedings.mlr.press\/v30\/Kaufmann13.pdf."},{"key":"e_1_3_3_59_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Kaushik Divyansh","year":"2020","unstructured":"Divyansh Kaushik, Eduard Hovy, and Zachary C. Lipton. 2020. Learning the difference that makes a difference with counterfactually-augmented data. In Proceedings of the International Conference on Learning Representations. Retrieved fromhttps:\/\/openreview.net\/forum?id=Sklgs0NFvr."},{"key":"e_1_3_3_60_2","first-page":"4186","article-title":"Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV)","volume":"6","author":"Kim Been","year":"2018","unstructured":"Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). 35th International Conference on Machine Learning. 6(2018), 4186\u20134195.","journal-title":"35th International Conference on Machine Learning."},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-28954-6_14"},{"key":"e_1_3_3_62_2","first-page":"2976","article-title":"Understanding black-box predictions via influence functions","author":"Koh Pang Wei","year":"2017","unstructured":"Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning.2976\u20132987.","journal-title":"Proceedings of the 34th International Conference on Machine Learning."},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1246"},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.771"},{"key":"e_1_3_3_65_2","unstructured":"Veronica Latcinnik and Jonathan Berant. 2020. Explaining question answering models through text generation. arXiv:2004.05569. Retrieved from http:\/\/arxiv.org\/abs\/2004.05569"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1011"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1082"},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00115"},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3233231"},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1560"},{"key":"e_1_3_3_71_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved fromhttps:\/\/github.com\/pytorch\/fairseq."},{"key":"e_1_3_3_72_2","first-page":"4766","article-title":"A unified approach to interpreting model predictions","author":"Lundberg Scott","year":"2017","unstructured":"Scott Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems.4766\u20134775. http:\/\/arxiv.org\/abs\/1705.07874.","journal-title":"Advances in Neural Information Processing Systems."},{"key":"e_1_3_3_73_2","doi-asserted-by":"publisher","DOI":"10.23915\/distill.00016"},{"key":"e_1_3_3_74_2","doi-asserted-by":"crossref","unstructured":"Andreas Madsen Nicholas Meade Vaibhav Adlakha and Siva Reddy. 2021. Evaluating the faithfulness of importance measures in NLP by recursively masking allegedly important tokens and retraining. arXiv:2110.08412. Retrieved from http:\/\/arxiv.org\/abs\/2110.08412.","DOI":"10.18653\/v1\/2022.findings-emnlp.125"},{"key":"e_1_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1334"},{"key":"e_1_3_3_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3457607"},{"key":"e_1_3_3_77_2","first-page":"1","article-title":"Are sixteen heads really better than one?","volume":"32","author":"Michel Paul","year":"2019","unstructured":"Paul Michel, Omer Levy, and Graham Neubig. 2019. Are sixteen heads really better than one? Advances in Neural Information Processing Systems 32(2019), 1\u201313.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_78_2","volume-title":"Proceedings of the 1st International Conference on Learning Representations","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations. Retrieved from http:\/\/ronan.collobert.com\/senna\/."},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2018.07.007"},{"key":"e_1_3_3_80_2","volume-title":"Interpretable Machine Learning","author":"Molnar Christoph","year":"2019","unstructured":"Christoph Molnar. 2019. Interpretable Machine Learning. Independent. 318 pages. Retrieved fromhttps:\/\/christophm.github.io\/interpretable-ml-book\/."},{"key":"e_1_3_3_81_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Mu Jesse","year":"2020","unstructured":"Jesse Mu and Jacob Andreas. 2020. Compositional explanations of neurons. In Proceedings of the Advances in Neural Information Processing Systems."},{"key":"e_1_3_3_82_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/p18-1176"},{"key":"e_1_3_3_83_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aax2342"},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1041"},{"key":"e_1_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.5555\/2074022.2074073"},{"key":"e_1_3_3_87_2","doi-asserted-by":"publisher","DOI":"10.1080\/14786440109462720"},{"key":"e_1_3_3_88_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_3_89_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Pruthi Garima","year":"2020","unstructured":"Garima Pruthi, Frederick Liu, Mukund Sundararajan, and Satyen Kale. 2020. Estimating training data influence by tracing gradient descent. In Proceedings of the Advances in Neural Information Processing Systems."},{"key":"e_1_3_3_90_2","article-title":"Improving language understanding by generative pre-training","author":"Radford Alec","year":"2018","unstructured":"Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. OpenAI (2018). Retrieved fromhttps:\/\/openai.com\/blog\/language-unsupervised\/.","journal-title":"OpenAI"},{"issue":"8","key":"e_1_3_3_91_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9. Retrieved fromhttps:\/\/openai.com\/blog\/better-language-models\/.","journal-title":"OpenAI blog"},{"key":"e_1_3_3_92_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 (2020), 1\u201367. Retrieved fromhttps:\/\/jmlr.org\/papers\/v21\/20-074.html.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_93_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1487"},{"key":"e_1_3_3_94_2","unstructured":"Karthikeyan Natesan Ramamurthy Bhanukiran Vinzamuri Yunfeng Zhang and Amit Dhurandhar. 2020. Model agnostic multilevel explanations. arXiv:2003.06005. Retrieved from http:\/\/arxiv.org\/abs\/2003.06005."},{"key":"e_1_3_3_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_3_3_96_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11491"},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1079"},{"key":"e_1_3_3_98_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-90403-0_9"},{"key":"e_1_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00349"},{"key":"e_1_3_3_100_2","doi-asserted-by":"crossref","unstructured":"Alexis Ross Ana Marasovi\u0107 and Matthew E. Peters. 2021. Explaining NLP models via minimal contrastive editing (MiCE). In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 Association for Computational Linguistics Stroudsburg PA 3840\u20133852. https:\/\/aclanthology.org\/2021.findings-acl.336.","DOI":"10.18653\/v1\/2021.findings-acl.336"},{"key":"e_1_3_3_101_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-019-0048-x"},{"key":"e_1_3_3_102_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6399"},{"key":"e_1_3_3_103_2","volume-title":"Proceedings of the CEUR Workshop","author":"Sangroya Amit","year":"2020","unstructured":"Amit Sangroya, Mouli Rastogi, C. Anantaram, and Lovekesh Vig. 2020. Guided-LIME: Structured sampling based hybrid approach towards explaining blackbox machine learning models. In Proceedings of the CEUR Workshop."},{"key":"e_1_3_3_104_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44581-1_27"},{"key":"e_1_3_3_105_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1282"},{"key":"e_1_3_3_106_2","first-page":"307","article-title":"A value for N-Person games","year":"1953","unstructured":"Shapley. 1953. A value for N-Person games. Contributions to the Theory of Games (AM-28), Volume II (1953), 307\u2013317. Retrieved from https:\/\/apps.dtic.mil\/dtic\/tr\/fulltext\/u2\/604084.pdf.","journal-title":"Contributions to the Theory of Games (AM-28), Volume II"},{"key":"e_1_3_3_107_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/d16-1159"},{"key":"e_1_3_3_108_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.569"},{"key":"e_1_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1145\/3375627.3375830"},{"key":"e_1_3_3_110_2","first-page":"455","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics","author":"Socher Richard","year":"2013","unstructured":"Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. 2013. Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 455\u2013465. Retrieved fromhttps:\/\/aclanthology.org\/P13-1045\/."},{"key":"e_1_3_3_111_2","first-page":"1631","article-title":"Recursive deep models for semantic compositionality over a sentiment treebank","author":"Socher Richard","year":"2013","unstructured":"Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.1631\u20131642.","journal-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_3_3_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2865044"},{"key":"e_1_3_3_113_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.495"},{"key":"e_1_3_3_114_2","unstructured":"Xiaofei Sun Diyi Yang Xiaoya Li Tianwei Zhang Yuxian Meng Han Qiu Guoyin Wang Eduard Hovy and Jiwei Li. 2021. Interpreting deep learning models in natural language processing: A review. arXiv:2110.10470. Retrieved from http:\/\/arxiv.org\/abs\/2110.10470."},{"key":"e_1_3_3_115_2","first-page":"5109","volume-title":"Proceedings of the 34th International Conference on Machine Learning","author":"Sundararajan Mukund","year":"2017","unstructured":"Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning. 5109\u20135118."},{"key":"e_1_3_3_116_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1421"},{"key":"e_1_3_3_117_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1452"},{"key":"e_1_3_3_118_2","doi-asserted-by":"crossref","unstructured":"Ian Tenney James Wexler Jasmijn Bastings Tolga Bolukbasi Andy Coenen Sebastian Gehrmann Ellen Jiang Mahima Pushkarna Carey Radebaugh Emily Reif and Ann Yuan. 2020. The language interpretability tool: Extensible interactive visualizations and analysis for NLP models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Association for Computational Linguistics Stroudsburg PA 107\u2013118. https:\/\/www.aclweb.org\/anthology\/2020.emnlp-demos.15.","DOI":"10.18653\/v1\/2020.emnlp-demos.15"},{"key":"e_1_3_3_119_2","first-page":"1","volume-title":"Proceedings of the 7th International Conference on Learning Representations","author":"Tenney Ian","year":"2019","unstructured":"Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick. 2019. What do you learn from context? Probing for sentence structure in contextualized word representations. In Proceedings of the 7th International Conference on Learning Representations. 1\u201317. Retrieved fromhttps:\/\/openreview.net\/forum?id=SJzSgnRcKX."},{"key":"e_1_3_3_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.3027314"},{"key":"e_1_3_3_121_2","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten Laurens Van Der","year":"2008","unstructured":"Laurens Van Der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579\u20132605. Retrieved from https:\/\/www.jmlr.org\/papers\/v9\/vandermaaten08a.html.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_122_2","unstructured":"Shikhar Vashishth Shyam Upadhyay Gaurav Singh Tomar and Manaal Faruqui. 2019. Attention interpretability across NLP tasks. arXiv:1909.11218. Retrieved from http:\/\/arxiv.org\/abs\/1909.11218."},{"key":"e_1_3_3_123_2","first-page":"12388","volume-title":"Proceedings of the Advances in Neural Information Processing Systems.","volume":"33","author":"Vig Jesse","year":"2020","unstructured":"Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber. 2020. Investigating gender bias in language models using causal mediation analysis. In Proceedings of the Advances in Neural Information Processing Systems.H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 12388\u201312401. Retrieved fromhttps:\/\/proceedings.neurips.cc\/paper\/2020\/file\/92650b2e92217715fe312e6fa7b90d82-Paper.pdf."},{"key":"e_1_3_3_124_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.14"},{"key":"e_1_3_3_125_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1221"},{"key":"e_1_3_3_126_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-tutorials.3"},{"key":"e_1_3_3_127_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Wang Alex","year":"2019","unstructured":"Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=rJ4km2R5t7."},{"key":"e_1_3_3_128_2","doi-asserted-by":"crossref","unstructured":"Wenqi Wang Run Wang Lina Wang Zhibo Wang and Aoshuang Ye. 2021. Towards a robust deep neural network against adversarial texts: A survey. IEEE Transactions on Knowledge and Data Engineering 1\u20131. https:\/\/ieeexplore.ieee.org\/document\/9557814\/.","DOI":"10.1109\/TKDE.2021.3117608"},{"key":"e_1_3_3_129_2","unstructured":"Sarah Wiegreffe and Ana Marasovi\u0107. 2021. Teach me to explain: A review of datasets for explainable natural language processing. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS\u201921) . http:\/\/arxiv.org\/abs\/2102.12060."},{"key":"e_1_3_3_130_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1002"},{"key":"e_1_3_3_131_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1042"},{"key":"e_1_3_3_132_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1101"},{"key":"e_1_3_3_133_2","doi-asserted-by":"publisher","DOI":"10.1145\/2876034.2876042"},{"key":"e_1_3_3_134_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_3_135_2","unstructured":"Tongshuang Wu Marco Tulio Ribeiro Jeffrey Heer and Daniel S. Weld. 2021. Polyjuice: Generating counterfactuals for explaining evaluating and Improving models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) Association for Computational Linguistics Stroudsburg PA 6707\u20136723. https:\/\/aclanthology.org\/2021.acl-long.523."},{"key":"e_1_3_3_136_2","first-page":"10967","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 32.","author":"Yeh Chih-Kuan","year":"2019","unstructured":"Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I. Inouye, Pradeep K. Ravikumar, Arun Sai Suggala, David I. Inouye, and Pradeep K. Ravikumar. 2019. On the (In)fidelity and sensitivity of explanations. In Proceedings of the Advances in Neural Information Processing Systems 32.H. Wallach, H. Larochelle, A. Beygelzimer, F. d\u2019Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.), Curran Associates, Inc., Vancouver, Canada, 10967\u201310978. Retrieved fromhttp:\/\/papers.nips.cc\/paper\/9278-on-the-infidelity-and-sensitivity-of-explanations.pdf."},{"key":"e_1_3_3_137_2","first-page":"9291","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Yeh Chih-Kuan","year":"2018","unstructured":"Chih-Kuan Yeh, Joon Sik Kim, Ian E. H. Yen, and Pradeep Ravikumar. 2018. Representer point selection for explaining deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 9291\u20139301."},{"key":"e_1_3_3_138_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5448"},{"key":"e_1_3_3_139_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1131"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3546577","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3546577","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:44:02Z","timestamp":1750272242000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3546577"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,23]]},"references-count":138,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,8,31]]}},"alternative-id":["10.1145\/3546577"],"URL":"https:\/\/doi.org\/10.1145\/3546577","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,23]]},"assertion":[{"value":"2021-10-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-20","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}