{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T09:18:50Z","timestamp":1781601530678,"version":"3.54.5"},"reference-count":248,"publisher":"MIT Press","issue":"2","license":[{"start":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T00:00:00Z","timestamp":1705881600000},"content-version":"vor","delay-in-days":21,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent years. One desideratum of model explanation is faithfulness, that is, an explanation should accurately represent the reasoning process behind the model\u2019s prediction. In this survey, we review over 110 model explanation methods in NLP through the lens of faithfulness. We first discuss the definition and evaluation of faithfulness, as well as its significance for explainability. We then introduce recent advances in faithful explanation, grouping existing approaches into five categories: similarity-based methods, analysis of model-internal structures, backpropagation-based methods, counterfactual intervention, and self-explanatory models. For each category, we synthesize its representative studies, strengths, and weaknesses. Finally, we summarize their common virtues and remaining challenges, and reflect on future work directions towards faithful explainability in NLP.<\/jats:p>","DOI":"10.1162\/coli_a_00511","type":"journal-article","created":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T17:21:31Z","timestamp":1705944091000},"page":"657-723","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":76,"title":["Towards Faithful Model Explanation in NLP: A Survey"],"prefix":"10.1162","volume":"50","author":[{"given":"Qing","family":"Lyu","sequence":"first","affiliation":[{"name":"University of Pennsylvania, Department of Computer and Information Science. lyuqing@sas.upenn.edu"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marianna","family":"Apidianaki","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, Department of Computer and Information Science. marapi@seas.upenn.edu"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chris","family":"Callison-Burch","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, Department of Computer and Information Science. ccb@seas.upenn.edu"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","published-online":{"date-parts":[[2024,6,1]]},"reference":[{"key":"2024070814304341200_bib1","doi-asserted-by":"publisher","first-page":"4190","DOI":"10.18653\/v1\/2020.acl-main.385","article-title":"Quantifying attention flow in\n                        transformers","volume-title":"Proceedings of the 58th Annual\n                        Meeting of the Association for Computational Linguistics","author":"Abnar","year":"2020"},{"key":"2024070814304341200_bib2","first-page":"17582","article-title":"CEBaB: Estimating the causal effects of\n                        real-world concepts on NLP model behavior","volume":"35","author":"Abraham","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024070814304341200_bib3","first-page":"9525","article-title":"Sanity checks for saliency maps","volume-title":"Advances in Neural Information Processing Systems 31: Annual\n                        Conference on Neural Information Processing Systems 2018, NeurIPS\n                        2018","author":"Adebayo","year":"2018"},{"key":"2024070814304341200_bib4","first-page":"700","article-title":"Debugging tests for model explanations","volume":"33","author":"Adebayo","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024070814304341200_bib5","article-title":"Fine-grained analysis of sentence\n                        embeddings using auxiliary prediction tasks","volume-title":"5th\n                        International Conference on Learning Representations, ICLR\n                    2017","author":"Adi","year":"2017"},{"key":"2024070814304341200_bib6","article-title":"On the robustness of interpretability\n                        methods","author":"Alvarez-Melis","year":"2018","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib7","first-page":"7786","article-title":"Towards robust interpretability with\n                        self-explaining neural networks","volume-title":"Advances in\n                        Neural Information Processing Systems 31: Annual Conference on Neural\n                        Information Processing Systems 2018, NeurIPS 2018","author":"Alvarez-Melis","year":"2018"},{"key":"2024070814304341200_bib8","doi-asserted-by":"publisher","first-page":"384","DOI":"10.1162\/tacl_a_00554","article-title":"Naturalistic causal probing for\n                        morpho-syntax","volume":"11","author":"Amini","year":"2022","journal-title":"Transactions of the Association\n                        for Computational Linguistics"},{"key":"2024070814304341200_bib9","doi-asserted-by":"publisher","first-page":"1545","DOI":"10.18653\/v1\/N16-1181","article-title":"Learning to compose neural networks for\n                        question answering","volume-title":"Proceedings of the 2016\n                        Conference of the North American Chapter of the Association for\n                        Computational Linguistics: Human Language Technologies","author":"Andreas","year":"2016"},{"key":"2024070814304341200_bib10","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1109\/CVPR.2016.12","article-title":"Neural module networks","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR\n                        2016","author":"Andreas","year":"2016"},{"key":"2024070814304341200_bib11","doi-asserted-by":"publisher","first-page":"2425","DOI":"10.1109\/ICCV.2015.279","article-title":"VQA: Visual question\n                        answering","volume-title":"2015 IEEE International Conference on\n                        Computer Vision, ICCV 2015","author":"Antol","year":"2015"},{"key":"2024070814304341200_bib12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/W16-1601","article-title":"Explaining predictions of non-linear\n                        classifiers in NLP","volume-title":"Proceedings of the 1st\n                        Workshop on Representation Learning for NLP","author":"Arras","year":"2016"},{"key":"2024070814304341200_bib13","doi-asserted-by":"publisher","first-page":"159","DOI":"10.18653\/v1\/W17-5221","article-title":"Explaining recurrent neural network\n                        predictions in sentiment analysis","volume-title":"Proceedings of\n                        the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and\n                        Social Media Analysis","author":"Arras","year":"2017"},{"key":"2024070814304341200_bib14","doi-asserted-by":"publisher","first-page":"3256","DOI":"10.18653\/v1\/2020.emnlp-main.263","article-title":"A diagnostic study of explainability\n                        techniques for text classification","volume-title":"Proceedings\n                        of the 2020 Conference on Empirical Methods in Natural Language Processing\n                        (EMNLP)","author":"Atanasova","year":"2020"},{"issue":"7","key":"2024070814304341200_bib15","doi-asserted-by":"publisher","first-page":"e0130140","DOI":"10.1371\/journal.pone.0130140","article-title":"On pixel-wise explanations for non-linear\n                        classifier decisions by layer-wise relevance propagation","volume":"10","author":"Bach","year":"2015","journal-title":"PLOS ONE"},{"key":"2024070814304341200_bib16","first-page":"1803","article-title":"How to explain individual classification\n                        decisions","volume":"11","author":"Baehrens","year":"2010","journal-title":"Journal of Machine Learning\n                        Research"},{"key":"2024070814304341200_bib17","article-title":"Neural machine translation by jointly\n                        learning to align and translate","volume-title":"3rd International Conference on Learning Representations, ICLR\n                        2015","author":"Bahdanau","year":"2015"},{"key":"2024070814304341200_bib18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3411764.3445717","article-title":"Does the whole exceed its parts? The effect of AI\n                        explanations on complementary team performance","volume-title":"Proceedings of the 2021 CHI Conference on Human Factors in Computing\n                        Systems","author":"Bansal","year":"2021"},{"key":"2024070814304341200_bib19","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.inffus.2019.12.012","article-title":"Explainable Artificial Intelligence (XAI):\n                        Concepts, taxonomies, opportunities and challenges toward responsible\n                        AI","volume":"58","author":"Barredo Arrieta","year":"2020","journal-title":"Information Fusion"},{"key":"2024070814304341200_bib20","doi-asserted-by":"publisher","first-page":"2963","DOI":"10.18653\/v1\/P19-1284","article-title":"Interpretable neural predictions with\n                        differentiable binary variables","volume-title":"Proceedings of\n                        the 57th Annual Meeting of the Association for Computational\n                        Linguistics","author":"Bastings","year":"2019"},{"key":"2024070814304341200_bib21","doi-asserted-by":"publisher","first-page":"976","DOI":"10.18653\/v1\/2022.emnlp-main.64","article-title":"\u201cWill you find these\n                        shortcuts?\u201d A protocol for evaluating the faithfulness of input\n                        salience methods for text classification","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural\n                        Language Processing","author":"Bastings","year":"2022"},{"key":"2024070814304341200_bib22","doi-asserted-by":"publisher","first-page":"149","DOI":"10.18653\/v1\/2020.blackboxnlp-1.14","article-title":"The elephant in the interpretability room:\n                        Why use attention as explanation when we have saliency\n                        methods?","volume-title":"Proceedings of the Third BlackboxNLP\n                        Workshop on Analyzing and Interpreting Neural Networks for NLP","author":"Bastings","year":"2020"},{"key":"2024070814304341200_bib23","article-title":"Influence functions in deep learning are\n                        fragile","volume-title":"9th International Conference on Learning\n                        Representations, ICLR 2021","author":"Basu","year":"2021"},{"key":"2024070814304341200_bib24","article-title":"Identifying and controlling important neurons\n                        in neural machine translation","volume-title":"7th International\n                        Conference on Learning Representations, ICLR 2019","author":"Bau","year":"2019"},{"issue":"1","key":"2024070814304341200_bib25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/coli_a_00367","article-title":"On the linguistic representational power of\n                        neural machine translation models","volume":"46","author":"Belinkov","year":"2020","journal-title":"Computational\n                        Linguistics"},{"key":"2024070814304341200_bib26","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1162\/tacl_a_00254","article-title":"Analysis methods in neural language\n                        processing: A survey","volume":"7","author":"Belinkov","year":"2019","journal-title":"Transactions of the\n                        Association for Computational Linguistics"},{"key":"2024070814304341200_bib27","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1162\/tacl_a_00361","article-title":"Latent compositional representations\n                        improve systematic generalization in grounded question\n                        answering","volume":"9","author":"Bogin","year":"2021","journal-title":"Transactions of the Association for\n                        Computational Linguistics"},{"key":"2024070814304341200_bib28","article-title":"On the opportunities and risks of\n                        foundation models","author":"Bommasani","year":"2021","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib29","doi-asserted-by":"publisher","first-page":"632","DOI":"10.18653\/v1\/D15-1075","article-title":"A large annotated corpus for learning\n                        natural language inference","volume-title":"Proceedings of the\n                        2015 Conference on Empirical Methods in Natural Language\n                    Processing","author":"Bowman","year":"2015"},{"key":"2024070814304341200_bib30","first-page":"1877","article-title":"Language models are few-shot\n                        learners","volume-title":"Advances in Neural Information\n                        Processing Systems 33: Annual Conference on Neural Information Processing\n                        Systems 2020, NeurIPS 2020","author":"Brown","year":"2020"},{"key":"2024070814304341200_bib31","article-title":"Natural language multitasking: Analyzing\n                        and improving syntactic saliency of hidden representations","author":"Brunner","year":"2018","journal-title":"arXiv preprint arXiv:1801.06024"},{"key":"2024070814304341200_bib32","doi-asserted-by":"publisher","first-page":"7727","DOI":"10.18653\/v1\/2022.acl-long.533","article-title":"DoCoGen: Domain counterfactual generation\n                        for low resource domain adaptation","volume-title":"Proceedings\n                        of the 60th Annual Meeting of the Association for Computational Linguistics\n                        (Volume 1: Long Papers)","author":"Calderon","year":"2022"},{"key":"2024070814304341200_bib33","first-page":"9560","article-title":"e-SNLI: Natural language inference with\n                        natural language explanations","volume-title":"Advances in Neural\n                        Information Processing Systems 31: Annual Conference on Neural Information\n                        Processing Systems 2018, NeurIPS 2018","author":"Camburu","year":"2018"},{"key":"2024070814304341200_bib34","doi-asserted-by":"publisher","first-page":"4157","DOI":"10.18653\/v1\/2020.acl-main.382","article-title":"Make up your mind! Adversarial generation\n                        of inconsistent natural language explanations","volume-title":"Proceedings of the 58th Annual Meeting of the Association for\n                        Computational Linguistics","author":"Camburu","year":"2020"},{"key":"2024070814304341200_bib35","first-page":"212","article-title":"Case-based explanation of non-case-based\n                        learning methods.","author":"Caruana","year":"1999","journal-title":"Proceedings of the AMIA\n                        Symposium"},{"issue":"6","key":"2024070814304341200_bib36","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1109\/MCG.2018.2878902","article-title":"RNNbow: Visualizing learning via\n                        backpropagation gradients in RNNs","volume":"38","author":"Cashman","year":"2018","journal-title":"IEEE Computer\n                        Graphics and Applications"},{"issue":"1","key":"2024070814304341200_bib37","doi-asserted-by":"publisher","first-page":"134","DOI":"10.1038\/s42003-022-03036-1","article-title":"Brains and algorithms partially converge in natural language\n                        processing","volume":"5","author":"Caucheteux","year":"2022","journal-title":"Communications Biology"},{"key":"2024070814304341200_bib38","doi-asserted-by":"publisher","first-page":"5029","DOI":"10.18653\/v1\/2022.acl-long.345","article-title":"A comparative study of faithfulness\n                        metrics for model interpretability methods","volume-title":"Proceedings of the 60th Annual Meeting of the Association for\n                        Computational Linguistics (Volume 1: Long Papers)","author":"Chan","year":"2022"},{"key":"2024070814304341200_bib39","doi-asserted-by":"publisher","first-page":"782","DOI":"10.1109\/CVPR46437.2021.00084","article-title":"Transformer interpretability beyond attention\n                        visualization","volume-title":"IEEE Conference on Computer Vision\n                        and Pattern Recognition, CVPR 2021","author":"Chefer","year":"2021"},{"key":"2024070814304341200_bib40","doi-asserted-by":"publisher","first-page":"2007","DOI":"10.18653\/v1\/2023.acl-long.112","article-title":"REV: Information-theoretic evaluation of\n                        free-text rationales","author":"Chen","year":"2022","journal-title":"Proceedings of the 61st\n                        Annual Meeting of the Association for Computational Linguistics (Volume 1:\n                        Long Papers)"},{"key":"2024070814304341200_bib41","first-page":"882","article-title":"Learning to explain: An\n                        information-theoretic perspective on model interpretation","volume-title":"Proceedings of the 35th International Conference on Machine Learning,\n                        ICML 2018","author":"Chen","year":"2018"},{"key":"2024070814304341200_bib42","article-title":"Program of thoughts prompting:\n                        Disentangling computation from reasoning for numerical reasoning\n                        tasks","author":"Chen","year":"2022","journal-title":"Transactions on Machine Learning\n                        Research"},{"key":"2024070814304341200_bib43","doi-asserted-by":"publisher","first-page":"1477","DOI":"10.18653\/v1\/2021.emnlp-main.111","article-title":"Stepmothers are mean and academics are\n                        pretentious: What do pretrained language models learn about\n                        you?","volume-title":"Proceedings of the 2021 Conference on\n                        Empirical Methods in Natural Language Processing","author":"Choenni","year":"2021"},{"key":"2024070814304341200_bib44","doi-asserted-by":"publisher","DOI":"10.21236\/AD0616323","volume-title":"Aspects of the Theory of Syntax","author":"Chomsky","year":"1965"},{"key":"2024070814304341200_bib45","article-title":"ELECTRA: Pre-training text encoders as\n                        discriminators rather than generators","volume-title":"8th\n                        International Conference on Learning Representations, ICLR\n                    2020","author":"Clark","year":"2020"},{"key":"2024070814304341200_bib46","article-title":"Think you have solved question answering?\n                        Try ARC, the AI2 Reasoning Challenge","author":"Clark","year":"2018","journal-title":"abs\/1803.05457"},{"key":"2024070814304341200_bib47","doi-asserted-by":"publisher","first-page":"2376","DOI":"10.18653\/v1\/2021.eacl-main.202","article-title":"A study of automatic metrics for the\n                        evaluation of natural language explanations","volume-title":"Proceedings of the 16th Conference of the European Chapter of the\n                        Association for Computational Linguistics: Main Volume","author":"Clinciu","year":"2021"},{"key":"2024070814304341200_bib48","article-title":"Training verifiers to solve math word\n                        problems","author":"Cobbe","year":"2021","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib49","doi-asserted-by":"publisher","first-page":"2126","DOI":"10.18653\/v1\/P18-1198","article-title":"What you can cram into a single\n                        $&!#* vector: Probing sentence embeddings for linguistic\n                        properties","volume-title":"Proceedings of the 56th Annual\n                        Meeting of the Association for Computational Linguistics (Volume 1: Long\n                        Papers)","author":"Conneau","year":"2018"},{"key":"2024070814304341200_bib50","article-title":"Faithful reasoning using large language\n                        models","author":"Creswell","year":"2022","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib51","doi-asserted-by":"publisher","first-page":"7358","DOI":"10.18653\/v1\/2021.emnlp-main.585","article-title":"Explaining answers with entailment\n                        trees","volume-title":"Proceedings of the 2021 Conference on\n                        Empirical Methods in Natural Language Processing","author":"Dalvi","year":"2021"},{"key":"2024070814304341200_bib52","article-title":"Discovering latent concepts learned in\n                        BERT","volume-title":"The Tenth International Conference on\n                        Learning Representations, ICLR 2022","author":"Dalvi","year":"2022"},{"key":"2024070814304341200_bib53","first-page":"447","article-title":"A survey of the state of explainable AI for natural language\n                        processing","volume-title":"Proceedings of the 1st Conference of\n                        the Asia-Pacific Chapter of the Association for Computational Linguistics\n                        and the 10th International Joint Conference on Natural Language\n                        Processing","author":"Danilevsky","year":"2020"},{"key":"2024070814304341200_bib54","doi-asserted-by":"publisher","first-page":"3243","DOI":"10.18653\/v1\/2020.emnlp-main.262","article-title":"How do decisions emerge across layers in\n                        neural models? Interpretation with differentiable masking","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural\n                        Language Processing (EMNLP)","author":"De Cao","year":"2020"},{"key":"2024070814304341200_bib55","doi-asserted-by":"publisher","first-page":"16","DOI":"10.18653\/v1\/2022.blackboxnlp-1.2","article-title":"Sparse interventions in language models with\n                        differentiable masking","volume-title":"Proceedings of the Fifth\n                        BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for\n                        NLP","author":"De Cao","year":"2022"},{"key":"2024070814304341200_bib56","article-title":"Extraction of salient sentences from\n                        labelled documents","author":"Denil","year":"2015","journal-title":"ArXiv preprint, arXiv:1412.6815\n                        [cs]"},{"key":"2024070814304341200_bib57","doi-asserted-by":"publisher","first-page":"482","DOI":"10.18653\/v1\/K19-1045","article-title":"A general-purpose algorithm for constrained sequential\n                        inference","volume-title":"Proceedings of the 23rd Conference on\n                        Computational Natural Language Learning (CoNLL)","author":"Deutsch","year":"2019"},{"key":"2024070814304341200_bib58","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional\n                        transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of\n                        the Association for Computational Linguistics: Human Language Technologies,\n                        Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2024070814304341200_bib59","doi-asserted-by":"publisher","first-page":"4443","DOI":"10.18653\/v1\/2020.acl-main.408","article-title":"ERASER: A benchmark to evaluate\n                        rationalized NLP models","volume-title":"Proceedings of the 58th\n                        Annual Meeting of the Association for Computational Linguistics","author":"DeYoung","year":"2020"},{"key":"2024070814304341200_bib60","doi-asserted-by":"publisher","first-page":"5034","DOI":"10.18653\/v1\/2021.naacl-main.399","article-title":"Evaluating saliency methods for neural\n                        language models","volume-title":"Proceedings of the 2021\n                        Conference of the North American Chapter of the Association for\n                        Computational Linguistics: Human Language Technologies","author":"Ding","year":"2021"},{"key":"2024070814304341200_bib61","article-title":"Towards a rigorous science of interpretable machine\n                        learning","author":"Doshi-Velez","year":"2017","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib62","first-page":"2368","article-title":"DROP: A reading comprehension benchmark\n                        requiring discrete reasoning over paragraphs","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of\n                        the Association for Computational Linguistics: Human Language Technologies,\n                        Volume 1 (Long and Short Papers)","author":"Dua","year":"2019"},{"key":"2024070814304341200_bib63","doi-asserted-by":"publisher","first-page":"4295","DOI":"10.18653\/v1\/2022.acl-long.296","article-title":"Do transformer models show similar\n                        attention patterns to task-specific human gaze?","volume-title":"Proceedings of the 60th Annual Meeting of the Association for\n                        Computational Linguistics (Volume 1: Long Papers)","author":"Eberle","year":"2022"},{"key":"2024070814304341200_bib64","doi-asserted-by":"publisher","first-page":"31","DOI":"10.18653\/v1\/P18-2006","article-title":"HotFlip: White-box adversarial examples for text\n                        classification","volume-title":"Proceedings of the 56th Annual\n                        Meeting of the Association for Computational Linguistics (Volume 2: Short\n                        Papers)","author":"Ebrahimi","year":"2018"},{"key":"2024070814304341200_bib65","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1162\/tacl_a_00359","article-title":"Amnesic probing: Behavioral explanation\n                        with amnesic counterfactuals","volume":"9","author":"Elazar","year":"2021","journal-title":"Transactions of the\n                        Association for Computational Linguistics"},{"key":"2024070814304341200_bib66","doi-asserted-by":"publisher","first-page":"55","DOI":"10.18653\/v1\/D19-1006","article-title":"How contextual are contextualized word\n                        representations? Comparing the geometry of BERT, ELMo, and GPT-2\n                        embeddings","volume-title":"Proceedings of the 2019 Conference on\n                        Empirical Methods in Natural Language Processing and the 9th International\n                        Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Ethayarajh","year":"2019"},{"key":"2024070814304341200_bib67","doi-asserted-by":"publisher","first-page":"49","DOI":"10.18653\/v1\/2021.acl-short.8","article-title":"Attention flows are Shapley Value\n                        explanations","volume-title":"Proceedings of the 59th Annual\n                        Meeting of the Association for Computational Linguistics and the 11th\n                        International Joint Conference on Natural Language Processing (Volume 2:\n                        Short Papers)","author":"Ethayarajh","year":"2021"},{"key":"2024070814304341200_bib68","doi-asserted-by":"publisher","first-page":"1138","DOI":"10.1162\/tacl_a_00511","article-title":"Causal inference in Natural Language Processing: Estimation,\n                        prediction, interpretation and beyond","volume":"10","author":"Feder","year":"2022","journal-title":"Transactions\n                        of the Association for Computational Linguistics"},{"issue":"2","key":"2024070814304341200_bib69","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1162\/coli_a_00404","article-title":"CausaLM: Causal model explanation through\n                        counterfactual language models","volume":"47","author":"Feder","year":"2021","journal-title":"Computational\n                        Linguistics"},{"key":"2024070814304341200_bib70","doi-asserted-by":"publisher","first-page":"3719","DOI":"10.18653\/v1\/D18-1407","article-title":"Pathologies of neural models make\n                        interpretations difficult","volume-title":"Proceedings of the\n                        2018 Conference on Empirical Methods in Natural Language\n                    Processing","author":"Feng","year":"2018"},{"key":"2024070814304341200_bib71","doi-asserted-by":"publisher","first-page":"1828","DOI":"10.18653\/v1\/2021.acl-long.144","article-title":"Causal analysis of syntactic agreement\n                        mechanisms in neural language models","volume-title":"Proceedings\n                        of the 59th Annual Meeting of the Association for Computational Linguistics\n                        and the 11th International Joint Conference on Natural Language Processing\n                        (Volume 1: Long Papers)","author":"Finlayson","year":"2021"},{"key":"2024070814304341200_bib72","first-page":"10764","article-title":"PAL: Program-aided Language\n                        Models","volume-title":"International Conference on Machine\n                        Learning","author":"Gao","year":"2022"},{"key":"2024070814304341200_bib73","doi-asserted-by":"publisher","first-page":"1307","DOI":"10.18653\/v1\/2020.findings-emnlp.117","article-title":"Evaluating models\u2019 local decision\n                        boundaries via contrast sets","volume-title":"Findings of the\n                        Association for Computational Linguistics: EMNLP 2020","author":"Gardner","year":"2020"},{"key":"2024070814304341200_bib74","doi-asserted-by":"publisher","first-page":"1161","DOI":"10.18653\/v1\/D19-1107","article-title":"Are we modeling the task or the annotator?\n                        An investigation of annotator bias in natural language understanding\n                        datasets","volume-title":"Proceedings of the 2019 Conference on\n                        Empirical Methods in Natural Language Processing and the 9th International\n                        Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Geva","year":"2019"},{"key":"2024070814304341200_bib75","doi-asserted-by":"publisher","first-page":"3681","DOI":"10.1609\/aaai.v33i01.33013681","article-title":"Interpretation of neural networks is fragile","volume-title":"The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI\n                        2019, The Thirty-First Innovative Applications of Artificial Intelligence\n                        Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in\n                        Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 -\n                        February 1, 2019","author":"Ghorbani","year":"2019"},{"key":"2024070814304341200_bib76","article-title":"Neural module networks for reasoning over\n                        text","volume-title":"8th International Conference on Learning\n                        Representations, ICLR 2020","author":"Gupta","year":"2020"},{"key":"2024070814304341200_bib77","doi-asserted-by":"publisher","first-page":"8395","DOI":"10.18653\/v1\/2022.emnlp-main.575","article-title":"Better hit the nail on the head than beat\n                        around the bush: Removing protected attributes with a single\n                        projection","volume-title":"Proceedings of the 2022 Conference on\n                        Empirical Methods in Natural Language Processing","author":"Haghighatkhah","year":"2022"},{"issue":"4","key":"2024070814304341200_bib78","doi-asserted-by":"publisher","first-page":"843","DOI":"10.1093\/bjps\/axi147","article-title":"Causes and explanations: A structural-model\n                        approach. Part I: Causes","volume":"56","author":"Halpern","year":"2005","journal-title":"The British Journal for\n                        the Philosophy of Science"},{"key":"2024070814304341200_bib79","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3233\/SW-223228","article-title":"Is neuro-symbolic AI meeting its promises in\n                        natural language processing? A structured review","volume-title":"Semantic Web","author":"Hamilton","year":"2022"},{"key":"2024070814304341200_bib80","doi-asserted-by":"publisher","first-page":"5553","DOI":"10.18653\/v1\/2020.acl-main.492","article-title":"Explaining black box predictions and\n                        unveiling data artifacts through influence functions","volume-title":"Proceedings of the 58th Annual Meeting of the Association for\n                        Computational Linguistics","author":"Han","year":"2020"},{"key":"2024070814304341200_bib81","doi-asserted-by":"publisher","first-page":"12963","DOI":"10.1609\/aaai.v35i14.17533","article-title":"Self-attention attribution: Interpreting\n                        information interactions inside Transformer","volume-title":"Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021,\n                        Thirty-Third Conference on Innovative Applications of Artificial\n                        Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in\n                        Artificial Intelligence, EAAI 2021","author":"Hao","year":"2021"},{"key":"2024070814304341200_bib82","volume-title":"Harvey Friedman\u2019s Research on the\n                        Foundations of Mathematics","author":"Harrington","year":"1985"},{"key":"2024070814304341200_bib83","doi-asserted-by":"publisher","first-page":"5540","DOI":"10.18653\/v1\/2020.acl-main.491","article-title":"Evaluating explainable AI: Which\n                        algorithmic explanations help users predict model behavior?","volume-title":"Proceedings of the 58th Annual Meeting of the Association for\n                        Computational Linguistics","author":"Hase","year":"2020"},{"key":"2024070814304341200_bib84","doi-asserted-by":"publisher","first-page":"29","DOI":"10.18653\/v1\/2022.lnls-1.4","article-title":"When can models learn from explanations? A\n                        formal framework for understanding the roles of explanation\n                        data","volume-title":"Proceedings of the First Workshop on\n                        Learning with Natural Language Supervision","author":"Hase","year":"2022"},{"key":"2024070814304341200_bib85","doi-asserted-by":"publisher","first-page":"4351","DOI":"10.18653\/v1\/2020.findings-emnlp.390","article-title":"Leakage-adjusted simulatability: Can\n                        models generate non-trivial explanations of their behavior in natural\n                        language?","volume-title":"Findings of the Association for\n                        Computational Linguistics: EMNLP 2020","author":"Hase","year":"2020"},{"key":"2024070814304341200_bib86","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-319-46493-0_1","article-title":"Generating visual\n                        explanations","volume-title":"Computer Vision \u2013 ECCV\n                        2016","author":"Hendricks","year":"2016"},{"key":"2024070814304341200_bib87","article-title":"Measuring massive multitask language\n                        understanding","volume-title":"9th International Conference on\n                        Learning Representations, ICLR 2021","author":"Hendrycks","year":"2021"},{"key":"2024070814304341200_bib88","article-title":"The promise and peril of human evaluation\n                        for model interpretability","author":"Herman","year":"2017","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib89","doi-asserted-by":"publisher","first-page":"2733","DOI":"10.18653\/v1\/D19-1275","article-title":"Designing and interpreting probes with control\n                        tasks","volume-title":"Proceedings of the 2019 Conference on\n                        Empirical Methods in Natural Language Processing and the 9th International\n                        Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Hewitt","year":"2019"},{"key":"2024070814304341200_bib90","doi-asserted-by":"publisher","first-page":"258","DOI":"10.18653\/v1\/W18-5428","article-title":"Interpreting word-level hidden state behaviour\n                        of character-level LSTM language models","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and\n                        Interpreting Neural Networks for NLP","author":"Hiebert","year":"2018"},{"key":"2024070814304341200_bib91","doi-asserted-by":"publisher","first-page":"1887","DOI":"10.18653\/v1\/2022.findings-naacl.145","article-title":"METGEN: A module-based entailment tree\n                        generation framework for answer explanation","volume-title":"Findings of the Association for Computational Linguistics: NAACL\n                        2022","author":"Hong","year":"2022"},{"key":"2024070814304341200_bib92","first-page":"9734","article-title":"A benchmark for interpretability methods in deep neural\n                        networks","volume-title":"Advances in Neural Information\n                        Processing Systems 32: Annual Conference on Neural Information Processing\n                        Systems 2019, NeurIPS 2019","author":"Hooker","year":"2019"},{"key":"2024070814304341200_bib93","doi-asserted-by":"publisher","first-page":"804","DOI":"10.1109\/ICCV.2017.93","article-title":"Learning to reason: End-to-end module\n                        networks for visual question answering","volume-title":"IEEE\n                        International Conference on Computer Vision, ICCV 2017","author":"Hu","year":"2017"},{"key":"2024070814304341200_bib94","doi-asserted-by":"publisher","first-page":"4198","DOI":"10.18653\/v1\/2020.acl-main.386","article-title":"Towards faithfully interpretable NLP\n                        systems: How should we define and evaluate faithfulness?","volume-title":"Proceedings of the 58th Annual Meeting of the Association for\n                        Computational Linguistics","author":"Jacovi","year":"2020"},{"key":"2024070814304341200_bib95","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1162\/tacl_a_00367","article-title":"Aligning faithful interpretations with\n                        their social attribution","volume":"9","author":"Jacovi","year":"2021","journal-title":"Transactions of the\n                        Association for Computational Linguistics"},{"key":"2024070814304341200_bib96","doi-asserted-by":"publisher","first-page":"1597","DOI":"10.18653\/v1\/2021.emnlp-main.120","article-title":"Contrastive explanations for model\n                        interpretability","volume-title":"Proceedings of the 2021\n                        Conference on Empirical Methods in Natural Language Processing","author":"Jacovi","year":"2021"},{"key":"2024070814304341200_bib97","first-page":"3543","article-title":"Attention is not\n                        explanation","volume-title":"Proceedings of the 2019 Conference\n                        of the North American Chapter of the Association for Computational\n                        Linguistics: Human Language Technologies, Volume 1 (Long and Short\n                        Papers)","author":"Jain","year":"2019"},{"key":"2024070814304341200_bib98","doi-asserted-by":"publisher","first-page":"4459","DOI":"10.18653\/v1\/2020.acl-main.409","article-title":"Learning to faithfully rationalize by\n                        construction","volume-title":"Proceedings of the 58th Annual\n                        Meeting of the Association for Computational Linguistics","author":"Jain","year":"2020"},{"key":"2024070814304341200_bib99","first-page":"104:1","article-title":"Explaining explanations: Axiomatic feature interactions for\n                        deep networks","volume":"22","author":"Janizek","year":"2021","journal-title":"Journal of Machine Learning\n                        Research"},{"key":"2024070814304341200_bib100","doi-asserted-by":"publisher","first-page":"3193","DOI":"10.18653\/v1\/2020.emnlp-main.258","article-title":"Cold-start and interpretability: Turning\n                        regular expressions into trainable recurrent neural\n                    networks","volume-title":"Proceedings of the 2020 Conference on\n                        Empirical Methods in Natural Language Processing (EMNLP)","author":"Jiang","year":"2020"},{"key":"2024070814304341200_bib101","doi-asserted-by":"publisher","first-page":"2714","DOI":"10.18653\/v1\/P19-1261","article-title":"Explore, propose, and assemble: An\n                        interpretable model for multi-hop reading comprehension","volume-title":"Proceedings of the 57th Annual Meeting of the Association for\n                        Computational Linguistics","author":"Jiang","year":"2019"},{"key":"2024070814304341200_bib102","doi-asserted-by":"publisher","first-page":"1988","DOI":"10.1109\/CVPR.2017.215","article-title":"CLEVR: A diagnostic dataset for\n                        compositional language and elementary visual reasoning","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition,\n                        CVPR 2017","author":"Johnson","year":"2017"},{"key":"2024070814304341200_bib103","doi-asserted-by":"publisher","first-page":"5911","DOI":"10.18653\/v1\/2022.acl-long.407","article-title":"Logic traps in evaluating attribution scores","volume-title":"Proceedings of the 60th Annual Meeting of the Association for\n                        Computational Linguistics (Volume 1: Long Papers)","author":"Ju","year":"2022"},{"key":"2024070814304341200_bib104","doi-asserted-by":"publisher","first-page":"1266","DOI":"10.18653\/v1\/2022.emnlp-main.82","article-title":"Maieutic prompting: Logically consistent reasoning with\n                        recursive explanations","volume-title":"Proceedings of the 2022\n                        Conference on Empirical Methods in Natural Language Processing","author":"Jung","year":"2022"},{"issue":"4","key":"2024070814304341200_bib105","doi-asserted-by":"publisher","first-page":"761","DOI":"10.1162\/COLI_a_00300","article-title":"Representation of linguistic form and\n                        function in recurrent neural networks","volume":"43","author":"K\u00e1d\u00e1r","year":"2017","journal-title":"Computational\n                        Linguistics"},{"key":"2024070814304341200_bib106","doi-asserted-by":"publisher","first-page":"10300","DOI":"10.18653\/v1\/2021.emnlp-main.806","article-title":"Putting words in BERT\u2019s mouth:\n                        Navigating contextualized vector spaces with pseudowords","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural\n                        Language Processing","author":"Karidi","year":"2021"},{"key":"2024070814304341200_bib107","article-title":"Visualizing and understanding recurrent\n                        networks","author":"Karpathy","year":"2015","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib108","doi-asserted-by":"publisher","first-page":"8849","DOI":"10.18653\/v1\/2021.emnlp-main.697","article-title":"BeliefBank: Adding memory to a pre-trained\n                        language model for a systematic notion of belief","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural\n                        Language Processing","author":"Kassner","year":"2021"},{"key":"2024070814304341200_bib109","article-title":"Learning the difference that makes a\n                        difference with counterfactually-augmented data","volume-title":"8th International Conference on Learning Representations, ICLR\n                        2020","author":"Kaushik","year":"2020"},{"key":"2024070814304341200_bib110","doi-asserted-by":"publisher","first-page":"5010","DOI":"10.18653\/v1\/D18-1546","article-title":"How much reading does reading\n                        comprehension require? A critical investigation of popular\n                        benchmarks","volume-title":"Proceedings of the 2018 Conference on\n                        Empirical Methods in Natural Language Processing","author":"Kaushik","year":"2018"},{"key":"2024070814304341200_bib111","first-page":"2673","article-title":"Interpretability beyond feature\n                        attribution: Quantitative testing with concept activation vectors\n                        (TCAV)","volume-title":"Proceedings of the 35th International\n                        Conference on Machine Learning, ICML 2018","author":"Kim","year":"2018"},{"key":"2024070814304341200_bib112","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1007\/978-3-030-28954-6_14","article-title":"The (un)reliability of saliency methods","volume-title":"Explainable AI: Interpreting, Explaining and Visualizing Deep\n                        Learning","author":"Kindermans","year":"2019"},{"key":"2024070814304341200_bib113","article-title":"Learning how to explain neural networks:\n                        PatternNet and PatternAttribution","volume-title":"6th\n                        International Conference on Learning Representations, ICLR\n                    2018","author":"Kindermans","year":"2018"},{"key":"2024070814304341200_bib114","first-page":"1885","article-title":"Understanding black-box predictions via\n                        influence functions","volume-title":"Proceedings of the 34th\n                        International Conference on Machine Learning, ICML 2017","author":"Koh","year":"2017"},{"key":"2024070814304341200_bib115","first-page":"22199","article-title":"Large language models are zero-shot\n                        reasoners","volume":"35","author":"Kojima","year":"2022","journal-title":"Advances in Neural Information Processing\n                        Systems"},{"key":"2024070814304341200_bib116","article-title":"Captum: A unified and generic model\n                        interpretability library for PyTorch","author":"Kokhlikyan","year":"2020","journal-title":"ArXiv\n                        preprint"},{"key":"2024070814304341200_bib117","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1162\/tacl_a_00220","article-title":"Jointly learning to parse and perceive:\n                        Connecting natural language to the physical world","volume":"1","author":"Krishnamurthy","year":"2013","journal-title":"Transactions of the Association for Computational\n                        Linguistics"},{"key":"2024070814304341200_bib118","first-page":"17994","article-title":"Probing classifiers are unreliable for\n                        concept removal and detection","volume":"35","author":"Kumar","year":"2022","journal-title":"Advances in Neural\n                        Information Processing Systems"},{"key":"2024070814304341200_bib119","doi-asserted-by":"publisher","first-page":"8730","DOI":"10.18653\/v1\/2020.acl-main.771","article-title":"NILE: Natural language inference with\n                        faithful natural language explanations","volume-title":"Proceedings of the 58th Annual Meeting of the Association for\n                        Computational Linguistics","author":"Kumar","year":"2020"},{"key":"2024070814304341200_bib120","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1145\/3290605.3300717","article-title":"Let me explain: Impact of personal and\n                        impersonal explanations on trust in recommender systems","volume-title":"Proceedings of the 2019 CHI Conference on Human Factors in Computing\n                        Systems, CHI 2019","author":"Kunkel","year":"2019"},{"key":"2024070814304341200_bib121","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1145\/3375627.3375833","article-title":"\u201cHow do I fool you?\u201d:\n                        Manipulating user trust via misleading black box\n                        explanations","volume-title":"Proceedings of the AAAI\/ACM\n                        Conference on AI, Ethics, and Society","author":"Lakkaraju","year":"2020"},{"key":"2024070814304341200_bib122","doi-asserted-by":"publisher","first-page":"537","DOI":"10.18653\/v1\/2022.findings-emnlp.38","article-title":"Can language models learn from explanations in\n                        context?","volume-title":"Findings of the Association for\n                        Computational Linguistics: EMNLP 2022","author":"Lampinen","year":"2022"},{"key":"2024070814304341200_bib123","article-title":"Defining locality for surrogates in\n                        post-hoc interpretablity","volume-title":"Workshop on Human\n                        Interpretability for Machine Learning (WHI)-International Conference on\n                        Machine Learning (ICML)","author":"Laugel","year":"2018"},{"key":"2024070814304341200_bib124","doi-asserted-by":"publisher","first-page":"107","DOI":"10.18653\/v1\/D16-1011","article-title":"Rationalizing neural\n                        predictions","volume-title":"Proceedings of the 2016 Conference\n                        on Empirical Methods in Natural Language Processing","author":"Lei","year":"2016"},{"key":"2024070814304341200_bib125","first-page":"10","article-title":"The Winograd Schema\n                        Challenge","volume-title":"Thirteenth International Conference on\n                        the Principles of Knowledge Representation and Reasoning","author":"Levesque","year":"2012"},{"key":"2024070814304341200_bib126","first-page":"3843","article-title":"Solving quantitative reasoning problems\n                        with language models","volume":"35","author":"Lewkowycz","year":"2022","journal-title":"Advances in Neural Information\n                        Processing Systems"},{"key":"2024070814304341200_bib127","doi-asserted-by":"publisher","first-page":"365","DOI":"10.18653\/v1\/2020.acl-main.35","article-title":"Evaluating explanation methods for neural machine\n                        translation","volume-title":"Proceedings of the 58th Annual\n                        Meeting of the Association for Computational Linguistics","author":"Li","year":"2020"},{"key":"2024070814304341200_bib128","doi-asserted-by":"publisher","first-page":"681","DOI":"10.18653\/v1\/N16-1082","article-title":"Visualizing and understanding neural\n                        models in NLP","volume-title":"Proceedings of the 2016 Conference\n                        of the North American Chapter of the Association for Computational\n                        Linguistics: Human Language Technologies","author":"Li","year":"2016"},{"key":"2024070814304341200_bib129","article-title":"Understanding neural networks through\n                        representation erasure","author":"Li","year":"2016","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib130","doi-asserted-by":"publisher","first-page":"5315","DOI":"10.18653\/v1\/2023.acl-long.291","article-title":"On the Advance of Making Language Models Better\n                        Reasoners","volume-title":"Proceedings of the 61st Annual Meeting\n                        of the Association for Computational Linguistics (Volume 1: Long\n                        Papers)","author":"Li","year":"2022"},{"key":"2024070814304341200_bib131","doi-asserted-by":"publisher","first-page":"158","DOI":"10.18653\/v1\/P17-1015","article-title":"Program induction by rationale generation:\n                        Learning to solve and explain algebraic word problems","volume-title":"Proceedings of the 55th Annual Meeting of the Association for\n                        Computational Linguistics (Volume 1: Long Papers)","author":"Ling","year":"2017"},{"key":"2024070814304341200_bib132","article-title":"The Mythos of Model\n                        Interpretability","author":"Lipton","year":"2016","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib133","first-page":"13807","article-title":"Rethinking attention-model explainability through\n                        faithfulness violation test","volume-title":"International\n                        Conference on Machine Learning, ICML 2022","author":"Liu","year":"2022"},{"key":"2024070814304341200_bib134","article-title":"RoBERTa: A robustly optimized BERT\n                        pretraining approach","author":"Liu","year":"2019","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib135","article-title":"Information-theoretic probing explains\n                        reliance on spurious features","volume-title":"International\n                        Conference on Learning Representations","author":"Lovering","year":"2020"},{"key":"2024070814304341200_bib136","first-page":"4461","article-title":"Influence patterns for explaining information\n                        flow in BERT","volume-title":"Advances in Neural Information\n                        Processing Systems 34: Annual Conference on Neural Information Processing\n                        Systems 2021, NeurIPS 2021","author":"Lu","year":"2021"},{"key":"2024070814304341200_bib137","first-page":"4765","article-title":"A unified approach to interpreting model\n                        predictions","volume-title":"Advances in Neural Information\n                        Processing Systems 30: Annual Conference on Neural Information Processing\n                        Systems 2017","author":"Lundberg","year":"2017"},{"key":"2024070814304341200_bib138","doi-asserted-by":"publisher","first-page":"305","DOI":"10.18653\/v1\/2023.ijcnlp-main.20","article-title":"Faithful chain-of-thought\n                        reasoning","volume-title":"Proceedings of the 13th International\n                        Joint Conference on Natural Language Processing and the 3rd Conference of\n                        the Asia-Pacific Chapter of the Association for Computational Linguistics\n                        (Volume 1: Long Papers)","author":"Lyu","year":"2023"},{"key":"2024070814304341200_bib139","article-title":"Improving neural model performance through\n                        natural language feedback on their explanations","author":"Madaan","year":"2021","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib140","article-title":"The neuro-symbolic concept learner:\n                        Interpreting scenes, words, and sentences from natural\n                        supervision","volume-title":"7th International Conference on\n                        Learning Representations, ICLR 2019","author":"Mao","year":"2019"},{"key":"2024070814304341200_bib141","doi-asserted-by":"publisher","first-page":"410","DOI":"10.18653\/v1\/2022.findings-naacl.31","article-title":"Few-shot self-rationalization with natural\n                        language prompts","volume-title":"Findings of the Association for\n                        Computational Linguistics: NAACL 2022","author":"Marasovic","year":"2022"},{"key":"2024070814304341200_bib142","first-page":"1614","article-title":"From softmax to sparsemax: A sparse model\n                        of attention and multi-label classification","volume-title":"Proceedings of the 33nd International Conference on Machine\n                        Learning, ICML 2016","author":"Martins","year":"2016"},{"key":"2024070814304341200_bib143","doi-asserted-by":"publisher","first-page":"3428","DOI":"10.18653\/v1\/P19-1334","article-title":"Right for the wrong reasons: Diagnosing\n                        syntactic heuristics in natural language inference","volume-title":"Proceedings of the 57th Annual Meeting of the Association for\n                        Computational Linguistics","author":"McCoy","year":"2019"},{"key":"2024070814304341200_bib144","article-title":"Explanation in artificial intelligence:\n                        Insights from the social sciences","author":"Miller","year":"2017","journal-title":"ArXiv\n                        preprint"},{"key":"2024070814304341200_bib145","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1109\/VAST.2017.8585721","article-title":"Understanding hidden memories of recurrent\n                        neural networks","volume-title":"2017 IEEE Conference on Visual\n                        Analytics Science and Technology (VAST)","author":"Ming","year":"2017"},{"key":"2024070814304341200_bib146","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/978-3-030-28954-6_10","article-title":"Layer-wise relevance propagation: An\n                        overview","volume-title":"Explainable AI: Interpreting, Explaining and Visualizing Deep\n                        Learning","author":"Montavon","year":"2019"},{"key":"2024070814304341200_bib147","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1016\/j.patcog.2016.11.008","article-title":"Explaining nonlinear classification\n                        decisions with deep Taylor decomposition","volume":"65","author":"Montavon","year":"2017","journal-title":"Pattern\n                        Recognition"},{"key":"2024070814304341200_bib148","first-page":"4593","article-title":"SHAP-based explanation methods: A review for NLP\n                        interpretability","volume-title":"Proceedings of the 29th\n                        International Conference on Computational Linguistics","author":"Mosca","year":"2022"},{"key":"2024070814304341200_bib149","doi-asserted-by":"publisher","first-page":"95","DOI":"10.18653\/v1\/2022.conll-1.8","article-title":"Causal analysis of syntactic agreement\n                        neurons in multilingual language models","volume-title":"Proceedings of the 26th Conference on Computational Natural Language\n                        Learning (CoNLL)","author":"Mueller","year":"2022"},{"key":"2024070814304341200_bib150","doi-asserted-by":"publisher","first-page":"1101","DOI":"10.18653\/v1\/N18-1100","article-title":"Explainable prediction of medical codes\n                        from clinical text","volume-title":"Proceedings of the 2018\n                        Conference of the North American Chapter of the Association for\n                        Computational Linguistics: Human Language Technologies, Volume 1 (Long\n                        Papers)","author":"Mullenbach","year":"2018"},{"issue":"44","key":"2024070814304341200_bib151","doi-asserted-by":"publisher","first-page":"22071","DOI":"10.1073\/pnas.1900654116","article-title":"Definitions, methods, and applications in\n                        interpretable machine learning","volume":"116","author":"Murdoch","year":"2019","journal-title":"Proceedings of the\n                        National Academy of Sciences"},{"key":"2024070814304341200_bib152","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1007\/s10618-023-00962-4","article-title":"An attention matrix for every decision:\n                        Faithfulness-based arbitration among multiple attention-based\n                        interpretations of transformers in text classification","volume":"38","author":"Mylonas","year":"2022","journal-title":"Data Mining and Knowledge Discovery"},{"key":"2024070814304341200_bib153","article-title":"WT5?! Training Text-to-Text Models to\n                        Explain their predictions","author":"Narang","year":"2020","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib154","first-page":"3806","article-title":"A theoretical explanation for perplexing\n                        behaviors of backpropagation-based visualizations","volume-title":"Proceedings of the 35th International Conference on Machine\n                        Learning, ICML 2018","author":"Nie","year":"2018"},{"key":"2024070814304341200_bib155","article-title":"Show your work: Scratchpads for\n                        intermediate computation with language models","author":"Nye","year":"2021","journal-title":"Deep\n                        Learning for Code Workshop"},{"key":"2024070814304341200_bib156","unstructured":"OpenAI. 2023. GPT-4 technical\n                        report. arXiv preprint\n                    arXiv:2303.08774."},{"key":"2024070814304341200_bib157","article-title":"On measuring faithfulness or self-consistency\n                        of natural language explanations","author":"Parcalabescu","year":"2024","journal-title":"arXiv"},{"key":"2024070814304341200_bib158","doi-asserted-by":"publisher","first-page":"8779","DOI":"10.1109\/CVPR.2018.00915","article-title":"Multimodal explanations: Justifying\n                        decisions and pointing to the evidence","volume-title":"2018 IEEE\n                        Conference on Computer Vision and Pattern Recognition, CVPR 2018","author":"Park","year":"2018"},{"key":"2024070814304341200_bib159","doi-asserted-by":"publisher","first-page":"105","DOI":"10.18653\/v1\/2021.eacl-main.9","article-title":"Telling BERT\u2019s full story: From\n                        local attention to global aggregation","volume-title":"Proceedings of the 16th Conference of the European Chapter of the\n                        Association for Computational Linguistics: Main Volume","author":"Pascual","year":"2021"},{"key":"2024070814304341200_bib160","doi-asserted-by":"publisher","first-page":"2463","DOI":"10.18653\/v1\/D19-1250","article-title":"Language models as knowledge\n                        bases?","volume-title":"Proceedings of the 2019 Conference on\n                        Empirical Methods in Natural Language Processing and the 9th International\n                        Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Petroni","year":"2019"},{"key":"2024070814304341200_bib161","doi-asserted-by":"publisher","first-page":"967","DOI":"10.18653\/v1\/2021.naacl-main.75","article-title":"An empirical comparison of instance\n                        attribution methods for NLP","volume-title":"Proceedings of the\n                        2021 Conference of the North American Chapter of the Association for\n                        Computational Linguistics: Human Language Technologies","author":"Pezeshkpour","year":"2021"},{"key":"2024070814304341200_bib162","doi-asserted-by":"publisher","first-page":"325","DOI":"10.18653\/v1\/W18-5437","article-title":"Interpretable textual neuron\n                        representations for NLP","volume-title":"Proceedings of the 2018\n                        EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for\n                        NLP","author":"Poerner","year":"2018"},{"key":"2024070814304341200_bib163","doi-asserted-by":"publisher","first-page":"340","DOI":"10.18653\/v1\/P18-1032","article-title":"Evaluating neural network explanation methods using hybrid\n                        documents and morphosyntactic agreement","volume-title":"Proceedings of the 56th Annual Meeting of the Association for\n                        Computational Linguistics (Volume 1: Long Papers)","author":"Poerner","year":"2018"},{"key":"2024070814304341200_bib164","doi-asserted-by":"publisher","first-page":"180","DOI":"10.18653\/v1\/S18-2023","article-title":"Hypothesis only baselines in natural\n                        language inference","volume-title":"Proceedings of the Seventh\n                        Joint Conference on Lexical and Computational Semantics","author":"Poliak","year":"2018"},{"key":"2024070814304341200_bib165","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1162\/tacl_a_00465","article-title":"Evaluating explanations: How much do\n                        explanations from the teacher aid students?","volume":"10","author":"Pruthi","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024070814304341200_bib166","doi-asserted-by":"publisher","first-page":"4782","DOI":"10.18653\/v1\/2020.acl-main.432","article-title":"Learning to deceive with attention-based\n                        explanations","volume-title":"Proceedings of the 58th Annual\n                        Meeting of the Association for Computational Linguistics","author":"Pruthi","year":"2020"},{"key":"2024070814304341200_bib167","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.516","article-title":"Limitations of language models in arithmetic and symbolic\n                        induction","volume-title":"Proceedings of the 61st Annual Meeting\n                        of the Association for Computational Linguistics (Volume 1: Long\n                        Papers)","author":"Qian","year":"2022"},{"key":"2024070814304341200_bib168","doi-asserted-by":"publisher","first-page":"826","DOI":"10.18653\/v1\/D16-1079","article-title":"Analyzing linguistic knowledge in sequential\n                        model of sentence","volume-title":"Proceedings of the 2016\n                        Conference on Empirical Methods in Natural Language Processing","author":"Qian","year":"2016"},{"key":"2024070814304341200_bib169","first-page":"140:1","article-title":"Exploring the limits of transfer learning with a unified\n                        text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine\n                        Learning Research"},{"key":"2024070814304341200_bib170","doi-asserted-by":"publisher","first-page":"556","DOI":"10.18653\/v1\/2020.findings-emnlp.49","article-title":"Fixed encoder self-attention patterns in\n                        transformer-based machine translation","volume-title":"Findings\n                        of the Association for Computational Linguistics: EMNLP 2020","author":"Raganato","year":"2020"},{"key":"2024070814304341200_bib171","doi-asserted-by":"publisher","first-page":"836","DOI":"10.18653\/v1\/2021.emnlp-main.64","article-title":"SELFEXPLAIN: A self-explaining\n                        architecture for neural text classifiers","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural\n                        Language Processing","author":"Rajagopal","year":"2021"},{"key":"2024070814304341200_bib172","doi-asserted-by":"publisher","first-page":"4932","DOI":"10.18653\/v1\/P19-1487","article-title":"Explain yourself! Leveraging language\n                        models for commonsense reasoning","volume-title":"Proceedings of\n                        the 57th Annual Meeting of the Association for Computational\n                        Linguistics","author":"Rajani","year":"2019"},{"key":"2024070814304341200_bib173","first-page":"5968","article-title":"Model agnostic multilevel\n                        explanations","volume-title":"Advances in Neural Information\n                        Processing Systems 33: Annual Conference on Neural Information Processing\n                        Systems 2020, NeurIPS 2020","author":"Ramamurthy","year":"2020"},{"key":"2024070814304341200_bib174","doi-asserted-by":"publisher","first-page":"7237","DOI":"10.18653\/v1\/2020.acl-main.647","article-title":"Null it out: Guarding protected attributes\n                        by iterative nullspace projection","volume-title":"Proceedings of\n                        the 58th Annual Meeting of the Association for Computational\n                        Linguistics","author":"Ravfogel","year":"2020"},{"key":"2024070814304341200_bib175","doi-asserted-by":"publisher","first-page":"9413","DOI":"10.18653\/v1\/2023.acl-long.523","article-title":"Log-linear guardedness and its\n                        implications","volume-title":"Proceedings of the 61st Annual\n                        Meeting of the Association for Computational Linguistics (Volume 1: Long\n                        Papers)","author":"Ravfogel","year":"2022"},{"key":"2024070814304341200_bib176","doi-asserted-by":"publisher","first-page":"194","DOI":"10.18653\/v1\/2021.conll-1.15","article-title":"Counterfactual interventions reveal the\n                        causal effect of relative clause representations on agreement\n                        prediction","volume-title":"Proceedings of the 25th Conference on\n                        Computational Natural Language Learning","author":"Ravfogel","year":"2021"},{"key":"2024070814304341200_bib177","doi-asserted-by":"publisher","first-page":"3363","DOI":"10.18653\/v1\/2021.eacl-main.295","article-title":"Probing the probing paradigm: Does probing accuracy entail\n                        task relevance?","volume-title":"Proceedings of the 16th\n                        Conference of the European Chapter of the Association for Computational\n                        Linguistics: Main Volume","author":"Ravichander","year":"2021"},{"key":"2024070814304341200_bib178","first-page":"8592","article-title":"Visualizing and measuring the geometry of\n                        BERT","volume-title":"Advances in Neural Information Processing\n                        Systems 32: Annual Conference on Neural Information Processing Systems 2019,\n                        NeurIPS 2019","author":"Reif","year":"2019"},{"key":"2024070814304341200_bib179","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1145\/2939672.2939778","article-title":"\u201cWhy should I trust you?\u201d:\n                        Explaining the predictions of any classifier","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on\n                        Knowledge Discovery and Data Mining","author":"Ribeiro","year":"2016"},{"key":"2024070814304341200_bib180","doi-asserted-by":"publisher","first-page":"1527","DOI":"10.1609\/aaai.v32i1.11491","article-title":"Anchors: High-precision model-agnostic\n                        explanations","volume-title":"Proceedings of the Thirty-Second\n                        AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative\n                        Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI\n                        Symposium on Educational Advances in Artificial Intelligence\n                        (EAAI-18)","author":"Ribeiro","year":"2018"},{"key":"2024070814304341200_bib181","first-page":"1","article-title":"Counterfactual thinking: A critical\n                        overview","volume-title":"What Might Have Been: The Social\n                        Psychology of Counterfactual Thinking","author":"Roese","year":"1995"},{"key":"2024070814304341200_bib182","doi-asserted-by":"publisher","first-page":"1285","DOI":"10.1162\/tacl_a_00519","article-title":"Neuron-level interpretation of deep NLP\n                        models: A survey","volume":"10","author":"Sajjad","year":"2022","journal-title":"Transactions of the Association\n                        for Computational Linguistics"},{"key":"2024070814304341200_bib183","doi-asserted-by":"publisher","first-page":"8732","DOI":"10.1609\/aaai.v34i05.6399","article-title":"WinoGrande: An adversarial winograd schema challenge at\n                        scale","volume-title":"The Thirty-Fourth AAAI Conference on\n                        Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative\n                        Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth\n                        AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI\n                        2020","author":"Sakaguchi","year":"2020"},{"issue":"11","key":"2024070814304341200_bib184","doi-asserted-by":"publisher","first-page":"2660","DOI":"10.1109\/TNNLS.2016.2599820","article-title":"Evaluating the visualization of what a\n                        Deep Neural Network has learned","volume":"28","author":"Samek","year":"2016","journal-title":"IEEE transactions\n                        on neural networks and learning systems"},{"key":"2024070814304341200_bib185","doi-asserted-by":"publisher","first-page":"295","DOI":"10.18653\/v1\/P18-1028","article-title":"Bridging CNNs, RNNs, and weighted finite-state\n                        machines","volume-title":"Proceedings of the 56th Annual Meeting\n                        of the Association for Computational Linguistics (Volume 1: Long\n                        Papers)","author":"Schwartz","year":"2018"},{"key":"2024070814304341200_bib186","doi-asserted-by":"publisher","first-page":"2931","DOI":"10.18653\/v1\/P19-1282","article-title":"Is attention interpretable?","volume-title":"Proceedings of the 57th Annual Meeting of the Association for\n                        Computational Linguistics","author":"Serrano","year":"2019"},{"key":"2024070814304341200_bib187","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1515\/9781400881970-018","article-title":"17. A value for n-person\n                    games","volume-title":"Contributions to the Theory of Games (AM-28), Volume II","author":"Shapley","year":"1953"},{"key":"2024070814304341200_bib188","first-page":"3145","article-title":"Learning important features through\n                        propagating activation differences","volume-title":"Proceedings\n                        of the 34th International Conference on Machine Learning, ICML\n                    2017","author":"Shrikumar","year":"2017"},{"key":"2024070814304341200_bib189","article-title":"Not just a black box: Learning important\n                        features through propagating activation differences","author":"Shrikumar","year":"2017","journal-title":"ArXiv preprint, arXiv:1605.01713 [cs]"},{"key":"2024070814304341200_bib190","doi-asserted-by":"publisher","first-page":"9837","DOI":"10.1609\/aaai.v37i8.26174","article-title":"Logical satisfiability of counterfactuals\n                        for faithful explanations in NLI","volume-title":"Proceedings of the\n                        AAAI Conference on Artificial Intelligence","author":"Sia","year":"2022"},{"key":"2024070814304341200_bib191","article-title":"Deep inside convolutional networks:\n                        Visualising image classification models and saliency maps","volume-title":"Workshop at International Conference on Learning\n                        Representations","author":"Simonyan","year":"2014"},{"key":"2024070814304341200_bib192","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1145\/3375627.3375830","article-title":"Fooling LIME and SHAP: Adversarial attacks\n                        on post hoc explanation methods","volume-title":"Proceedings of\n                        the AAAI\/ACM Conference on AI, Ethics, and Society","author":"Slack","year":"2020"},{"key":"2024070814304341200_bib193","article-title":"SmoothGrad: Removing noise by adding\n                        noise","author":"Smilkov","year":"2017","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib194","article-title":"Striving for simplicity: The all\n                        convolutional net","volume-title":"arXiv preprint\n                        arXiv:1412.6806","author":"Springenberg","year":"2015"},{"issue":"1","key":"2024070814304341200_bib195","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1109\/TVCG.2018.2865044","article-title":"Seq2seq-Vis: A visual debugging tool for sequence-to-sequence\n                        models","volume":"25","author":"Strobelt","year":"2019","journal-title":"IEEE Transactions on Visualization and\n                        Computer Graphics"},{"issue":"1","key":"2024070814304341200_bib196","doi-asserted-by":"publisher","first-page":"667","DOI":"10.1109\/TVCG.2017.2744158","article-title":"LSTMVis: A tool for visual analysis of hidden state dynamics\n                        in recurrent neural networks","volume":"24","author":"Strobelt","year":"2018","journal-title":"IEEE Transactions on\n                        Visualization and Computer Graphics"},{"key":"2024070814304341200_bib197","doi-asserted-by":"publisher","first-page":"5594","DOI":"10.18653\/v1\/2020.acl-main.495","article-title":"Obtaining faithful interpretations from\n                        compositional neural networks","volume-title":"Proceedings of the\n                        58th Annual Meeting of the Association for Computational\n                        Linguistics","author":"Subramanian","year":"2020"},{"key":"2024070814304341200_bib198","first-page":"3319","article-title":"Axiomatic attribution for deep networks","volume-title":"Proceedings of the 34th International Conference on Machine\n                        Learning, ICML 2017","author":"Sundararajan","year":"2017"},{"key":"2024070814304341200_bib199","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1016\/j.jbi.2018.06.016","article-title":"Patient representation learning and\n                        interpretable evaluation using clinical notes","volume":"84","author":"Sushil","year":"2018","journal-title":"Journal of Biomedical Informatics"},{"key":"2024070814304341200_bib200","doi-asserted-by":"publisher","first-page":"3621","DOI":"10.18653\/v1\/2021.findings-acl.317","article-title":"ProofWriter: Generating implications, proofs,\n                        and abductive statements over natural language","volume-title":"Findings of the Association for Computational Linguistics:\n                        ACL-IJCNLP 2021","author":"Tafjord","year":"2021"},{"key":"2024070814304341200_bib201","doi-asserted-by":"publisher","first-page":"107","DOI":"10.18653\/v1\/2020.emnlp-demos.15","article-title":"The language interpretability tool: Extensible, interactive\n                        visualizations and analysis for NLP models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural\n                        Language Processing: System Demonstrations","author":"Tenney","year":"2020"},{"key":"2024070814304341200_bib202","first-page":"6147","article-title":"How does this interaction affect me? Interpretable\n                        attribution for feature interactions","volume-title":"Advances in\n                        Neural Information Processing Systems 33: Annual Conference on Neural\n                        Information Processing Systems 2020, NeurIPS 2020","author":"Tsang","year":"2020"},{"key":"2024070814304341200_bib203","doi-asserted-by":"publisher","first-page":"862","DOI":"10.18653\/v1\/2021.findings-acl.76","article-title":"What if this modified that? Syntactic interventions with\n                        counterfactual embeddings","volume-title":"Findings of the\n                        Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Tucker","year":"2021"},{"key":"2024070814304341200_bib204","doi-asserted-by":"publisher","first-page":"131","DOI":"10.18653\/v1\/2020.repl4nlp-1.17","article-title":"Staying true to your word: (How) can\n                        attention become explanation?","volume-title":"Proceedings of the\n                        5th Workshop on Representation Learning for NLP","author":"Tutek","year":"2020"},{"key":"2024070814304341200_bib205","article-title":"Attention interpretability across NLP\n                        tasks","author":"Vashishth","year":"2019","journal-title":"ArXiv preprint"},{"key":"2024070814304341200_bib206","first-page":"5998","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems 30: Annual\n                        Conference on Neural Information Processing Systems 2017","author":"Vaswani","year":"2017"},{"key":"2024070814304341200_bib207","first-page":"69","article-title":"Diagnostic classifiers: Revealing how\n                        neural networks process hierarchical structure","author":"Veldhoen","year":"2016","journal-title":"CoCo@ NIPS"},{"key":"2024070814304341200_bib208","article-title":"Visualizing attention in transformer-based\n                        language representation models","author":"Vig","year":"2019","journal-title":"ArXiv\n                        preprint"},{"key":"2024070814304341200_bib209","first-page":"12388","article-title":"Investigating gender bias in language\n                        models using causal mediation analysis","volume-title":"Advances\n                        in Neural Information Processing Systems 33: Annual Conference on Neural\n                        Information Processing Systems 2020, NeurIPS 2020","author":"Vig","year":"2020"},{"key":"2024070814304341200_bib210","doi-asserted-by":"publisher","first-page":"5797","DOI":"10.18653\/v1\/P19-1580","article-title":"Analyzing multi-head self-attention:\n                        Specialized heads do the heavy lifting, the rest can be\n                        pruned","volume-title":"Proceedings of the 57th Annual Meeting of\n                        the Association for Computational Linguistics","author":"Voita","year":"2019"},{"key":"2024070814304341200_bib211","doi-asserted-by":"publisher","first-page":"183","DOI":"10.18653\/v1\/2020.emnlp-main.14","article-title":"Information-theoretic probing with minimum\n                        description length","volume-title":"Proceedings of the 2020\n                        Conference on Empirical Methods in Natural Language Processing\n                        (EMNLP)","author":"Voita","year":"2020"},{"key":"2024070814304341200_bib212","doi-asserted-by":"publisher","first-page":"136","DOI":"10.18653\/v1\/W18-5416","article-title":"Interpreting neural networks with nearest\n                        neighbors","volume-title":"Proceedings of the 2018 EMNLP Workshop\n                        BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Wallace","year":"2018"},{"key":"2024070814304341200_bib213","doi-asserted-by":"publisher","first-page":"2153","DOI":"10.18653\/v1\/D19-1221","article-title":"Universal adversarial triggers for\n                        attacking and analyzing NLP","volume-title":"Proceedings of the\n                        2019 Conference on Empirical Methods in Natural Language Processing and the\n                        9th International Joint Conference on Natural Language Processing\n                        (EMNLP-IJCNLP)","author":"Wallace","year":"2019"},{"key":"2024070814304341200_bib214","doi-asserted-by":"publisher","first-page":"20","DOI":"10.18653\/v1\/2020.emnlp-tutorials.3","article-title":"Interpreting predictions of NLP\n                        models","volume-title":"Proceedings of the 2020 Conference on\n                        Empirical Methods in Natural Language Processing: Tutorial\n                        Abstracts","author":"Wallace","year":"2020"},{"key":"2024070814304341200_bib215","doi-asserted-by":"publisher","first-page":"7","DOI":"10.18653\/v1\/D19-3002","article-title":"AllenNLP interpret: A framework for\n                        explaining predictions of NLP models","volume-title":"Proceedings\n                        of the 2019 Conference on Empirical Methods in Natural Language Processing\n                        and the 9th International Joint Conference on Natural Language Processing\n                        (EMNLP-IJCNLP): System Demonstrations","author":"Wallace","year":"2019"},{"key":"2024070814304341200_bib216","first-page":"3261","article-title":"SuperGLUE: A stickier benchmark for\n                        general-purpose language understanding systems","volume-title":"Advances in Neural Information Processing Systems 32: Annual\n                        Conference on Neural Information Processing Systems 2019, NeurIPS\n                        2019","author":"Wang","year":"2019"},{"key":"2024070814304341200_bib217","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5446","article-title":"GLUE: A multi-task benchmark and analysis\n                        platform for natural language understanding","volume-title":"7th\n                        International Conference on Learning Representations, ICLR 2019","author":"Wang","year":"2019"},{"key":"2024070814304341200_bib218","doi-asserted-by":"publisher","first-page":"247","DOI":"10.18653\/v1\/2020.findings-emnlp.24","article-title":"Gradient-based analysis of NLP models is\n                        manipulable","volume-title":"Findings of the Association for\n                        Computational Linguistics: EMNLP 2020","author":"Wang","year":"2020"},{"key":"2024070814304341200_bib219","doi-asserted-by":"publisher","first-page":"70","DOI":"10.18653\/v1\/2022.conll-1.6","article-title":"A fine-grained interpretability evaluation\n                        benchmark for neural NLP","volume-title":"Proceedings of the 26th\n                        Conference on Computational Natural Language Learning (CoNLL)","author":"Wang","year":"2022"},{"key":"2024070814304341200_bib220","article-title":"Self-consistency improves chain of thought\n                        reasoning in language models","volume-title":"The Eleventh\n                        International Conference on Learning\n                    Representations","author":"Wang","year":"2022"},{"key":"2024070814304341200_bib221","article-title":"Chain of thought prompting elicits reasoning in large\n                        language models","volume-title":"ArXiv preprint","author":"Wei","year":"2022"},{"issue":"1","key":"2024070814304341200_bib222","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1109\/TVCG.2019.2934619","article-title":"The What-If Tool: Interactive probing of\n                        machine learning models","volume":"26","author":"Wexler","year":"2020","journal-title":"IEEE Transactions on\n                        Visualization and Computer Graphics"},{"key":"2024070814304341200_bib223","doi-asserted-by":"publisher","first-page":"632","DOI":"10.18653\/v1\/2022.naacl-main.47","article-title":"Reframing human-AI collaboration for generating free-text\n                        explanations","volume-title":"Proceedings of the 2022 Conference\n                        of the North American Chapter of the Association for Computational\n                        Linguistics: Human Language Technologies","author":"Wiegreffe","year":"2022"},{"key":"2024070814304341200_bib224","doi-asserted-by":"publisher","first-page":"10266","DOI":"10.18653\/v1\/2021.emnlp-main.804","article-title":"Measuring association between labels and\n                        free-text rationales","volume-title":"Proceedings of the 2021\n                        Conference on Empirical Methods in Natural Language Processing","author":"Wiegreffe","year":"2021"},{"key":"2024070814304341200_bib225","doi-asserted-by":"publisher","first-page":"11","DOI":"10.18653\/v1\/D19-1002","article-title":"Attention is not not\n                        explanation","volume-title":"Proceedings of the 2019 Conference\n                        on Empirical Methods in Natural Language Processing and the 9th\n                        International Joint Conference on Natural Language Processing\n                        (EMNLP-IJCNLP)","author":"Wiegreffe","year":"2019"},{"issue":"1","key":"2024070814304341200_bib226","doi-asserted-by":"publisher","first-page":"659","DOI":"10.1146\/annurev.soc.25.1.659","article-title":"The estimation of causal effects from\n                        observational data","volume":"25","author":"Winship","year":"1999","journal-title":"Annual Review of\n                        Sociology"},{"key":"2024070814304341200_bib227","doi-asserted-by":"publisher","first-page":"6707","DOI":"10.18653\/v1\/2021.acl-long.523","article-title":"Polyjuice: Generating counterfactuals for explaining,\n                        evaluating, and improving models","volume-title":"Proceedings of\n                        the 59th Annual Meeting of the Association for Computational Linguistics and\n                        the 11th International Joint Conference on Natural Language Processing\n                        (Volume 1: Long Papers)","author":"Wu","year":"2021"},{"key":"2024070814304341200_bib228","doi-asserted-by":"publisher","first-page":"950","DOI":"10.18653\/v1\/P17-1088","article-title":"An interpretable knowledge transfer model for knowledge base\n                        completion","volume-title":"Proceedings of the 55th Annual\n                        Meeting of the Association for Computational Linguistics (Volume 1: Long\n                        Papers)","author":"Xie","year":"2017"},{"key":"2024070814304341200_bib229","article-title":"Benchmarking attribution methods with relative feature\n                        importance","author":"Yang","year":"2019","journal-title":"arXiv preprint\n                    arXiv:1907.09701"},{"key":"2024070814304341200_bib230","doi-asserted-by":"publisher","first-page":"2369","DOI":"10.18653\/v1\/D18-1259","article-title":"HotpotQA: A dataset for diverse,\n                        explainable multi-hop question answering","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural\n                        Language Processing","author":"Yang","year":"2018"},{"key":"2024070814304341200_bib231","article-title":"The unreliability of explanations in\n                        few-shot prompting for textual reasoning","author":"Ye","year":"2022"},{"key":"2024070814304341200_bib232","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.41","article-title":"Explanation Selection Using Unlabeled Data\n                        for In-Context Learning","author":"Ye","year":"2023"},{"key":"2024070814304341200_bib233","doi-asserted-by":"publisher","first-page":"4469","DOI":"10.18653\/v1\/2023.findings-acl.273","article-title":"Complementary explanations for effective\n                        in-context learning","volume-title":"Findings of the Association\n                        for Computational Linguistics: ACL 2023","author":"Ye","year":"2022"},{"key":"2024070814304341200_bib234","doi-asserted-by":"publisher","first-page":"5496","DOI":"10.18653\/v1\/2021.emnlp-main.447","article-title":"Connecting attributions and QA model\n                        behavior on realistic counterfactuals","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural\n                        Language Processing","author":"Ye","year":"2021"},{"key":"2024070814304341200_bib235","first-page":"10965","article-title":"On the (in)fidelity and sensitivity of\n                        explanations","volume-title":"Advances in Neural Information\n                        Processing Systems 32: Annual Conference on Neural Information Processing\n                        Systems 2019, NeurIPS 2019","author":"Yeh","year":"2019"},{"key":"2024070814304341200_bib236","first-page":"20554","article-title":"On completeness-aware concept-based\n                        explanations in deep neural networks","volume-title":"Advances in\n                        Neural Information Processing Systems 33: Annual Conference on Neural\n                        Information Processing Systems 2020, NeurIPS 2020","author":"Yeh","year":"2020"},{"key":"2024070814304341200_bib237","first-page":"1039","article-title":"Neural-symbolic VQA: Disentangling\n                        reasoning from vision and language understanding","volume-title":"Advances in Neural Information Processing Systems 31: Annual\n                        Conference on Neural Information Processing Systems 2018, NeurIPS\n                        2018","author":"Yi","year":"2018"},{"key":"2024070814304341200_bib238","doi-asserted-by":"publisher","first-page":"2631","DOI":"10.18653\/v1\/2022.acl-long.188","article-title":"On the sensitivity and stability of model\n                        interpretations in NLP","volume-title":"Proceedings of the 60th\n                        Annual Meeting of the Association for Computational Linguistics (Volume 1:\n                        Long Papers)","author":"Yin","year":"2022"},{"key":"2024070814304341200_bib239","doi-asserted-by":"publisher","first-page":"184","DOI":"10.18653\/v1\/2022.emnlp-main.14","article-title":"Interpreting language models with\n                        contrastive explanations","volume-title":"Proceedings of the 2022\n                        Conference on Empirical Methods in Natural Language Processing","author":"Yin","year":"2022"},{"key":"2024070814304341200_bib240","first-page":"260","article-title":"Using \u201cAnnotator Rationales\u201d\n                        to Improve Machine Learning for Text Categorization","volume-title":"Human Language Technologies 2007: The Conference of the North\n                        American Chapter of the Association for Computational Linguistics;\n                        Proceedings of the Main Conference","author":"Zaidan","year":"2007"},{"key":"2024070814304341200_bib241","doi-asserted-by":"publisher","first-page":"818","DOI":"10.1007\/978-3-319-10590-1_53","article-title":"Visualizing and understanding\n                        convolutional networks","volume-title":"Computer Vision \u2013 ECCV 2014","author":"Zeiler","year":"2014"},{"key":"2024070814304341200_bib242","doi-asserted-by":"publisher","first-page":"4791","DOI":"10.18653\/v1\/P19-1472","article-title":"HellaSwag: Can a machine really finish your\n                        sentence?","volume-title":"Proceedings of the 57th Annual Meeting\n                        of the Association for Computational Linguistics","author":"Zellers","year":"2019"},{"key":"2024070814304341200_bib243","doi-asserted-by":"publisher","first-page":"64","DOI":"10.18653\/v1\/2022.trustnlp-1.6","article-title":"The irrationality of neural rationale models","volume-title":"Proceedings of the 2nd Workshop on Trustworthy Natural Language\n                        Processing (TrustNLP 2022)","author":"Zheng","year":"2022"},{"key":"2024070814304341200_bib244","article-title":"Least-to-most prompting enables complex\n                        reasoning in large language models","volume-title":"The Eleventh\n                        International Conference on Learning\n                    Representations","author":"Zhou","year":"2022"},{"key":"2024070814304341200_bib245","doi-asserted-by":"publisher","first-page":"9623","DOI":"10.1609\/aaai.v36i9.21196","article-title":"Do feature attribution methods correctly\n                        attribute features?","volume-title":"Thirty-Sixth AAAI Conference\n                        on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on\n                        Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth\n                        Symposium on Educational Advances in Artificial Intelligence, EAAI 2022\n                        Virtual Event","author":"Zhou","year":"2022"},{"key":"2024070814304341200_bib246","doi-asserted-by":"publisher","first-page":"5359","DOI":"10.18653\/v1\/2022.naacl-main.392","article-title":"ExSum: From local explanations to model\n                        understanding","volume-title":"Proceedings of the 2022 Conference\n                        of the North American Chapter of the Association for Computational\n                        Linguistics: Human Language Technologies","author":"Zhou","year":"2022"},{"key":"2024070814304341200_bib247","doi-asserted-by":"publisher","first-page":"2399","DOI":"10.18653\/v1\/2023.findings-eacl.182","article-title":"The Solvability of Interpretability Evaluation\n                        Metrics","volume-title":"Findings of the Association for\n                        Computational Linguistics: EACL 2023","author":"Zhou","year":"2023"},{"key":"2024070814304341200_bib248","doi-asserted-by":"publisher","first-page":"1651","DOI":"10.18653\/v1\/P19-1161","article-title":"Counterfactual data augmentation for\n                        mitigating gender stereotypes in languages with rich\n                        morphology","volume-title":"Proceedings of the 57th Annual\n                        Meeting of the Association for Computational Linguistics","author":"Zmigrod","year":"2019"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/50\/2\/657\/2457495\/coli_a_00511.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/50\/2\/657\/2457495\/coli_a_00511.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,8]],"date-time":"2024-07-08T14:35:51Z","timestamp":1720449351000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/50\/2\/657\/119158\/Towards-Faithful-Model-Explanation-in-NLP-A-Survey"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":248,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,6,1]]},"published-print":{"date-parts":[[2024,6,1]]}},"URL":"https:\/\/doi.org\/10.1162\/coli_a_00511","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}