{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T20:02:41Z","timestamp":1773432161584,"version":"3.50.1"},"reference-count":41,"publisher":"MIT Press - Journals","license":[{"start":{"date-parts":[[2021,8,6]],"date-time":"2021-08-06T00:00:00Z","timestamp":1628208000000},"content-version":"vor","delay-in-days":217,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,2]]},"abstract":"<jats:p>A question answering system that in addition to providing an answer provides an explanation of the reasoning that leads to that answer has potential advantages in terms of debuggability, extensibility, and trust. To this end, we propose QED, a linguistically informed, extensible framework for explanations in question answering. A QED explanation specifies the relationship between a question and answer according to formal semantic notions such as referential equality, sentencehood, and entailment. We describe and publicly release an expert-annotated dataset of QED explanations built upon a subset of the Google Natural Questions dataset, and report baseline models on two tasks\u2014post- hoc explanation generation given an answer, and joint question answering and explanation generation. In the joint setting, a promising result suggests that training on a relatively small amount of QED data can improve question answering. In addition to describing the formal, language-theoretic motivations for the QED approach, we describe a large user study showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.<\/jats:p>","DOI":"10.1162\/tacl_a_00398","type":"journal-article","created":{"date-parts":[[2021,9,20]],"date-time":"2021-09-20T19:29:42Z","timestamp":1632166182000},"page":"790-806","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":20,"title":["QED: A Framework and Dataset for Explanations in Question Answering"],"prefix":"10.1162","volume":"9","author":[{"given":"Matthew","family":"Lamm","sequence":"first","affiliation":[{"name":"Department of Linguistics, Stanford University, United States. mrlamm@google.com"}]},{"given":"Jennimaria","family":"Palomaki","sequence":"additional","affiliation":[{"name":"Google Research, United States. jpalomaki@google.com"}]},{"given":"Chris","family":"Alberti","sequence":"additional","affiliation":[{"name":"Google Research, United States. chrisalberti@google.com"}]},{"given":"Daniel","family":"Andor","sequence":"additional","affiliation":[{"name":"Google Research, United States. andor@google.com"}]},{"given":"Eunsol","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Texas at Austin, United States. eunsol@cs.utexas.edu"}]},{"given":"Livio Baldini","family":"Soares","sequence":"additional","affiliation":[{"name":"Google Research, United States. liviobs@google.com"}]},{"given":"Michael","family":"Collins","sequence":"additional","affiliation":[{"name":"Google Research, United States. mjcollins@google.com"}]}],"member":"281","published-online":{"date-parts":[[2021,8,2]]},"reference":[{"key":"2021080620254590300_bib1","doi-asserted-by":"publisher","DOI":"10.1002\/9780470756959.ch6","article-title":"Definiteness and Indefiniteness","volume":"122","author":"Abbott","year":"2004","journal-title":"The Handbook of Pragmatics"},{"key":"2021080620254590300_bib2","doi-asserted-by":"publisher","first-page":"242","DOI":"10.18653\/v1\/E17-2039","article-title":"The Parallel Meaning Bank: Towards a multilingual corpus of translations annotated with compositional meaning representations","volume-title":"Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers","author":"Abzianidze","year":"2017"},{"key":"2021080620254590300_bib3","article-title":"A BERT baseline for the natural questions","author":"Alberti","year":"2019","journal-title":"arXiv preprint arXiv:1901.08634"},{"key":"2021080620254590300_bib4","first-page":"9539","article-title":"e-SNLI: Natural language inference with natural language explanations","volume-title":"Advances in Neural Information Processing Systems","author":"Camburu","year":"2018"},{"key":"2021080620254590300_bib5","doi-asserted-by":"publisher","first-page":"4157","DOI":"10.18653\/v1\/2020.acl-main.382","article-title":"Make up your mind! Adversarial generation of inconsistent natural language explanations","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Camburu","year":"2020"},{"issue":"3","key":"2021080620254590300_bib6","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1007\/BF00353456","article-title":"A unified analysis of the English bare plural","volume":"1","author":"Carlson","year":"1977","journal-title":"Linguistics and Philosophy"},{"key":"2021080620254590300_bib7","first-page":"2924","article-title":"BoolQ: Exploring the surprising difficulty of natural yes\/no questions","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Clark","year":"2019"},{"key":"2021080620254590300_bib8","doi-asserted-by":"publisher","first-page":"page 169174","DOI":"10.3115\/980190.980237","article-title":"Bridging","volume-title":"Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing","author":"Clark","year":"1975"},{"key":"2021080620254590300_bib9","article-title":"Definite reference and mutual knowledge","author":"Clark","year":"1981","journal-title":"Elements of Discourse Understanding"},{"key":"2021080620254590300_bib10","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2021080620254590300_bib11","article-title":"Towards a rigorous science of interpretable machine learning","author":"Doshi-Velez","year":"2017","journal-title":"arXiv preprint arXiv:1702.08608"},{"key":"2021080620254590300_bib12","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1145\/3301275.3302316","article-title":"Automated rationale generation: A technique for explainable ai and its effects on human perceptions","volume-title":"Proceedings of the 24th International Conference on Intelligent User Interfaces","author":"Ehsan","year":"2019"},{"key":"2021080620254590300_bib13","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511790942","volume-title":"Data Analysis Using Regression and Multilevel\/Hierarchical Models","author":"Gelman","year":"2006"},{"key":"2021080620254590300_bib14","unstructured":"Ben\n              Goodrich\n            , JonahGabry, ImadAli, and SamBrilleman. 2020. rstanarm: Bayesian Applied Regression Modeling via Stan. R package version 2.19.3."},{"key":"2021080620254590300_bib15","doi-asserted-by":"publisher","DOI":"10.3115\/992133.992154","article-title":"Automatic acquisition of hyponyms from large text corpora","volume-title":"COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics","author":"Hearst","year":"1992"},{"key":"2021080620254590300_bib16","doi-asserted-by":"publisher","first-page":"4198","DOI":"10.18653\/v1\/2020.acl-main.386","article-title":"Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness?","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Jacovi","year":"2020"},{"key":"2021080620254590300_bib17","first-page":"3543","article-title":"Attention is not explanation","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Jain","year":"2019"},{"key":"2021080620254590300_bib18","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1162\/tacl_a_00300","article-title":"SpanBERT: Improving pre-training by representing and predicting spans","volume":"8","author":"Joshi","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021080620254590300_bib19","first-page":"180","article-title":"Bare NPS: Kind-referring, indefinites, both, or neither?","volume-title":"Semantics and Linguistic Theory","author":"Krifka","year":"2003"},{"key":"2021080620254590300_bib20","doi-asserted-by":"publisher","DOI":"10.3765\/salt.v13i0.2880","article-title":"Nile: Natural language inference with faithful natural language explanations","author":"Kumar","year":"2020"},{"key":"2021080620254590300_bib21","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1162\/tacl_a_00276","article-title":"Natural questions: A benchmark for question answering research","volume":"7","author":"Kwiatkowski","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021080620254590300_bib22","doi-asserted-by":"publisher","first-page":"82","DOI":"10.18653\/v1\/D18-1008","article-title":"Textual analogy parsing: What\u2019s shared and what\u2019s compared among analogous facts","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Lamm","year":"2018"},{"key":"2021080620254590300_bib23","first-page":"188","article-title":"End-to-end ution","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Lee","year":"2017"},{"key":"2021080620254590300_bib24","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1007\/978-94-015-9731-9_2","article-title":"What good is an explanation?","volume-title":"Explanation: Theoretical Approaches and Applications","author":"Lipton","year":"2001"},{"key":"2021080620254590300_bib25","first-page":"1805","article-title":"Copular clauses","volume-title":"Semantics: An International Handbook of Natural Language Meaning","author":"Mikkelsen","year":"2011"},{"key":"2021080620254590300_bib26","article-title":"The Penn Discourse Treebank.","volume-title":"LREC","author":"Miltsakaki","year":"2004"},{"key":"2021080620254590300_bib27","doi-asserted-by":"publisher","first-page":"4206","DOI":"10.18653\/v1\/2020.acl-main.387","article-title":"Towards transparent and explainable attention models","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Mohankumar","year":"2020"},{"key":"2021080620254590300_bib28","article-title":"Wt5?! Training text-to-text models to explain their predictions","author":"Narang","year":"2020"},{"key":"2021080620254590300_bib29","first-page":"13","article-title":"The limitations of opaque learning machines","author":"Pearl","year":"2019","journal-title":"Possible Minds: Twenty- Five Ways of Looking at AI"},{"key":"2021080620254590300_bib30","article-title":"CoNLL-2012 Shared Task: Modeling multilingual unrestricted coreference in OntoNotes","volume-title":"EMNLP-CoNLL Shared Task","author":"Pradhan","year":"2012"},{"issue":"140","key":"2021080620254590300_bib31","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2021080620254590300_bib32","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264","article-title":"SQuAD: 100,000+ Questions for machine comprehension of text","author":"Rajpurkar","year":"2016","journal-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing"},{"key":"2021080620254590300_bib33","doi-asserted-by":"publisher","first-page":"249266","DOI":"10.1162\/tacl_a_00266","article-title":"CoQA: A conversational question answering challenge","volume":"7","author":"Reddy","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"issue":"05","key":"2021080620254590300_bib34","doi-asserted-by":"publisher","first-page":"8722","DOI":"10.1609\/aaai.v34i05.6398","article-title":"Getting closer to AI complete question answering: A set of prerequisite real tasks","volume":"34","author":"Rogers","year":"2020","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"2021080620254590300_bib35","first-page":"2662","article-title":"Right for the right reasons: Training differentiable models by constraining their explanations","volume-title":"Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17","author":"Ross","year":"2017"},{"issue":"56","key":"2021080620254590300_bib36","doi-asserted-by":"publisher","first-page":"479","DOI":"10.1093\/mind\/XIV.4.479","article-title":"On denoting","volume":"14","author":"Russell","year":"1905","journal-title":"Mind"},{"key":"2021080620254590300_bib37","doi-asserted-by":"publisher","first-page":"6078","DOI":"10.18653\/v1\/D19-1629","article-title":"WIQA: A dataset for \u201cWhat if\u2026\u201d reasoning over procedural text","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP)","author":"Tandon","year":"2019"},{"issue":"3","key":"2021080620254590300_bib38","doi-asserted-by":"publisher","first-page":"705","DOI":"10.1111\/j.1467-8624.2007.01025.x","article-title":"A new look at infant pointing","volume":"78","author":"Tomasello","year":"2007","journal-title":"Child Development"},{"key":"2021080620254590300_bib39","doi-asserted-by":"publisher","first-page":"pages 11\u2013pages 20","DOI":"10.18653\/v1\/D19-1002","article-title":"Attention is not not explanation","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP)","author":"Wiegreffe","year":"2019"},{"key":"2021080620254590300_bib40","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1162\/tacl_a_00309","article-title":"Break it down: A question understanding benchmark","volume":"8","author":"Wolfson","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021080620254590300_bib41","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1259","article-title":"HotpotQA: A dataset for diverse, explainable multi-hop question answering","volume-title":"Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Yang","year":"2018"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00398\/1955181\/tacl_a_00398.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00398\/1955181\/tacl_a_00398.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,20]],"date-time":"2021-09-20T19:30:10Z","timestamp":1632166210000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00398\/106795\/QED-A-Framework-and-Dataset-for-Explanations-in"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":41,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00398","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}