{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T21:14:42Z","timestamp":1773177282935,"version":"3.50.1"},"reference-count":55,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2023,1,3]],"date-time":"2023-01-03T00:00:00Z","timestamp":1672704000000},"content-version":"vor","delay-in-days":367,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,12,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging.<\/jats:p>","DOI":"10.1162\/tacl_a_00529","type":"journal-article","created":{"date-parts":[[2023,1,3]],"date-time":"2023-01-03T19:30:39Z","timestamp":1672774239000},"page":"1473-1490","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":27,"title":["<scp>FaithDial<\/scp>: A Faithful Benchmark for Information-Seeking Dialogue"],"prefix":"10.1162","volume":"10","author":[{"given":"Nouha","family":"Dziri","sequence":"first","affiliation":[{"name":"University of Alberta, Canada"},{"name":"Mila \u2013 Quebec AI Institute, Canada"},{"name":"Alberta Machine Intelligence Institute (Amii), Canada. dziri@cs.ualberta.ca"}]},{"given":"Ehsan","family":"Kamalloo","sequence":"additional","affiliation":[{"name":"University of Alberta, Canada"}]},{"given":"Sivan","family":"Milton","sequence":"additional","affiliation":[{"name":"McGill University, Canada"}]},{"given":"Osmar","family":"Zaiane","sequence":"additional","affiliation":[{"name":"University of Alberta, Canada"},{"name":"Alberta Machine Intelligence Institute (Amii), Canada"}]},{"given":"Mo","family":"Yu","sequence":"additional","affiliation":[{"name":"WeChat AI, Tencent, USA"}]},{"given":"Edoardo M.","family":"Ponti","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}]},{"given":"Siva","family":"Reddy","sequence":"additional","affiliation":[{"name":"Mila \u2013 Quebec AI Institute, Canada"},{"name":"McGill University, Canada"}]}],"member":"281","published-online":{"date-parts":[[2022,12,23]]},"reference":[{"key":"2023010319173589300_bib1","doi-asserted-by":"publisher","first-page":"610","DOI":"10.1145\/3442188.3445922","article-title":"On the dangers of stochastic parrots: Can language models be too big?","volume-title":"Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency","author":"Bender","year":"2021"},{"key":"2023010319173589300_bib2","doi-asserted-by":"crossref","first-page":"6633","DOI":"10.18653\/v1\/2021.emnlp-main.532","article-title":"CLIFF: Contrastive learning for improving faithfulness and factuality in abstractive summarization","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Cao","year":"2021"},{"issue":"2","key":"2023010319173589300_bib3","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1007\/s11251-009-9110-0","article-title":"Cognitive load theory, educational research, and instructional design: Some food for thought","volume":"38","author":"De Jong","year":"2010","journal-title":"Instructional Science"},{"issue":"3","key":"2023010319173589300_bib4","doi-asserted-by":"publisher","first-page":"1616","DOI":"10.1016\/j.chb.2005.08.012","article-title":"Cognitive load in hypertext reading: A review","volume":"23","author":"DeStefano","year":"2007","journal-title":"Computers in Human Behavior"},{"key":"2023010319173589300_bib5","article-title":"Wizard of Wikipedia: Knowledge- powered conversational agents","volume-title":"7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6\u20139, 2019","author":"Dinan","year":"2019"},{"key":"2023010319173589300_bib6","doi-asserted-by":"publisher","first-page":"5055","DOI":"10.18653\/v1\/2020.acl-main.454","article-title":"FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Durmus","year":"2020"},{"key":"2023010319173589300_bib7","doi-asserted-by":"publisher","first-page":"3806","DOI":"10.18653\/v1\/N19-1381","article-title":"Evaluating coherence in dialogue systems using entailment","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Dziri","year":"2019"},{"key":"2023010319173589300_bib8","doi-asserted-by":"publisher","first-page":"2197","DOI":"10.18653\/v1\/2021.emnlp-main.168","article-title":"Neural path hunter: Reducing hallucination in dialogue systems via path grounding","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Dziri","year":"2021"},{"key":"2023010319173589300_bib9","doi-asserted-by":"publisher","first-page":"5271","DOI":"10.18653\/v1\/2022.naacl-main.387","article-title":"On the origin of hallucinations in conversational models: Is it the datasets or the models?","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Dziri","year":"2022"},{"key":"2023010319173589300_bib10","doi-asserted-by":"publisher","first-page":"1066","DOI":"10.1162\/tacl_a_00506","article-title":"Evaluating attribution in dialogue aystems: The BEGIN benchmark","volume":"10","author":"Dziri","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2023010319173589300_bib11","doi-asserted-by":"publisher","first-page":"1891","DOI":"10.21437\/Interspeech.2019-3079","article-title":"Topical-Chat: Towards knowledge-grounded open-domain conversations","volume-title":"Proceedings of Interspeech 2019","author":"Gopalakrishnan","year":"2019"},{"key":"2023010319173589300_bib12","doi-asserted-by":"publisher","first-page":"1449","DOI":"10.18653\/v1\/2021.naacl-main.114","article-title":"Annotating and modeling fine-grained factuality in summarization","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Goyal","year":"2021"},{"key":"2023010319173589300_bib13","doi-asserted-by":"publisher","first-page":"708","DOI":"10.18653\/v1\/N18-1065","article-title":"Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Grusky","year":"2018"},{"key":"2023010319173589300_bib14","doi-asserted-by":"publisher","first-page":"3785","DOI":"10.18653\/v1\/2022.acl-long.263","article-title":"DialFact: A benchmark for fact-checking in dialogue","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Gupta","year":"2022"},{"key":"2023010319173589300_bib15","doi-asserted-by":"publisher","first-page":"7856","DOI":"10.18653\/v1\/2021.emnlp-main.619","article-title":"q2: Evaluating factual consistency in knowledge-grounded dialogues via question generation and question answering","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Or","year":"2021"},{"key":"2023010319173589300_bib16","article-title":"Survey of hallucination in natural language generation","author":"Ji","year":"2022","journal-title":"CoRR"},{"key":"2023010319173589300_bib17","doi-asserted-by":"publisher","first-page":"718","DOI":"10.18653\/v1\/2020.acl-main.66","article-title":"Improved natural language generation via loss truncation","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Kang","year":"2020"},{"key":"2023010319173589300_bib18","doi-asserted-by":"publisher","first-page":"1410","DOI":"10.18653\/v1\/2022.acl-long.100","article-title":"Faithful or extractive? On mitigating the faithfulness-abstractiveness trade-off in abstractive summarization","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ladhak","year":"2022"},{"key":"2023010319173589300_bib19","doi-asserted-by":"crossref","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","author":"Richard Landis","year":"1977","journal-title":"biometrics"},{"key":"2023010319173589300_bib20","doi-asserted-by":"publisher","first-page":"7871","DOI":"10.18653\/v1\/2020.acl-main.703","article-title":"BART: Denoising sequence- to-sequence pre-training for natural language generation, translation, and comprehension","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Lewis","year":"2020"},{"key":"2023010319173589300_bib21","doi-asserted-by":"publisher","first-page":"942","DOI":"10.18653\/v1\/2021.acl-short.118","article-title":"Addressing semantic drift in generative question answering with auxiliary extraction","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)","author":"Li","year":"2021"},{"key":"2023010319173589300_bib22","first-page":"74","article-title":"ROUGE: A package for automatic evaluation of summaries","volume-title":"Text Summarization Branches Out","author":"Lin","year":"2004"},{"key":"2023010319173589300_bib23","article-title":"Roberta: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"CoRR"},{"key":"2023010319173589300_bib24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/2020.eval4nlp-1.1","article-title":"Truth or error? Towards systematic analysis of factual errors in abstractive summaries","volume-title":"Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems","author":"Lux","year":"2020"},{"key":"2023010319173589300_bib25","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1162\/tacl_a_00494","article-title":"Reducing conversational agents\u2019 overconfidence through linguistic calibration","volume":"10","author":"Mielke","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2023010319173589300_bib26","doi-asserted-by":"publisher","first-page":"1699","DOI":"10.18653\/v1\/2021.acl-long.134","article-title":"I like fish, especially dolphins: Addressing contradictions in dialogue modeling","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Nie","year":"2021"},{"key":"2023010319173589300_bib27","article-title":"Representation learning with contrastive predictive coding","author":"van den Oord","year":"2018","journal-title":"CoRR"},{"key":"2023010319173589300_bib28","doi-asserted-by":"publisher","first-page":"311","DOI":"10.3115\/1073083.1073135","article-title":"BLEU: A method for automatic evaluation of machine translation","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni","year":"2002"},{"key":"2023010319173589300_bib29","doi-asserted-by":"publisher","first-page":"1173","DOI":"10.18653\/v1\/2020.emnlp-main.89","article-title":"ToTTo: A controlled table-to-text generation dataset","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Parikh","year":"2020"},{"key":"2023010319173589300_bib30","doi-asserted-by":"publisher","first-page":"4274","DOI":"10.18653\/v1\/2021.naacl-main.338","article-title":"Focused attention improves document-grounded generation","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Prabhumoye","year":"2021"},{"issue":"8","key":"2023010319173589300_bib31","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"2023010319173589300_bib32","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2023010319173589300_bib33","article-title":"Measuring attribution in natural language generation models","author":"Rashkin","year":"2021","journal-title":"CoRR"},{"key":"2023010319173589300_bib34","doi-asserted-by":"publisher","first-page":"704","DOI":"10.18653\/v1\/2021.acl-long.58","article-title":"Increasing faithfulness in knowledge-grounded dialogue with controllable features","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Rashkin","year":"2021"},{"key":"2023010319173589300_bib35","doi-asserted-by":"publisher","first-page":"1172","DOI":"10.18653\/v1\/2021.naacl-main.92","article-title":"The curious case of hallucinations in neural machine translation","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Raunak","year":"2021"},{"key":"2023010319173589300_bib36","doi-asserted-by":"publisher","first-page":"300","DOI":"10.18653\/v1\/2021.eacl-main.24","article-title":"Recipes for building an open-domain chatbot","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Roller","year":"2021"},{"key":"2023010319173589300_bib37","article-title":"Rome was built in 1776: A case study on factual correctness in knowledge- grounded response generation","author":"Santhanam","year":"2021","journal-title":"CoRR"},{"issue":"1","key":"2023010319173589300_bib38","doi-asserted-by":"publisher","first-page":"140","DOI":"10.1080\/03637751.2017.1342043","article-title":"Crowdsourcing research: Data collection with Amazon\u2019s Mechanical Turk","volume":"85","author":"Sheehan","year":"2018","journal-title":"Communication Monographs"},{"key":"2023010319173589300_bib39","doi-asserted-by":"publisher","first-page":"3784","DOI":"10.18653\/v1\/2021.findings-emnlp.320","article-title":"Retrieval augmentation reduces hallucination in conversation","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2021","author":"Shuster","year":"2021"},{"key":"2023010319173589300_bib40","first-page":"224","article-title":"Large-margin learning of submodular summarization models","volume-title":"Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics","author":"Sipos","year":"2012"},{"key":"2023010319173589300_bib41","volume-title":"Describing Talk: A Taxonomy of Verbal Response Modes","author":"Stiles","year":"1992"},{"key":"2023010319173589300_bib42","doi-asserted-by":"publisher","first-page":"5657","DOI":"10.18653\/v1\/2022.naacl-main.415","article-title":"CONFIT: Toward faithful dialogue summarization with linguistically- informed contrastive fine-tuning","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Tang","year":"2022"},{"key":"2023010319173589300_bib43","article-title":"Lamda: Language models for dialog applications","author":"Thoppilan","year":"2022","journal-title":"CoRR"},{"key":"2023010319173589300_bib44","doi-asserted-by":"crossref","first-page":"5008","DOI":"10.18653\/v1\/2020.acl-main.450","article-title":"Asking and answering questions to evaluate the factual consistency of summaries","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Wang","year":"2020"},{"key":"2023010319173589300_bib45","doi-asserted-by":"publisher","first-page":"3544","DOI":"10.18653\/v1\/2020.acl-main.326","article-title":"On exposure bias, hallucination and domain shift in neural machine translation","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Wang","year":"2020"},{"key":"2023010319173589300_bib46","doi-asserted-by":"crossref","first-page":"3731","DOI":"10.18653\/v1\/P19-1363","article-title":"Dialogue natural language inference","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Welleck","year":"2019"},{"key":"2023010319173589300_bib47","doi-asserted-by":"publisher","first-page":"3731","DOI":"10.18653\/v1\/P19-1363","article-title":"Dialogue natural language inference","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Welleck","year":"2019"},{"key":"2023010319173589300_bib48","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.18653\/v1\/N18-1101","article-title":"A broad-coverage challenge corpus for sentence understanding through inference","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Williams","year":"2018"},{"key":"2023010319173589300_bib49","doi-asserted-by":"publisher","first-page":"2253","DOI":"10.18653\/v1\/D17-1239","article-title":"Challenges in data-to-document generation","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Wiseman","year":"2017"},{"key":"2023010319173589300_bib50","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2023010319173589300_bib51","doi-asserted-by":"publisher","first-page":"5131","DOI":"10.18653\/v1\/2021.naacl-main.404","article-title":"On the inductive bias of masked language modeling: From statistical to syntactic dependencies","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Zhang","year":"2021"},{"key":"2023010319173589300_bib52","article-title":"Bertscore: Evaluating text generation with BERT","volume-title":"8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26\u201330, 2020","author":"Zhang","year":"2020"},{"key":"2023010319173589300_bib53","doi-asserted-by":"publisher","first-page":"270","DOI":"10.18653\/v1\/2020.acl-demos.30","article-title":"DIALOGPT: Large-scale generative pre- training for conversational response generation","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations","author":"Zhang","year":"2020"},{"key":"2023010319173589300_bib54","doi-asserted-by":"publisher","first-page":"3377","DOI":"10.18653\/v1\/2020.emnlp-main.272","article-title":"Knowledge-grounded dialogue generation with pre-trained language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Zhao","year":"2020"},{"key":"2023010319173589300_bib55","doi-asserted-by":"publisher","first-page":"708","DOI":"10.18653\/v1\/D18-1076","article-title":"A dataset for document grounded conversations","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Zhou","year":"2018"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00529\/2065956\/tacl_a_00529.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00529\/2065956\/tacl_a_00529.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,3]],"date-time":"2023-01-03T19:30:53Z","timestamp":1672774253000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00529\/114373\/FaithDial-A-Faithful-Benchmark-for-Information"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":55,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00529","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]}}}