{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T22:53:48Z","timestamp":1776120828897,"version":"3.50.1"},"reference-count":43,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T00:00:00Z","timestamp":1660176000000},"content-version":"vor","delay-in-days":222,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,8,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>While improving neural dialogue agents\u2019 factual accuracy is the object of much research, another important aspect of communication, less studied in the setting of neural dialogue, is transparency about ignorance. In this work, we analyze to what extent state-of-the-art chit-chat models are linguistically calibrated in the sense that their verbalized expression of doubt (or confidence) matches the likelihood that the model\u2019s responses are factually incorrect (or correct). We find that these models are poorly calibrated, yet we show that likelihood of correctness can accurately be predicted. By incorporating such metacognitive features into the training of a controllable generation model, we obtain a dialogue agent with greatly improved linguistic calibration.<\/jats:p>","DOI":"10.1162\/tacl_a_00494","type":"journal-article","created":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T19:43:54Z","timestamp":1660247034000},"page":"857-872","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":44,"title":["Reducing Conversational Agents\u2019 Overconfidence Through Linguistic Calibration"],"prefix":"10.1162","volume":"10","author":[{"given":"Sabrina J.","family":"Mielke","sequence":"first","affiliation":[{"name":"Department of Computer Science, Johns Hopkins University, USA"},{"name":"Facebook AI Research, USA. sjmielke@jhu.edu"}]},{"given":"Arthur","family":"Szlam","sequence":"additional","affiliation":[{"name":"Facebook AI Research, USA. aszlam@fb.com"}]},{"given":"Emily","family":"Dinan","sequence":"additional","affiliation":[{"name":"Facebook AI Research, USA. edinan@fb.com"}]},{"given":"Y-Lan","family":"Boureau","sequence":"additional","affiliation":[{"name":"Facebook AI Research, USA. ylan@fb.com"}]}],"member":"281","published-online":{"date-parts":[[2022,8,12]]},"reference":[{"key":"2022081119434551400_bib1","article-title":"Towards a human-like open-domain chatbot","author":"Adiwardana","year":"2020","journal-title":"arXiv preprint arXiv:2001.09977v3"},{"key":"2022081119434551400_bib2","article-title":"The Pushshift Reddit dataset","author":"Baumgartner","year":"2020","journal-title":"arXiv preprint arXiv:2001.08435v1"},{"key":"2022081119434551400_bib3","doi-asserted-by":"publisher","first-page":"610","DOI":"10.1145\/3442188.3445922","article-title":"On the dangers of stochastic parrots: Can language models be too big?","volume-title":"Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency","author":"Bender","year":"2021"},{"key":"2022081119434551400_bib4","first-page":"1533","article-title":"Semantic parsing on Freebase from question-answer pairs","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL","author":"Berant","year":"2013"},{"key":"2022081119434551400_bib5","article-title":"Language models are few-shot learners","author":"Brown","year":"2020","journal-title":"arXiv preprint arXiv:2005.14165v4"},{"key":"2022081119434551400_bib6","doi-asserted-by":"publisher","first-page":"295","DOI":"10.18653\/v1\/2020.emnlp-main.21","article-title":"Calibration of pre-trained Transformers","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Desai","year":"2020"},{"key":"2022081119434551400_bib7","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2022081119434551400_bib8","article-title":"Wizard of Wikipedia: Knowledge- powered conversational agents","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Dinan","year":"2019"},{"key":"2022081119434551400_bib9","doi-asserted-by":"publisher","first-page":"1891","DOI":"10.21437\/Interspeech.2019-3079","article-title":"Topical-chat: Towards knowledge-grounded open-domain conversations","volume-title":"Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019","author":"Gopalakrishnan","year":"2019"},{"key":"2022081119434551400_bib10","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1163\/9789004368811_003","article-title":"Logic and conversation","volume-title":"Speech Acts","author":"Grice","year":"1975"},{"key":"2022081119434551400_bib11","first-page":"1321","article-title":"On calibration of modern neural networks","volume-title":"Proceedings of the 34th International Conference on Machine Learning","author":"Guo","year":"2017"},{"key":"2022081119434551400_bib12","article-title":"Gaussian error linear units (GELUs)","author":"Hendrycks","year":"2016","journal-title":"arXiv preprint arXiv:1606.08415v3"},{"key":"2022081119434551400_bib13","doi-asserted-by":"publisher","first-page":"2078","DOI":"10.18653\/v1\/2020.acl-main.188","article-title":"Calibrating structured output predictors for natural language processing","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020","author":"Jagannatha","year":"2020"},{"key":"2022081119434551400_bib14","doi-asserted-by":"publisher","first-page":"962","DOI":"10.1162\/tacl_a_00407","article-title":"How can we know when language models know? On the calibration of language models for question answering","volume":"9","author":"Jiang","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2022081119434551400_bib15","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.18653\/v1\/P17-1147","article-title":"TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Joshi","year":"2017"},{"key":"2022081119434551400_bib16","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1006\/obhd.1994.1013","article-title":"The overconfidence phenomenon as a consequence of informal experimenter-guided selection of almanac items","volume":"57","author":"Juslin","year":"1994","journal-title":"Organizational Behavior and Human Decision Processes"},{"key":"2022081119434551400_bib17","doi-asserted-by":"publisher","first-page":"5684","DOI":"10.18653\/v1\/2020.acl-main.503","article-title":"Selective question answering under domain shift","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Kamath","year":"2020"},{"key":"2022081119434551400_bib18","article-title":"CTRL: A conditional transformer language model for controllable generation","author":"Keskar","year":"2019","journal-title":"arXiv preprint arXiv:1909.05858v2"},{"key":"2022081119434551400_bib19","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1002\/acp.705","article-title":"Ecological and person-oriented aspects of metacognitive processes in test-taking","volume":"15","author":"Kleitman","year":"2001","journal-title":"Applied Cognitive Psychology"},{"key":"2022081119434551400_bib20","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1162\/tacl_a_00276","article-title":"Natural questions: A benchmark for question answering research","volume":"7","author":"Kwiatkowski","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2022081119434551400_bib21","article-title":"Multiple- attribute text rewriting","volume-title":"International Conference on Learning Representations","author":"Lample","year":"2019"},{"key":"2022081119434551400_bib22","doi-asserted-by":"crossref","first-page":"4715","DOI":"10.18653\/v1\/2020.acl-main.428","article-title":"Don\u2019t say that! Making inconsistent dialogue unlikely with unlikelihood training","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Li","year":"2020"},{"key":"2022081119434551400_bib23","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"arXiv preprint arXiv:1907.11692v1"},{"key":"2022081119434551400_bib24","doi-asserted-by":"publisher","first-page":"2422","DOI":"10.18653\/v1\/2020.findings-emnlp.219","article-title":"Plug-and-play conversational models","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Madotto","year":"2020"},{"key":"2022081119434551400_bib25","doi-asserted-by":"publisher","first-page":"79","DOI":"10.18653\/v1\/D17-2014","article-title":"ParlAI: A dialog research software platform","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Miller","year":"2017"},{"key":"2022081119434551400_bib26","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1145\/1102351.1102430.","article-title":"Predicting good probabilities with supervised learning","volume-title":"Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005), Bonn, Germany, August 7-11, 2005","author":"Niculescu-Mizil","year":"2005"},{"key":"2022081119434551400_bib27","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1080\/00221300209602099","article-title":"The role of individual differences in the accuracy of confidence judgments","volume":"129","author":"Pallier","year":"2002","journal-title":"The Journal of General Psychology"},{"issue":"8","key":"2022081119434551400_bib28","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"2022081119434551400_bib29","first-page":"140:1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"issue":"2","key":"2022081119434551400_bib30","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1177\/107554709301500203","article-title":"The sin of science: Ignorance of ignorance","volume":"15","author":"Ravetz","year":"1993","journal-title":"Knowledge"},{"key":"2022081119434551400_bib31","doi-asserted-by":"publisher","first-page":"300","DOI":"10.18653\/v1\/2021.eacl-main.24","article-title":"Recipes for building an open-domain chatbot","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Roller","year":"2021"},{"key":"2022081119434551400_bib32","article-title":"DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter","author":"Sanh","year":"2019","journal-title":"arXiv preprint arXiv:1910.01108v4"},{"key":"2022081119434551400_bib33","first-page":"1702","article-title":"What makes a good conversation? How controllable attributes affect human judgments","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"See","year":"2019"},{"key":"2022081119434551400_bib34","article-title":"Controlling style in generated dialogue","author":"Smith","year":"2020","journal-title":"arXiv preprint arXiv:2009.10855v1"},{"key":"2022081119434551400_bib35","doi-asserted-by":"crossref","first-page":"2021","DOI":"10.18653\/v1\/2020.acl-main.183","article-title":"Can you put it all together: Evaluating conversational agents\u2019 ability to blend skills","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Smith","year":"2020"},{"key":"2022081119434551400_bib36","volume-title":"Ignorance and Uncertainty: Emerging Paradigms","author":"Smithson","year":"2012"},{"key":"2022081119434551400_bib37","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/S1041-6080(99)80141-1","article-title":"Calibration curves, scatterplots and the distinction between general knowledge and perceptual tasks","volume":"10","author":"Stankov","year":"1998","journal-title":"Learning and Individual Differences"},{"issue":"6","key":"2022081119434551400_bib38","doi-asserted-by":"publisher","first-page":"971","DOI":"10.1016\/S0191-8869(96)00130-4","article-title":"Confidence judgments in studies of individual differences","volume":"21","author":"Stankov","year":"1996","journal-title":"Personality and Individual Differences"},{"key":"2022081119434551400_bib39","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2022081119434551400_bib40","article-title":"Neural text generation with unlikelihood training","volume-title":"International Conference on Learning Representations","author":"Welleck","year":"2020"},{"key":"2022081119434551400_bib41","doi-asserted-by":"publisher","first-page":"87","DOI":"10.18653\/v1\/W18-5713","article-title":"Retrieve and refine: Improved sequence generation models for dialogue","volume-title":"Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search- Oriented Conversational AI","author":"Weston","year":"2018"},{"key":"2022081119434551400_bib42","article-title":"Recipes for safety in open-domain chatbots","author":"Jing","year":"2020","journal-title":"arXiv preprint arXiv:2010.07079v2"},{"key":"2022081119434551400_bib43","doi-asserted-by":"publisher","first-page":"270","DOI":"10.18653\/v1\/2020.acl-demos.30","article-title":"DialoGPT: Large-scale generative pre- training for conversational response generation","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations","author":"Zhang","year":"2020"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00494\/2038516\/tacl_a_00494.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00494\/2038516\/tacl_a_00494.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T19:44:19Z","timestamp":1660247059000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00494\/112606\/Reducing-Conversational-Agents-Overconfidence"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":43,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00494","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]}}}