{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T23:30:29Z","timestamp":1774308629711,"version":"3.50.1"},"reference-count":72,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2025,4,7]],"date-time":"2025-04-07T00:00:00Z","timestamp":1743984000000},"content-version":"vor","delay-in-days":96,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,3,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The rapid proliferation of large language models (LLMs) has stimulated researchers to seek effective and efficient approaches to deal with LLM hallucinations and low-quality outputs. Uncertainty quantification (UQ) is a key element of machine learning applications in dealing with such challenges. However, research to date on UQ for LLMs has been fragmented in terms of techniques and evaluation methodologies. In this work, we address this issue by introducing a novel benchmark that implements a collection of state-of-the-art UQ baselines and offers an environment for controllable and consistent evaluation of novel UQ techniques over various text generation tasks. Our benchmark also supports the assessment of confidence normalization methods in terms of their ability to provide interpretable scores. Using our benchmark, we conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.<\/jats:p>","DOI":"10.1162\/tacl_a_00737","type":"journal-article","created":{"date-parts":[[2025,4,7]],"date-time":"2025-04-07T18:49:19Z","timestamp":1744051759000},"page":"220-248","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":7,"title":["Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph"],"prefix":"10.1162","volume":"13","author":[{"given":"Roman","family":"Vashurin","sequence":"first","affiliation":[{"name":"MBZUAI, UAE. roman.vashurin@mbzuai.ac.ae"}]},{"given":"Ekaterina","family":"Fadeeva","sequence":"additional","affiliation":[{"name":"ETH Zurich, Switzerland. ekaterina.fadeeva@inf.ethz.ch"}]},{"given":"Artem","family":"Vazhentsev","sequence":"additional","affiliation":[{"name":"Center for Artificial Intelligence Technology, Russia. vazhentsev@airi.net, artiomvazh99@gmail.com"}]},{"given":"Lyudmila","family":"Rvanova","sequence":"additional","affiliation":[{"name":"Laboratory for Analysis and Controllable Text Generation Technologies RAS, Russia. Milarv99@gmail.com"},{"name":"Weakly-Supervised NLP Group, Russia"}]},{"given":"Daniil","family":"Vasilev","sequence":"additional","affiliation":[{"name":"HSE University, Russia. davasilev_4@edu.hse.ru"}]},{"given":"Akim","family":"Tsvigun","sequence":"additional","affiliation":[{"name":"Nebius. aktsvigun@gmail.com"}]},{"given":"Sergey","family":"Petrakov","sequence":"additional","affiliation":[{"name":"Center for Artificial Intelligence Technology, Russia. sergeypetrakof@gmail.com"}]},{"given":"Rui","family":"Xing","sequence":"additional","affiliation":[{"name":"MBZUAI, UAE. Rui.Xing@mbzuai.ac.ae"},{"name":"The University of Melbourne, Australia"}]},{"given":"Abdelrahman","family":"Sadallah","sequence":"additional","affiliation":[{"name":"MBZUAI, UAE. Abdelrahman.Sadallah@mbzuai.ac.ae"}]},{"given":"Kirill","family":"Grishchenkov","sequence":"additional","affiliation":[{"name":"Independent Researcher. kirillgrish@gmail.com"}]},{"given":"Alexander","family":"Panchenko","sequence":"additional","affiliation":[{"name":"Center for Artificial Intelligence Technology, Russia. panchenkoalexander@gmail.com, panchenko@airi.net"}]},{"given":"Timothy","family":"Baldwin","sequence":"additional","affiliation":[{"name":"MBZUAI, UAE. Timothy.Baldwin@mbzuai.ac.ae"},{"name":"The University of Melbourne, Australia"}]},{"given":"Preslav","family":"Nakov","sequence":"additional","affiliation":[{"name":"MBZUAI, UAE. Preslav.Nakov@mbzuai.ac.ae"}]},{"given":"Maxim","family":"Panov","sequence":"additional","affiliation":[{"name":"MBZUAI, UAE. maxim.panov@mbzuai.ac.ae"}]},{"given":"Artem","family":"Shelmanov","sequence":"additional","affiliation":[{"name":"MBZUAI, UAE. artem.shelmanov@mbzuai.ac.ae"}]}],"member":"281","published-online":{"date-parts":[[2025,3,19]]},"reference":[{"key":"2025051914253280900_bib1","article-title":"Pitfalls of in-domain uncertainty estimation and ensembling in deep learning","volume-title":"International Conference on Learning Representations","author":"Ashukha","year":"2019"},{"key":"2025051914253280900_bib2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/W19-5301","article-title":"Findings of the 2019 Conference on Machine Translation (WMT19)","volume-title":"Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)","author":"Barrault","year":"2019"},{"key":"2025051914253280900_bib3","article-title":"Stable LM 2 1.6b technical report","author":"Bellagente","year":"2024","journal-title":"arXiv preprint arXiv:2402.17834"},{"key":"2025051914253280900_bib4","first-page":"1613","article-title":"Weight uncertainty in neural network","volume-title":"International Conference on Machine Learning","author":"Blundell","year":"2015"},{"key":"2025051914253280900_bib5","doi-asserted-by":"publisher","first-page":"12","DOI":"10.3115\/v1\/W14-3302","article-title":"Findings of the 2014 workshop on statistical machine translation","volume-title":"Proceedings of the Ninth Workshop on Statistical Machine Translation","author":"Bojar","year":"2014"},{"key":"2025051914253280900_bib6","article-title":"Accelerating large language model decoding with speculative sampling","author":"Chen","year":"2023","journal-title":"arXiv preprint arXiv:2302.01318"},{"key":"2025051914253280900_bib7","article-title":"Training verifiers to solve math word problems","author":"Cobbe","year":"2021","journal-title":"arXiv preprint arXiv:2110.14168"},{"key":"2025051914253280900_bib8","doi-asserted-by":"publisher","first-page":"5831","DOI":"10.18653\/v1\/2023.emnlp-main.357","article-title":"RainProof: An umbrella to shield text generator from out-of-distribution data","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing","author":"Darrin","year":"2023"},{"key":"2025051914253280900_bib9","doi-asserted-by":"publisher","first-page":"5050","DOI":"10.18653\/v1\/2024.acl-long.276","article-title":"Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Duan","year":"2024"},{"key":"2025051914253280900_bib10","doi-asserted-by":"publisher","first-page":"5271","DOI":"10.18653\/v1\/2022.naacl-main.387","article-title":"On the origin of hallucinations in conversational models: Is it the datasets or the models?","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Dziri","year":"2022"},{"issue":"5","key":"2025051914253280900_bib11","article-title":"On the foundations of noise-free selective classification","volume":"11","author":"El-Yaniv","year":"2010","journal-title":"Journal of Machine Learning Research"},{"key":"2025051914253280900_bib12","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.558","article-title":"Fact-checking the output of large language models via token-level uncertainty quantification","volume-title":"Findings of the Association for Computational Linguistics: ACL 2024","author":"Fadeeva","year":"2024"},{"key":"2025051914253280900_bib13","doi-asserted-by":"publisher","first-page":"446","DOI":"10.18653\/v1\/2023.emnlp-demo.41","article-title":"LM-Polygraph: Uncertainty estimation for language models","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Fadeeva","year":"2023"},{"key":"2025051914253280900_bib14","article-title":"Perception of probability words","author":"Fagen-Ulmschneider","year":"2023"},{"key":"2025051914253280900_bib15","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1162\/tacl_a_00330","article-title":"Unsupervised quality estimation for neural machine translation","volume":"8","author":"Fomicheva","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025051914253280900_bib16","unstructured":"Yarin\n              Gal\n            \n          . 2016. Uncertainty in Deep Learning. Ph.D. thesis, University of Cambridge."},{"key":"2025051914253280900_bib17","first-page":"1183","article-title":"Deep bayesian active learning with image data","volume-title":"International conference on machine learning","author":"Gal","year":"2017"},{"key":"2025051914253280900_bib18","article-title":"A framework for few-shot language model evaluation","author":"Gao","year":"2023"},{"key":"2025051914253280900_bib19","doi-asserted-by":"publisher","first-page":"8362","DOI":"10.18653\/v1\/2020.emnlp-main.671","article-title":"Towards more accurate uncertainty estimation in text classification","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16\u201320, 2020","author":"He","year":"2020"},{"key":"2025051914253280900_bib20","article-title":"DeBERTa: Decoding- enhanced bert with disentangled attention","volume-title":"9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 2021","author":"He","year":"2021"},{"key":"2025051914253280900_bib21","article-title":"Measuring massive multitask language understanding","volume-title":"9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 2021","author":"Hendrycks","year":"2021"},{"key":"2025051914253280900_bib22","article-title":"Quantifying aleatoric and epistemic uncertainty with proper scoring rules","author":"Hofman","year":"2024","journal-title":"arXiv preprint arXiv:2404.12215"},{"key":"2025051914253280900_bib23","article-title":"Mistral 7b","author":"Jiang","year":"2023","journal-title":"CoRR"},{"key":"2025051914253280900_bib24","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.18653\/v1\/P17-1147","article-title":"TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Joshi","year":"2017"},{"key":"2025051914253280900_bib25","article-title":"Language models (mostly) know what they know","author":"Kadavath","year":"2022","journal-title":"arXiv preprint arXiv:2207.05221"},{"key":"2025051914253280900_bib26","first-page":"36308","article-title":"Nonparametric uncertainty quantification for single deterministic neural network","volume-title":"Advances in Neural Information Processing Systems","author":"Kotelevskii","year":"2022"},{"key":"2025051914253280900_bib27","article-title":"Predictive uncertainty quantification via risk decompositions for strictly proper scoring rules","author":"Kotelevskii","year":"2024","journal-title":"arXiv preprint arXiv:2402.10727"},{"key":"2025051914253280900_bib28","article-title":"Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation","volume-title":"The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1\u20135, 2023","author":"Kuhn","year":"2023"},{"key":"2025051914253280900_bib29","doi-asserted-by":"publisher","first-page":"744","DOI":"10.18653\/v1\/2023.ijcnlp-main.48","article-title":"Uncertainty estimation for debiased models: Does fairness hurt reliability?","volume-title":"Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics","author":"Kuzmin","year":"2023"},{"key":"2025051914253280900_bib30","first-page":"7167","article-title":"A simple unified framework for detecting out-of-distribution samples and adversarial attacks","volume-title":"Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3\u20138, 2018, Montr\u00e9al, Canada","author":"Lee","year":"2018"},{"key":"2025051914253280900_bib31","first-page":"19274","article-title":"Fast inference from transformers via speculative decoding","volume-title":"International Conference on Machine Learning","author":"Leviathan","year":"2023"},{"key":"2025051914253280900_bib32","article-title":"Generating with confidence: Uncertainty quantification for black-box large language models","author":"Lin","year":"2024","journal-title":"Transactions of Machine Learning Research"},{"key":"2025051914253280900_bib33","doi-asserted-by":"publisher","first-page":"6804","DOI":"10.18653\/v1\/2022.acl-long.469","article-title":"ParaDetox: Detoxification with parallel data","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Logacheva","year":"2022"},{"key":"2025051914253280900_bib34","article-title":"Uncertainty estimation in autoregressive structured prediction","volume-title":"9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 2021","author":"Malinin","year":"2021"},{"key":"2025051914253280900_bib35","doi-asserted-by":"publisher","first-page":"45","DOI":"10.18653\/v1\/P17-2008","article-title":"Incorporating uncertainty into deep learning for spoken language assessment","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Malinin","year":"2017"},{"key":"2025051914253280900_bib36","doi-asserted-by":"publisher","first-page":"9004","DOI":"10.18653\/v1\/2023.emnlp-main.557","article-title":"SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing","author":"Manakul","year":"2023"},{"key":"2025051914253280900_bib37","doi-asserted-by":"publisher","first-page":"1797","DOI":"10.18653\/v1\/D18-1206","article-title":"Don\u2019t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 \u2013 November 4, 2018","author":"Narayan","year":"2018"},{"key":"2025051914253280900_bib38","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.mrl-1.15","article-title":"Vikhr: Constructing a state-of-the-art bilingual open-source instruction-following large language model for Russian","volume-title":"Proceedings of the 4rd Workshop on Multilingual Representation Learning (MRL) @ EMNLP-2024","author":"Nikolich","year":"2024"},{"key":"2025051914253280900_bib39","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1080\/19466315.2017.1286256","article-title":"Centered isotonic regression: Point and interval estimation for dose-response studies","volume":"9","author":"Oron","year":"2017","journal-title":"Statistics in Biopharmaceutical Research"},{"key":"2025051914253280900_bib40","doi-asserted-by":"publisher","first-page":"13675","DOI":"10.1609\/aaai.v35i15.17612","article-title":"Revisiting mahalanobis distance for transformer-based out-of-domain detection","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Podolskiy","year":"2021"},{"key":"2025051914253280900_bib41","article-title":"Direct preference optimization: Your language model is secretly a reward model","volume":"36","author":"Rafailov","year":"2024","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025051914253280900_bib42","first-page":"140:1\u2013140:67","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2025051914253280900_bib43","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1162\/tacl_a_00266","article-title":"CoQA: A conversational question answering challenge","volume":"7","author":"Reddy","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025051914253280900_bib44","doi-asserted-by":"publisher","first-page":"2685","DOI":"10.18653\/v1\/2020.emnlp-main.213","article-title":"COMET: A neural framework for MT evaluation","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Rei","year":"2020"},{"key":"2025051914253280900_bib45","article-title":"Out-of-distribution detection and selective generation for conditional language models","volume-title":"The Eleventh International Conference on Learning Representations","author":"Ren","year":"2023"},{"issue":"388","key":"2025051914253280900_bib46","doi-asserted-by":"publisher","first-page":"871","DOI":"10.1080\/01621459.1984.10477105","article-title":"Least median of squares regression","volume":"79","author":"Rousseeuw","year":"1984","journal-title":"Journal of the American Statistical Association"},{"key":"2025051914253280900_bib47","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1137\/1.9781611977653.ch83","article-title":"Scalable batch acquisition for deep bayesian active learning","volume-title":"Proceedings of the 2023 SIAM International Conference on Data Mining (SDM)","author":"Rubashevskii","year":"2023"},{"key":"2025051914253280900_bib48","first-page":"17456","article-title":"Confident adaptive language modeling","volume":"35","author":"Schuster","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025051914253280900_bib49","doi-asserted-by":"publisher","first-page":"6640","DOI":"10.18653\/v1\/2020.acl-main.593","article-title":"The right tool for the job: Matching model and instance complexities","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Schwartz","year":"2020"},{"key":"2025051914253280900_bib50","article-title":"Jais and Jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models","author":"Sengupta","year":"2023","journal-title":"arXiv preprint arXiv:2308.16149"},{"key":"2025051914253280900_bib51","doi-asserted-by":"publisher","first-page":"1698","DOI":"10.18653\/v1\/2021.eacl-main.145","article-title":"Active learning for sequence tagging with deep pre-trained models and Bayesian uncertainty estimates","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Shelmanov","year":"2021"},{"key":"2025051914253280900_bib52","doi-asserted-by":"publisher","first-page":"1833","DOI":"10.18653\/v1\/2021.eacl-main.157","article-title":"How certain is your transformer?","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Shelmanov","year":"2021"},{"key":"2025051914253280900_bib53","first-page":"560","article-title":"Understanding measures of uncertainty for adversarial example detection","volume-title":"Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6\u201310, 2018","author":"Smith","year":"2018"},{"key":"2025051914253280900_bib54","doi-asserted-by":"publisher","first-page":"133","DOI":"10.18653\/v1\/W19-4115","article-title":"Relevant and informative response generation using pointwise mutual information","volume-title":"Proceedings of the First Workshop on NLP for Conversational AI","author":"Takayama","year":"2019"},{"key":"2025051914253280900_bib55","doi-asserted-by":"publisher","first-page":"5433","DOI":"10.18653\/v1\/2023.emnlp-main.330","article-title":"Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing","author":"Tian","year":"2023"},{"key":"2025051914253280900_bib56","doi-asserted-by":"publisher","first-page":"1198","DOI":"10.18653\/v1\/2022.findings-naacl.90","article-title":"Towards computationally feasible deep active learning","volume-title":"Findings of the Association for Computational Linguistics: NAACL 2022","author":"Tsvigun","year":"2022"},{"key":"2025051914253280900_bib57","doi-asserted-by":"publisher","first-page":"5956","DOI":"10.18653\/v1\/2022.emnlp-main.399","article-title":"Mutual information alleviates hallucinations in abstractive summarization","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"van der Poel","year":"2022"},{"key":"2025051914253280900_bib58","doi-asserted-by":"publisher","first-page":"8237","DOI":"10.18653\/v1\/2022.acl-long.566","article-title":"Uncertainty estimation of transformer predictions for misclassification detection","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Vazhentsev","year":"2022"},{"key":"2025051914253280900_bib59","doi-asserted-by":"publisher","first-page":"11659","DOI":"10.18653\/v1\/2023.acl-long.652","article-title":"Hybrid uncertainty quantification for selective text classification in ambiguous tasks","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Vazhentsev","year":"2023"},{"key":"2025051914253280900_bib60","doi-asserted-by":"publisher","first-page":"1430","DOI":"10.18653\/v1\/2023.findings-acl.93","article-title":"Efficient out-of-domain detection for sequence to sequence models","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Vazhentsev","year":"2023"},{"key":"2025051914253280900_bib61","doi-asserted-by":"publisher","first-page":"680","DOI":"10.1162\/tacl_a_00483","article-title":"Uncertainty estimation and reduction of pre-trained models for text regression","volume":"10","author":"Wang","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025051914253280900_bib62","article-title":"Factcheck-gpt: End-to-end fine-grained document-level fact-checking and correction of llm output","author":"Wang","year":"2023","journal-title":"arXiv preprint arXiv:2311.09000"},{"key":"2025051914253280900_bib63","doi-asserted-by":"publisher","first-page":"2734","DOI":"10.18653\/v1\/2021.eacl-main.236","article-title":"On hallucination and predictive uncertainty in conditional language generation","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Xiao","year":"2021"},{"key":"2025051914253280900_bib64","doi-asserted-by":"publisher","first-page":"2246","DOI":"10.18653\/v1\/2020.acl-main.204","article-title":"DeeBERT: Dynamic early exiting for accelerating BERT inference","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ji","year":"2020"},{"key":"2025051914253280900_bib65","doi-asserted-by":"publisher","first-page":"1040","DOI":"10.18653\/v1\/2021.acl-long.84","article-title":"The art of abstention: Selective prediction and error regularization for natural language processing","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Ji","year":"2021"},{"key":"2025051914253280900_bib66","doi-asserted-by":"publisher","first-page":"546","DOI":"10.1162\/tacl_a_00563","article-title":"Understanding and detecting hallucinations in neural machine translation via model introspection","volume":"11","author":"Weijia","year":"2023","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025051914253280900_bib67","doi-asserted-by":"publisher","first-page":"3656","DOI":"10.18653\/v1\/2022.findings-acl.289","article-title":"Detection of adversarial examples in text classification: Benchmark and baseline via robust density estimation","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Yoo","year":"2022"},{"key":"2025051914253280900_bib68","doi-asserted-by":"crossref","first-page":"11328","DOI":"10.18653\/v1\/2023.acl-long.634","article-title":"AlignScore: Evaluating factual consistency with a unified alignment function","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zha","year":"2023"},{"key":"2025051914253280900_bib69","doi-asserted-by":"publisher","first-page":"5244","DOI":"10.18653\/v1\/2024.emnlp-main.299","article-title":"LUQ: Long-text uncertainty quantification for LLMs","volume-title":"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing","author":"Zhang","year":"2024"},{"key":"2025051914253280900_bib70","doi-asserted-by":"publisher","first-page":"915","DOI":"10.18653\/v1\/2023.emnlp-main.58","article-title":"Enhancing uncertainty-based hallucination detection with stronger focus","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing","author":"Zhang","year":"2023"},{"key":"2025051914253280900_bib71","doi-asserted-by":"publisher","first-page":"3126","DOI":"10.18653\/v1\/N19-1316","article-title":"Mitigating uncertainty in document classification","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Zhang","year":"2019"},{"key":"2025051914253280900_bib72","first-page":"46595","article-title":"Judging llm-as-a-judge with mt-bench and chatbot arena","volume":"36","author":"Zheng","year":"2023","journal-title":"Advances in Neural Information Processing Systems"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00737\/2511955\/tacl_a_00737.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00737\/2511955\/tacl_a_00737.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,19]],"date-time":"2025-05-19T18:25:44Z","timestamp":1747679144000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00737\/128713\/Benchmarking-Uncertainty-Quantification-Methods"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":72,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00737","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025]]},"published":{"date-parts":[[2025]]}}}