{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:57:46Z","timestamp":1773802666766,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Uncertainty visualizations, such as hurricane cones and ensemble tracks, are essential for risk communication but are often misinterpreted, leading to harmful decisions. As AI assistants like large language models (LLMs) increasingly support understanding of graphics and decision-making, they offer a promising pathway to enhance the interpretation of complex visualizations and a new opportunity to examine and improve the interpretation of uncertainty. We introduce UnReason, the first benchmark that systematically compares how humans and LLMs reason about hurricane forecast uncertainty visualizations. UnReason spans two escalating phases, seven representative visualization formats, six real hurricane cases, and three agent types (humans, LLMs with context, and LLMs without context), including 880 visualizations and 117,600 structured question\u2013answer pairs under matched evaluation conditions. Phase 1 evaluates reasoning across implicit and explicit uncertainty encodings; Phase 2 examines reasoning under single- versus multi-dimensional uncertainty representations. We thoroughly assess damage estimation, reasoning strategies, and comprehension patterns, revealing that LLMs have a stronger semantic and conceptual understanding of uncertainty, and are less misled by visual variability, but still replicate key human biases during decision-making. Our findings offer insights into aligning LLM behavior with human cognition in uncertainty-rich visual reasoning tasks.<\/jats:p>","DOI":"10.1609\/aaai.v40i21.38812","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:57:04Z","timestamp":1773795424000},"page":"17571-17579","source":"Crossref","is-referenced-by-count":0,"title":["Do Large Language Models Reason About Uncertainty Like Humans? A Benchmark on Hurricane Forecast Visualization Comprehension"],"prefix":"10.1609","volume":"40","author":[{"given":"Le","family":"Liu","sequence":"first","affiliation":[]},{"given":"Yuhao","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Bohan","family":"Shen","sequence":"additional","affiliation":[]},{"given":"Wei","family":"Zeng","sequence":"additional","affiliation":[]},{"given":"Shizhou","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Di","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Peng","family":"Wang","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38812\/42774","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38812\/42774","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:57:04Z","timestamp":1773795424000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38812"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i21.38812","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}