{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:03:31Z","timestamp":1773803011884,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"25","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Large language models performing chain-of-thought (CoT) reasoning generate extensive intermediate sequences that consume substantial memory through key-value (KV) cache storage. Unlike conventional text generation, reasoning sequences exhibit unique characteristics, including repetitive logic patterns and low information density, making existing KV cache compression methods suboptimal. We propose DesireKV, a novel compression framework that first constructs a two-dimensional coordinate system based on attention-derived importance and outlier-based quantization sensitivity. It then applies a dedicated protection mechanism for tokens critical to the reasoning process itself. Our approach makes differentiated compression decisions: retaining important and sensitive tokens, quantizing important but insensitive tokens, and evicting unimportant tokens. Through comprehensive evaluation on reasoning benchmarks, we demonstrate that DesireKV achieves up to 2.93\u00d7 throughput improvement while maintaining nearly 99% of original reasoning accuracy.<\/jats:p>","DOI":"10.1609\/aaai.v40i25.39187","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:19:06Z","timestamp":1773796746000},"page":"20518-20526","source":"Crossref","is-referenced-by-count":0,"title":["DesireKV: Decoupling Sensitivity and Importance for Reasoning-Aware KV Cache Compression"],"prefix":"10.1609","volume":"40","author":[{"given":"Pengyu","family":"Cheng","sequence":"first","affiliation":[]},{"given":"Jiacheng","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Tianle","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Bei","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Xiaofeng","family":"Hou","sequence":"additional","affiliation":[]},{"given":"Jiacheng","family":"Liu","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/39187\/43148","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/39187\/43148","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:19:06Z","timestamp":1773796746000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/39187"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"25","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i25.39187","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}