{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T04:36:06Z","timestamp":1779251766627,"version":"3.51.4"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>It is generally well understood that predictive classification and compression are intrinsically related concepts in information theory. Indeed, many deep learning methods are explained as learning a kind of compression, and that better compression leads to better performance. We interrogate this hypothesis via the Normalized Compression Distance (NCD), which explicitly relies on compression as the means of measuring similarity between sequences and thus enables nearest-neighbor classification. By turning popular large language models (LLMs) into lossless compressors, we develop a Neural NCD and compare LLMs to classic general-purpose algorithms like gzip in a compression-distance-based classification setting. We find that whether neural compressors achieve better accuracy over a gzip baseline is dataset-dependent, despite consistently superior compression ratios. Though the best neural compressor achieves up to 7.49 times better compression, we observe an up to 22.2% decrease in accuracy relative to the gzip baseline.<\/jats:p>","DOI":"10.3233\/faia251322","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:58:10Z","timestamp":1761127090000},"source":"Crossref","is-referenced-by-count":1,"title":["Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy"],"prefix":"10.3233","author":[{"given":"John","family":"Hurwitz","sequence":"first","affiliation":[{"name":"University of Maryland, Baltimore County"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Charles","family":"Nicholas","sequence":"additional","affiliation":[{"name":"University of Maryland, Baltimore County"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edward","family":"Raff","sequence":"additional","affiliation":[{"name":"University of Maryland, Baltimore County"},{"name":"CrowdStrike"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA251322","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:58:10Z","timestamp":1761127090000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA251322"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia251322","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}