{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T00:15:31Z","timestamp":1758672931246,"version":"3.44.0"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:p>Quantization has gained attention as a promising solution for the cost-effective deployment of large and small language models. However, most prior work has been limited to perplexity or basic knowledge tasks and lacks a comprehensive evaluation of recent models like Llama-3.3. \n\nIn this paper, we conduct a comprehensive evaluation of instruction-tuned models spanning 1B to 405B parameters, applying four quantization methods across 13 datasets. \n\nOur findings reveal that (1) quantized models generally surpass smaller FP16 baselines, yet they often struggle with instruction-following and hallucination detection; (2) FP8 consistently emerges as the most robust option across tasks, and AWQ tends to outperform GPTQ in weight-only quantization; \n\n(3) smaller models can suffer severe accuracy drops at 4-bit quantization, while 70B-scale models maintain stable performance;\n\n(4) notably, \\textit{hard} tasks do not always experience the largest accuracy losses, indicating that quantization magnifies a model\u2019s inherent weaknesses rather than simply correlating with task difficulty; and (5) an LLM-based judge (MT-Bench) highlights significant performance declines in Coding and STEM tasks, though it occasionally reports improvements in reasoning.<\/jats:p>","DOI":"10.24963\/ijcai.2025\/902","type":"proceedings-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:10:40Z","timestamp":1758269440000},"page":"8113-8121","source":"Crossref","is-referenced-by-count":0,"title":["Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant"],"prefix":"10.24963","author":[{"given":"Jemin","family":"Lee","sequence":"first","affiliation":[{"name":"Electronics and Telecommunications Research Institute"}]},{"given":"Sihyeong","family":"Park","sequence":"additional","affiliation":[{"name":"Korea Electronics Technology Institute"}]},{"given":"Jinse","family":"Kwon","sequence":"additional","affiliation":[{"name":"Electronics and Telecommunications Research Institute"}]},{"given":"Jihun","family":"Oh","sequence":"additional","affiliation":[{"name":"Neubla"}]},{"given":"Yongin","family":"Kwon","sequence":"additional","affiliation":[{"name":"Electronics and Telecommunications Research Institute"}]}],"member":"10584","event":{"number":"34","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-2025","name":"Thirty-Fourth International Joint Conference on Artificial Intelligence {IJCAI-25}","start":{"date-parts":[[2025,8,16]]},"theme":"Artificial Intelligence","location":"Montreal, Canada","end":{"date-parts":[[2025,8,22]]}},"container-title":["Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:35:26Z","timestamp":1758627326000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2025\/902"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2025\/902","relation":{},"subject":[],"published":{"date-parts":[[2025,9]]}}}