{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:37:46Z","timestamp":1773805066426,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"38","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Real-world text classification datasets frequently exhibit long-tail distributions, where numerous classes have sparse data, significantly degrading model performance on these underrepresented categories. While Large Language Models (LLMs) offer promise for data augmentation, existing methods often produce semantically limited samples, neglect \"implicit long-tails\" (sparse sub-patterns within classes), and lack cost-effective optimization. To address these challenges, we propose \\textbf{DEALT (LLM-driven Diversity-Enhanced Data Augmentation for Long-Tail Text Classification)}, a novel cognitive-inspired framework emulating the human learning process of \"recognize, explore, generate, and optimize.\" DEALT systematically enhances augmented data diversity by first detecting both explicit and implicit long-tails. It then employs an LLM for diversity-aware planning of augmentation strategies, followed by conditional generation. A low-overhead quality and diversity validator filters the synthetic data, and an adaptive incremental sampler refines future augmentation efforts based on proxy model feedback, ensuring efficient and budget-aware optimization. Extensive experiments on multiple public text classification datasets demonstrate DEALT's superiority over state-of-the-art methods in improving tail-class performance and overall model robustness by generating more diverse and high-fidelity augmented data.<\/jats:p>","DOI":"10.1609\/aaai.v40i38.40508","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:53:19Z","timestamp":1773802399000},"page":"32338-32346","source":"Crossref","is-referenced-by-count":0,"title":["DEALT: LLM-driven Diversity-Enhanced Data Augmentation for Long-Tail Text Classification"],"prefix":"10.1609","volume":"40","author":[{"given":"Wayne","family":"Lu","sequence":"first","affiliation":[]},{"given":"Xiaoxi","family":"Cui","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/40508\/44469","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/40508\/44469","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:53:19Z","timestamp":1773802399000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/40508"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"38","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i38.40508","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}