{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T23:37:54Z","timestamp":1761176274200,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>Fine-tuning large language models (LLMs) is challenged by the presence of noisy data and the high computational cost when training on large-scale datasets. While data selection has emerged as a promising approach to reduce training cost and improve data quality, existing methods often rely on static heuristics or manual metrics. These approaches struggle to adapt to the model\u2019s evolving capabilities during training, as its understanding of tasks improves. As the model becomes more powerful, its requirements for data that can enhance performance also change, making it crucial to incorporate this dynamic into the data selection process. Moreover, ensuring data diversity throughout different stages of training is essential for preventing redundancy, reducing overfitting. To address these issues, we propose DSP, a Diversity-Aware Self-Paced data selection framework that evolves with the model. DSP progressively selects training samples based on the model\u2019s own outputs and incorporates a diversity-aware mechanism to enhance generalization and mitigate overfitting. Unlike prior static or rule-based strategies, DSP adaptively adjusts to the model\u2019s internal feedback and training stage. Experiments on two public benchmarks demonstrate that DSP consistently outperforms static and heuristic-based baselines across multiple datasets and backbone models. Our findings highlight the critical role of dynamic, diversity-aware data selection in effective LLM fine-tuning.<\/jats:p>","DOI":"10.3233\/faia251329","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:58:25Z","timestamp":1761127105000},"source":"Crossref","is-referenced-by-count":0,"title":["Diversity-Aware Self-Paced Data Selection for LLM Fine-Tuning"],"prefix":"10.3233","author":[{"given":"Yingxuan","family":"Yang","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Huayi","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Muning","family":"Wen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Xiaoyun","family":"Mo","sequence":"additional","affiliation":[{"name":"Oppo Research Institute, Shenzhen, China"}]},{"given":"Qiuying","family":"Peng","sequence":"additional","affiliation":[{"name":"Oppo Research Institute, Shenzhen, China"}]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[{"name":"Oppo Research Institute, Shenzhen, China"}]},{"given":"Weinan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA251329","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:58:25Z","timestamp":1761127105000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA251329"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia251329","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}