{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T23:37:47Z","timestamp":1761176267295,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>We investigate the continual enhancement of mathematical reasoning abilities in small language models (SLMs). While large language models (LLMs) demonstrate impressive reasoning performance, their deployment is often constrained by substantial computational costs. Existing approaches to improving SLMs mainly rely on knowledge distillation from costly teacher LLMs, which typically improves mathematical reasoning at the expense of general capabilities. In this work, we show that continual pre-training (CPT) has strong potential to enhance the mathematical reasoning ability of SLMs without relying on large teacher models. We also find that its effectiveness critically depends on the quality of the training data. To maximize efficiency and performance, we propose Dual-Metric Selection for Continual Pre-training (DRIFT), a novel data selection strategy that identifies optimal training data through task-aligned loss differences and distributional regularization. To further enhance task-specific reasoning while preserving general capabilities, we introduce a metadata-aware data mixture that integrates diverse sources during CPT. Extensive experiments on multiple arithmetic reasoning benchmarks demonstrate the effectiveness of DRIFT: SLMs trained with DRIFT achieve substantial gains in reasoning performance, surpassing larger models on specific tasks, while largely preserving general capabilities.<\/jats:p>","DOI":"10.3233\/faia251297","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:57:30Z","timestamp":1761127050000},"source":"Crossref","is-referenced-by-count":0,"title":["Don\u2019t Stop Pre-Training Small Language Models for Continual Enhancement of Reasoning"],"prefix":"10.3233","author":[{"given":"Qing","family":"Li","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qibin","family":"Zheng","sequence":"additional","affiliation":[{"name":"Advanced Institute of Big Data, Beijing, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Advanced Institute of Big Data, Beijing, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xingchun","family":"Diao","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA251297","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:57:41Z","timestamp":1761127061000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA251297"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia251297","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}