{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T03:34:13Z","timestamp":1764646453651},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>In this paper, we propose Stochastic Knowledge Distillation (SKD) to obtain compact BERT-style language model dubbed SKDBERT. In each distillation iteration, SKD samples a teacher model from a pre-defined teacher team, which consists of multiple teacher models with multi-level capacities, to transfer knowledge into student model in an one-to-one manner. Sampling distribution plays an important role in SKD. We heuristically present three types of sampling distributions to assign appropriate probabilities for multi-level teacher models. SKD has two advantages: 1) it can preserve the diversities of multi-level teacher models via stochastically sampling single teacher model in each distillation iteration, and 2) it can also improve the efficacy of knowledge distillation via multi-level teacher models when large capacity gap exists between the teacher model and the student model. Experimental results on GLUE benchmark show that SKDBERT reduces the size of a BERT model by 40% while retaining 99.5% performances of language understanding and being 100% faster.<\/jats:p>","DOI":"10.1609\/aaai.v37i6.25902","type":"journal-article","created":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T17:09:28Z","timestamp":1687885768000},"page":"7414-7422","source":"Crossref","is-referenced-by-count":10,"title":["SKDBERT: Compressing BERT via Stochastic Knowledge Distillation"],"prefix":"10.1609","volume":"37","author":[{"given":"Zixiang","family":"Ding","sequence":"first","affiliation":[]},{"given":"Guoqing","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Shuai","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Lin","family":"Guo","sequence":"additional","affiliation":[]},{"given":"Wei","family":"Lin","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2023,6,26]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/25902\/25674","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/25902\/25674","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T17:09:29Z","timestamp":1687885769000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/25902"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,26]]},"references-count":0,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2023,6,27]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v37i6.25902","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2023,6,26]]}}}