{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,4,7]],"date-time":"2024-04-07T09:06:21Z","timestamp":1712480781302},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"14","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>The pre-training models such as BERT have achieved great results in various natural language processing problems. However, a large number of parameters need significant amounts of memory and the consumption of inference time, which makes it difficult to deploy them on edge devices. In this work, we propose a knowledge distillation method LRC-BERT based on contrastive learning to fit the output of the intermediate layer from the angular distance aspect, which is not considered by the existing distillation methods. Furthermore, we introduce a gradient perturbation-based training architecture in the training phase to increase the robustness of LRC-BERT, which is the first attempt in knowledge distillation. Additionally, in order to better capture the distribution characteristics of the intermediate layer, we design a two-stage training method for the total distillation loss. Finally, by verifying 8 datasets on the General Language Understanding Evaluation (GLUE) benchmark, the performance of the proposed LRC-BERT exceeds the existing state-of-the-art methods, which proves the effectiveness of our method.<\/jats:p>","DOI":"10.1609\/aaai.v35i14.17518","type":"journal-article","created":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T19:56:51Z","timestamp":1662667011000},"page":"12830-12838","source":"Crossref","is-referenced-by-count":16,"title":["LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding"],"prefix":"10.1609","volume":"35","author":[{"given":"Hao","family":"Fu","sequence":"first","affiliation":[]},{"given":"Shaojun","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Qihong","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Junjie","family":"Tang","sequence":"additional","affiliation":[]},{"given":"Guiquan","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Kaikui","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Xiaolong","family":"Li","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2021,5,18]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/17518\/17325","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/17518\/17325","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T19:56:51Z","timestamp":1662667011000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/17518"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,18]]},"references-count":0,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2021,5,28]]}},"URL":"http:\/\/dx.doi.org\/10.1609\/aaai.v35i14.17518","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2021,5,18]]}}}