{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:55:06Z","timestamp":1750308906120,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":22,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,11,25]],"date-time":"2022-11-25T00:00:00Z","timestamp":1669334400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"University Innovation Team Project of Jinan","award":["2019GXRC015"],"award-info":[{"award-number":["2019GXRC015"]}]},{"name":"Shandong Proviencial Natural Science Foundation, China","award":["ZR2021MF036"],"award-info":[{"award-number":["ZR2021MF036"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,11,25]]},"DOI":"10.1145\/3573834.3574482","type":"proceedings-article","created":{"date-parts":[[2023,1,18]],"date-time":"2023-01-18T03:29:17Z","timestamp":1674012557000},"page":"1-4","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["BERT Model Compression With Decoupled Knowledge Distillation And Representation Learning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3128-8812","authenticated-orcid":false,"given":"Linna","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Information Science and Engineering, University of Jinan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5265-903X","authenticated-orcid":false,"given":"Yuehui","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, University of Jinan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1973-5010","authenticated-orcid":false,"given":"Yi","family":"Cao","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, University of Jinan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4134-2352","authenticated-orcid":false,"given":"Yaou","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, University of Jinan, China"}]}],"member":"320","published-online":{"date-parts":[[2023,1,17]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks. 2020 25th International Conference on Pattern Recognition (ICPR)","author":"Choi Hyunjin","year":"2021","unstructured":"Hyunjin Choi , Judong Kim , Seongho Joe , and Youngjune Gwon . 2021 . Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks. 2020 25th International Conference on Pattern Recognition (ICPR) (2021), 5482\u20135487. Hyunjin Choi, Judong Kim, Seongho Joe, and Youngjune Gwon. 2021. Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks. 2020 25th International Conference on Pattern Recognition (ICPR) (2021), 5482\u20135487."},{"key":"e_1_3_2_1_2_1","unstructured":"Emily\u00a0L. Denton Wojciech Zaremba Joan Bruna Yann LeCun and Rob Fergus. 2014. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. ArXiv abs\/1404.0736(2014).  Emily\u00a0L. Denton Wojciech Zaremba Joan Bruna Yann LeCun and Rob Fergus. 2014. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. ArXiv abs\/1404.0736(2014)."},{"key":"e_1_3_2_1_3_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018)."},{"key":"e_1_3_2_1_4_1","volume-title":"Dolan and Chris Brockett","author":"B.","year":"2005","unstructured":"William\u00a0 B. Dolan and Chris Brockett . 2005 . Automatically Constructing a Corpus of Sentential Paraphrases. In IJCNLP. William\u00a0B. Dolan and Chris Brockett. 2005. Automatically Constructing a Corpus of Sentential Paraphrases. In IJCNLP."},{"key":"e_1_3_2_1_5_1","unstructured":"Angela Fan Edouard Grave and Armand Joulin. 2020. Reducing Transformer Depth on Demand with Structured Dropout. ArXiv abs\/1909.11556(2020).  Angela Fan Edouard Grave and Armand Joulin. 2020. Reducing Transformer Depth on Demand with Structured Dropout. ArXiv abs\/1909.11556(2020)."},{"key":"e_1_3_2_1_6_1","unstructured":"Yunchao Gong L. Liu Ming Yang and Lubomir\u00a0D. Bourdev. 2014. Compressing Deep Convolutional Networks using Vector Quantization. ArXiv abs\/1412.6115(2014).  Yunchao Gong L. Liu Ming Yang and Lubomir\u00a0D. Bourdev. 2014. Compressing Deep Convolutional Networks using Vector Quantization. ArXiv abs\/1412.6115(2014)."},{"key":"e_1_3_2_1_7_1","volume":"201","author":"Han Song","unstructured":"Song Han , Huizi Mao , and William\u00a0 J. Dally. 201 6. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv: Computer Vision and Pattern Recognition (2016). Song Han, Huizi Mao, and William\u00a0J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv: Computer Vision and Pattern Recognition (2016).","journal-title":"J. Dally."},{"key":"e_1_3_2_1_8_1","volume-title":"Channel Pruning for Accelerating Very Deep Neural Networks. 2017 IEEE International Conference on Computer Vision (ICCV)","author":"He Yihui","year":"2017","unstructured":"Yihui He , Xiangyu Zhang , and Jian Sun . 2017 . Channel Pruning for Accelerating Very Deep Neural Networks. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 1398\u20131406. Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 1398\u20131406."},{"key":"e_1_3_2_1_9_1","unstructured":"Geoffrey\u00a0E. Hinton Oriol Vinyals and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. ArXiv abs\/1503.02531(2015).  Geoffrey\u00a0E. Hinton Oriol Vinyals and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. ArXiv abs\/1503.02531(2015)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li Fang Wang and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. ArXiv abs\/1909.10351(2020).  Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li Fang Wang and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. ArXiv abs\/1909.10351(2020).","DOI":"10.18653\/v1\/2020.findings-emnlp.372"},{"key":"e_1_3_2_1_11_1","volume-title":"Train Large","author":"Li Zhuohan","year":"2002","unstructured":"Zhuohan Li , Eric Wallace , Sheng Shen , Kevin Lin , Kurt Keutzer , Dan Klein , and Joseph Gonzalez . 2020. Train Large , Then Compress : Rethinking Model Size for Efficient Training and Inference of Transformers. ArXiv abs\/ 2002 .11794(2020). Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and Joseph Gonzalez. 2020. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers. ArXiv abs\/2002.11794(2020)."},{"key":"e_1_3_2_1_12_1","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs\/1907.11692.  Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs\/1907.11692."},{"key":"e_1_3_2_1_13_1","unstructured":"Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.  Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners."},{"key":"e_1_3_2_1_14_1","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: smaller faster cheaper and lighter. ArXiv abs\/1910.01108(2019).  Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: smaller faster cheaper and lighter. ArXiv abs\/1910.01108(2019)."},{"key":"e_1_3_2_1_15_1","unstructured":"Richard Socher Alex Perelygin Jean Wu Jason Chuang Christopher\u00a0D. Manning A. Ng and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP.  Richard Socher Alex Perelygin Jean Wu Jason Chuang Christopher\u00a0D. Manning A. Ng and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"S. Sun Yu Cheng Zhe Gan and Jingjing Liu. 2019. Patient Knowledge Distillation for BERT Model Compression. In EMNLP.  S. Sun Yu Cheng Zhe Gan and Jingjing Liu. 2019. Patient Knowledge Distillation for BERT Model Compression. In EMNLP.","DOI":"10.18653\/v1\/D19-1441"},{"key":"e_1_3_2_1_17_1","unstructured":"Zhiqing Sun Hongkun Yu Xiaodan Song Renjie Liu Yiming Yang and Denny Zhou. 2020. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. ArXiv abs\/2004.02984(2020).  Zhiqing Sun Hongkun Yu Xiaodan Song Renjie Liu Yiming Yang and Denny Zhou. 2020. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. ArXiv abs\/2004.02984(2020)."},{"key":"e_1_3_2_1_18_1","volume":"201","author":"Tang Raphael","unstructured":"Raphael Tang , Yao Lu , Linqing Liu , Lili Mou , Olga Vechtomova , and Jimmy\u00a0 J. Lin. 201 9. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. ArXiv abs\/1903.12136(2019). Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy\u00a0J. Lin. 2019. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. ArXiv abs\/1903.12136(2019).","journal-title":"J. Lin."},{"key":"e_1_3_2_1_19_1","unstructured":"Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. ArXiv abs\/2002.10957(2020).  Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. ArXiv abs\/2002.10957(2020)."},{"key":"e_1_3_2_1_20_1","unstructured":"Canwen Xu Wangchunshu Zhou Tao Ge Furu Wei and Ming Zhou. 2020. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. In EMNLP.  Canwen Xu Wangchunshu Zhou Tao Ge Furu Wei and Ming Zhou. 2020. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. In EMNLP."},{"key":"e_1_3_2_1_21_1","volume-title":"Decoupled Knowledge Distillation. 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022","author":"Zhao Borui","year":"2022","unstructured":"Borui Zhao , Quan Cui , Renjie Song , Yiyu Qiu , and Jiajun Liang . 2022 . Decoupled Knowledge Distillation. 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022 ), 11943\u201311952. Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. 2022. Decoupled Knowledge Distillation. 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), 11943\u201311952."},{"key":"e_1_3_2_1_22_1","unstructured":"Wangchunshu Zhou Canwen Xu and Julian McAuley. 2022. BERT Learns to Teach: Knowledge Distillation with Meta Learning. In ACL.  Wangchunshu Zhou Canwen Xu and Julian McAuley. 2022. BERT Learns to Teach: Knowledge Distillation with Meta Learning. In ACL."}],"event":{"name":"AISS 2022: 2022 4th International Conference on Advanced Information Science and System","acronym":"AISS 2022","location":"Sanya China"},"container-title":["Proceedings of the 4th International Conference on Advanced Information Science and System"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3573834.3574482","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3573834.3574482","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:26:11Z","timestamp":1750281971000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3573834.3574482"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,25]]},"references-count":22,"alternative-id":["10.1145\/3573834.3574482","10.1145\/3573834"],"URL":"https:\/\/doi.org\/10.1145\/3573834.3574482","relation":{},"subject":[],"published":{"date-parts":[[2022,11,25]]},"assertion":[{"value":"2023-01-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}