{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T21:36:25Z","timestamp":1773524185640,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,19]],"date-time":"2020-10-19T00:00:00Z","timestamp":1603065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,19]]},"DOI":"10.1145\/3340531.3412171","type":"proceedings-article","created":{"date-parts":[[2020,10,19]],"date-time":"2020-10-19T05:31:03Z","timestamp":1603085463000},"page":"3507-3508","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Compression of Deep Learning Models for NLP"],"prefix":"10.1145","author":[{"given":"Manish","family":"Gupta","sequence":"first","affiliation":[{"name":"Microsoft, Hyderabad, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vasudeva","family":"Varma","sequence":"additional","affiliation":[{"name":"International Institute of Information Technology Hyderabad, Hyderabad, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sonam","family":"Damani","sequence":"additional","affiliation":[{"name":"Microsoft, Hyderabad, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kedhar Nath","family":"Narahari","sequence":"additional","affiliation":[{"name":"Microsoft, Hyderabad, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,10,19]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"R. Anil G. Pereyra A. Passos R. Ormandi G. E. Dahl and G. E. Hinton. 2018. Large scale distributed neural network training through online distillation. arXiv:1804.03235 (2018).  R. Anil G. Pereyra A. Passos R. Ormandi G. E. Dahl and G. E. Hinton. 2018. Large scale distributed neural network training through online distillation. arXiv:1804.03235 (2018)."},{"key":"e_1_3_2_2_2_1","unstructured":"J. Ba and R. Caruana. 2014. Do deep nets really need to be deep?. In NIPS. 2654--2662.  J. Ba and R. Caruana. 2014. Do deep nets really need to be deep?. In NIPS. 2654--2662."},{"key":"e_1_3_2_2_3_1","unstructured":"S. Bai J. Z. Kolter and V. Koltun. 2019. Deep equilibrium models. arXiv:1909.01377 (2019).  S. Bai J. Z. Kolter and V. Koltun. 2019. Deep equilibrium models. arXiv:1909.01377 (2019)."},{"key":"e_1_3_2_2_4_1","volume-title":"Proceedings of the 2019 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 63--72","author":"Cao S.","unstructured":"S. Cao , C. Zhang , Z. Yao , W. Xiao , L. Nie , D. Zhan , Y. Liu , M. Wu , and L. Zhang . 2019. Efficient and effective sparse LSTM on FPGA with Bank-Balanced Sparsity . In Proceedings of the 2019 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 63--72 . S. Cao, C. Zhang, Z. Yao, W. Xiao, L. Nie, D. Zhan, Y. Liu, M. Wu, and L. Zhang. 2019. Efficient and effective sparse LSTM on FPGA with Bank-Balanced Sparsity. In Proceedings of the 2019 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 63--72."},{"key":"e_1_3_2_2_5_1","volume-title":"Stanford University","author":"Cheong R.","year":"2019","unstructured":"R. Cheong and R. Daniel . 2019. transformers. zip: Compressing Transformers with Pruning and Quantization. Technical Report. Technical report , Stanford University , Stanford, California , 2019 . R. Cheong and R. Daniel. 2019. transformers. zip: Compressing Transformers with Pruning and Quantization. Technical Report. Technical report, Stanford University, Stanford, California, 2019."},{"key":"e_1_3_2_2_6_1","unstructured":"R. Child S. Gray A. Radford and I. Sutskever. 2019. Generating long sequences with sparse transformers. arXiv:1904.10509 (2019).  R. Child S. Gray A. Radford and I. Sutskever. 2019. Generating long sequences with sparse transformers. arXiv:1904.10509 (2019)."},{"key":"e_1_3_2_2_7_1","volume-title":"Binaryconnect: Training deep neural networks with binary weights during propagations. In NIPS. 3123--3131.","author":"Courbariaux M.","year":"2015","unstructured":"M. Courbariaux , Y. Bengio , and J.-P. David . 2015 . Binaryconnect: Training deep neural networks with binary weights during propagations. In NIPS. 3123--3131. M. Courbariaux, Y. Bengio, and J.-P. David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In NIPS. 3123--3131."},{"key":"e_1_3_2_2_8_1","volume-title":"Universal transformers. arXiv:1807.03819","author":"Dehghani M.","year":"2018","unstructured":"M. Dehghani , S. Gouws , O. Vinyals , J. Uszkoreit , and \u0141. Kaiser. 2018. Universal transformers. arXiv:1807.03819 ( 2018 ). M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and \u0141. Kaiser. 2018. Universal transformers. arXiv:1807.03819 (2018)."},{"key":"e_1_3_2_2_9_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805","author":"Devlin J.","year":"2018","unstructured":"J. Devlin , M.-W. Chang , K. Lee , and K. Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018). J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2019.03.057"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Q. Guo X. Qiu P. Liu Y. Shao X. Xue and Z. Zhang. 2019. Star-Transformer. In NAACL-HLT. 1315--1325.  Q. Guo X. Qiu P. Liu Y. Shao X. Xue and Z. Zhang. 2019. Star-Transformer. In NAACL-HLT. 1315--1325.","DOI":"10.18653\/v1\/N19-1133"},{"key":"e_1_3_2_2_12_1","unstructured":"S. Han H. Mao and W. J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. arXiv:1510.00149 (2015).  S. Han H. Mao and W. J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. arXiv:1510.00149 (2015)."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"crossref","unstructured":"T. He Y. Fan Y. Qian T. Tan and K. Yu. 2014. Reshaping deep neural network for fast decoding by node-pruning. In ICASSP. IEEE 245--249.  T. He Y. Fan Y. Qian T. Tan and K. Yu. 2014. Reshaping deep neural network for fast decoding by node-pruning. In ICASSP. IEEE 245--249.","DOI":"10.1109\/ICASSP.2014.6853595"},{"key":"e_1_3_2_2_14_1","unstructured":"G. Hinton O. Vinyals and J. Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015).  G. Hinton O. Vinyals and J. Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)."},{"key":"e_1_3_2_2_15_1","unstructured":"L. Hou and J. T. Kwok. 2018. Loss-aware weight quantization of deep networks. arXiv:1802.08635 (2018).  L. Hou and J. T. Kwok. 2018. Loss-aware weight quantization of deep networks. arXiv:1802.08635 (2018)."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3242044"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"crossref","unstructured":"V. Khrulkov O. Hrinchuk L. Mirvakhabova and I. Oseledets. 2019. Tensorized Embedding Layers for Efficient Model Compression. arXiv:1901.10787 (2019).  V. Khrulkov O. Hrinchuk L. Mirvakhabova and I. Oseledets. 2019. Tensorized Embedding Layers for Efficient Model Compression. arXiv:1901.10787 (2019).","DOI":"10.18653\/v1\/2020.findings-emnlp.436"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"crossref","unstructured":"Y. Kim and A. M. Rush. 2016. Sequence-level knowledge distillation. arXiv:1606.07947 (2016).  Y. Kim and A. M. Rush. 2016. Sequence-level knowledge distillation. arXiv:1606.07947 (2016).","DOI":"10.18653\/v1\/D16-1139"},{"key":"e_1_3_2_2_19_1","volume-title":"ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942","author":"Lan Z.","year":"2019","unstructured":"Z. Lan , M. Chen , S. Goodman , K. Gimpel , P. Sharma , and R. Soricut . 2019 . ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942 (2019). Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942 (2019)."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Z. Li R. Kulhanek S. Wang Y. Zhao and S. Wu. 2018. Slim embedding layers for recurrent neural language models. In AAAI.  Z. Li R. Kulhanek S. Wang Y. Zhao and S. Wu. 2018. Slim embedding layers for recurrent neural language models. In AAAI.","DOI":"10.1609\/aaai.v32i1.12000"},{"key":"e_1_3_2_2_21_1","unstructured":"X. Liu P. He W. Chen and J. Gao. 2019. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. arXiv:1904.09482 (2019).  X. Liu P. He W. Chen and J. Gao. 2019. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. arXiv:1904.09482 (2019)."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Z. Lu V. Sindhwani and T. N. Sainath. 2016. Learning compact recurrent neural networks. In ICASSP. IEEE 5960--5964.  Z. Lu V. Sindhwani and T. N. Sainath. 2016. Learning compact recurrent neural networks. In ICASSP. IEEE 5960--5964.","DOI":"10.1109\/ICASSP.2016.7472821"},{"key":"e_1_3_2_2_23_1","unstructured":"X. Ma P. Zhang S. Zhang N. Duan Y. Hou D. Song and M. Zhou. 2019. A Tensorized Transformer for Language Modeling. arXiv:1906.09777 (2019).  X. Ma P. Zhang S. Zhang N. Duan Y. Hou D. Song and M. Zhou. 2019. A Tensorized Transformer for Language Modeling. arXiv:1906.09777 (2019)."},{"key":"e_1_3_2_2_24_1","unstructured":"P. Michel O. Levy and G. Neubig. 2019. Are Sixteen Heads Really Better than One? arXiv:1905.10650 (2019).  P. Michel O. Levy and G. Neubig. 2019. Are Sixteen Heads Really Better than One? arXiv:1905.10650 (2019)."},{"key":"e_1_3_2_2_25_1","unstructured":"S. Narang E. Undersander and G. Diamos. 2017. Block-sparse recurrent neural networks. arXiv:1711.02782 (2017).  S. Narang E. Undersander and G. Diamos. 2017. Block-sparse recurrent neural networks. arXiv:1711.02782 (2017)."},{"key":"e_1_3_2_2_26_1","unstructured":"J. Ott Z. Lin Y. Zhang S.-C. Liu and Y. Bengio. 2016. Recurrent neural networks with limited numerical precision. arXiv:1608.06902 (2016).  J. Ott Z. Lin Y. Zhang S.-C. Liu and Y. Bengio. 2016. Recurrent neural networks with limited numerical precision. arXiv:1608.06902 (2016)."},{"key":"e_1_3_2_2_27_1","unstructured":"A. Polino R. Pascanu and D. Alistarh. 2018. Model compression via distillation and quantization. arXiv:1802.05668 (2018).  A. Polino R. Pascanu and D. Alistarh. 2018. Model compression via distillation and quantization. arXiv:1802.05668 (2018)."},{"key":"e_1_3_2_2_28_1","volume-title":"On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition","author":"Prabhavalkar R.","unstructured":"R. Prabhavalkar , O. Alsharif , A. Bruguier , and L. McGraw . 2016. On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition . In ICASSP. IEEE , 5970--5974. R. Prabhavalkar, O. Alsharif, A. Bruguier, and L. McGraw. 2016. On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition. In ICASSP. IEEE, 5970--5974."},{"key":"e_1_3_2_2_29_1","volume-title":"Fitnets: Hints for thin deep nets. arXiv:1412.6550","author":"Romero A.","year":"2014","unstructured":"A. Romero , N. Ballas , S. E. Kahou , A. Chassang , C. Gatta , and Y. Bengio . 2014 . Fitnets: Hints for thin deep nets. arXiv:1412.6550 (2014). A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv:1412.6550 (2014)."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"crossref","unstructured":"A. See M.-T. Luong and C. D. Manning. 2016. Compression of neural machine translation models via pruning. arXiv:1606.09274 (2016).  A. See M.-T. Luong and C. D. Manning. 2016. Compression of neural machine translation models via pruning. arXiv:1606.09274 (2016).","DOI":"10.18653\/v1\/K16-1029"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"crossref","unstructured":"S. Sun Y. Cheng Z. Gan and J. Liu. 2019. Patient knowledge distillation for bert model compression. arXiv:1908.09355 (2019).  S. Sun Y. Cheng Z. Gan and J. Liu. 2019. Patient knowledge distillation for bert model compression. arXiv:1908.09355 (2019).","DOI":"10.18653\/v1\/D19-1441"},{"key":"e_1_3_2_2_32_1","unstructured":"R. Tang Y. Lu L. Liu L. Mou O. Vechtomova and J. Lin. 2019. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. arXiv:1903.12136 (2019).  R. Tang Y. Lu L. Liu L. Mou O. Vechtomova and J. Lin. 2019. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. arXiv:1903.12136 (2019)."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Y. Tay A. Zhang L. A. Tuan J. Rao S. Zhang S. Wang J. Fu and S. C. Hui. 2019. Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks. arXiv:1906.04393 (2019).  Y. Tay A. Zhang L. A. Tuan J. Rao S. Zhang S. Wang J. Fu and S. C. Hui. 2019. Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks. arXiv:1906.04393 (2019).","DOI":"10.18653\/v1\/P19-1145"},{"key":"e_1_3_2_2_34_1","volume-title":"WEST: Word Encoded Sequence Transducers","author":"Variani E.","year":"2019","unstructured":"E. Variani , A. T. Suresh , and M. Weintraub . 2019 . WEST: Word Encoded Sequence Transducers . In ICASSP. IEEE , 7340--7344. E. Variani, A. T. Suresh, and M. Weintraub. 2019. WEST: Word Encoded Sequence Transducers. In ICASSP. IEEE, 7340--7344."},{"key":"e_1_3_2_2_35_1","unstructured":"A. Vaswani N. Shazeer N. Parmar J. Uszkoreit L. Jones A. N. Gomez \u0141. Kaiser and I. Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008.  A. Vaswani N. Shazeer N. Parmar J. Uszkoreit L. Jones A. N. Gomez \u0141. Kaiser and I. Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008."},{"key":"e_1_3_2_2_36_1","unstructured":"Z. Wang J. Wohlwend and T. Lei. 2019. Structured Pruning of Large Language Models. arXiv:1910.04732 (2019).  Z. Wang J. Wohlwend and T. Lei. 2019. Structured Pruning of Large Language Models. arXiv:1910.04732 (2019)."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Y. Yang K. Liang X. Xiao Z. Xie L. Jin J. Sun and W. Zhou. 2018. Accelerating and Compressing LSTM Based Model for Online Handwritten Chinese Character Recognition. In ICFHR. IEEE 110--115.  Y. Yang K. Liang X. Xiao Z. Xie L. Jin J. Sun and W. Zhou. 2018. Accelerating and Compressing LSTM Based Model for Online Handwritten Chinese Character Recognition. In ICFHR. IEEE 110--115.","DOI":"10.1109\/ICFHR-2018.2018.00028"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"crossref","unstructured":"J. Ye L. Wang G. Li D. Chen S. Zhe X. Chu and Z. Xu. 2018. Learning compact recurrent neural networks with block-term tensor decomposition. In CVPR. 9378--9387.  J. Ye L. Wang G. Li D. Chen S. Zhe X. Chu and Z. Xu. 2018. Learning compact recurrent neural networks with block-term tensor decomposition. In CVPR. 9378--9387.","DOI":"10.1109\/CVPR.2018.00977"},{"key":"e_1_3_2_2_39_1","unstructured":"S. Zhao R. Gupta Y. Song and D. Zhou. 2019. Extreme Language Model Compression with Optimal Subwords and Shared Projections. arXiv:1909.11687 (2019).  S. Zhao R. Gupta Y. Song and D. Zhou. 2019. Extreme Language Model Compression with Optimal Subwords and Shared Projections. arXiv:1909.11687 (2019)."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-017-1750-y"},{"key":"e_1_3_2_2_41_1","unstructured":"M. Zhu and S. Gupta. 2017. To prune or not to prune: exploring the efficacy of pruning for model compression. arXiv:1710.01878 (2017).  M. Zhu and S. Gupta. 2017. To prune or not to prune: exploring the efficacy of pruning for model compression. arXiv:1710.01878 (2017)."}],"event":{"name":"CIKM '20: The 29th ACM International Conference on Information and Knowledge Management","location":"Virtual Event Ireland","acronym":"CIKM '20","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3340531.3412171","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3340531.3412171","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:54Z","timestamp":1750197774000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3340531.3412171"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,19]]},"references-count":41,"alternative-id":["10.1145\/3340531.3412171","10.1145\/3340531"],"URL":"https:\/\/doi.org\/10.1145\/3340531.3412171","relation":{},"subject":[],"published":{"date-parts":[[2020,10,19]]},"assertion":[{"value":"2020-10-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}