{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:23:11Z","timestamp":1774628591163,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,11]],"date-time":"2022-06-11T00:00:00Z","timestamp":1654905600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSERC"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,18]]},"DOI":"10.1145\/3470496.3527438","type":"proceedings-article","created":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T19:06:01Z","timestamp":1654023961000},"page":"888-901","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Mokey"],"prefix":"10.1145","author":[{"given":"Ali Hadi","family":"Zadeh","sequence":"first","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]},{"given":"Mostafa","family":"Mahmoud","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]},{"given":"Ameer","family":"Abdelhadi","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]},{"given":"Andreas","family":"Moshovos","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]}],"member":"320","published-online":{"date-parts":[[2022,6,11]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6229"},{"key":"e_1_3_2_1_2_1","volume-title":"Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150","author":"Beltagy Iz","year":"2020","unstructured":"Iz Beltagy , Matthew E Peters , and Arman Cohan . 2020 . Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020). Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)."},{"key":"e_1_3_2_1_3_1","unstructured":"Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).  Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)."},{"key":"e_1_3_2_1_4_1","volume-title":"Cadence Innovus User Guide","author":"Cadence Design Systems Inc. 2021.","unstructured":"Cadence Design Systems Inc. 2021. Cadence Innovus User Guide . Cadence Design Systems Inc. Ver . 20.13. Cadence Design Systems Inc. 2021. Cadence Innovus User Guide. Cadence Design Systems Inc. Ver. 20.13."},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the 23rd National Conference on Artificial Intelligence -","volume":"2","author":"Chang Ming-Wei","year":"2008","unstructured":"Ming-Wei Chang , Lev Ratinov , Dan Roth , and Vivek Srikumar . 2008 . Importance of Semantic Representation: Dataless Classification . In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2 (Chicago, Illinois) (AAAI'08). AAAI Press, 830--835. Ming-Wei Chang, Lev Ratinov, Dan Roth, and Vivek Srikumar. 2008. Importance of Semantic Representation: Dataless Classification. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2 (Chicago, Illinois) (AAAI'08). AAAI Press, 830--835."},{"key":"e_1_3_2_1_6_1","volume-title":"Yongkweon Jeon, Baeseong Park, Sangha Kim, and Dongsoo Lee.","author":"Chung Insoo","year":"2020","unstructured":"Insoo Chung , Byeongwook Kim , Yoonjung Choi , Se Jung Kwon , Yongkweon Jeon, Baeseong Park, Sangha Kim, and Dongsoo Lee. 2020 . Extremely low bit transformer quantization for on-device neural machine translation. arXiv preprint arXiv:2009.07453 (2020). Insoo Chung, Byeongwook Kim, Yoonjung Choi, Se Jung Kwon, Yongkweon Jeon, Baeseong Park, Sangha Kim, and Dongsoo Lee. 2020. Extremely low bit transformer quantization for on-device neural machine translation. arXiv preprint arXiv:2009.07453 (2020)."},{"key":"e_1_3_2_1_7_1","volume-title":"Compute and Energy Consumption Trends in Deep Learning Inference. arXiv preprint arXiv:2109.05472","author":"Desislavov Radosvet","year":"2021","unstructured":"Radosvet Desislavov , Fernando Mart\u00ednez-Plumed , and Jos\u00e9 Hern\u00e1ndez-Orallo . 2021. Compute and Energy Consumption Trends in Deep Learning Inference. arXiv preprint arXiv:2109.05472 ( 2021 ). Radosvet Desislavov, Fernando Mart\u00ednez-Plumed, and Jos\u00e9 Hern\u00e1ndez-Orallo. 2021. Compute and Energy Consumption Trends in Deep Learning Inference. arXiv preprint arXiv:2109.05472 (2021)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_2_1_9_1","unstructured":"Luke Durant Olivier Giroux Mark Harris and Nick Stam. 2017. NVIDIA Developer Blog. https:\/\/devblogs.nvidia.com\/inside-volta\/  Luke Durant Olivier Giroux Mark Harris and Nick Stam. 2017. NVIDIA Developer Blog. https:\/\/devblogs.nvidia.com\/inside-volta\/"},{"key":"e_1_3_2_1_10_1","volume-title":"Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. CoRR abs\/2101.03961","author":"Fedus William","year":"2021","unstructured":"William Fedus , Barret Zoph , and Noam Shazeer . 2021 . Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. CoRR abs\/2101.03961 (2021). arXiv:2101.03961 https:\/\/arxiv.org\/abs\/2101.03961 William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. CoRR abs\/2101.03961 (2021). arXiv:2101.03961 https:\/\/arxiv.org\/abs\/2101.03961"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00413"},{"key":"e_1_3_2_1_12_1","volume-title":"Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. arXiv preprint arXiv:2002.08307","author":"Gordon Mitchell A","year":"2020","unstructured":"Mitchell A Gordon , Kevin Duh , and Nicholas Andrews . 2020. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. arXiv preprint arXiv:2002.08307 ( 2020 ). Mitchell A Gordon, Kevin Duh, and Nicholas Andrews. 2020. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. arXiv preprint arXiv:2002.08307 (2020)."},{"key":"e_1_3_2_1_13_1","volume-title":"Dally","author":"Han Song","year":"2016","unstructured":"Song Han , Huizi Mao , and William J . Dally . 2016 . Deep Compression : Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). http:\/\/arxiv.org\/abs\/1510.00149 Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1510.00149"},{"key":"e_1_3_2_1_14_1","volume-title":"Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654","author":"He Pengcheng","year":"2021","unstructured":"Pengcheng He , Xiaodong Liu , Jianfeng Gao , and Weizhu Chen . 2021 . Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 (2021). Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2021. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 (2021)."},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of Machine Learning and Systems 3","author":"Ivanov Andrei","year":"2021","unstructured":"Andrei Ivanov , Nikoli Dryden , Tal Ben-Nun , Shigang Li , and Torsten Hoefler . 2021 . Data Movement Is All You Need: A Case Study on Optimizing Transformers . Proceedings of Machine Learning and Systems 3 (2021). Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. 2021. Data Movement Is All You Need: A Case Study on Optimizing Transformers. Proceedings of Machine Learning and Systems 3 (2021)."},{"key":"e_1_3_2_1_16_1","volume-title":"Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. CoRR abs\/1804.06826","author":"Jia Zhe","year":"2018","unstructured":"Zhe Jia , Marco Maggioni , Benjamin Staiger , and Daniele Paolo Scarpazza . 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. CoRR abs\/1804.06826 ( 2018 ). arXiv:1804.06826 http:\/\/arxiv.org\/abs\/1804.06826 Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele Paolo Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. CoRR abs\/1804.06826 (2018). arXiv:1804.06826 http:\/\/arxiv.org\/abs\/1804.06826"},{"key":"e_1_3_2_1_17_1","volume-title":"Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification. arXiv preprint arXiv:2108.02598","author":"Jiang Yidi","year":"2021","unstructured":"Yidi Jiang , Bidisha Sharma , Maulik Madhavi , and Haizhou Li. 2021. Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification. arXiv preprint arXiv:2108.02598 ( 2021 ). Yidi Jiang, Bidisha Sharma, Maulik Madhavi, and Haizhou Li. 2021. Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification. arXiv preprint arXiv:2108.02598 (2021)."},{"key":"e_1_3_2_1_18_1","volume-title":"International Conference on Machine Learning. PMLR, 5156--5165","author":"Katharopoulos Angelos","year":"2020","unstructured":"Angelos Katharopoulos , Apoorv Vyas , Nikolaos Pappas , and Fran\u00e7ois Fleuret . 2020 . Transformers are rnns: Fast autoregressive transformers with linear attention . In International Conference on Machine Learning. PMLR, 5156--5165 . Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and Fran\u00e7ois Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning. PMLR, 5156--5165."},{"key":"e_1_3_2_1_19_1","volume-title":"I-bert: Integer-only bert quantization. arXiv preprint arXiv:2101.01321","author":"Kim Sehoon","year":"2021","unstructured":"Sehoon Kim , Amir Gholami , Zhewei Yao , Michael W Mahoney , and Kurt Keutzer . 2021 . I-bert: Integer-only bert quantization. arXiv preprint arXiv:2101.01321 (2021). Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. I-bert: Integer-only bert quantization. arXiv preprint arXiv:2101.01321 (2021)."},{"key":"e_1_3_2_1_20_1","volume-title":"Learned token pruning for transformers. arXiv preprint arXiv:2107.00910","author":"Kim Sehoon","year":"2021","unstructured":"Sehoon Kim , Sheng Shen , David Thorsley , Amir Gholami , Woosuk Kwon , Joseph Hassoun , and Kurt Keutzer . 2021. Learned token pruning for transformers. arXiv preprint arXiv:2107.00910 ( 2021 ). Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, and Kurt Keutzer. 2021. Learned token pruning for transformers. arXiv preprint arXiv:2107.00910 (2021)."},{"key":"e_1_3_2_1_21_1","volume-title":"Block pruning for faster transformers. arXiv preprint arXiv:2109.04838","author":"Lagunas Fran\u00e7ois","year":"2021","unstructured":"Fran\u00e7ois Lagunas , Ella Charlaix , Victor Sanh , and Alexander M Rush . 2021. Block pruning for faster transformers. arXiv preprint arXiv:2109.04838 ( 2021 ). Fran\u00e7ois Lagunas, Ella Charlaix, Victor Sanh, and Alexander M Rush. 2021. Block pruning for faster transformers. arXiv preprint arXiv:2109.04838 (2021)."},{"key":"e_1_3_2_1_22_1","volume-title":"Differentiable Subset Pruning of Transformer Heads. arXiv preprint arXiv:2108.04657","author":"Li Jiaoda","year":"2021","unstructured":"Jiaoda Li , Ryan Cotterell , and Mrinmaya Sachan . 2021. Differentiable Subset Pruning of Transformer Heads. arXiv preprint arXiv:2108.04657 ( 2021 ). Jiaoda Li, Ryan Cotterell, and Mrinmaya Sachan. 2021. Differentiable Subset Pruning of Transformer Heads. arXiv preprint arXiv:2108.04657 (2021)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.2973991"},{"key":"e_1_3_2_1_24_1","volume-title":"Mkd: a multi-task knowledge distillation approach for pretrained language models. arXiv preprint arXiv:1911.03588","author":"Liu Linqing","year":"2019","unstructured":"Linqing Liu , Huan Wang , Jimmy Lin , Richard Socher , and Caiming Xiong . 2019. Mkd: a multi-task knowledge distillation approach for pretrained language models. arXiv preprint arXiv:1911.03588 ( 2019 ). Linqing Liu, Huan Wang, Jimmy Lin, Richard Socher, and Caiming Xiong. 2019. Mkd: a multi-task knowledge distillation approach for pretrained language models. arXiv preprint arXiv:1911.03588 (2019)."},{"key":"e_1_3_2_1_25_1","volume-title":"Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019 . Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3446640"},{"key":"e_1_3_2_1_27_1","volume-title":"ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Masumura Ryo","unstructured":"Ryo Masumura , Naoki Makishima , Mana Ihori , Akihiko Takashima , Tomohiro Tanaka , and Shota Orihashi . 2021. Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation . In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE , 5879--5883. Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, and Shota Orihashi. 2021. Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5879--5883."},{"key":"e_1_3_2_1_28_1","volume-title":"version 9.9.0 (R2020a)","author":"MATLAB.","unstructured":"MATLAB. 2020. version 9.9.0 (R2020a) . The MathWorks Inc., Natick, Massachusetts . MATLAB. 2020. version 9.9.0 (R2020a). The MathWorks Inc., Natick, Massachusetts."},{"key":"e_1_3_2_1_29_1","volume-title":"Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data. arXiv preprint arXiv:1910.01769","author":"Mukherjee Subhabrata","year":"2019","unstructured":"Subhabrata Mukherjee and Ahmed Hassan Awadallah . 2019. Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data. arXiv preprint arXiv:1910.01769 ( 2019 ). Subhabrata Mukherjee and Ahmed Hassan Awadallah. 2019. Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data. arXiv preprint arXiv:1910.01769 (2019)."},{"key":"e_1_3_2_1_30_1","volume-title":"A tool to model large caches. HP Laboratories (01","author":"Muralimanohar Naveen","year":"2009","unstructured":"Naveen Muralimanohar , Rajeev Balasubramonian , and Norman Jouppi . 2009. Cacti 6.0 : A tool to model large caches. HP Laboratories (01 2009 ). Naveen Muralimanohar, Rajeev Balasubramonian, and Norman Jouppi. 2009. Cacti 6.0: A tool to model large caches. HP Laboratories (01 2009)."},{"key":"e_1_3_2_1_31_1","unstructured":"Common Crawl nonprofit 501(c)(3) Organization. 2021. https:\/\/commoncrawl.org\/the-data\/get-started\/  Common Crawl nonprofit 501(c)(3) Organization. 2021. https:\/\/commoncrawl.org\/the-data\/get-started\/"},{"key":"e_1_3_2_1_32_1","volume-title":"Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures. arXiv preprint arXiv:2110.15225","author":"Parnami Archit","year":"2021","unstructured":"Archit Parnami , Rahul Singh , and Tarun Joshi . 2021. Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures. arXiv preprint arXiv:2110.15225 ( 2021 ). Archit Parnami, Rahul Singh, and Tarun Joshi. 2021. Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures. arXiv preprint arXiv:2110.15225 (2021)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_2_1_34_1","volume-title":"Greedy Layer Pruning: Decreasing Inference Time of Transformer Models. arXiv preprint arXiv:2105.14839","author":"Peer David","year":"2021","unstructured":"David Peer , Sebastian Stabinger , Stefan Engl , and Antonio Rodriguez-Sanchez . 2021. Greedy Layer Pruning: Decreasing Inference Time of Transformer Models. arXiv preprint arXiv:2105.14839 ( 2021 ). David Peer, Sebastian Stabinger, Stefan Engl, and Antonio Rodriguez-Sanchez. 2021. Greedy Layer Pruning: Decreasing Inference Time of Transformer Models. arXiv preprint arXiv:2105.14839 (2021)."},{"key":"e_1_3_2_1_35_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).  Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_2_1_36_1","volume-title":"100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250","author":"Rajpurkar Pranav","year":"2016","unstructured":"Pranav Rajpurkar , Jian Zhang , Konstantin Lopyrev , and Percy Liang . 2016. Squad : 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 ( 2016 ). Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)."},{"key":"e_1_3_2_1_37_1","volume-title":"5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS","author":"Sanh Victor","year":"2019","unstructured":"Victor Sanh , Lysandre Debut , Julien Chaumond , and Thomas Wolf . 2019 . Distil-BERT, a distilled version of BERT: smaller, faster, cheaper and lighter . 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS (2019). Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. Distil-BERT, a distilled version of BERT: smaller, faster, cheaper and lighter. 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS (2019)."},{"key":"e_1_3_2_1_38_1","volume-title":"The Cost of Training NLP Models: A Concise Overview. CoRR abs\/2004.08900","author":"Sharir Or","year":"2020","unstructured":"Or Sharir , Barak Peleg , and Yoav Shoham . 2020. The Cost of Training NLP Models: A Concise Overview. CoRR abs\/2004.08900 ( 2020 ). arXiv:2004.08900 https:\/\/arxiv.org\/abs\/2004.08900 Or Sharir, Barak Peleg, and Yoav Shoham. 2020. The Cost of Training NLP Models: A Concise Overview. CoRR abs\/2004.08900 (2020). arXiv:2004.08900 https:\/\/arxiv.org\/abs\/2004.08900"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Sheng Shen Zhen Dong Jiayu Ye Linjian Ma Zhewei Yao Amir Gholami Michael W Mahoney and Kurt Keutzer. 2020. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.. In AAAI. 8815--8821.  Sheng Shen Zhen Dong Jiayu Ye Linjian Ma Zhewei Yao Amir Gholami Michael W Mahoney and Kurt Keutzer. 2020. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.. In AAAI. 8815--8821.","DOI":"10.1609\/aaai.v34i05.6409"},{"key":"e_1_3_2_1_40_1","volume-title":"Layerwise Pruning of Transformer Attention Heads for Efficient Language Modeling. arXiv preprint arXiv:2110.03252","author":"Shim Kyuhong","year":"2021","unstructured":"Kyuhong Shim , Iksoo Choi , Wonyong Sung , and Jungwook Choi . 2021. Layerwise Pruning of Transformer Attention Heads for Efficient Language Modeling. arXiv preprint arXiv:2110.03252 ( 2021 ). Kyuhong Shim, Iksoo Choi, Wonyong Sung, and Jungwook Choi. 2021. Layerwise Pruning of Transformer Attention Heads for Efficient Language Modeling. arXiv preprint arXiv:2110.03252 (2021)."},{"key":"e_1_3_2_1_41_1","volume-title":"DesignWare\u00ae Building Block IP User Guide","author":"Synopsys Inc. 2016.","year":"2016","unstructured":"Synopsys Inc. 2016. DesignWare\u00ae Building Block IP User Guide . Synopsys Inc. Ver. M- 2016 .12. Synopsys Inc. 2016. DesignWare\u00ae Building Block IP User Guide. Synopsys Inc. Ver. M-2016.12."},{"key":"e_1_3_2_1_42_1","volume-title":"Design Compiler\u00ae User Guide","author":"Synopsys Inc. 2017.","year":"2017","unstructured":"Synopsys Inc. 2017. Design Compiler\u00ae User Guide . Synopsys Inc. Ver. N- 2017 .09. Synopsys Inc. 2017. Design Compiler\u00ae User Guide. Synopsys Inc. Ver. N-2017.09."},{"key":"e_1_3_2_1_43_1","volume-title":"International Conference on Machine Learning. PMLR, 10347--10357","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron , Matthieu Cord , Matthijs Douze , Francisco Massa , Alexandre Sablayrolles , and Herv\u00e9 J\u00e9gou . 2021 . Training data-efficient image transformers & distillation through attention . In International Conference on Machine Learning. PMLR, 10347--10357 . Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. PMLR, 10347--10357."},{"key":"e_1_3_2_1_44_1","volume-title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019","author":"Wang Alex","year":"2019","unstructured":"Alex Wang , Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , and Samuel R. Bowman . 2019 . GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019 , New Orleans, LA, USA, May 6--9 , 2019 . OpenReview.net. https:\/\/openreview.net\/forum?id=rJ4km2R5t7 Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https:\/\/openreview.net\/forum?id=rJ4km2R5t7"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3309551"},{"key":"e_1_3_2_1_46_1","volume-title":"Hat: Hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187","author":"Wang Hanrui","year":"2020","unstructured":"Hanrui Wang , Zhanghao Wu , Zhijian Liu , Han Cai , Ligeng Zhu , Chuang Gan , and Song Han . 2020 . Hat: Hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187 (2020). Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, and Song Han. 2020. Hat: Hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187 (2020)."},{"key":"e_1_3_2_1_47_1","volume-title":"SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 97--110","author":"Wang Hanrui","year":"2021","unstructured":"Hanrui Wang , Zhekai Zhang , and Song Han . 2021 . SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 97--110 . Hanrui Wang, Zhekai Zhang, and Song Han. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 97--110."},{"key":"e_1_3_2_1_48_1","volume-title":"Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768","author":"Wang Sinong","year":"2020","unstructured":"Sinong Wang , Belinda Z Li , Madian Khabsa , Han Fang , and Hao Ma . 2020 . Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020). Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020)."},{"key":"e_1_3_2_1_49_1","volume-title":"HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv abs\/1910.03771","author":"Wolf Thomas","year":"2019","unstructured":"Thomas Wolf , Lysandre Debut , Victor Sanh , Julien Chaumond , Clement Delangue , Anthony Moi , Pierric Cistac , Tim Rault , R'emi Louf , Morgan Funtowicz , and Jamie Brew . 2019. HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv abs\/1910.03771 ( 2019 ). Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R'emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv abs\/1910.03771 (2019)."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00071"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/EMC2-NIPS53020.2019.00016"},{"key":"e_1_3_2_1_52_1","volume-title":"Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al.","author":"Zaheer Manzil","year":"2020","unstructured":"Manzil Zaheer , Guru Guruganesh , Kumar Avinava Dubey , Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. 2020 . Big Bird : Transformers for Longer Sequences.. In NeurIPS. Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. 2020. Big Bird: Transformers for Longer Sequences.. In NeurIPS."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4419-9863-7_1371"},{"key":"e_1_3_2_1_54_1","volume-title":"Ternarybert: Distillation-aware ultra-low bit bert. arXiv preprint arXiv:2009.12812","author":"Zhang Wei","year":"2020","unstructured":"Wei Zhang , Lu Hou , Yichun Yin , Lifeng Shang , Xiao Chen , Xin Jiang , and Qun Liu . 2020 . Ternarybert: Distillation-aware ultra-low bit bert. arXiv preprint arXiv:2009.12812 (2020). Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, and Qun Liu. 2020. Ternarybert: Distillation-aware ultra-low bit bert. arXiv preprint arXiv:2009.12812 (2020)."}],"event":{"name":"ISCA '22: The 49th Annual International Symposium on Computer Architecture","location":"New York New York","acronym":"ISCA '22","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IEEE CS TCAA IEEE CS technical committee on architectural acoustics"]},"container-title":["Proceedings of the 49th Annual International Symposium on Computer Architecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470496.3527438","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3470496.3527438","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:54Z","timestamp":1750191534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470496.3527438"}},"subtitle":["enabling narrow fixed-point inference for out-of-the-box floating-point transformer models"],"short-title":[],"issued":{"date-parts":[[2022,6,11]]},"references-count":54,"alternative-id":["10.1145\/3470496.3527438","10.1145\/3470496"],"URL":"https:\/\/doi.org\/10.1145\/3470496.3527438","relation":{},"subject":[],"published":{"date-parts":[[2022,6,11]]},"assertion":[{"value":"2022-06-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}