{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T16:22:31Z","timestamp":1781022151490,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":88,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation JUMP ADA","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1704834, 1718160"],"award-info":[{"award-number":["1704834, 1718160"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,18]]},"DOI":"10.1145\/3466752.3480095","type":"proceedings-article","created":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T19:12:05Z","timestamp":1634497925000},"page":"830-844","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":87,"title":["EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference"],"prefix":"10.1145","author":[{"given":"Thierry","family":"Tambe","sequence":"first","affiliation":[{"name":"Harvard University, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Coleman","family":"Hooper","sequence":"additional","affiliation":[{"name":"Harvard University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lillian","family":"Pentecost","sequence":"additional","affiliation":[{"name":"Harvard University, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tianyu","family":"Jia","sequence":"additional","affiliation":[{"name":"Harvard University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"En-Yu","family":"Yang","sequence":"additional","affiliation":[{"name":"Harvard University, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marco","family":"Donato","sequence":"additional","affiliation":[{"name":"Tufts University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Victor","family":"Sanh","sequence":"additional","affiliation":[{"name":"Hugging Face"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Paul","family":"Whatmough","sequence":"additional","affiliation":[{"name":"Arm Research \/ Harvard, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alexander M.","family":"Rush","sequence":"additional","affiliation":[{"name":"Cornell University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Brooks","sequence":"additional","affiliation":[{"name":"Harvard University, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gu-Yeon","family":"Wei","sequence":"additional","affiliation":[{"name":"Harvard University"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"accessed Oct 1 2020. Catapult High-Level Synthesis. https:\/\/www.mentor.com\/hls-lp\/catapult-high-level-synthesis  accessed Oct 1 2020. Catapult High-Level Synthesis. https:\/\/www.mentor.com\/hls-lp\/catapult-high-level-synthesis"},{"key":"e_1_3_2_1_2_1","unstructured":"accessed Oct 1 2020. Jetson TX2 Module. https:\/\/developer.nvidia.com\/embedded\/jetson-tx2  accessed Oct 1 2020. Jetson TX2 Module. https:\/\/developer.nvidia.com\/embedded\/jetson-tx2"},{"key":"e_1_3_2_1_3_1","volume-title":"2021 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).","author":"Agrawal A.","unstructured":"A. Agrawal , S. Lee , J. Silberman , M. Ziegler , M. Kang , S. Venkataramani , N. Cao , B. Fleischer , M. Guillorn , M. Cohen , S. Mueller , J. Oh , M. Lutz , J. Jung , S. Koswatta , C. Zhou , V. Zalani , J. Bonanno , R. Casatuta , C. Chen , J. Choi , H. Haynie , A. Herbert , R. Jain , M. Kar , K. Kim , Y. Li , Z. Ren , S. Rider , M. Schaal , K. Schelm , M. Scheuermann , X. Sun , H. Tran , N. Wang , W. Wang , X. Zhang , V. Shah , B. Curran , V. Srinivasan , P. Lu , S. Shukla , L. Chang , and K. Gopalakrishnan . 2021. 9.1 A 7nm 4-Core AI chip with 25.6 TFLOPS hybrid FP8 training, 102.4 TOPS INT4 inference and workload-Aware throttling . In 2021 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). A. Agrawal, S. Lee, J. Silberman, M. Ziegler, M. Kang, S. Venkataramani, N. Cao, B. Fleischer, M. Guillorn, M. Cohen, S. Mueller, J. Oh, M. Lutz, J. Jung, S. Koswatta, C. Zhou, V. Zalani, J. Bonanno, R. Casatuta, C. Chen, J. Choi, H. Haynie, A. Herbert, R. Jain, M. Kar, K. Kim, Y. Li, Z. Ren, S. Rider, M. Schaal, K. Schelm, M. Scheuermann, X. Sun, H. Tran, N. Wang, W. Wang, X. Zhang, V. Shah, B. Curran, V. Srinivasan, P. Lu, S. Shukla, L. Chang, and K. Gopalakrishnan. 2021. 9.1 A 7nm 4-Core AI chip with 25.6 TFLOPS hybrid FP8 training, 102.4 TOPS INT4 inference and workload-Aware throttling. In 2021 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"T. Ajayi S. Kamineni Y. Cherivirala M. Fayazi K. Kwon M. Saligane S. Gupta C. Chen D. Sylvester D. Dreslinski B. Calhoun and D. Wentzloff. 2020. An Open-source Framework for Autonomous SoC Design with Analog Block Generation. In 020 IFIP\/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SoC).  T. Ajayi S. Kamineni Y. Cherivirala M. Fayazi K. Kwon M. Saligane S. Gupta C. Chen D. Sylvester D. Dreslinski B. Calhoun and D. Wentzloff. 2020. An Open-source Framework for Autonomous SoC Design with Analog Block Generation. In 020 IFIP\/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SoC).","DOI":"10.1109\/VLSI-SOC46417.2020.9344104"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00061"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001138"},{"key":"e_1_3_2_1_7_1","unstructured":"L.\u00a0J. Ba 2016. Layer Normalization. ArXiv abs\/1607.06450(2016).  L.\u00a0J. Ba 2016. Layer Normalization. ArXiv abs\/1607.06450(2016)."},{"key":"e_1_3_2_1_8_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural Machine Translation by Jointly Learning to Align and Translate . In 3rd International Conference on Learning Representations, ICLR 2015. http:\/\/arxiv.org\/abs\/1409.0473 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015. http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"e_1_3_2_1_9_1","volume-title":"2020 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).","author":"Bang S.","unstructured":"S. Bang , W. Lim , C. Augustine , A. Malavasi , M. Khellah , J. Tschanz , and V. De . 2020. 25.1 A Fully Synthesizable Distributed and Scalable All-Digital LDO in 10nm CMOS . In 2020 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). S. Bang, W. Lim, C. Augustine, A. Malavasi, M. Khellah, J. Tschanz, and V. De. 2020. 25.1 A Fully Synthesizable Distributed and Scalable All-Digital LDO in 10nm CMOS. In 2020 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)."},{"key":"e_1_3_2_1_10_1","volume-title":"2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).","author":"Chang M.","unstructured":"M. Chang , J. Wu , T. Chien , Y. Liu , T. Yang , W. Shen , Y. King , C. Lin , K. Lin , Y. Chih , S. Natarajan , and J. Chang . 2014. 19.4 embedded 1Mb ReRAM in 28nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme . In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). M. Chang, J. Wu, T. Chien, Y. Liu, T. Yang, W. Shen, Y. King, C. Lin, K. Lin, Y. Chih, S. Natarajan, and J. Chang. 2014. 19.4 embedded 1Mb ReRAM in 28nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"key":"e_1_3_2_1_12_1","volume-title":"Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 367\u2013379","author":"Chen Y.","unstructured":"Y. Chen , J. Emer , and V. Sze . 2016 . Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 367\u2013379 . Y. Chen, J. Emer, and V. Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 367\u2013379."},{"key":"e_1_3_2_1_13_1","volume-title":"2019 56th ACM\/IEEE Design Automation Conference (DAC). 1\u20136.","author":"Choi J.","unstructured":"J. Choi , Z. Hakimi , P.\u00a0 W. Shin , J. Sampson , and V. Narayanan . 2019. Context-Aware Convolutional Neural Network over Distributed System in Collaborative Computing . In 2019 56th ACM\/IEEE Design Automation Conference (DAC). 1\u20136. J. Choi, Z. Hakimi, P.\u00a0W. Shin, J. Sampson, and V. Narayanan. 2019. Context-Aware Convolutional Neural Network over Distributed System in Collaborative Computing. In 2019 56th ACM\/IEEE Design Automation Conference (DAC). 1\u20136."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2012.2220459"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463209.2488867"},{"key":"e_1_3_2_1_16_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs\/1810.04805(2018). arxiv:1810.04805http:\/\/arxiv.org\/abs\/1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs\/1810.04805(2018). arxiv:1810.04805http:\/\/arxiv.org\/abs\/1810.04805 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs\/1810.04805(2018). arxiv:1810.04805http:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2019.2944782"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2012.2185930"},{"key":"e_1_3_2_1_19_1","unstructured":"Robert Eisele. 2016. The log-sum-exp trick in Machine Learning. https:\/\/www.xarg.org\/2016\/06\/the-log-sum-exp-trick-in-machine-learning\/  Robert Eisele. 2016. The log-sum-exp trick in Machine Learning. https:\/\/www.xarg.org\/2016\/06\/the-log-sum-exp-trick-in-machine-learning\/"},{"key":"e_1_3_2_1_20_1","first-page":"4054","article-title":"TinyLSTMs","volume":"2020","author":"Fedorov Igor","year":"2020","unstructured":"Igor Fedorov , Marko Stamenovic , Carl Jensen , Li-Chia Yang , Ari Mandell , Yiming Gan , Matthew Mattina , and Paul\u00a0 N. Whatmough . 2020 . TinyLSTMs : Efficient Neural Speech Enhancement for Hearing Aids. In Proc. Interspeech 2020. 4054 \u2013 4058 . https:\/\/doi.org\/10.21437\/Interspeech.2020-1864 10.21437\/Interspeech.2020-1864 Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, and Paul\u00a0N. Whatmough. 2020. TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids. In Proc. Interspeech 2020. 4054\u20134058. https:\/\/doi.org\/10.21437\/Interspeech.2020-1864","journal-title":"Efficient Neural Speech Enhancement for Hearing Aids. In Proc. Interspeech"},{"key":"e_1_3_2_1_21_1","volume-title":"Y. Yang, D. Chen, M. Winslett, Hassan Sajjad, and Preslav Nakov.","author":"Ganesh Prakhar","year":"2020","unstructured":"Prakhar Ganesh , Yao Chen , Xin Lou , Mohammad Haris\u00a0Ali Khan , Y. Yang, D. Chen, M. Winslett, Hassan Sajjad, and Preslav Nakov. 2020 . Compressing Large-Scale Transformer-Based Models: A Case Study on BERT. ArXiv abs\/2002.11985(2020). Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Haris\u00a0Ali Khan, Y. Yang, D. Chen, M. Winslett, Hassan Sajjad, and Preslav Nakov. 2020. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT. ArXiv abs\/2002.11985(2020)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304014"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00035"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_2_1_25_1","volume-title":"Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. CoRR abs\/1510.00149(2015).","author":"Han Song","year":"2015","unstructured":"Song Han , Huizi Mao , and William\u00a0 J. Dally . 2015 . Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. CoRR abs\/1510.00149(2015). Song Han, Huizi Mao, and William\u00a0J. Dally. 2015. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. CoRR abs\/1510.00149(2015)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00062"},{"key":"e_1_3_2_1_28_1","volume-title":"2021 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).","author":"Huang B.","unstructured":"B. Huang , E. Fang , S. Hsueh , R. Huang , A. Lin , C. Chiang , Y. Lin , W. Hsieh , B. Chen , Y. Zhuang , C. Wu , J. Chen , Y. Chen , C. Wan , E. Wang , A. Chiou , P. Kao , Y. Tsai , H. Chen , and S. Hwang . 2021. 35.1 An octa-core 2.8\/2GHz dual-gear sensor-assisted high-speed and power-efficient CPU in 7nm FinFET 5G smartphone SoC . In 2021 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). B. Huang, E. Fang, S. Hsueh, R. Huang, A. Lin, C. Chiang, Y. Lin, W. Hsieh, B. Chen, Y. Zhuang, C. Wu, J. Chen, Y. Chen, C. Wan, E. Wang, A. Chiou, P. Kao, Y. Tsai, H. Chen, and S. Hwang. 2021. 35.1 An octa-core 2.8\/2GHz dual-gear sensor-assisted high-speed and power-efficient CPU in 7nm FinFET 5G smartphone SoC. In 2021 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Forrest\u00a0N. Iandola Albert\u00a0Eaton Shaw R. Krishna and K. Keutzer. 2020. SqueezeBERT: What can computer vision teach NLP about efficient neural networks?ArXiv abs\/2006.11316(2020).  Forrest\u00a0N. Iandola Albert\u00a0Eaton Shaw R. Krishna and K. Keutzer. 2020. SqueezeBERT: What can computer vision teach NLP about efficient neural networks?ArXiv abs\/2006.11316(2020).","DOI":"10.18653\/v1\/2020.sustainlp-1.17"},{"key":"e_1_3_2_1_30_1","volume-title":"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs\/1502.03167(2015). arxiv:1502.03167http:\/\/arxiv.org\/abs\/1502.03167","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015 . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs\/1502.03167(2015). arxiv:1502.03167http:\/\/arxiv.org\/abs\/1502.03167 Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs\/1502.03167(2015). arxiv:1502.03167http:\/\/arxiv.org\/abs\/1502.03167"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00070"},{"key":"e_1_3_2_1_32_1","unstructured":"Jeff Johnson. 2018. Rethinking floating point for deep learning. CoRR abs\/1811.01721(2018). arxiv:1811.01721http:\/\/arxiv.org\/abs\/1811.01721  Jeff Johnson. 2018. Rethinking floating point for deep learning. CoRR abs\/1811.01721(2018). arxiv:1811.01721http:\/\/arxiv.org\/abs\/1811.01721"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3195970.3199846"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSICircuits18222.2020.9162784"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173176"},{"key":"e_1_3_2_1_37_1","volume-title":"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ArXiv abs\/1909.11942(2020).","author":"Lan Zhenzhong","year":"2020","unstructured":"Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , and Radu Soricut . 2020 . ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ArXiv abs\/1909.11942(2020). Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ArXiv abs\/1909.11942(2020)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2019.2913098"},{"key":"e_1_3_2_1_39_1","volume-title":"On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators. In 2019 56th ACM\/IEEE Design Automation Conference (DAC). 1\u20136.","author":"Li Haitong","year":"2019","unstructured":"Haitong Li , Mudit Bhargav , Paul\u00a0 N. Whatmough , and H.-S. Philip\u00a0Wong . 2019 . On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators. In 2019 56th ACM\/IEEE Design Automation Conference (DAC). 1\u20136. Haitong Li, Mudit Bhargav, Paul\u00a0N. Whatmough, and H.-S. Philip\u00a0Wong. 2019. On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators. In 2019 56th ACM\/IEEE Design Automation Conference (DAC). 1\u20136."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.2973991"},{"key":"e_1_3_2_1_41_1","first-page":"4","article-title":"PuDianNao","volume":"50","author":"Liu Daofu","year":"2015","unstructured":"Daofu Liu , Tianshi Chen , Shaoli Liu , Jinhong Zhou , Shengyuan Zhou , Olivier Teman , Xiaobing Feng , Xuehai Zhou , and Yunji Chen . 2015 . PuDianNao : A Polyvalent Machine Learning Accelerator. SIGPLAN Not. 50 , 4 (March 2015), 369\u2013381. https:\/\/doi.org\/10.1145\/2775054.2694358 10.1145\/2775054.2694358 Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. SIGPLAN Not. 50, 4 (March 2015), 369\u2013381. https:\/\/doi.org\/10.1145\/2775054.2694358","journal-title":"A Polyvalent Machine Learning Accelerator. SIGPLAN Not."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001179"},{"key":"e_1_3_2_1_43_1","volume-title":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.","author":"Liu T.","unstructured":"T. Liu , T.\u00a0 H. Yan , R. Scheuerlein , Y. Chen , J.\u00a0 K. Lee , G. Balakrishnan , G. Yee , H. Zhang , A. Yap , J. Ouyang , T. Sasaki , S. Addepalli , A. Al-Shamma , C. Chen , M. Gupta , G. Hilton , S. Joshi , A. Kathuria , V. Lai , D. Masiwal , M. Matsumoto , A. Nigam , A. Pai , J. Pakhale , C.\u00a0 H. Siau , X. Wu , R. Yin , L. Peng , J.\u00a0 Y. Kang , S. Huynh , H. Wang , N. Nagel , Y. Tanaka , M. Higashitani , T. Minvielle , C. Gorla , T. Tsukamoto , T. Yamaguchi , M. Okajima , T. Okamura , S. Takase , T. Hara , H. Inoue , L. Fasoli , M. Mofidi , R. Shrivastava , and K. Quader . 2013. A 130.7mm2 2-layer 32Gb ReRAM memory device in 24nm technology . In 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers. T. Liu, T.\u00a0H. Yan, R. Scheuerlein, Y. Chen, J.\u00a0K. Lee, G. Balakrishnan, G. Yee, H. Zhang, A. Yap, J. Ouyang, T. Sasaki, S. Addepalli, A. Al-Shamma, C. Chen, M. Gupta, G. Hilton, S. Joshi, A. Kathuria, V. Lai, D. Masiwal, M. Matsumoto, A. Nigam, A. Pai, J. Pakhale, C.\u00a0H. Siau, X. Wu, R. Yin, L. Peng, J.\u00a0Y. Kang, S. Huynh, H. Wang, N. Nagel, Y. Tanaka, M. Higashitani, T. Minvielle, C. Gorla, T. Tsukamoto, T. Yamaguchi, M. Okajima, T. Okamura, S. Takase, T. Hara, H. Inoue, L. Fasoli, M. Mofidi, R. Shrivastava, and K. Quader. 2013. A 130.7mm2 2-layer 32Gb ReRAM memory device in 24nm technology. In 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers."},{"key":"e_1_3_2_1_44_1","unstructured":"Y. Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis L. Zettlemoyer and V. Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs\/1907.11692(2019).  Y. Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis L. Zettlemoyer and V. Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs\/1907.11692(2019)."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.2979965"},{"key":"e_1_3_2_1_46_1","unstructured":"Siyuan Lu Meiqi Wang S. Liang J. Lin and Z. Wang. 2020. Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer. ArXiv abs\/2009.08605(2020).  Siyuan Lu Meiqi Wang S. Liang J. Lin and Z. Wang. 2020. Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer. ArXiv abs\/2009.08605(2020)."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446050"},{"key":"e_1_3_2_1_48_1","unstructured":"James McCaffrey. 2016. The Max Trick when Computing Softmax. https:\/\/jamesmccaffrey.wordpress.com\/2016\/03\/04\/the-max-trick-when-computing-softmax\/  James McCaffrey. 2016. The Max Trick when Computing Softmax. https:\/\/jamesmccaffrey.wordpress.com\/2016\/03\/04\/the-max-trick-when-computing-softmax\/"},{"key":"e_1_3_2_1_49_1","unstructured":"J.\u00a0Scott McCarley. 2019. Pruning a BERT-based Question Answering Model. ArXiv abs\/1910.06360(2019).  J.\u00a0Scott McCarley. 2019. Pruning a BERT-based Question Answering Model. ArXiv abs\/1910.06360(2019)."},{"key":"e_1_3_2_1_50_1","volume-title":"2018 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).","author":"Meinerzhagen P.","unstructured":"P. Meinerzhagen , C. Tokunaga , A. Malavasi , V. Vaidya , A. Mendon , D. Mathaikutty , J. Kulkarni , C. Augustine , M. Cho , S. Kim , G. Matthew , R. Jain , J. Ryan , C. Peng , S. Paul , S. Vangal , B. Esparza , L. Cuellar , M. Woodman , B. Iyer , S. Maiyuran , G. Chinya , C. Zou , Y. Liao , K. Ravichandran , H. Wang , M. Khellah , J. Tschanz , and V. De . 2018. 2.3 An energy-efficient graphics processor featuring fine-grain DVFS with integrated voltage regulators, execution-unit turbo, and retentive sleep in 14nm tri-gate CMOS . In 2018 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). P. Meinerzhagen, C. Tokunaga, A. Malavasi, V. Vaidya, A. Mendon, D. Mathaikutty, J. Kulkarni, C. Augustine, M. Cho, S. Kim, G. Matthew, R. Jain, J. Ryan, C. Peng, S. Paul, S. Vangal, B. Esparza, L. Cuellar, M. Woodman, B. Iyer, S. Maiyuran, G. Chinya, C. Zou, Y. Liao, K. Ravichandran, H. Wang, M. Khellah, J. Tschanz, and V. De. 2018. 2.3 An energy-efficient graphics processor featuring fine-grain DVFS with integrated voltage regulators, execution-unit turbo, and retentive sleep in 14nm tri-gate CMOS. In 2018 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)."},{"key":"e_1_3_2_1_51_1","unstructured":"Paul Michel Omer Levy and Graham Neubig. 2019. Are Sixteen Heads Really Better than One?ArXiv abs\/1905.10650(2019).  Paul Michel Omer Levy and Graham Neubig. 2019. Are Sixteen Heads Really Better than One?ArXiv abs\/1905.10650(2019)."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080254"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00063"},{"key":"e_1_3_2_1_55_1","volume-title":"Proceedings of Machine Learning and Systems, I.\u00a0Dhillon, D.\u00a0Papailiopoulos, and V.\u00a0Sze (Eds.). Vol.\u00a02. 363\u2013378","author":"Park Junki","year":"2020","unstructured":"Junki Park , Hyunsung Yoon , Daehyun Ahn , Jungwook Choi , and Jae-Joon Kim . 2020 . OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator . In Proceedings of Machine Learning and Systems, I.\u00a0Dhillon, D.\u00a0Papailiopoulos, and V.\u00a0Sze (Eds.). Vol.\u00a02. 363\u2013378 . Junki Park, Hyunsung Yoon, Daehyun Ahn, Jungwook Choi, and Jae-Joon Kim. 2020. OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator. In Proceedings of Machine Learning and Systems, I.\u00a0Dhillon, D.\u00a0Papailiopoulos, and V.\u00a0Sze (Eds.). Vol.\u00a02. 363\u2013378."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358258"},{"key":"e_1_3_2_1_57_1","unstructured":"PubNub. 2015. How Fast is Real-time? Human Perception and Technology. https:\/\/www.pubnub.com\/blog\/how-fast-is-realtime-human-perception-and-technology\/  PubNub. 2015. How Fast is Real-time? Human Perception and Technology. https:\/\/www.pubnub.com\/blog\/how-fast-is-realtime-human-perception-and-technology\/"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"crossref","unstructured":"Pranav Rajpurkar Jian Zhang Konstantin Lopyrev and Percy Liang. 2016. SQuAD: 100 000+ Questions for Machine Comprehension of Text. ArXiv abs\/1606.05250(2016).  Pranav Rajpurkar Jian Zhang Konstantin Lopyrev and Percy Liang. 2016. SQuAD: 100 000+ Questions for Machine Comprehension of Text. ArXiv abs\/1606.05250(2016).","DOI":"10.18653\/v1\/D16-1264"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/DAC.2018.8465834"},{"key":"e_1_3_2_1_60_1","volume-title":"Highly-Accurate Deep Neural Network Accelerators. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 267\u2013278","author":"Reagen B.","year":"2016","unstructured":"B. Reagen , P. Whatmough , R. Adolf , S. Rama , H. Lee , S.\u00a0 K. Lee , J.\u00a0 M. Hern\u00e1ndez-Lobato , G. Wei , and D. Brooks . 2016. Minerva: Enabling Low-Power , Highly-Accurate Deep Neural Network Accelerators. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 267\u2013278 . https:\/\/doi.org\/10.1109\/ISCA. 2016 .32 10.1109\/ISCA.2016.32 B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S.\u00a0K. Lee, J.\u00a0M. Hern\u00e1ndez-Lobato, G. Wei, and D. Brooks. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 267\u2013278. https:\/\/doi.org\/10.1109\/ISCA.2016.32"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00016"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS48437.2020.00016"},{"key":"e_1_3_2_1_63_1","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: smaller faster cheaper and lighter. ArXiv abs\/1910.01108(2019).  Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: smaller faster cheaper and lighter. ArXiv abs\/1910.01108(2019)."},{"key":"e_1_3_2_1_64_1","volume-title":"34th Conference on Neural Information Processing Systems (NeurIPS). http:\/\/arxiv.org\/abs\/2005","author":"Sanh Victor","year":"2020","unstructured":"Victor Sanh , Thomas Wolf , and Alexander\u00a0 M. Rush . 2020 . Movement Pruning: Adaptive Sparsity by Fine-Tuning . In 34th Conference on Neural Information Processing Systems (NeurIPS). http:\/\/arxiv.org\/abs\/2005 .07683 Victor Sanh, Thomas Wolf, and Alexander\u00a0M. Rush. 2020. Movement Pruning: Adaptive Sparsity by Fine-Tuning. In 34th Conference on Neural Information Processing Systems (NeurIPS). http:\/\/arxiv.org\/abs\/2005.07683"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"crossref","unstructured":"Roy Schwartz Gabi Stanovsky Swabha Swayamdipta Jesse Dodge and Noah\u00a0A. Smith. 2020. The Right Tool for the Job: Matching Model and Instance Complexities. In ACL.  Roy Schwartz Gabi Stanovsky Swabha Swayamdipta Jesse Dodge and Noah\u00a0A. Smith. 2020. The Right Tool for the Job: Matching Model and Instance Complexities. In ACL.","DOI":"10.18653\/v1\/2020.acl-main.593"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358302"},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783720"},{"key":"e_1_3_2_1_68_1","doi-asserted-by":"crossref","unstructured":"Sheng Shen Zhen Dong J. Ye L. Ma Zhewei Yao A. Gholami M. Mahoney and K. Keutzer. 2020. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. In AAAI.  Sheng Shen Zhen Dong J. Ye L. Ma Zhewei Yao A. Gholami M. Mahoney and K. Keutzer. 2020. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. In AAAI.","DOI":"10.1609\/aaai.v34i05.6409"},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"crossref","unstructured":"Sainbayar Sukhbaatar E. Grave P. Bojanowski and Armand Joulin. 2019. Adaptive Attention Span in Transformers. In ACL.  Sainbayar Sukhbaatar E. Grave P. Bojanowski and Armand Joulin. 2019. Adaptive Attention Span in Transformers. In ACL.","DOI":"10.18653\/v1\/P19-1032"},{"key":"e_1_3_2_1_70_1","unstructured":"Zhiqing Sun H. Yu Xiaodan Song Renjie Liu Yiming Yang and Denny Zhou. 2020. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. In ACL.  Zhiqing Sun H. Yu Xiaodan Song Renjie Liu Yiming Yang and Denny Zhou. 2020. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. In ACL."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42613.2021.9366062"},{"key":"e_1_3_2_1_72_1","unstructured":"Thierry Tambe En-Yu Yang Zishen Wan Y. Deng V. Reddi Alexander\u00a0M. Rush D. Brooks and Gu-Yeon Wei. 2019. AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference. ArXiv abs\/1909.13271(2019).  Thierry Tambe En-Yu Yang Zishen Wan Y. Deng V. Reddi Alexander\u00a0M. Rush D. Brooks and Gu-Yeon Wei. 2019. AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference. ArXiv abs\/1909.13271(2019)."},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7900006"},{"key":"e_1_3_2_1_74_1","volume-title":"2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).","author":"Toprak-Deniz Z.","unstructured":"Z. Toprak-Deniz , M. Sperling , J. Bulzacchelli , G. Still , R. Kruse , S. Kim , D. Boerstler , T. Gloekler , R. Robertazzi , K. Stawiasz , T. Diemoz , G. English , D. Hui , P. Muench , and J. Friedrich . 2014. 5.2 Distributed system of digitally controlled microregulators enabling per-core dvfs for the Power8 tm microprocessor . In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). Z. Toprak-Deniz, M. Sperling, J. Bulzacchelli, G. Still, R. Kruse, S. Kim, D. Boerstler, T. Gloekler, R. Robertazzi, K. Stawiasz, T. Diemoz, G. English, D. Hui, P. Muench, and J. Friedrich. 2014. 5.2 Distributed system of digitally controlled microregulators enabling per-core dvfs for the Power8 tm microprocessor. In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)."},{"key":"e_1_3_2_1_75_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan\u00a0N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs\/1706.03762(2017). arxiv:1706.03762http:\/\/arxiv.org\/abs\/1706.03762  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan\u00a0N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs\/1706.03762(2017). arxiv:1706.03762http:\/\/arxiv.org\/abs\/1706.03762"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"crossref","unstructured":"Swagath Venkataramani Ashish Ranjan Subarno Banerjee Dipankar Das Sasikanth Avancha Ashok Jagannathan Ajaya Durg Dheemanth Nagaraj Bharat Kaul Pradeep Dubey and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. SIGARCH Comput. Archit. News(2017).  Swagath Venkataramani Ashish Ranjan Subarno Banerjee Dipankar Das Sasikanth Avancha Ashok Jagannathan Ajaya Durg Dheemanth Nagaraj Bharat Kaul Pradeep Dubey and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. SIGARCH Comput. Archit. News(2017).","DOI":"10.1145\/3079856.3080244"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"crossref","unstructured":"B. Venkatesan 2019. MAGNet : A Modular Accelerator Generator for Neural Networks. In ICCAD.  B. Venkatesan 2019. MAGNet : A Modular Accelerator Generator for Neural Networks. In ICCAD.","DOI":"10.1109\/ICCAD45719.2019.8942127"},{"key":"e_1_3_2_1_78_1","volume-title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. CoRR abs\/1804.07461(2018). arxiv:1804.07461http:\/\/arxiv.org\/abs\/1804.07461","author":"Wang Alex","year":"2018","unstructured":"Alex Wang , Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , and Samuel\u00a0 R. Bowman . 2018 . GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. CoRR abs\/1804.07461(2018). arxiv:1804.07461http:\/\/arxiv.org\/abs\/1804.07461 Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel\u00a0R. Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. CoRR abs\/1804.07461(2018). arxiv:1804.07461http:\/\/arxiv.org\/abs\/1804.07461"},{"key":"e_1_3_2_1_79_1","volume-title":"SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. 2021 IEEE International Symposium on High Performance Computer Architecture (HPCA)","author":"Wang Hanrui","year":"2021","unstructured":"Hanrui Wang , Zhekai Zhang , and Song Han . 2021 . SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. 2021 IEEE International Symposium on High Performance Computer Architecture (HPCA) (2021). Hanrui Wang, Zhekai Zhang, and Song Han. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. 2021 IEEE International Symposium on High Performance Computer Architecture (HPCA) (2021)."},{"key":"e_1_3_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00032"},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2995809"},{"key":"e_1_3_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2018.2841824"},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.23919\/VLSIC.2019.8778002"},{"key":"e_1_3_2_1_84_1","volume-title":"Proceedings of the 2nd SysML Conference","author":"Whatmough N.","year":"2019","unstructured":"Paul\u00a0 N. Whatmough , Chuteng Zhou , Patrick Hansen , Shreyas\u00a0Kolala Venkataramanaiah , Jae sun Seo , and Matthew Mattina . 2019 . FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning . In Proceedings of the 2nd SysML Conference , Palo Alto, CA, USA. Paul\u00a0N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas\u00a0Kolala Venkataramanaiah, Jae sun Seo, and Matthew Mattina. 2019. FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning. In Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA."},{"key":"e_1_3_2_1_85_1","unstructured":"Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u2019emi Louf Morgan Funtowicz and Jamie Brew. 2019. HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. ArXiv abs\/1910.03771(2019).  Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u2019emi Louf Morgan Funtowicz and Jamie Brew. 2019. HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. ArXiv abs\/1910.03771(2019)."},{"key":"e_1_3_2_1_86_1","article-title":"SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads","volume":"17","author":"Yao Yuan","year":"2020","unstructured":"Sam\u00a0(Likun) Xi, Yuan Yao , Kshitij Bhardwaj , Paul Whatmough , Gu-Yeon Wei , and David Brooks . 2020 . SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads . ACM Trans. Archit. Code Optim. 17 , 4, Article 39 (Nov. 2020), 26\u00a0pages. https:\/\/doi.org\/10.1145\/3424669 10.1145\/3424669 Sam\u00a0(Likun) Xi, Yuan Yao, Kshitij Bhardwaj, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2020. SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads. ACM Trans. Archit. Code Optim. 17, 4, Article 39 (Nov. 2020), 26\u00a0pages. https:\/\/doi.org\/10.1145\/3424669","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_2_1_87_1","doi-asserted-by":"crossref","unstructured":"J. Xin Raphael Tang J. Lee Y. Yu and Jimmy Lin. 2020. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. ArXiv abs\/2004.12993(2020).  J. Xin Raphael Tang J. Lee Y. Yu and Jimmy Lin. 2020. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. ArXiv abs\/2004.12993(2020).","DOI":"10.18653\/v1\/2020.acl-main.204"},{"key":"e_1_3_2_1_88_1","volume-title":"GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference. In 53rd IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Zadeh Ali\u00a0Hadi","unstructured":"Ali\u00a0Hadi Zadeh and A. Moshovos . 2020 . GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference. In 53rd IEEE\/ACM International Symposium on Microarchitecture (MICRO). Ali\u00a0Hadi Zadeh and A. Moshovos. 2020. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference. In 53rd IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/EMC2-NIPS53020.2019.00016"},{"key":"e_1_3_2_1_90_1","unstructured":"Wangchunshu Zhou Canwen Xu Tao Ge Julian McAuley Ke Xu and Furu Wei. 2020. BERT Loses Patience: Fast and Robust Inference with Early Exit. ArXiv abs\/2006.04152(2020).  Wangchunshu Zhou Canwen Xu Tao Ge Julian McAuley Ke Xu and Furu Wei. 2020. BERT Loses Patience: Fast and Robust Inference with Early Exit. ArXiv abs\/2006.04152(2020)."}],"event":{"name":"MICRO '21: 54th Annual IEEE\/ACM International Symposium on Microarchitecture","location":"Virtual Event Greece","acronym":"MICRO '21","sponsor":["SIGMICRO ACM Special Interest Group on Microarchitectural Research and Processing"]},"container-title":["MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3466752.3480095","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3466752.3480095","content-type":"text\/html","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3466752.3480095","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3466752.3480095","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:56Z","timestamp":1750191536000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3466752.3480095"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":88,"alternative-id":["10.1145\/3466752.3480095","10.1145\/3466752"],"URL":"https:\/\/doi.org\/10.1145\/3466752.3480095","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}