{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T16:46:17Z","timestamp":1761324377054,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,6,22]],"date-time":"2021-06-22T00:00:00Z","timestamp":1624320000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,22]]},"DOI":"10.1145\/3453688.3461740","type":"proceedings-article","created":{"date-parts":[[2021,6,18]],"date-time":"2021-06-18T23:13:45Z","timestamp":1624058025000},"page":"169-174","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["HMC-T\n            <scp>RAN<\/scp>"],"prefix":"10.1145","author":[{"given":"Shaoyi","family":"Huang","sequence":"first","affiliation":[{"name":"University of Connecticut, Storrs, CT, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shiyang","family":"Chen","sequence":"additional","affiliation":[{"name":"Stevens Institute of Technology, Hoboken, NJ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongwu","family":"Peng","sequence":"additional","affiliation":[{"name":"University of Connecticut, Storrs, CT, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Manu","sequence":"additional","affiliation":[{"name":"University of New Mexico, Albuquerque, NM, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhenglun","family":"Kong","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Geng","family":"Yuan","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Yang","sequence":"additional","affiliation":[{"name":"University of New Mexico, Albuquerque, NM, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shusen","family":"Wang","sequence":"additional","affiliation":[{"name":"Stevens Institute of Technology, Hoboken, NJ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hang","family":"Liu","sequence":"additional","affiliation":[{"name":"Stevens Institute of Technology, Hoboken, NJ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Caiwen","family":"Ding","sequence":"additional","affiliation":[{"name":"University of Connecticut, Storrs, CT, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,22]]},"reference":[{"key":"e_1_3_2_3_1_1","unstructured":"Hangbo Bao et al. 2020. Unilmv2: Pseudo-masked language models for unified language model pre-training. arXiv preprint arXiv:2002.12804 (2020)."},{"key":"e_1_3_2_3_2_1","volume-title":"Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150","author":"Iz Beltagy","year":"2020","unstructured":"Iz Beltagy et al. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)."},{"key":"e_1_3_2_3_3_1","unstructured":"Emily L Denton et al. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems. 1269--1277."},{"key":"e_1_3_2_3_4_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1).","author":"Jacob Devlin","year":"2019","unstructured":"Jacob Devlin et al. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1)."},{"key":"e_1_3_2_3_5_1","volume-title":"Third International Workshop on Paraphrasing (IWP2005)","author":"Dolan Bill","year":"2005","unstructured":"Bill Dolan and Chris Brockett. 2005. Automatically Constructing a Corpus of Sentential Paraphrases. In Third International Workshop on Paraphrasing (IWP2005) (third international workshop on paraphrasing (iwp2005) ed.). Asia Federation of Natural Language Processing. https:\/\/www.microsoft.com\/en-us\/research\/ publication\/automatically-constructing-a-corpus-of-sentential-paraphrases\/"},{"key":"e_1_3_2_3_6_1","unstructured":"Scott Gray et al. 2017. GPU Kernels for Block-Sparse Weights."},{"volume-title":"2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 204--218","author":"Cong","key":"e_1_3_2_3_7_1","unstructured":"Cong Guo et al. [n.d.]. Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity. In 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 204--218."},{"key":"e_1_3_2_3_8_1","unstructured":"Song Han et al. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135--1143."},{"key":"e_1_3_2_3_9_1","unstructured":"Song Han et al. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (NIPS). 1135--1143."},{"volume-title":"Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254","author":"Song","key":"e_1_3_2_3_10_1","unstructured":"Song Han et al. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254."},{"key":"e_1_3_2_3_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_3_12_1","volume-title":"Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415","author":"Hendrycks Dan","year":"2016","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)."},{"key":"e_1_3_2_3_13_1","volume-title":"NVIDIA TESLA V100 GPU ARCHITECTURE. Retrived from https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf. Accessed","author":"Nvidia Inc. [n.d.].","year":"2021","unstructured":"Nvidia Inc. [n.d.]. NVIDIA TESLA V100 GPU ARCHITECTURE. Retrived from https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf. Accessed: 2021, March 6."},{"key":"e_1_3_2_3_14_1","volume-title":"Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451","author":"Nikita Kitaev","year":"2020","unstructured":"Nikita Kitaev et al. 2020. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020)."},{"volume-title":"Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning.","author":"Hector","key":"e_1_3_2_3_15_1","unstructured":"Hector Levesque et al. 2012. The winograd schema challenge. In Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning."},{"volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).","author":"Bingbing","key":"e_1_3_2_3_16_1","unstructured":"Bingbing Li et al. 2020. Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)."},{"key":"e_1_3_2_3_17_1","volume-title":"Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692","author":"Yinhan Liu","year":"2019","unstructured":"Yinhan Liu et al. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)."},{"volume-title":"Pointer Sentinel Mixture Models. In 5th International Conference on Learning RepresentationsICLR.","author":"Stephen","key":"e_1_3_2_3_18_1","unstructured":"Stephen Merity et al. 2017. Pointer Sentinel Mixture Models. In 5th International Conference on Learning RepresentationsICLR."},{"key":"e_1_3_2_3_19_1","unstructured":"Sharan Narang et al. 2017. Block-Sparse Recurrent Neural Networks. arXiv:1711.02782 [cs.LG]"},{"key":"e_1_3_2_3_20_1","unstructured":"NVIDIA. [n.d.]. CUDA C++ Programming Guide. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html#wmma."},{"key":"e_1_3_2_3_21_1","first-page":"31","article-title":"Cublas library. NVIDIA Corporation, Santa Clara","volume":"15","author":"Nvidia CUDA","year":"2008","unstructured":"CUDA Nvidia. 2008. Cublas library. NVIDIA Corporation, Santa Clara, California 15, 27 (2008), 31.","journal-title":"California"},{"key":"e_1_3_2_3_22_1","doi-asserted-by":"crossref","unstructured":"Sai Prasanna et al. 2020. When BERT Plays the Lottery All Tickets Are Winning. arXiv preprint arXiv:2005.00561 (2020).","DOI":"10.18653\/v1\/2020.emnlp-main.259"},{"key":"e_1_3_2_3_23_1","unstructured":"PyTorch. [n.d.]. https:\/\/pytorch.org\/tutorials\/beginner\/transformer_tutorial. html."},{"key":"e_1_3_2_3_24_1","unstructured":"Colin Raffel et al. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)."},{"volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics","author":"Richard","key":"e_1_3_2_3_25_1","unstructured":"Richard Socher et al. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1631--1642. https:\/\/www.aclweb.org\/anthology\/ D13-1170"},{"key":"e_1_3_2_3_26_1","unstructured":"Ashish Vaswani et al. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_3_2_3_27_1","volume-title":"Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461","author":"Alex Wang","year":"2018","unstructured":"Alex Wang et al. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)."},{"key":"e_1_3_2_3_28_1","unstructured":"Wei Wen et al. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. 2074--2082."},{"key":"e_1_3_2_3_29_1","unstructured":"Thomas Wolf et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. ArXiv abs\/1910.03771 (2019)."},{"key":"e_1_3_2_3_30_1","unstructured":"Qizhe Xie et al. 2020. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems 33 (2020)."},{"volume-title":"Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 548--560","author":"Jiecao","key":"e_1_3_2_3_31_1","unstructured":"Jiecao Yu et al. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 548--560."},{"volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5059--5069","author":"Xingxing","key":"e_1_3_2_3_32_1","unstructured":"Xingxing Zhang et al. 2019. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5059--5069."},{"key":"e_1_3_2_3_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472630"},{"key":"e_1_3_2_3_34_1","volume-title":"An Efficient Deep Reinforcement Learning Framework for UAVs. In 2020 21st International Symposium on Quality Electronic Design (ISQED)","author":"Zhou Shanglin","year":"2020","unstructured":"Shanglin Zhou, Bingbing Li, Caiwu Ding, Lu Lu, and Caiwen Ding. 2020. An Efficient Deep Reinforcement Learning Framework for UAVs. In 2020 21st International Symposium on Quality Electronic Design (ISQED)"}],"event":{"name":"GLSVLSI '21: Great Lakes Symposium on VLSI 2021","sponsor":["SIGDA ACM Special Interest Group on Design Automation"],"location":"Virtual Event USA","acronym":"GLSVLSI '21"},"container-title":["Proceedings of the 2021 Great Lakes Symposium on VLSI"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3453688.3461740","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3453688.3461740","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:47Z","timestamp":1750195727000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3453688.3461740"}},"subtitle":["A Tensor-core Inspired Hierarchical Model Compression for Transformer-based DNNs on GPU"],"short-title":[],"issued":{"date-parts":[[2021,6,22]]},"references-count":34,"alternative-id":["10.1145\/3453688.3461740","10.1145\/3453688"],"URL":"https:\/\/doi.org\/10.1145\/3453688.3461740","relation":{},"subject":[],"published":{"date-parts":[[2021,6,22]]},"assertion":[{"value":"2021-06-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}