{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T16:04:56Z","timestamp":1780589096260,"version":"3.54.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"5","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62404256"],"award-info":[{"award-number":["62404256"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Jiangsu Provincial Science and Technology Major Special Project","award":["BG2024032"],"award-info":[{"award-number":["BG2024032"]}]},{"name":"Key Project of Shenzhen Basic Research Program","award":["JCYJ20241206180301003"],"award-info":[{"award-number":["JCYJ20241206180301003"]}]},{"name":"High-performance Computing Public Platform"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>The remarkable progress of Vision Transformer (ViT) models has significantly advanced performance in computer vision tasks. However, the deployment of ViTs in resource-constrained environments remains a challenge, as the attention computation mechanisms within these models form a significant bottleneck, requiring substantial memory and computational resources. To address this challenge, we introduce TAFP-ViT, a tailored hardware-software co-design framework for Vision Transformers. On the software level, TAFP-ViT leverages a learnable compressor to perform multi-head shared compression on feature maps, and fuses decompression reconstruction, QKV generation and QKV processing together for calculation, thereby greatly reducing memory and computation requirements. Furthermore, TAFP-ViT combines dynamic inter-layer token pruning to eliminate unimportant tokens and hardware-friendly intra-block row pruning to diminish redundant computations. The proposed software design converts the calculations before and after SoftMax into dense and sparse triple matrix multiplication (TMM) forms respectively. On the hardware level, TAFP-ViT proposes a configurable systolic array (SA) to efficiently adapt to the QKV fusion computation pattern. The SA has flexible PE units that can effectively support general matrix multiplication (GEMM), dense and sparse TMM. The TMM and flexible dataflows allow TAFP-ViT to avoid handling transpositions and storing intermediate computation results, greatly enhancing computational efficiency. Besides, TAFP-ViT innovatively designs a Top-k engine to support dynamic pruning on the fly with high throughput and low resource consumption. Experiments show that the proposed TAFP-ViT achieves remarkable speedups of 123.91\u00d7, 29.5\u00d7, and 3.01\u223c 20.65\u00d7 compared to conventional CPUs, GPUs, and previous state-of-the-art works, respectively. Additionally, TAFP-ViT reaches a throughput of up to 731.5 GOP\/s and an impressive energy efficiency of 77.9 GOPS\/W.<\/jats:p>","DOI":"10.1145\/3745028","type":"journal-article","created":{"date-parts":[[2025,6,21]],"date-time":"2025-06-21T05:37:26Z","timestamp":1750484246000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["TAFP-ViT: A Transformer Accelerator via QKV Computational Fusion and Adaptive Pruning for Vision Transformer"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-3644-6886","authenticated-orcid":false,"given":"Liang","family":"Xu","sequence":"first","affiliation":[{"name":"Sun Yat-Sen University","place":["Shenzhen, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-3411-7800","authenticated-orcid":false,"given":"Hongrui","family":"Song","sequence":"additional","affiliation":[{"name":"Nanjing University","place":["Nanjing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1321-334X","authenticated-orcid":false,"given":"Lan","family":"Tian","sequence":"additional","affiliation":[{"name":"Shandong University","place":["Jinan, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7227-4786","authenticated-orcid":false,"given":"Zhongfeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Sun Yat-Sen University","place":["Shenzhen, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9553-3640","authenticated-orcid":false,"given":"Meiqi","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Integrated Circuits, Sun Yat-Sen University","place":["Shenzhen, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,9,13]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"17302","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Cai Han","year":"2023","unstructured":"Han Cai, Junyan Li, Muyan Hu, Chuang Gan, and Song Han. 2023. EfficientViT: Lightweight multi-scale attention for high-resolution dense prediction. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 17302\u201317313."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00041"},{"key":"e_1_3_1_4_2","unstructured":"Tianlong Chen Yu Cheng Zhe Gan Lu Yuan Lei Zhang and Zhangyang Wang. 2021. Chasing sparsity in vision transformers: An end-to-end exploration. In Advances in Neural Information Processing Systems. Curran Associates Inc. 19974\u201319988."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1587\/transfun.E100.A.1074"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","unstructured":"Weijie Chen Weijun Li and Feng Yu. 2020. A hybrid pipelined architecture for high performance top-K sorting on FPGA. IEEE Transactions on Circuits and Systems II: Express Briefs 67 8 (August 2020) 1449\u20131453. DOI:10.1109\/TCSII.2019.2938892","DOI":"10.1109\/TCSII.2019.2938892"},{"key":"e_1_3_1_7_2","unstructured":"Rewon Child Scott Gray Alec Radford and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv:1904.10509. Retrieved from https:\/\/arxiv.org\/abs\/1904.10509"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1361"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","unstructured":"Peiyan Dong Mengshu Sun Alec Lu Yanyue Xie Kenneth Liu Zhenglun Kong Xin Meng Zhengang Li Xue Lin Zhenman Fang and Yanzhi Wang. 2023. HeatViT: Hardware-Efficient adaptive token pruning for vision transformers. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 442\u2013455. DOI:10.1109\/HPCA56546.2023.10071047","DOI":"10.1109\/HPCA56546.2023.10071047"},{"key":"e_1_3_1_10_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2021. An image is worth \\(16\\times 16\\) words: Transformers for image recognition at scale. In International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=YicbFdNTTy"},{"key":"e_1_3_1_11_2","first-page":"328","volume-title":"Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","author":"Ham Tae Jun","year":"2020","unstructured":"Tae Jun Ham, Sung Jun Jung, Seonghak Kim, Young H. Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae W Lee, et\u00a0al. 2020. A\u23033: Accelerating attention mechanisms in neural networks with approximation. In Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 328\u2013341."},{"key":"e_1_3_1_12_2","first-page":"692","volume-title":"Proceedings of the 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)","author":"Ham Tae Jun","year":"2021","unstructured":"Tae Jun Ham, Yejin Lee, Seong Hoon Seo, Soosung Kim, Hyunji Choi, Sung Jun Jung, and Jae W. Lee. 2021. ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks. In Proceedings of the 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 692\u2013705."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00548"},{"key":"e_1_3_1_14_2","unstructured":"Lang Huang Yuhui Yuan Jianyuan Guo Chao Zhang Xilin Chen and Jingdong Wang. 2019. Interlaced sparse self-attention for semantic segmentation. arXiv:1907.12273. Retrieved from https:\/\/arxiv.org\/abs\/1907.12273"},{"key":"e_1_3_1_15_2","unstructured":"Zi-Hang Jiang Qibin Hou Li Yuan Daquan Zhou Yujun Shi Xiaojie Jin Anran Wang and Jiashi Feng. 2021. All tokens matter: Token labeling for training better vision transformers. In Advances in Neural Information Processing Systems M. Ranzato A. Beygelzimer Y. Dauphin P. S. Liang and J. Wortman Vaughan (Eds.). Curran Associates Inc. 18590\u201318602. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/9a49a25d845a483fae4be7e341368e36-Paper.pdf"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CISES54857.2022.9844325"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE56975.2023.10137099"},{"key":"e_1_3_1_18_2","first-page":"1","volume-title":"Proceedings of the 2024 Design, Automation and Test in Europe Conference and Exhibition (DATE).","author":"Lee Seungju","year":"2024","unstructured":"Seungju Lee, Kyumin Cho, Eunji Kwon, Sejin Park, Seojeong Kim, and Seokhyeong Kang. 2024. ViT- ToGo: Vision transformer accelerator with grouped token pruning. In Proceedings of the 2024 Design, Automation and Test in Europe Conference and Exhibition (DATE).1\u20136. Retrieved from https:\/\/ieeexplore.ieee.org\/document\/10546804\/?arnumber=10546804"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3370748.3406567"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","unstructured":"Yangfan Li Yikun Hu Fan Wu and Kenli Li. 2022. DiVIT: Algorithm and architecture co-design of differential attention in vision transformer. Journal of Systems Architecture 128 (July 2022) 102520. DOI:10.1016\/j.sysarc.2022.102520","DOI":"10.1016\/j.sysarc.2022.102520"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL57034.2022.00027"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480125"},{"key":"e_1_3_1_24_2","unstructured":"Zeyu Lu Zidong Wang Di Huang Chengyue Wu Xihui Liu Wanli Ouyang and Lei Bai. 2024. FiT: Flexible vision Transformer for diffusion model. In Proceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research) Ruslan Salakhutdinov Zico Kolter Katherine Heller Adrian Weller Nuria Oliver Jonathan Scarlett and Felix Berkenkamp (Eds.). PMLR 33160\u201333176. Retrieved from https:\/\/proceedings.mlr.press\/v235\/lu24k.html"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC58850.2023.00039"},{"key":"e_1_3_1_26_2","unstructured":"Umberto Michelucci. 2022. An Introduction to Autoencoders. arXiv:2201.03898. Retrieved from https:\/\/arxiv.org\/abs\/2201.03898"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICAIS56108.2023.10073937"},{"key":"e_1_3_1_28_2","unstructured":"https:\/\/www.nvidia.com\/en-us\/autonomous-machines\/embedded-systems\/jetson-tx2\/ NVIDIA Inc.2019. NVIDIA jetson TX2."},{"key":"e_1_3_1_29_2","unstructured":"OpenAI Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat Red Avila Igor Babuschkin Suchir Balaji Valerie Balcom Paul Baltescu Haiming Bao Mohammad Bavarian Jeff Belgum Irwan Bello Jake Berdine Gabriel Bernadett-Shapiro Christopher Berner Lenny Bogdonoff Oleg Boiko Madelaine Boyd Anna-Luisa Brakman Greg Brockman Tim Brooks Miles Brundage Kevin Button Trevor Cai Rosie Campbell Andrew Cann Brittany Carey Chelsea Carlson Rory Carmichael Brooke Chan Che Chang Fotis Chantzis Derek Chen Sully Chen Ruby Chen Jason Chen Mark Chen Ben Chess Chester Cho Casey Chu Hyung Won Chung Dave Cummings Jeremiah Currier Yunxing Dai Cory Decareaux Thomas Degry Noah Deutsch Damien Deville Arka Dhar David Dohan Steve Dowling Sheila Dunning Adrien Ecoffet Atty Eleti Tyna Eloundou David Farhi Liam Fedus Niko Felix Sim\u00f3n Posada Fishman Juston Forte Isabella Fulford Leo Gao Elie Georges Christian Gibson Vik Goel Tarun Gogineni Gabriel Goh Rapha Gontijo-Lopes Jonathan Gordon Morgan Grafstein Scott Gray Ryan Greene Joshua Gross Shixiang Shane Gu Yufei Guo Chris Hallacy Jesse Han Jeff Harris Yuchen He Mike Heaton Johannes Heidecke Chris Hesse Alan Hickey Wade Hickey Peter Hoeschele Brandon Houghton Kenny Hsu Shengli Hu Xin Hu Joost Huizinga Shantanu Jain Shawn Jain Joanne Jang Angela Jiang Roger Jiang Haozhun Jin Denny Jin Shino Jomoto Billie Jonn Heewoo Jun Tomer Kaftan \u0141ukasz Kaiser Ali Kamali Ingmar Kanitscheider Nitish Shirish Keskar Tabarak Khan Logan Kilpatrick Jong Wook Kim Christina Kim Yongjik Kim Jan Hendrik Kirchner Jamie Kiros Matt Knight Daniel Kokotajlo \u0141ukasz Kondraciuk Andrew Kondrich Aris Konstantinidis Kyle Kosic Gretchen Krueger Vishal Kuo Michael Lampe Ikai Lan Teddy Lee Jan Leike Jade Leung Daniel Levy Chak Ming Li Rachel Lim Molly Lin Stephanie Lin Mateusz Litwin Theresa Lopez Ryan Lowe Patricia Lue Anna Makanju Kim Malfacini Sam Manning Todor Markov Yaniv Markovski Bianca Martin Katie Mayer Andrew Mayne Bob McGrew Scott Mayer McKinney Christine McLeavey Paul McMillan Jake McNeil David Medina Aalok Mehta Jacob Menick Luke Metz Andrey Mishchenko Pamela Mishkin Vinnie Monaco Evan Morikawa Daniel Mossing Tong Mu Mira Murati Oleg Murk David M\u00e9ly Ashvin Nair Reiichiro Nakano Rajeev Nayak Arvind Neelakantan Richard Ngo Hyeonwoo Noh Long Ouyang Cullen O\u2019Keefe Jakub Pachocki Alex Paino Joe Palermo Ashley Pantuliano Giambattista Parascandolo Joel Parish Emy Parparita Alex Passos Mikhail Pavlov Andrew Peng Adam Perelman Filipe de Avila Belbute Peres Michael Petrov Henrique Ponde de Oliveira Pinto Michael Pokorny Michelle Pokrass Vitchyr H. Pong Tolly Powell Alethea Power Boris Power Elizabeth Proehl Raul Puri Alec Radford Jack Rae Aditya Ramesh Cameron Raymond Francis Real Kendra Rimbach Carl Ross Bob Rotsted Henri Roussez Nick Ryder Mario Saltarelli Ted Sanders Shibani Santurkar Girish Sastry Heather Schmidt David Schnurr John Schulman Daniel Selsam Kyla Sheppard Toki Sherbakov Jessica Shieh Sarah Shoker Pranav Shyam Szymon Sidor Eric Sigler Maddie Simens Jordan Sitkin Katarina Slama Ian Sohl Benjamin Sokolowsky Yang Song Natalie Staudacher Felipe Petroski Such Natalie Summers Ilya Sutskever Jie Tang Nikolas Tezak Madeleine B. Thompson Phil Tillet Amin Tootoonchian Elizabeth Tseng Preston Tuggle Nick Turley Jerry Tworek Juan Felipe Cer\u00f3n Uribe Andrea Vallone Arun Vijayvergiya Chelsea Voss Carroll Wainwright Justin Jay Wang Alvin Wang Ben Wang Jonathan Ward Jason Wei C. J. Weinmann Akila Welihinda Peter Welinder Jiayi Weng Lilian Weng Matt Wiethoff Dave Willner Clemens Winter Samuel Wolrich Hannah Wong Lauren Workman Sherwin Wu Jeff Wu Michael Wu Kai Xiao Tao Xu Sarah Yoo Kevin Yu Qiming Yuan Wojciech Zaremba Rowan Zellers Chong Zhang Marvin Zhang Shengjia Zhao Tianhao Zheng Juntang Zhuang William Zhuk and Barret Zoph. 2024. GPT-4 Technical Report. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","unstructured":"Dhruv Parikh Shouyi Li Bingyi Zhang Rajgopal Kannan Carl Busart and Viktor Prasanna. 2024. Accelerating ViT inference on FPGA through static and dynamic pruning. In 2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 78\u201389. DOI:10.1109\/FCCM60383.2024.00018","DOI":"10.1109\/FCCM60383.2024.00018"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","unstructured":"William Peebles and Saining Xie. 2023. Scalable diffusion models with transformers. In 2023 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE Paris France 4172\u20134182. DOI:10.1109\/ICCV51070.2023.00387","DOI":"10.1109\/ICCV51070.2023.00387"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530585"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISQED51717.2021.9424344"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507738"},{"key":"e_1_3_1_35_2","unstructured":"Yongming Rao Wenliang Zhao Benlin Liu Jiwen Lu Jie Zhou and Cho-Jui Hsieh. 2021. DynamicViT: Efficient vision transformers with dynamic token sparsification. In Advances in Neural Information Processing Systems Curran Associates Inc. 13937\u201313949."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","unstructured":"Aurko Roy Mohammad Saffar Ashish Vaswani and David Grangier. 2021. Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics Brian Roark and Ani Nenkova (Eds.). 9 (2021) 53\u201368. DOI:10.1162\/tacl_a_00353","DOI":"10.1162\/tacl_a_00353"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530504"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","unstructured":"Valery Sklyarov and Iouliia Skliarova. 2015. Design and implementation of counting networks. Computing 97 6 (June 2015) 557\u2013577. DOI:10.1007\/s00607-013-0360-y","DOI":"10.1007\/s00607-013-0360-y"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/MWSCAS57524.2023.10406121"},{"key":"e_1_3_1_40_2","volume-title":"Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS)","author":"Song HongRui","year":"2022","unstructured":"HongRui Song, Ya Wang, Meiqi Wang, and Zhongfeng Wang. 2022. UCViT: Hardware-friendly vision transformer via unified compression. In Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE."},{"key":"e_1_3_1_41_2","first-page":"10347","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers and distillation through attention. In Proceedings of the International Conference on Machine Learning. PMLR, 10347\u201310357."},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1007\/978-3-031-20053-3_30","volume-title":"Proceedings of the 17th European Conference on Computer Vision\u2013ECCV 2022. ,","author":"Touvron Hugo","year":"2022","unstructured":"Hugo Touvron, Matthieu Cord, and Herv\u00e9 J\u00e9gou. 2022. Deit iii: Revenge of the vit. In Proceedings of the 17th European Conference on Computer Vision\u2013ECCV 2022. , Springer, 516\u2013533."},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","unstructured":"Shikhar Tuli and Niraj K. Jha. 2023. AccelTran: A sparsity-aware accelerator for dynamic inference with transformers. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42 11 (November 2023) 4038\u20134051. DOI:10.1109\/TCAD.2023.3273992","DOI":"10.1109\/TCAD.2023.3273992"},{"key":"e_1_3_1_44_2","first-page":"15909","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Wang Ao","year":"2024","unstructured":"Ao Wang, Hui Chen, Zijia Lin, Jungong Han, and Guiguang Ding. 2024. RepViT: Revisiting mobile CNN from ViT perspective. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15909\u201315920."},{"key":"e_1_3_1_45_2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1109\/HPCA51647.2021.00018","volume-title":"Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)","author":"Wang Hanrui","year":"2021","unstructured":"Hanrui Wang, Zhekai Zhang, and Song Han. 2021. Spatten: Efficient sparse attention architecture with cascade token and head pruning. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 97\u2013110."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2022.3197489"},{"key":"e_1_3_1_47_2","first-page":"5493","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Xia Chunlong","year":"2024","unstructured":"Chunlong Xia, Xinliang Wang, Feng Lv, Xin Hao, and Yifeng Shi. 2024. ViT-CoMer: Vision transformer with convolutional multi-scale feature interaction for dense predictions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5493\u20135502."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3582649.3582676"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01054"},{"key":"e_1_3_1_50_2","first-page":"273","volume-title":"Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)","author":"You Haoran","year":"2023","unstructured":"Haoran You, Zhanyi Sun, Huihong Shi, Zhongzhi Yu, Yang Zhao, Yongan Zhang, Chaojian Li, Baopu Li, and Yingyan Lin. 2023. Vitcod: Vision transformer acceleration via dedicated algorithm and accelerator co-design. In Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 273\u2013286."},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01387"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","unstructured":"Hao Yu and Jianxin Wu. 2023. A Unified pruning framework for vision transformers. Science China Information Sciences 66 7 (April 2023) 179101. DOI:10.1007\/s11432-022-3646-6","DOI":"10.1007\/s11432-022-3646-6"},{"key":"e_1_3_1_53_2","unstructured":"Guangxiang Zhao Junyang Lin Zhiyuan Zhang Xuancheng Ren Qi Su and Xu Sun. 2019. Explicit sparse transformer: Concentrated attention through explicit selection. arXiv:1912.11637. Retrieved from https:\/\/arxiv.org\/abs\/1912.11637"},{"key":"e_1_3_1_54_2","unstructured":"Mingjian Zhu Yehui Tang and Kai Han. 2021. Vision Transformer Pruning. arXiv:2104.08500. Retrieved from https:\/\/arxiv.org\/abs\/2104.08500"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3745028","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,13]],"date-time":"2025-09-13T13:44:16Z","timestamp":1757771056000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3745028"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,13]]},"references-count":53,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3745028"],"URL":"https:\/\/doi.org\/10.1145\/3745028","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,13]]},"assertion":[{"value":"2024-09-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-07","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}