{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T08:39:25Z","timestamp":1766219965693,"version":"3.48.0"},"publisher-location":"New York, NY, USA","reference-count":31,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100002367","name":"Chinese Academy of Sciences","doi-asserted-by":"publisher","award":["XDB0660101"],"award-info":[{"award-number":["XDB0660101"]}],"id":[{"id":"10.13039\/501100002367","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002367","name":"Chinese Academy of Sciences","doi-asserted-by":"publisher","award":["XDB0660000"],"award-info":[{"award-number":["XDB0660000"]}],"id":[{"id":"10.13039\/501100002367","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Chinese Academy of Sciences","award":["XDB0660100"],"award-info":[{"award-number":["XDB0660100"]}]},{"name":"Natural Science Foundation of China","award":["62172380"],"award-info":[{"award-number":["62172380"]}]},{"name":"Jiangsu Provincial Natural Science Foundation","award":["BK20241818"],"award-info":[{"award-number":["BK20241818"]}]},{"name":"Youth Innovation Promotion Association CAS","award":["Y2021121"],"award-info":[{"award-number":["Y2021121"]}]},{"name":"USTC Research Funds of the Double First-Class Initiative","award":["YD2150002011."],"award-info":[{"award-number":["YD2150002011."]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,9,8]]},"DOI":"10.1145\/3754598.3754656","type":"proceedings-article","created":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T08:34:32Z","timestamp":1766219672000},"page":"406-416","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Automated FPGA Accelerator Generation Framework for Transformers with Dataflow Optimization"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2240-6672","authenticated-orcid":false,"given":"Wenqi","family":"Lou","sequence":"first","affiliation":[{"name":"Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9404-581X","authenticated-orcid":false,"given":"Yunji","family":"Qin","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0043-6433","authenticated-orcid":false,"given":"Zihao","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9403-5575","authenticated-orcid":false,"given":"Chao","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8391-5526","authenticated-orcid":false,"given":"Lei","family":"Gong","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China and Anhui Prov. Key Lab of HPC, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8360-3143","authenticated-orcid":false,"given":"Xuehai","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]}],"member":"320","published-online":{"date-parts":[[2025,12,20]]},"reference":[{"key":"e_1_3_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Jun Bi Yuanbo Wen Xiaqing Li Yongwei Zhao Yuxuan Guo Enshuai Zhou Xing Hu Zidong Du Ling Li Huaping Chen Tianshi Chen and Qi Guo. 2025. Efficient and Fast High-Performance Library Generation for Deep Learning Accelerators. IEEE Trans. Comput. 74 1 (2025) 155\u2013169.","DOI":"10.1109\/TC.2024.3475575"},{"key":"e_1_3_3_1_3_2","unstructured":"Tri Dao Dan Fu Stefano Ermon Atri Rudra and Christopher R\u00e9. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in neural information processing systems 35 (2022) 16344\u201316359."},{"key":"e_1_3_3_1_4_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1810.04805 (2018)."},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS56072.2025.11043956"},{"key":"e_1_3_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071047"},{"key":"e_1_3_3_1_7_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et\u00a0al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2010.11929 (2020)."},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Kai Han Yunhe Wang Hanting Chen Xinghao Chen Jianyuan Guo Zhenhua Liu Yehui Tang An Xiao Chunjing Xu Yixing Xu et\u00a0al. 2022. A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence 45 1 (2022) 87\u2013110.","DOI":"10.1109\/TPAMI.2022.3152247"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL60245.2023.00012"},{"key":"e_1_3_3_1_10_2","unstructured":"Yen-Chang Hsu Ting Hua Sungen Chang Qian Lou Yilin Shen and Hongxia Jin. 2022. Language model compression with weighted low-rank factorization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2207.00112 (2022)."},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00058"},{"key":"e_1_3_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575747"},{"key":"e_1_3_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439477"},{"key":"e_1_3_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3370748.3406567"},{"key":"e_1_3_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Wenqi Lou Lei Gong Chao Wang Zidong Du and Xuehai Zhou. 2021. OctCNN: A high throughput FPGA accelerator for CNNs using octave convolution algorithm. IEEE Trans. Comput. 71 8 (2021) 1847\u20131859.","DOI":"10.1109\/TC.2021.3110413"},{"key":"e_1_3_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Wenqi Lou Lei Gong Chao Wang Jiaming Qian Xuan Wang Changlong Li and Xuehai Zhou. 2024. Unleashing network\/accelerator co-exploration potential on fpgas: A deeper joint search. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2024).","DOI":"10.1109\/TCAD.2024.3391688"},{"key":"e_1_3_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Wenqi Lou Yunji Qin Xuan Wang Lei Gong Chao Wang and Xuehai Zhou. 2024. FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 43 11 (2024) 3852\u20133863.","DOI":"10.1109\/TCAD.2024.3439488"},{"key":"e_1_3_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480125"},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.00886"},{"key":"e_1_3_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS46773.2023.10181988"},{"key":"e_1_3_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649476.3658810"},{"key":"e_1_3_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507738"},{"key":"e_1_3_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD57390.2023.10323651"},{"key":"e_1_3_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Atefeh Sohrabizadeh Cody\u00a0Hao Yu Min Gao and Jason Cong. 2022. AutoDSE: Enabling software programmers to design efficient FPGA accelerators. ACM Transactions on Design Automation of Electronic Systems (TODAES) 27 4 (2022) 1\u201327.","DOI":"10.1145\/3494534"},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00018"},{"key":"e_1_3_3_1_26_2","doi-asserted-by":"crossref","unstructured":"Teng Wang Lei Gong Chao Wang Yang Yang Yingxue Gao Xuehai Zhou and Huaping Chen. 2022. Via: A novel vision-transformer accelerator based on fpga. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41 11 (2022) 4088\u20134099.","DOI":"10.1109\/TCAD.2022.3197489"},{"key":"e_1_3_3_1_27_2","first-page":"1","volume-title":"2023 60th ACM\/IEEE Design Automation Conference (DAC)","author":"Wang Zhican","year":"2023","unstructured":"Zhican Wang, Gang Wang, Honglan Jiang, Ningyi Xu, and Guanghui He. 2023. Cosa: Co-operative systolic arrays for multi-head attention mechanism in neural network using hybrid data reuse and fusion methodologies. In 2023 60th ACM\/IEEE Design Automation Conference (DAC). IEEE, 1\u20136."},{"key":"e_1_3_3_1_28_2","unstructured":"Rui Xu Sheng Ma Yaohua Wang Yang Guo Dongsheng Li and Yuran Qiao. 2021. Heterogeneous systolic array architecture for compact cnns hardware accelerators. IEEE Transactions on Parallel and Distributed Systems 33 11 (2021) 2860\u20132871."},{"key":"e_1_3_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41406.2024.00028"},{"key":"e_1_3_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Wenhua Ye Xu Zhou Joey Zhou Cen Chen and Kenli Li. 2023. Accelerating attention mechanism on fpgas based on efficient reconfigurable systolic array. ACM Transactions on Embedded Computing Systems 22 6 (2023) 1\u201322.","DOI":"10.1145\/3549937"},{"key":"e_1_3_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO61859.2024.00108"},{"key":"e_1_3_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415609"}],"event":{"name":"ICPP '25: 54th International Conference on Parallel Processing","location":"San Diego CA USA","acronym":"ICPP '25"},"container-title":["Proceedings of the 54th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3754598.3754656","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T08:34:40Z","timestamp":1766219680000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3754598.3754656"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,8]]},"references-count":31,"alternative-id":["10.1145\/3754598.3754656","10.1145\/3754598"],"URL":"https:\/\/doi.org\/10.1145\/3754598.3754656","relation":{},"subject":[],"published":{"date-parts":[[2025,9,8]]},"assertion":[{"value":"2025-12-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}