{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T10:31:30Z","timestamp":1770287490787,"version":"3.49.0"},"reference-count":75,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T00:00:00Z","timestamp":1741910400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Research Grants Council (RGC) of Hong Kong under the Research Impact Fund","award":["R7003-21"],"award-info":[{"award-number":["R7003-21"]}]},{"name":"Theme-based Research Scheme (TRS) Project","award":["T45-701-22-R"],"award-info":[{"award-number":["T45-701-22-R"]}]},{"name":"AI Chip Center for Emerging Smart Systems"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>\n            Modern transformer-based deep neural networks present unique technical challenges for effective acceleration in real-world applications. Apart from the vast amount of linear operations needed due to their sizes, modern transformer models are increasingly reliance on precise non-linear computations that make traditional low-bitwidth quantization methods and fixed-dataflow matrix accelerators ineffective for end-to-end acceleration. To address this need to accelerate both linear and non-linear operations in a unified and programmable framework, this article introduces TATAA. TATAA employs 8-bit integer (\n            <jats:monospace>int8<\/jats:monospace>\n            ) arithmetic for quantized linear layer operations through post-training quantization, while it relies on\n            <jats:monospace>bfloat16<\/jats:monospace>\n            floating-point arithmetic to approximate non-linear layers of a transformer model. TATAA hardware features a transformable arithmetic architecture that supports both formats during runtime with minimal overhead, enabling it to switch between a systolic array mode for\n            <jats:monospace>int8<\/jats:monospace>\n            matrix multiplications and a SIMD mode for vectorized\n            <jats:monospace>bfloat16<\/jats:monospace>\n            operations. An end-to-end compiler is presented to enable flexible mapping from emerging transformer models to the proposed hardware. Experimental results indicate that our mixed-precision design incurs only 0.14% to 1.16% accuracy drop when compared with the pre-trained single-precision transformer models across a range of vision, language, and generative text applications. Our prototype implementation on the Alveo U280 FPGA currently achieves 2,935.2 GOPS throughput on linear layers and a maximum of 189.5 GFLOPS for non-linear operations, outperforming related works by up to\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(1.45\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            in end-to-end throughput and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(2.29\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            in DSP efficiency, while achieving\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(2.19\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            higher power efficiency than modern NVIDIA RTX4090 GPU.\n          <\/jats:p>","DOI":"10.1145\/3714416","type":"journal-article","created":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T13:33:12Z","timestamp":1737725592000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2477-3553","authenticated-orcid":false,"given":"Jiajun","family":"Wu","sequence":"first","affiliation":[{"name":"Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-5225-072X","authenticated-orcid":false,"given":"Mo","family":"Song","sequence":"additional","affiliation":[{"name":"Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4098-2355","authenticated-orcid":false,"given":"Jingmin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5673-3746","authenticated-orcid":false,"given":"Yizhao","family":"Gao","sequence":"additional","affiliation":[{"name":"Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6492-727X","authenticated-orcid":false,"given":"Jia","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6514-0237","authenticated-orcid":false,"given":"Hayden Kwok-Hay","family":"So","sequence":"additional","affiliation":[{"name":"Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2025,3,14]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2023. Attention is all you need. In Advances in Neural Information Processing Systems. Retrieved from https:\/\/arxiv.org\/abs\/1706.03762"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586134"},{"key":"e_1_3_2_4_2","unstructured":"Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https:\/\/arxiv.org\/abs\/1607.06450"},{"key":"e_1_3_2_5_2","article-title":"Root mean square layer normalization","volume":"32","author":"Zhang Biao","year":"2019","unstructured":"Biao Zhang and Rico Sennrich. 2019. Root mean square layer normalization. In Advances in Neural Information Processing Systems, Vol. 32. Retrieved from https:\/\/openreview.net\/references\/pdf?id=S1qBAf6rr","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_6_2","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (GELUs). arXiv:1606.08415. Retrieved from https:\/\/arxiv.org\/abs\/1606.08415"},{"key":"e_1_3_2_7_2","unstructured":"Prajit Ramachandran Barret Zoph and Quoc V. Le. 2017. Searching for activation functions. arXiv:1710.05941. Retrieved from https:\/\/arxiv.org\/abs\/1710.05941"},{"key":"e_1_3_2_8_2","unstructured":"Noam Shazeer. 2020. GLU variants improve transformer. arXiv:2002.05202. Retrieved from https:\/\/arxiv.org\/abs\/2002.05202"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_3_2_10_2","unstructured":"Dexu Lin Liao Edward Somdeb Majumdar Aaron Lamb and Karamvir Chatha. 2018. Approximation of non-linear functions in fixed point using look-up tables. US Patent No. 10037306."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-20870-7_7"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626202.3637562"},{"key":"e_1_3_2_13_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.127063"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01565"},{"key":"e_1_3_2_16_2","first-page":"5506","volume-title":"Proceedings of the 38th International Conference on Machine Learning","volume":"139","author":"Kim Sehoon","year":"2021","unstructured":"Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. I-BERT: Integer-only BERT quantization. In Proceedings of the 38th International Conference on Machine Learning 139 (2021), 5506\u20135518. Retrieved from https:\/\/proceedings.mlr.press\/v139\/kim21d.html"},{"key":"e_1_3_2_17_2","first-page":"1","volume-title":"Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN \u201923)","author":"Alberto Marchisio","year":"2023","unstructured":"Marchisio Alberto, Dura Davide, Capra Maurizio, Martina Maurizio, Masera Guido, and Shafique Muhammad. 2023. SwiftTron: An efficient hardware accelerator for quantized transformers. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN \u201923), 1\u20139."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439477"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527438"},{"key":"e_1_3_2_20_2","first-page":"17402","article-title":"Outlier suppression: Pushing the limit of low-bit transformer language models","volume":"35","author":"Wei Xiuying","year":"2022","unstructured":"Xiuying Wei, Yunchen Zhang, Xiangguo Zhang, Ruihao Gong, Shanghang Zhang, Qi Zhang, Fengwei Yu, and Xianglong Liu. 2022. Outlier suppression: Pushing the limit of low-bit transformer language models. In Advances in Neural Information Processing Systems, Vol. 35, 17402\u201317414.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2022.3197489"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2023.3312775"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD57390.2023.10323752"},{"key":"e_1_3_2_24_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_2_25_2","first-page":"10347","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning. PMLR, 10347\u201310357."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_2_27_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_2_28_2","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona Diab Xian Li Xi Victoria Lin et al. 2022. OPT: Open pre-trained transformer language models. arXiv:2205.01068. Retrieved from https:\/\/arxiv.org\/abs\/2205.01068"},{"key":"e_1_3_2_29_2","volume-title":"Advances in Neural Information Processing Systems","volume":"31","author":"Drumond Mario","year":"2018","unstructured":"Mario Drumond, Tao LIN, Martin Jaggi, and Babak Falsafi. 2018. Training DNNs with hybrid block floating point. In Advances in Neural Information Processing Systems. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2018\/file\/6a9aeddfc689c1d0e3b9ccc3ab651bc5-Paper.pdf"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT59805.2023.00032"},{"key":"e_1_3_2_31_2","first-page":"1405","article-title":"Towards efficient post-training quantization of pre-trained language models","volume":"35","author":"Bai Haoli","year":"2022","unstructured":"Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, and Michael R. Lyu. 2022. Towards efficient post-training quantization of pre-trained language models. In Advances in Neural Information Processing Systems, Vol. 35, 1405\u20131418.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_32_2","first-page":"28092","article-title":"Post-training quantization for vision transformer","volume":"34","author":"Liu Zhenhua","year":"2021","unstructured":"Zhenhua Liu, Yunhe Wang, Kai Han, Wei Zhang, Siwei Ma, and Wen Gao. 2021. Post-training quantization for vision transformer. In Advances in Neural Information Processing Systems, Vol. 34, 28092\u201328103.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2022\/164"},{"key":"e_1_3_2_34_2","unstructured":"Tim Dettmers Artidoro Pagnoni Ari Holtzman and Luke Zettlemoyer. 2023. QLoRA: Efficient finetuning of quantized LLMs. arXiv:2305.14314. Retrieved from https:\/\/arxiv.org\/abs\/2305.14314"},{"key":"e_1_3_2_35_2","unstructured":"Markus Nagel Marios Fournarakis Rana Ali Amjad Yelysei Bondarenko Mart Van Baalen and Tijmen Blankevoort. 2021. A white paper on neural network quantization. arXiv:2106.08295. Retrieved from https:\/\/arxiv.org\/abs\/2106.08295"},{"key":"e_1_3_2_36_2","first-page":"13","volume-title":"Proceedings of the 40th International Conference on Machine Learning (2023)","author":"Xiao Guangxuan","year":"2023","unstructured":"Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. 2023. SmoothQuant: Accurate and efficient post-training quantization for large language models. In Proceedings of the 40th International Conference on Machine Learning (2023), Article 1585, 13 pages."},{"key":"e_1_3_2_37_2","first-page":"11875","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Yao Zhewei","year":"2021","unstructured":"Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, et al. 2021. HAWQ-V3: Dyadic neural network quantization. In Proceedings of the International Conference on Machine Learning. PMLR, 11875\u201311886."},{"key":"e_1_3_2_38_2","article-title":"Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters","volume":"2","author":"Bridle John","year":"1989","unstructured":"John Bridle. 1989. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In Advances in Neural Information Processing Systems, Vol. 2.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","unstructured":"Jacob R. Stevens Rangharajan Venkatesan Steve Dai Brucek Khailany and Anand Raghunathan. 2021. Softermax: Hardware\/software co-design of an efficient softmax for transformers. arXiv:2103.09301. Retrieved from https:\/\/arxiv.org\/abs\/2103.09301","DOI":"10.1109\/DAC18074.2021.9586134"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021745"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/IWOFC48002.2019.9078446"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL50879.2020.00055"},{"key":"e_1_3_2_43_2","unstructured":"Mahdi Nazemi Ghasem Pasandi and Massoud Pedram. 2018. NullaNet: Training deep neural networks for reduced-memory-access inference 1807.08716. arXiv:1807.08716. Retrieved from https:\/\/arxiv.org\/abs\/1807.08716"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00035"},{"key":"e_1_3_2_45_2","first-page":"109","volume-title":"Proceedings of the 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL \u201922)","author":"Lit Zhengang","year":"2022","unstructured":"Zhengang Lit, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, et al. 2022. Auto-ViT-Acc: An FPGA-aware automatic acceleration framework for vision transformer with mixed-scheme quantization. In Proceedings of the 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL \u201922). IEEE, 109\u2013116."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477002"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE51398.2021.9474043"},{"key":"e_1_3_2_48_2","unstructured":"NVIDIA. 2024. Transformer engine documentation. Retrieved from https:\/\/docs.nvidia.com\/deeplearning\/transformer-engine\/user-guide\/"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3370748.3406567"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3564606"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626202.3637569"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3656177"},{"key":"e_1_3_2_53_2","unstructured":"Ramya Prabhu Ajay Nayak Jayashree Mohan Ramachandran Ramjee and Ashish Panwar. 2024. VAttention: Dynamic memory management for serving LLMs without paged attention. Retrieved from https:\/\/arxiv.org\/abs\/2405.04437"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics11213550"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASP-DAC58780.2024.10473931"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC58850.2023.00039"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO56248.2022.00051"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL60245.2023.00048"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3549937"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2018.00091"},{"key":"e_1_3_2_61_2","volume":"32","author":"Lomont Chris","year":"2003","unstructured":"Chris Lomont. 2003. Fast Inverse Square Root. Technical Report 32.","journal-title":"Fast Inverse Square Root"},{"key":"e_1_3_2_62_2","unstructured":"Dhiraj Kalamkar Dheevatsa Mudigere Naveen Mellempudi Dipankar Das Kunal Banerjee Sasikanth Avancha Dharma Teja Vooturi Nataraj Jammalamadaka Jianyu Huang Hector Yuen et al. 2019. A study of BFLOAT16 for deep learning training. arXiv:1905.12322. Retrieved from https:\/\/arxiv.org\/abs\/1905.12322"},{"key":"e_1_3_2_63_2","author":"Fu Yao","year":"2016","unstructured":"Yao Fu, Ephrem Wu, Ashish Sirasao, Sedny Attia, Kamran Khan, and Ralph Wittig. 2016. Deep learning with INT8 optimization on Xilinx devices. Xilinx White Paper.","journal-title":"Deep learning with INT8 optimization on Xilinx devices. Xilinx White Paper"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASP-DAC52403.2022.9712553"},{"key":"e_1_3_2_65_2","unstructured":"Junjie Bai Fang Lu and Ke Zhang. 2019. ONNX: Open neural network exchange. Retrieved from https:\/\/github.com\/onnx\/onnx"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415609"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM57271.2023.00018"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","unstructured":"Alex Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy and Samuel R. Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv:1804.07461. Retrieved from https:\/\/arxiv.org\/abs\/1804.07461","DOI":"10.18653\/v1\/W18-5446"},{"issue":"8","key":"e_1_3_2_70_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_71_2","doi-asserted-by":"crossref","unstructured":"Denis Paperno Germ\u00e1n Kruszewski Angeliki Lazaridou Quan Ngoc Pham Raffaella Bernardi Sandro Pezzelle Marco Baroni Gemma Boleda and Raquel Fern\u00e1ndez. 2016. The LAMBADA dataset: Word prediction requiring a broad discourse context. arXiv:1606.06031. Retrieved from https:\/\/arxiv.org\/abs\/1606.06031","DOI":"10.18653\/v1\/P16-1144"},{"key":"e_1_3_2_72_2","unstructured":"Aohan Zeng Bin Xu Bowen Wang Chenhui Zhang Da Yin Dan Zhang Diego Rojas Guanyu Feng Hanlin Zhao Hanyu Lai et al. 2024. ChatGLM: A family of large language models from GLM-130B to GLM-4 all tools. arXiv:2406.12793. Retrieved from https:\/\/arxiv.org\/abs\/2406.12793"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/SOCC49529.2020.9524802"},{"key":"e_1_3_2_74_2","unstructured":"Tri Dao Daniel Y. Fu Stefano Ermon Atri Rudra and Christopher R\u00e9. 2022. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. arXiv:2205.14135. Retrieved from https:\/\/arxiv.org\/abs\/2205.14135"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW63119.2024.00045"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL60245.2023.00012"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3714416","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3714416","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:56Z","timestamp":1750295876000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3714416"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,14]]},"references-count":75,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3714416"],"URL":"https:\/\/doi.org\/10.1145\/3714416","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,14]]},"assertion":[{"value":"2024-10-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-19","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}