{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T16:01:47Z","timestamp":1775145707630,"version":"3.50.1"},"reference-count":108,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,6,20]]},"abstract":"<jats:p>Special-purpose hardware accelerators are increasingly pivotal for sustaining performance improvements in emerging applications, especially as the benefits of technology scaling continue to diminish. However, designers currently lack effective tools and methodologies to construct complex, high-performance accelerator architectures in a productive manner. Existing high-level synthesis (HLS) tools often require intrusive sourcelevel changes to attain satisfactory quality of results. Despite the introduction of several new accelerator design languages (ADLs) aiming to enhance or replace HLS, their advantages are more evident in relatively simple applications with a single kernel. Existing ADLs prove less effective for realistic hierarchical designs with multiple kernels, even if the design hierarchy is flattened.<\/jats:p>\n          <jats:p>\n            In this paper, we introduce Allo, a composable programming model for efficient spatial accelerator design. Allo decouples hardware customizations, including compute, memory, communication, and data type from algorithm specification, and encapsulates them as a set of customization primitives. Allo preserves the hierarchical structure of an input program by combining customizations from different functions in a bottomup, type-safe manner. This approach facilitates holistic optimizations that span across function boundaries. We conduct comprehensive experiments on commonly-used HLS benchmarks and several realistic deep learning models. Our evaluation shows that Allo can outperform state-of-the-art HLS tools and ADLs on all test cases in the PolyBench. For the GPT2 model, the inference latency of the Allo generated accelerator is\n            <jats:inline-formula>\n              <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"inline\">\n                <mml:mn>1.7\u00d7<\/mml:mn>\n              <\/mml:math>\n            <\/jats:inline-formula>\n            faster than the NVIDIA A100 GPU with\n            <jats:inline-formula>\n              <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"inline\">\n                <mml:mn>5.4\u00d7<\/mml:mn>\n              <\/mml:math>\n            <\/jats:inline-formula>\n            higher energy efficiency, demonstrating the capability of Allo to handle large-scale designs.\n          <\/jats:p>","DOI":"10.1145\/3656401","type":"journal-article","created":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T16:27:20Z","timestamp":1718900840000},"page":"593-620","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Allo: A Programming Model for Composable Accelerator Design"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6617-0075","authenticated-orcid":false,"given":"Hongzheng","family":"Chen","sequence":"first","affiliation":[{"name":"Cornell University, Ithaca, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2850-0176","authenticated-orcid":false,"given":"Niansong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Cornell University, Ithaca, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6901-8837","authenticated-orcid":false,"given":"Shaojie","family":"Xiang","sequence":"additional","affiliation":[{"name":"Cornell University, Ithaca, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0023-2367","authenticated-orcid":false,"given":"Zhichen","family":"Zeng","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7347-5117","authenticated-orcid":false,"given":"Mengjia","family":"Dai","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0778-0308","authenticated-orcid":false,"given":"Zhiru","family":"Zhang","sequence":"additional","affiliation":[{"name":"Cornell University, Ithaca, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2022.3178580"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640366"},{"key":"e_1_3_2_4_2","unstructured":"AWS. 2023. Inferentia Architecture. https:\/\/awsdocs-neuron.readthedocs-hosted.com\/en\/latest\/general\/arch\/neuron-hardware\/inferentia.html."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228584"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.5555\/3314872.3314896"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356173"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476159"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Hongzheng Chen Cody Hao Yu Shuai Zheng Zhen Zhang Zhiru Zhang and Yida Wang. 2024. Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems Volume 2 (La Jolla CA USA) (ASPLOS\u201924). Association for Computing Machinery New York NY USA.","DOI":"10.1145\/3620665.3640399"},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"Hongzheng Chen Jiahao Zhang Yixiao Du Shaojie Xiang Zichao Yue Niansong Zhang Yaohui Cai and Zhiru Zhang. 2024. Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference. ACM Trans. Reconfigurable Technol. Syst. (2024).","DOI":"10.1145\/3656177"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","unstructured":"Hongzheng Chen Niansong Zhang Shaojie Xiang Zhichen Zeng Mengjia Dai and Zhiru Zhang. 2024. Artifact for Allo: A Programming Model for Composable Accelerator Design. https:\/\/doi.org\/10.5281\/zenodo.10961342 10.5281\/zenodo.10961342.","DOI":"10.5281\/zenodo.10961342"},{"key":"e_1_3_2_12_2","first-page":"579","volume-title":"Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI\u201918). USENIX Association, USA, 579\u2013594."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327258"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240850"},{"key":"e_1_3_2_15_2","unstructured":"CIRCT. 2024. CIRCT: Circuit IR Compilers and Tools. https:\/\/github.com\/llvm\/circt."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3530775"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2011.2110592"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240838"},{"key":"e_1_3_2_19_2","volume-title":"Type assignment in programming languages","author":"Damas Luis","year":"1984","unstructured":"Luis Damas. 1984. Type assignment in programming languages. Ph. D. Dissertation. University of Edinburgh."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/582153.582176"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3039409"},{"key":"e_1_3_2_22_2","first-page":"30318","article-title":"LLM.int8 (): 8-bit Matrix Multiplication for Transformers at Scale","volume":"35","author":"Dettmers Tim","year":"2022","unstructured":"Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. LLM.int8 (): 8-bit Matrix Multiplication for Transformers at Scale. Advances in Neural Information Processing Systems 35 (2022), 30318\u201330332.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_23_2","unstructured":"IREE Developers. 2022. IREE (Intermediate Representation Execution Environment. https:\/\/google.github.io\/iree\/"},{"key":"e_1_3_2_24_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093333.3009882"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/13\/07\/P07027"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385983"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2021.3057203"},{"key":"e_1_3_2_29_2","unstructured":"Farah Fahim Benjamin Hawks Christian Herwig James Hirschauer Sergo Jindariani Nhan Tran et al. 2021. hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3576933"},{"key":"e_1_3_2_31_2","unstructured":"Elias Frantar Saleh Ashkboos Torsten Hoefler and Dan Alistarh. 2022. GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers. arXiv preprint arXiv:2210.17323 (2022)."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586216"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439289"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414632"},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","unstructured":"James Hegarty John Brunhaver Zachary DeVito Jonathan Ragan-Kelley Noy Cohen Steven Bell Artem Vasilyev Mark Horowitz and Pat Hanrahan. 2014. Darkroom: Compiling High-Level Image Processing Code into Hardware Pipelines. ACM Trans. Graph. 33 4 Article 144 (jul 2014) 11 pages. https:\/\/doi.org\/10.1145\/2601097.2601174 10.1145\/2601097.2601174","DOI":"10.1145\/2601097.2601174"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925892"},{"key":"e_1_3_2_38_2","first-page":"29","article-title":"The Principal Type-Scheme of an Object in Combinatory Logic","volume":"146","author":"Hindley R.","year":"1969","unstructured":"R. Hindley. 1969. The Principal Type-Scheme of an Object in Combinatory Logic. Trans. Amer. Math. Soc. 146 (1969), 29\u201360. http:\/\/www.jstor.org\/stable\/1995158","journal-title":"Trans. Amer. Math. Soc"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","unstructured":"Seongmin Hong Seungjae Moon Junsoo Kim Sungjae Lee Minsub Kim Dongsoo Lee and Joo-Young Kim. 2022. DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation. In 2022 55th IEEE\/ACM International Symposium on Microarchitecture (MICRO). 616\u2013630. https:\/\/doi.org\/10.1109\/MICRO56248.2022.00051 10.1109\/MICRO56248.2022.00051","DOI":"10.1109\/MICRO56248.2022.00051"},{"key":"e_1_3_2_40_2","unstructured":"Andrew G Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017)."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","unstructured":"Sitao Huang Kun Wu Hyunmin Jeong Chengyue Wang Deming Chen and Wen-Mei Hwu. 2021. PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. IEEE Trans. Comput. 70 12 (2021) 2015\u20132028. https:\/\/doi.org\/10.1109\/TC.2021.3123465 10.1109\/TC.2021.3123465","DOI":"10.1109\/TC.2021.3123465"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523446"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589350"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/512927.512945"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.20"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192379"},{"key":"e_1_3_2_49_2","unstructured":"H. T. Kung and Charles E. Leiserson. 1978. Systolic Arrays for (VLSI). https:\/\/api.semanticscholar.org\/CorpusID:60531591"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293910"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415644"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3469660"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281665"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370308"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375320"},{"key":"e_1_3_2_56_2","unstructured":"TorchVision maintainers and contributors. 2016. TorchVision: PyTorch\u2019s Computer Vision library. https:\/\/github.com\/pytorch\/vision."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","unstructured":"Stefano Markidis Steven Wei Der Chien Erwin Laure Ivy Bo Peng and Jeffrey S. Vetter. 2018. NVIDIA Tensor Core Programmability Performance & Precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 522\u2013531. https:\/\/doi.org\/10.1109\/IPDPSW.2018.00091 10.1109\/IPDPSW.2018.00091","DOI":"10.1109\/IPDPSW.2018.00091"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/2159542.2159547"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","unstructured":"William S. Moses Lorenzo Chelini Ruizhe Zhao and Oleksandr Zinenko. 2021. Polygeist: Raising C to Polyhedral MLIR. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). 45\u201359. https:\/\/doi.org\/10.1109\/PACT52795.2021.00011 10.1109\/PACT52795.2021.00011","DOI":"10.1109\/PACT52795.2021.00011"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385974"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3591234"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446712"},{"key":"e_1_3_2_63_2","unstructured":"OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023)."},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530681"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3409006"},{"key":"e_1_3_2_66_2","first-page":"172","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. IEEE Press, New York, NY, USA, 172\u2013198."},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/2594291.2594339"},{"key":"e_1_3_2_68_2","article-title":"Efficiently Scaling Transformer Inference","volume":"5","author":"Pope Reiner","year":"2023","unstructured":"Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. 2023. Efficiently Scaling Transformer Inference. In Proceedings of Machine Learning and Systems, Vol. 5.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_69_2","unstructured":"Fran\u00e7ois Pottier. 1998. Type Inference in the Presence of Subtyping: From Theory to Practice. Ph. D. Dissertation. INRIA."},{"key":"e_1_3_2_70_2","unstructured":"Louis-No\u00ebl Pouchet et al. 2012. Polybench: The polyhedral benchmark suite. http:\/\/www.cs.ucla.edu\/pouchet\/software\/polybench"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626202.3637563"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/2435264.2435273"},{"key":"e_1_3_2_73_2","unstructured":"PyBind. 2023. PyBind11. https:\/\/github.com\/pybind\/pybind11."},{"key":"e_1_3_2_74_2","unstructured":"PyTorch. 2022. TorchDynamo Overview. https:\/\/pytorch.org\/docs\/master\/dynamo\/."},{"issue":"8","key":"e_1_3_2_75_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.","journal-title":"OpenAI blog"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_2_77_2","article-title":"torch.fx: Practical Program Capture and Transformation for Deep Learning in Python","volume":"4","author":"Reed James","year":"2022","unstructured":"James Reed, Zachary DeVito, Horace He, Ansley Ussery, and Jason Ansel. 2022. torch.fx: Practical Program Capture and Transformation for Deep Learning in Python. In Proceedings of Machine Learning and Systems, Vol. 4.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","unstructured":"Oliver Reiche M. Akif \u00d6zkan Richard Membarth J\u00fcrgen Teich and Frank Hannig. 2017. Generating FPGA-based image processing accelerators with Hipacc: (Invited paper). In 2017 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). 1026\u20131033. https:\/\/doi.org\/10.1109\/ICCAD.2017.8203894 10.1109\/ICCAD.2017.8203894","DOI":"10.1109\/ICCAD.2017.8203894"},{"key":"e_1_3_2_79_2","unstructured":"Junru Shao Xiyou Zhou Siyuan Feng Bohan Hou Ruihang Lai Hongyi Jin Wuwei Lin Masahiro Masuda Cody Hao Yu and Tianqi Chen. 2022. Tensor Program Optimization with Probabilistic Programs. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783720"},{"key":"e_1_3_2_81_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_82_2","first-page":"55","volume-title":"2022 59th ACM\/IEEE Design Automation Conference (DAC)","author":"Sohrabizadeh Atefeh","year":"2022","unstructured":"Atefeh Sohrabizadeh, Yunsheng Bai, Yizhou Sun, and Jason Cong. 2022. Automated Accelerator Optimization Aided by Graph Neural Networks. In 2022 59th ACM\/IEEE Design Automation Conference (DAC). Association for Computing Machinery, New York, NY, USA, 55\u201360."},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439464"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","unstructured":"Nitish Srivastava Hongbo Rong Prithayan Barua Guanyu Feng Huanqi Cao Zhiru Zhang et al. 2019. T2STensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 181\u2013189. https:\/\/doi.org\/10.1109\/FCCM.2019.00033 10.1109\/FCCM.2019.00033","DOI":"10.1109\/FCCM.2019.00033"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180481"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.2140\/pjm.1955.5.285"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378495"},{"key":"e_1_3_2_88_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et al. 2023. Llama: Open and Efficient Foundation Language Models. arXiv preprint arXiv. 2302.13971 (2023)."},{"key":"e_1_3_2_89_2","unstructured":"Nicolas Vasilache Oleksandr Zinenko Aart JC Bik Mahesh Ravishankar Thomas Raoux Alexander Belyaev et al. 2022. Composable and Modular Code Generation in MLIR: A Structured and Retargetable Approach to Tensor Compiler Construction. arXiv preprint arXiv:2202.03293 (2022)."},{"key":"e_1_3_2_90_2","unstructured":"Nicolas Vasilache Oleksandr Zinenko Theodoros Theodoridis Priya Goyal Zachary DeVito William S Moses Sven Verdoolaege Andrew Adams and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. arXiv preprint arXiv:1802.04730 (2018)."},{"key":"e_1_3_2_91_2","first-page":"6000","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS\u201917)","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS\u201917). Curran Associates Inc., Red Hook, NY, USA, 6000\u20136010."},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1145\/3050220.3050234"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439292"},{"key":"e_1_3_2_94_2","doi-asserted-by":"crossref","unstructured":"Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u00e9mi Louf Morgan Funtowicz et al. 2019. Huggingface\u2019s Transformers: State-of-the-Art Natural Language Processing. arXiv preprint arXiv:1910.03771 (2019).","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490422.3502369"},{"key":"e_1_3_2_96_2","first-page":"38087","volume-title":"International Conference on Machine Learning","author":"Xiao Guangxuan","year":"2023","unstructured":"Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. 2023. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. In International Conference on Machine Learning. PMLR, 38087\u201338099."},{"key":"e_1_3_2_97_2","unstructured":"AMD Xilinx. 2021. Alveo U280 Data Center Accelerator Card. https:\/\/www.xilinx.com\/products\/boards-andkits\/alveo\/u280.html#specifications."},{"key":"e_1_3_2_98_2","unstructured":"AMD Xilinx. 2022. AI Engines and Their Applications. https:\/\/www.xilinx.com\/content\/dam\/xilinx\/support\/documents\/white_papers\/wp506-ai-engine.pdf"},{"key":"e_1_3_2_99_2","unstructured":"AMD Xilinx. 2022. Vitis Accelerated Libraries. https:\/\/github.com\/Xilinx\/Vitis_Libraries."},{"key":"e_1_3_2_100_2","unstructured":"AMD Xilinx. 2022. Vitis AI: Adaptable & Real-Time AI Inference Acceleration. https:\/\/github.com\/Xilinx\/Vitis-AI."},{"key":"e_1_3_2_101_2","unstructured":"AMD Xilinx. 2022. Vitis HLS v2022.1. https:\/\/www.xilinx.com\/products\/design-tools\/vitis\/vitis-platform.html."},{"key":"e_1_3_2_102_2","unstructured":"AMD Xilinx. 2023. Merlin Compiler. https:\/\/github.com\/Xilinx\/merlin-compiler."},{"key":"e_1_3_2_103_2","unstructured":"Hanchen Ye Cong Hao Jianyi Cheng Hyunmin Jeong Jack Huang Stephen Neuendorffer and Deming Chen. 2022. ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","DOI":"10.1145\/3276491"},{"key":"e_1_3_2_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2017.8203809"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL57034.2022.00044"},{"key":"e_1_3_2_107_2","unstructured":"Yilong Zhao Chien-Yu Lin Kan Zhu Zihao Ye Lequn Chen Size Zheng Luis Ceze Arvind Krishnamurthy Tianqi Chen and Baris Kasikci. 2023. Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving. arXiv preprint arXiv:2310.19102 (2023)."},{"key":"e_1_3_2_108_2","first-page":"17","volume-title":"Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201920)","author":"Zheng Lianmin","year":"2020","unstructured":"Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor: Generating High-Performance Tensor Programs for Deep Learning. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201920). USENIX Association, USA, Article 49, 17 pages."},{"key":"e_1_3_2_109_2","unstructured":"Alex Zinenko. 2022. [RFC] Interfaces and Dialects for Precise IR Transformation Control. https:\/\/discourse.llvm.org\/t\/rfc-interfaces-and-dialects-for-precise-ir-transformation-control\/60927"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656401","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3656401","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T20:41:00Z","timestamp":1751661660000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656401"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":108,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2024,6,20]]}},"alternative-id":["10.1145\/3656401"],"URL":"https:\/\/doi.org\/10.1145\/3656401","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,20]]},"assertion":[{"value":"2024-06-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}