{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T16:51:22Z","timestamp":1771951882274,"version":"3.50.1"},"reference-count":94,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,11,13]],"date-time":"2023-11-13T00:00:00Z","timestamp":1699833600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,11,13]]},"abstract":"<jats:p>Compiler optimization plays an increasingly important role to boost the performance of machine learning models for data processing and management. With increasingly complex data, the dynamic tensor shape phenomenon emerges for ML models. However, existing ML compilers either can only handle static shape models or expose a series of performance problems for both operator fusion optimization and code generation in dynamic shape scenes. This paper tackles the main challenges of dynamic shape optimization: the fusion optimization without shape value, and code generation supporting arbitrary shapes. To tackle the fundamental challenge of the absence of shape values, it systematically abstracts and excavates the shape information and designs a cross-level symbolic shape representation. With the insight that what fusion optimization relies upon is tensor shape relationships between adjacent operators rather than exact shape values, it proposes the dynamic shape fusion approach based on shape information propagation. To generate code that adapts to arbitrary shapes efficiently, it proposes a compile-time and runtime combined code generation approach. Finally, it presents a complete optimization pipeline for dynamic shape models and implements an industrial-grade ML compiler, named BladeDISC. The extensive evaluation demonstrates that BladeDISC outperforms PyTorch, TorchScript, TVM, ONNX Runtime, XLA, Torch Inductor (dynamic shape), and TensorRT by up to 6.95\u00d7, 6.25\u00d7, 4.08\u00d7, 2.04\u00d7, 2.06\u00d7, 7.92\u00d7, and 4.16\u00d7 (3.54\u00d7, 3.12\u00d7, 1.95\u00d7, 1.47\u00d7, 1.24\u00d7, 2.93\u00d7, and 1.46\u00d7 on average) in terms of end-to-end inference speedup on the A10 and T4 GPU, respectively. BladeDISC's source code is publicly available at https:\/\/github.com\/alibaba\/BladeDISC.<\/jats:p>","DOI":"10.1145\/3617327","type":"journal-article","created":{"date-parts":[[2023,11,13]],"date-time":"2023-11-13T22:28:39Z","timestamp":1699914519000},"page":"1-29","source":"Crossref","is-referenced-by-count":9,"title":["BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-2692-713X","authenticated-orcid":false,"given":"Zhen","family":"Zheng","sequence":"first","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6759-2616","authenticated-orcid":false,"given":"Zaifeng","family":"Pan","sequence":"additional","affiliation":[{"name":"Renmin University of China &amp; Alibaba Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9016-8446","authenticated-orcid":false,"given":"Dalin","family":"Wang","sequence":"additional","affiliation":[{"name":"Renmin University of China &amp; Alibaba Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2411-3349","authenticated-orcid":false,"given":"Kai","family":"Zhu","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7308-9542","authenticated-orcid":false,"given":"Wenyi","family":"Zhao","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-4535-9866","authenticated-orcid":false,"given":"Tianyou","family":"Guo","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-8803-928X","authenticated-orcid":false,"given":"Xiafei","family":"Qiu","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3843-2826","authenticated-orcid":false,"given":"Minmin","family":"Sun","sequence":"additional","affiliation":[{"name":"Alibaba Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-6805-4785","authenticated-orcid":false,"given":"Junjie","family":"Bai","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1983-7321","authenticated-orcid":false,"given":"Feng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5757-9135","authenticated-orcid":false,"given":"Xiaoyong","family":"Du","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7656-6428","authenticated-orcid":false,"given":"Jidong","family":"Zhai","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3003-0150","authenticated-orcid":false,"given":"Wei","family":"Lin","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2023,11,13]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Cited April 2023. Stablehlo backward compatible ML compute opset inspired by HLO\/MHLO. https:\/\/github.com\/openxla\/stablehlo."},{"key":"e_1_2_1_2_1","unstructured":"Cited January 2023. Basic Linear Algebra on NVIDIA GPUs. https:\/\/developer.nvidia.com\/cublas."},{"key":"e_1_2_1_3_1","unstructured":"Cited January 2023. CUDA Templates for Linear Algebra Subroutines. https:\/\/github.com\/NVIDIA\/cutlass."},{"key":"e_1_2_1_4_1","unstructured":"Cited January 2023. Introduction to TorchScript. https:\/\/pytorch.org\/tutorials\/beginner\/Intro_to_TorchScript_tutorial.html."},{"key":"e_1_2_1_5_1","unstructured":"Cited January 2023. IREE. https:\/\/github.com\/google\/iree."},{"key":"e_1_2_1_6_1","unstructured":"Cited January 2023. MLIR-HLO: A Standalone \"HLO\" MLIR-based Compiler. https:\/\/github.com\/tensorflow\/mlir-hlo."},{"key":"e_1_2_1_7_1","unstructured":"Cited January 2023. NVIDIA A10 GPU Accelerator. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/a10\/pdf\/A10-Product-Brief.pdf."},{"key":"e_1_2_1_8_1","unstructured":"Cited January 2023. NVIDIA Nsight Compute. https:\/\/developer.nvidia.com\/nsight-compute."},{"key":"e_1_2_1_9_1","unstructured":"Cited January 2023. NVIDIA TensorFlow User Guide. https:\/\/docs.nvidia.com\/deeplearning\/frameworks\/tensorflow-user-guide\/index.html."},{"key":"e_1_2_1_10_1","unstructured":"Cited January 2023. ONNX Runtime. https:\/\/onnxruntime.ai."},{"key":"e_1_2_1_11_1","unstructured":"Cited January 2023. TensorRT. https:\/\/developer.nvidia.com\/tensorrt."},{"key":"e_1_2_1_12_1","unstructured":"Cited January 2023. TensorRT Python API Reference. https:\/\/docs.nvidia.com\/deeplearning\/tensorrt\/api\/python_api\/."},{"key":"e_1_2_1_13_1","unstructured":"Cited January 2023. The Torch-MLIR Project. https:\/\/github.com\/llvm\/torch-mlir."},{"key":"e_1_2_1_14_1","unstructured":"Cited January 2023. XLA: Optimizing Compiler for Machine Learning. https:\/\/www.tensorflow.org\/xla."},{"key":"e_1_2_1_15_1","unstructured":"Cited June 2023. Effective Transformer. https:\/\/github.com\/bytedance\/effective_transformer."},{"key":"e_1_2_1_16_1","unstructured":"Cited March 2023. Introducing nvFuser a deep learning compiler for PyTorch. https:\/\/pytorch.org\/blog\/introducing-nvfuser-a-deep-learning-compiler-for-pytorch\/."},{"key":"e_1_2_1_17_1","unstructured":"Cited March 2023. NVIDIA FasterTransformer. https:\/\/github.com\/NVIDIA\/FasterTransformer."},{"key":"e_1_2_1_18_1","unstructured":"Cited March 2023. NVIDIA Triton Inference Server. https:\/\/developer.nvidia.com\/nvidia-triton-inference-server."},{"key":"e_1_2_1_19_1","unstructured":"Cited March 2023. PyTorch 2.0 Release. https:\/\/pytorch.org\/blog\/pytorch-2.0-release\/."},{"key":"e_1_2_1_20_1","unstructured":"Cited March 2023. TensorRT Dynamic Shape. https:\/\/docs.nvidia.com\/deeplearning\/tensorrt\/developer-guide\/index.html#work_dynamic_shapes."},{"key":"e_1_2_1_21_1","unstructured":"Cited March 2023. TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes. https:\/\/dev-discuss.pytorch.org\/t\/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes\/747."},{"key":"e_1_2_1_22_1","volume-title":"TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2--4, 2016, Kimberly Keeton and Timothy Roscoe (Eds.). USENIX Association, 265--283."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485462"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3547305.3547313"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335420"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661197"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3229865"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732286.2732292"},{"key":"e_1_2_1_29_1","first-page":"52","article-title":"SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs","volume":"37","author":"B\u00f6hm Matthias","year":"2014","unstructured":"Matthias B\u00f6hm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, and Yuanyuan Tian. 2014. SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull. 37, 3 (2014), 52--62.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_30_1","volume-title":"TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8--10, 2018, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 578--594."},{"key":"e_1_2_1_31_1","volume-title":"Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Lianmin Zheng, Eddie Q. Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to Optimize Tensor Programs. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3--8, 2018, Montr\u00e9al, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). 3393--3404. https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/8b5700012be65c9da25f49408d959ca0-Abstract.html"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824045"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2205.14135"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Amol Deshpande Zachary Ives and Vijayshankar Raman. 2007. Adaptive query processing. Foundations and Trends\u00ae in Databases 1 1 1--140.","DOI":"10.1561\/1900000001"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186."},{"key":"e_1_2_1_36_1","volume-title":"9th International Conference on Learning Representations, ICLR 2021","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3489496.3489500"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183734"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389739"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3574245.3574266"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517895"},{"key":"e_1_2_1_42_1","volume-title":"Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. IEEE Computer Society, 770--778."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517869"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749432"},{"key":"e_1_2_1_45_1","volume-title":"Dissecting the NVidia Turing T4 GPU via Microbenchmarking. CoRR abs\/1903.07486","author":"Jia Zhe","year":"2019","unstructured":"Zhe Jia, Marco Maggioni, Jeffrey Smith, and Daniele Paolo Scarpazza. 2019. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. CoRR abs\/1903.07486 (2019). arXiv:1903.07486 http:\/\/arxiv.org\/abs\/1903.07486"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359630"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503597"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551801"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00027"},{"key":"e_1_2_1_50_1","volume-title":"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In 8th International Conference on Learning Representations, ICLR 2020","author":"Lan Zhenzhong","year":"2020","unstructured":"Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net."},{"key":"e_1_2_1_51_1","volume-title":"MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In IEEE\/ACM International Symposium on Code Generation and Optimization, CGO 2021","author":"Lattner Chris","year":"2021","unstructured":"Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques A. Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In IEEE\/ACM International Symposium on Code Generation and Optimization, CGO 2021, Seoul, South Korea, February 27 - March 3, 2021, Jae W. Lee, Mary Lou Soffa, and Ayal Zaks (Eds.). IEEE, 2--14."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007328.3007331"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415530"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551828"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007642"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452773"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517902"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.14778\/3570690.3570697"},{"key":"e_1_2_1_59_1","volume-title":"Mixed Precision Training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.","author":"Micikevicius Paulius","year":"2018","unstructured":"Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David Garc\u00eda, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed Precision Training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002940"},{"key":"e_1_2_1_61_1","volume-title":"GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU. In PACT '20: International Conference on Parallel Architectures and Compilation Techniques","author":"Oh Chanyoung","year":"2020","unstructured":"Chanyoung Oh, Zhen Zheng, Xipeng Shen, Jidong Zhai, and Youngmin Yi. 2020. GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU. In PACT '20: International Conference on Parallel Architectures and Compilation Techniques, Virtual Event, GA, USA, October 3--7, 2020, Vivek Sarkar and Hyesoon Kim (Eds.). ACM, 43--54."},{"key":"e_1_2_1_62_1","volume-title":"High-Performance ML Serving. CoRR abs\/1712.06139","author":"Olston Christopher","year":"2017","unstructured":"Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. TensorFlow-Serving: Flexible, High-Performance ML Serving. CoRR abs\/1712.06139 (2017). arXiv:1712.06139 http:\/\/arxiv.org\/abs\/1712.06139"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.14778\/3213880.3213890"},{"key":"e_1_2_1_64_1","volume-title":"High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 8024--8035."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425890"},{"key":"e_1_2_1_66_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_2_1_67_1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (2020), 140:1--140:67.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_68_1","volume-title":"Dali: Lazy Compilation of Dynamic Computation Graphs. In Workshop on Systems for Machine Learning and Open Source Software at NeurIPS","author":"Raiman Jonathan","year":"2018","unstructured":"Jonathan Raiman. 2018. Dali: Lazy Compilation of Dynamic Computation Graphs. In Workshop on Systems for Machine Learning and Open Source Software at NeurIPS 2018."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517860"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/21.97458"},{"key":"e_1_2_1_71_1","first-page":"208","article-title":"Nimble: Efficiently compiling dynamic neural networks for model inference","volume":"3","author":"Shen Haichen","year":"2021","unstructured":"Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, and Yida Wang. 2021. Nimble: Efficiently compiling dynamic neural networks for model inference. Proceedings of Machine Learning and Systems 3 (2021), 208--222.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304072"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457244"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3315508.3329973"},{"key":"e_1_2_1_75_1","volume-title":"Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR abs\/1802.04730","author":"Vasilache Nicolas","year":"2018","unstructured":"Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR abs\/1802.04730 (2018)."},{"key":"e_1_2_1_76_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2018.03.016"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526134"},{"key":"e_1_2_1_79_1","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16--20","author":"Wolf Thomas","year":"2020","unstructured":"Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R\u00e9mi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16--20, 2020, Qun Liu and David Schlangen (Eds.). Association for Computational Linguistics, 38--45."},{"key":"e_1_2_1_80_1","volume-title":"Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation. In 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE, 107--118","author":"Wu Haicheng","year":"2012","unstructured":"Haicheng Wu, Gregory Diamos, Srihari Cadambi, and Sudhakar Yalamanchili. 2012. Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation. In 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE, 107--118."},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517905"},{"key":"e_1_2_1_82_1","volume-title":"Proceedings of Machine Learning and Systems 2022","author":"Xing Jiarong","year":"2022","unstructured":"Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, and Yibo Zhu. 2022. Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance. In Proceedings of Machine Learning and Systems 2022, MLSys 2022, Santa Clara, CA, USA, August 29 - September 1, 2022, Diana Marculescu, Yuejie Chi, and Carole-Jean Wu (Eds.). mlsys.org."},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517885"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3201531"},{"key":"e_1_2_1_85_1","volume-title":"Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022","author":"Yu Gyeong-In","year":"2022","unstructured":"Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun. 2022. Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022, Carlsbad, CA, USA, July 11--13, 2022, Marcos K. Aguilera and Hakim Weatherspoon (Eds.). USENIX Association, 521--538. https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/yu"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994511"},{"key":"e_1_2_1_87_1","volume-title":"Proceedings of Machine Learning and Systems 2022","author":"Zheng Bojian","year":"2022","unstructured":"Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, and Gennady Pekhimenko. 2022. DietCode: Automatic Optimization for Dynamic Tensor Programs. In Proceedings of Machine Learning and Systems 2022, MLSys 2022, Santa Clara, CA, USA, August 29 - September 1, 2022, Diana Marculescu, Yuejie Chi, and Carole-Jean Wu (Eds.). mlsys.org."},{"key":"e_1_2_1_88_1","volume-title":"Ansor: Generating High-Performance Tensor Programs for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020","author":"Zheng Lianmin","year":"2020","unstructured":"Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor: Generating High-Performance Tensor Programs for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4--6, 2020. USENIX Association, 863--879."},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123978"},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304032"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507723"},{"key":"e_1_2_1_92_1","volume-title":"Fusionstitching: boosting memory intensive computations for deep learning workloads. arXiv preprint arXiv:2009.10924","author":"Zheng Zhen","year":"2020","unstructured":"Zhen Zheng, Pengzhan Zhao, Guoping Long, Feiwen Zhu, Kai Zhu, Wenyi Zhao, Lansong Diao, Jun Yang, and Wei Lin. 2020. Fusionstitching: boosting memory intensive computations for deep learning workloads. arXiv preprint arXiv:2009.10924 (2020)."},{"key":"e_1_2_1_93_1","volume-title":"ROLLER: Fast and Efficient Tensor Compilation for Deep Learning. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022","author":"Zhu Hongyu","year":"2022","unstructured":"Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, and Gennady Pekhimenko. 2022. ROLLER: Fast and Efficient Tensor Compilation for Deep Learning. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022, Carlsbad, CA, USA, July 11--13, 2022, Marcos K. Aguilera and Hakim Weatherspoon (Eds.). USENIX Association, 233--248."},{"key":"e_1_2_1_94_1","volume-title":"Proceedings of the 1st Workshop on Machine Learning and Systemsg Virtual Event","author":"Zhu Kai","year":"2021","unstructured":"Kai Zhu, Wenyi Zhao, Zhen Zheng, Tianyou Guo, Pengzhan Zhao, Junjie Bai, Jun Yang, Xiaoyong Liu, Lansong Diao, and Wei Lin. 2021. DISC: A Dynamic Shape Compiler for Machine Learning Workloads. In EuroMLSys@EuroSys 2021, Proceedings of the 1st Workshop on Machine Learning and Systemsg Virtual Event, Edinburgh, Scotland, UK, 26 April, 2021, Eiko Yoneki and Paul Patras (Eds.). ACM, 89--95."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617327","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3617327","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:15Z","timestamp":1750178775000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617327"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,13]]},"references-count":94,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,11,13]]}},"alternative-id":["10.1145\/3617327"],"URL":"https:\/\/doi.org\/10.1145\/3617327","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,13]]}}}