{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T11:44:09Z","timestamp":1766231049280,"version":"3.48.0"},"publisher-location":"New York, NY, USA","reference-count":18,"publisher":"ACM","funder":[{"name":"National Science and Technology Council, Taiwan","award":["NSTC 113-2221-E-194-039-MY3"],"award-info":[{"award-number":["NSTC 113-2221-E-194-039-MY3"]}]},{"name":"National Science and Technology Council, Taiwan","award":["NSTC 113-2640-E-194-001"],"award-info":[{"award-number":["NSTC 113-2640-E-194-001"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,9,8]]},"DOI":"10.1145\/3750720.3757289","type":"proceedings-article","created":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T11:42:38Z","timestamp":1766230958000},"page":"80-89","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Optimizing Large Language Model Deployment with TVM: Overcoming Dynamic Shape and Scheduling Challenges"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-5697-1396","authenticated-orcid":false,"given":"Kang Wei","family":"Kuo","sequence":"first","affiliation":[{"name":"Computer Science and Information Engineering, National Chung Cheng University, ChiaYi, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3875-4150","authenticated-orcid":false,"given":"Peng-Sheng","family":"Chen","sequence":"additional","affiliation":[{"name":"Computer Science and Information Engineering, National Chung Cheng University, ChiaYi, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2025,12,20]]},"reference":[{"key":"e_1_3_3_1_2_2","series-title":"(OSDI\u201916)","first-page":"265","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek\u00a0G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI\u201916). USENIX Association, USA, 265\u2013283."},{"key":"e_1_3_3_1_3_2","series-title":"(OSDI\u201918)","first-page":"579","volume-title":"Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: an automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI\u201918). USENIX Association, USA, 579\u2013594."},{"key":"e_1_3_3_1_4_2","unstructured":"ONNX developers. 2025. ONNX: Open Neural Network Exchange. https:\/\/github.com\/onnx\/onnx."},{"key":"e_1_3_3_1_5_2","unstructured":"ONNX\u00a0Runtime developers. 2021. ONNX Runtime. https:\/\/onnxruntime.ai\/. Version: x.y.z."},{"key":"e_1_3_3_1_6_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:https:\/\/arXiv.org\/abs\/1810.04805\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3676641.3716249"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362694"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"publisher","unstructured":"Pengyu Mu Yi Liu Rui Wang Guoxiang Liu Zhonghao Sun Hailong Yang Zhongzhi Luan and Depei Qian. 2023. HAOTuner: A Hardware Adaptive Operator Auto-Tuner for Dynamic Shape Tensor Compilers. IEEE Trans. Comput. 72 11 (2023) 3178\u20133190. 10.1109\/TC.2023.3288758","DOI":"10.1109\/TC.2023.3288758"},{"key":"e_1_3_3_1_10_2","unstructured":"Pengyu Mu Linquan Wei Yi Liu and Rui Wang. 2024. FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers. CoRR abs\/2407.21418 (2024)."},{"key":"e_1_3_3_1_11_2","volume-title":"PyTorch: an imperative style, high-performance deep learning library","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: an imperative style, high-performance deep learning library. Curran Associates Inc., Red Hook, NY, USA."},{"key":"e_1_3_3_1_12_2","series-title":"(NIPS \u201922)","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"Shao Junru","year":"2022","unstructured":"Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody\u00a0Hao Yu, and Tianqi Chen. 2022. Tensor program optimization with probabilistic programs. In Proceedings of the 36th International Conference on Neural Information Processing Systems (New Orleans, LA, USA) (NIPS \u201922). Curran Associates Inc., Red Hook, NY, USA, Article 2593, 14\u00a0pages."},{"key":"e_1_3_3_1_13_2","unstructured":"Haichen Shen Jared Roesch Zhi Chen Wei Chen Yong Wu Mu Li Vin Sharma Zachary Tatlock and Yida Wang. 2021. Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference. arxiv:https:\/\/arXiv.org\/abs\/2006.03031\u00a0[cs.PL]"},{"key":"e_1_3_3_1_14_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan\u00a0N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640390"},{"key":"e_1_3_3_1_16_2","first-page":"848","volume-title":"Proceedings of Machine Learning and Systems","volume":"4","author":"Zheng Bojian","year":"2022","unstructured":"Bojian Zheng, Ziheng Jiang, Cody\u00a0Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, and Gennady Pekhimenko. 2022. DietCode: Automatic Optimization for Dynamic Tensor Programs. In Proceedings of Machine Learning and Systems , D.\u00a0Marculescu, Y.\u00a0Chi, and C.\u00a0Wu (Eds.), Vol.\u00a04. 848\u2013863."},{"key":"e_1_3_3_1_17_2","series-title":"(OSDI\u201920)","volume-title":"Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation","author":"Zheng Lianmin","year":"2020","unstructured":"Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody\u00a0Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph\u00a0E. Gonzalez, and Ion Stoica. 2020. Ansor: generating high-performance tensor programs for deep learning. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation(OSDI\u201920). USENIX Association, USA, Article 49, 17\u00a0pages."},{"key":"e_1_3_3_1_18_2","unstructured":"Yangjie Zhou Honglin Zhu Qian Qiu Weihao Cui Zihan Liu Cong Guo Siyuan Feng Jintao Meng Haidong Lan Jingwen Leng et\u00a0al. 2024. Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2409.01075 (2024)."},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437984.3458838"}],"event":{"name":"ICPP Workshops '25: The 54th International Conference on Parallel Processing Workshops","location":"San Diego CA USA","acronym":"ICPP Workshops '25"},"container-title":["Workshop Proceedings of the 54th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3750720.3757289","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T11:42:52Z","timestamp":1766230972000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3750720.3757289"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,8]]},"references-count":18,"alternative-id":["10.1145\/3750720.3757289","10.1145\/3750720"],"URL":"https:\/\/doi.org\/10.1145\/3750720.3757289","relation":{},"subject":[],"published":{"date-parts":[[2025,9,8]]},"assertion":[{"value":"2025-12-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}