{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:43:26Z","timestamp":1773193406469,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":70,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,1,28]]},"DOI":"10.1145\/3774934.3786420","type":"proceedings-article","created":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T15:25:57Z","timestamp":1769613957000},"page":"522-536","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8630-7959","authenticated-orcid":false,"given":"Desen","family":"Sun","sequence":"first","affiliation":[{"name":"University of Waterloo, Waterloo, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5791-3172","authenticated-orcid":false,"given":"Zepeng","family":"Zhao","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1634-8549","authenticated-orcid":false,"given":"Yuke","family":"Wang","sequence":"additional","affiliation":[{"name":"Rice University, Houston, USA"}]}],"member":"320","published-online":{"date-parts":[[2026,1,28]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Adobe. 2023. Create with Adobe Firefly generative AI. https:\/\/www.adobe.com\/products\/firefly.html"},{"key":"e_1_3_2_1_2_1","volume-title":"Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)","author":"Agarwal Shubham","year":"2024","unstructured":"Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, and Shiv Kumar Saini. 2024. Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA. 1173\u20131189. isbn:978-1-939133-39-7 https:\/\/www.usenix.org\/conference\/nsdi24\/presentation\/agarwal-shubham"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3617232.3624849"},{"key":"e_1_3_2_1_4_1","volume-title":"DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling. In Eighth Conference on Machine Learning and Systems. https:\/\/openreview.net\/forum?id=1N3ShLfcTf","author":"Ahmad Sohaib","year":"2025","unstructured":"Sohaib Ahmad, Qizheng Yang, Haoliang Wang, Ramesh K. Sitaraman, and Hui Guan. 2025. DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling. In Eighth Conference on Machine Learning and Systems. https:\/\/openreview.net\/forum?id=1N3ShLfcTf"},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 628\u2013639","author":"Bokhovkin Aleksey","year":"2025","unstructured":"Aleksey Bokhovkin, Quan Meng, Shubham Tulsiani, and Angela Dai. 2025. SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 628\u2013639."},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 28694\u201328704","author":"Chen Haoyu","year":"2025","unstructured":"Haoyu Chen, Xiaojie Xu, Wenbo Li, Jingjing Ren, Tian Ye, Songhua Liu, Ying-Cong Chen, Lei Zhu, and Xinchao Wang. 2025. POSTA: A Go-to Framework for Customized Artistic Poster Generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 28694\u201328704."},{"key":"e_1_3_2_1_7_1","volume-title":"OTAS: An Elastic Transformer Serving System via Token Adaptation. arxiv:2401.05031.","author":"Chen Jinyu","year":"2024","unstructured":"Jinyu Chen, Wenchao Xu, Zicong Hong, Song Guo, Haozhao Wang, Jie Zhang, and Deze Zeng. 2024. OTAS: An Elastic Transformer Serving System via Token Adaptation. arxiv:2401.05031."},{"key":"e_1_3_2_1_8_1","unstructured":"Jiuhai Chen Zhiyang Xu Xichen Pan Yushi Hu Can Qin Tom Goldstein Lifu Huang Tianyi Zhou Saining Xie Silvio Savarese Le Xue Caiming Xiong and Ran Xu. 2025. BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture Training and Dataset. arxiv:2505.09568. arxiv:2505.09568"},{"key":"e_1_3_2_1_9_1","volume-title":"Piotr Dollar, and C. Lawrence Zitnick","author":"Chen Xinlei","year":"2015","unstructured":"Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C. Lawrence Zitnick. 2015. Microsoft COCO Captions: Data Collection and Evaluation Server. arxiv:1504.00325."},{"key":"e_1_3_2_1_10_1","unstructured":"Xinle Cheng Zhuoming Chen and Zhihao Jia. 2025. CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models. arxiv:2502.00433. arxiv:2502.00433"},{"key":"e_1_3_2_1_11_1","volume-title":"Advances in Neural Information Processing Systems","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, Inc., 8780\u20138794. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf"},{"key":"e_1_3_2_1_12_1","volume-title":"The Thirteenth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=gg6dPtdC1C","author":"Eldesokey Abdelrahman","year":"2025","unstructured":"Abdelrahman Eldesokey and Peter Wonka. 2025. Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation. In The Thirteenth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=gg6dPtdC1C"},{"key":"e_1_3_2_1_13_1","unstructured":"Patrick Esser Sumith Kulal Andreas Blattmann Rahim Entezari Jonas M\u00fcller Harry Saini Yam Levi Dominik Lorenz Axel Sauer Frederic Boesel Dustin Podell Tim Dockhorn Zion English Kyle Lacey Alex Goodwin Yannik Marek and Robin Rombach. 2024. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. arxiv:2403.03206. arxiv:2403.03206"},{"key":"e_1_3_2_1_14_1","unstructured":"Jiarui Fang Jinzhe Pan Xibo Sun Aoyu Li and Jiannan Wang. 2024. xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism. arxiv:2411.01738. arxiv:2411.01738"},{"key":"e_1_3_2_1_15_1","unstructured":"Jiarui Fang Jinzhe Pan Jiannan Wang Aoyu Li and Xibo Sun. 2024. PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference. arxiv:2405.14430. arxiv:2405.14430"},{"key":"e_1_3_2_1_16_1","volume-title":"Rodrigues","author":"Fardo Fernando A.","year":"2016","unstructured":"Fernando A. Fardo, Victor H. Conforto, Francisco C. de Oliveira, and Paulo S. Rodrigues. 2016. A Formal Evaluation of PSNR as Quality Measurement Parameter for Image Segmentation Algorithms. arxiv:1605.07116. arxiv:1605.07116"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8083\u20138093","author":"Gao Yifan","year":"2025","unstructured":"Yifan Gao, Zihang Lin, Chuanbin Liu, Min Zhou, Tiezheng Ge, Bo Zheng, and Hongtao Xie. 2025. PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8083\u20138093."},{"key":"e_1_3_2_1_18_1","volume-title":"Matryoshka Diffusion Models. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=tOzCcDdH9O","author":"Gu Jiatao","year":"2024","unstructured":"Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Joshua M. Susskind, and Navdeep Jaitly. 2024. Matryoshka Diffusion Models. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=tOzCcDdH9O"},{"key":"e_1_3_2_1_19_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Gujarati Arpan","year":"2020","unstructured":"Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like Clockwork: Performance Predictability from the Bottom Up. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 443\u2013462. isbn:978-1-939133-19-9 https:\/\/www.usenix.org\/conference\/osdi20\/presentation\/gujarati"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA61900.2025.00034"},{"key":"e_1_3_2_1_22_1","volume-title":"Advances in Neural Information Processing Systems","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc.. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/8a1d694707eb0fefe65871369074926d-Paper.pdf"},{"key":"e_1_3_2_1_23_1","unstructured":"Runhui Huang Chunwei Wang Junwei Yang Guansong Lu Yunlong Yuan Jianhua Han Lu Hou Wei Zhang Lanqing Hong Hengshuang Zhao and Hang Xu. 2025. ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement. arxiv:2504.01934. arxiv:2504.01934"},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23646\u201323657","author":"Huang Zehuan","year":"2025","unstructured":"Zehuan Huang, Yuan-Chen Guo, Xingqiao An, Yunhan Yang, Yangguang Li, Zi-Xin Zou, Ding Liang, Xihui Liu, Yan-Pei Cao, and Lu Sheng. 2025. MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23646\u201323657."},{"key":"e_1_3_2_1_25_1","unstructured":"Xiaoxiao Jiang Suyi Li Lingyun Yang Tianyu Feng Zhipeng Di Weiyi Lu Guoxuan Zhu Xiu Lin Kan Liu Yinghao Yu Tao Lan Guodong Yang Lin Qu Liping Zhang and Wei Wang. 2025. InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling. arxiv:2505.20600. arxiv:2505.20600"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_3_2_1_27_1","unstructured":"Benjamin Lefaudeux Francisco Massa Diana Liskovich Wenhan Xiong Vittorio Caggiano Sean Naren Min Xu Jieru Hu Marta Tintore Susan Zhang Patrick Labatut Daniel Haziza Luca Wehrstedt Jeremy Reizenstein and Grigory Sizov. 2022. xFormers: A modular and hackable Transformer modelling library. https:\/\/github.com\/facebookresearch\/xformers"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00686"},{"key":"e_1_3_2_1_29_1","volume-title":"Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). 35, Curran Associates","author":"Li Muyang","year":"2022","unstructured":"Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, and Jun-Yan Zhu. 2022. Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). 35, Curran Associates, Inc., 28858\u201328873. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/b9603de9e49d0838e53b6c9cf9d06556-Paper-Conference.pdf"},{"key":"e_1_3_2_1_30_1","unstructured":"Zongming Li Tianheng Cheng Shoufa Chen Peize Sun Haocheng Shen Longjin Ran Xiaoxin Chen Wenyu Liu and Xinggang Wang. 2024. ControlAR: Controllable Image Generation with Autoregressive Models. arxiv:2410.02705. arxiv:2410.02705"},{"key":"e_1_3_2_1_31_1","unstructured":"Zhimin Li Jianwei Zhang Qin Lin Jiangfeng Xiong Yanxin Long Xinchi Deng Yingfang Zhang Xingchao Liu Minbin Huang Zedong Xiao Dayou Chen Jiajun He Jiahao Li Wenyue Li Chen Zhang Rongwei Quan Jianxiang Lu Jiabin Huang Xiaoyan Yuan Xiaoxiao Zheng Yixuan Li Jihong Zhang Chao Zhang Meng Chen Jie Liu Zheng Fang Weiyan Wang Jinbao Xue Yangyu Tao Jianchen Zhu Kai Liu Sihuan Lin Yifu Sun Yun Li Dongdong Wang Mingtao Chen Zhichao Hu Xiao Xiao Yan Chen Yuhong Liu Wei Liu Di Wang Yong Yang Jie Jiang and Qinglin Lu. 2024. Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding. arxiv:2405.08748. arxiv:2405.08748"},{"key":"e_1_3_2_1_32_1","volume-title":"AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Li Zhuohan","year":"2023","unstructured":"Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association, Boston, MA. 663\u2013679. isbn:978-1-939133-34-2 https:\/\/www.usenix.org\/conference\/osdi23\/presentation\/li-zhouhan"},{"key":"e_1_3_2_1_33_1","unstructured":"Shiyu Liu Yucheng Han Peng Xing Fukun Yin Rui Wang Wei Cheng Jiaqi Liao Yingming Wang Honghao Fu Chunrui Han Guopeng Li Yuang Peng Quan Sun Jingwei Wu Yan Cai Zheng Ge Ranchen Ming Lei Xia Xianfang Zeng Yibo Zhu Binxing Jiao Xiangyu Zhang Gang Yu and Daxin Jiang. 2025. Step1X-Edit: A Practical Framework for General Image Editing. arxiv:2504.17761. arxiv:2504.17761"},{"key":"e_1_3_2_1_34_1","volume-title":"Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). 35, Curran Associates","author":"Lu Cheng","year":"2022","unstructured":"Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan LI, and Jun Zhu. 2022. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). 35, Curran Associates, Inc., 5775\u20135787. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/260a14acce2a89dad36adc8eefe7c59e-Paper-Conference.pdf"},{"key":"e_1_3_2_1_35_1","unstructured":"Cheng Lu Yuhao Zhou Fan Bao Jianfei Chen Chongxuan Li and Jun Zhu. 2023. DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models. arxiv:2211.01095."},{"key":"e_1_3_2_1_36_1","unstructured":"Simian Luo Yiqin Tan Longbo Huang Jian Li and Hang Zhao. 2024. Latent Consistency Models: Synthesizing High-Resolution Images with Few-step Inference. https:\/\/openreview.net\/forum?id=duBCwjb68o"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i6.32636"},{"key":"e_1_3_2_1_38_1","volume-title":"The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=ZupoMzMNrO","author":"Ma Xinyin","year":"2024","unstructured":"Xinyin Ma, Gongfan Fang, Michael Bi Mi, and Xinchao Wang. 2024. Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=ZupoMzMNrO"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01492"},{"key":"e_1_3_2_1_40_1","unstructured":"Midjourney. 2023. midjourney. https:\/\/www.midjourney.com"},{"key":"e_1_3_2_1_41_1","unstructured":"Alexander Quinn Nichol Prafulla Dhariwal Aditya Ramesh Pranav Shyam Pamela Mishkin Bob Mcgrew Ilya Sutskever and Mark Chen. 2022. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In Proceedings of the 39th International Conference on Machine Learning Kamalika Chaudhuri Stefanie Jegelka Le Song Csaba Szepesvari Gang Niu and Sivan Sabato (Eds.) (Proceedings of Machine Learning Research Vol. 162). PMLR 16784\u201316804. https:\/\/proceedings.mlr.press\/v162\/nichol22a.html"},{"key":"e_1_3_2_1_42_1","unstructured":"OpenAI. 2023. Dalle 3 System Card. https:\/\/cdn.openai.com\/papers\/DALL_E_3_System_Card.pdf"},{"key":"e_1_3_2_1_43_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). 32, Curran Associates, Inc.. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf"},{"key":"e_1_3_2_1_44_1","volume-title":"Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., 12","author":"Pedregosa Fabian","year":"2011","unstructured":"Fabian Pedregosa, Ga\u00ebl Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and \u00c9douard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., 12 (2011), nov, 2825\u20132830. issn:1532-4435"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00387"},{"key":"e_1_3_2_1_46_1","volume-title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arxiv:2307.01952.","author":"Podell Dustin","year":"2023","unstructured":"Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M\u00fcller, Joe Penna, and Robin Rombach. 2023. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arxiv:2307.01952."},{"key":"e_1_3_2_1_47_1","unstructured":"Junxiang Qiu Lin Liu Shuo Wang Jinda Lu Kezhou Chen and Yanbin Hao. 2025. Accelerating Diffusion Transformer via Gradient-Optimized Cache. arxiv:2503.05156. arxiv:2503.05156"},{"key":"e_1_3_2_1_48_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (Proceedings of Machine Learning Research","volume":"8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (Proceedings of Machine Learning Research, Vol. 139). PMLR, 8748\u20138763. https:\/\/proceedings.mlr.press\/v139\/radford21a.html"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Sebastian Raschka Joshua Patterson and Corey Nolet. 2020. Machine Learning in Python: Main developments and technology trends in data science machine learning and artificial intelligence. arXiv preprint arXiv:2002.04803.","DOI":"10.3390\/info11040193"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_1_51_1","volume-title":"INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Romero Francisco","year":"2021","unstructured":"Francisco Romero, Qian Li, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 397\u2013411. isbn:978-1-939133-23-6 https:\/\/www.usenix.org\/conference\/atc21\/presentation\/romero"},{"key":"e_1_3_2_1_52_1","volume-title":"Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi.","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). 35, Curran Associates, Inc., 36479\u201336494. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/ec795aeadae0b7d230fa35cbaf04c041-Paper-Conference.pdf"},{"key":"e_1_3_2_1_53_1","unstructured":"Desen Sun Henry Tian Tim Lu and Sihang Liu. 2024. FlexCache: Flexible Approximate Cache System for Video Diffusion. arxiv:2501.04012. arxiv:2501.04012"},{"key":"e_1_3_2_1_54_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21170\u201321180","author":"Sun Mingze","year":"2025","unstructured":"Mingze Sun, Junhao Chen, Junting Dong, Yurun Chen, Xinyu Jiang, Shiwei Mao, Puhua Jiang, Jingbo Wang, Bo Dai, and Ruqi Huang. 2025. DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21170\u201321180."},{"key":"e_1_3_2_1_55_1","unstructured":"A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_1_56_1","volume-title":"Diffusers: State-of-the-art diffusion models. https:\/\/github.com\/huggingface\/diffusers","author":"von Platen Patrick","year":"2022","unstructured":"Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, and Thomas Wolf. 2022. Diffusers: State-of-the-art diffusion models. https:\/\/github.com\/huggingface\/diffusers"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2023.3276759"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681373"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.51"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00594"},{"key":"e_1_3_2_1_62_1","unstructured":"Yuchen Xia Divyam Sharma Yichao Yuan Souvik Kundu and Nishil Talati. 2025. MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models. arxiv:2503.11972. arxiv:2503.11972"},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00608"},{"key":"e_1_3_2_1_64_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2857\u20132869","author":"Yang Yuanbo","year":"2025","unstructured":"Yuanbo Yang, Jiahao Shao, Xinyang Li, Yujun Shen, Andreas Geiger, and Yiyi Liao. 2025. Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2857\u20132869."},{"key":"e_1_3_2_1_65_1","volume-title":"Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Yu Gyeong-In","year":"2022","unstructured":"Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun. 2022. Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA. 521\u2013538. isbn:978-1-939133-28-1 https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/yu"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i15.29599"},{"key":"e_1_3_2_1_67_1","unstructured":"Zichao Yu Zhen Zou Guojiang Shao Chengwei Zhang Shengze Xu Jie Huang Feng Zhao Xiaodong Cun and Wenyi Zhang. 2025. AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse. arxiv:2504.10540. arxiv:2504.10540"},{"key":"e_1_3_2_1_68_1","volume-title":"SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention. arxiv:2509.24006. arxiv:2509.24006","author":"Zhang Jintao","year":"2025","unstructured":"Jintao Zhang, Haoxu Wang, Kai Jiang, Shuo Yang, Kaiwen Zheng, Haocheng Xi, Ziteng Wang, Hongzhou Zhu, Min Zhao, Ion Stoica, Joseph E. Gonzalez, Jun Zhu, and Jianfei Chen. 2025. SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention. arxiv:2509.24006. arxiv:2509.24006"},{"key":"e_1_3_2_1_69_1","unstructured":"Jintao Zhang Chendong Xiang Haofeng Huang Jia Wei Haocheng Xi Jun Zhu and Jianfei Chen. 2025. SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference. arxiv:2502.18137. arxiv:2502.18137"},{"key":"e_1_3_2_1_70_1","volume-title":"DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)","author":"Zhong Yinmin","year":"2024","unstructured":"Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA. 193\u2013210. isbn:978-1-939133-40-3 https:\/\/www.usenix.org\/conference\/osdi24\/presentation\/zhong-yinmin"}],"event":{"name":"PPoPP '26: 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming","location":"Sydney NSW Australia","acronym":"PPoPP '26","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing","SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3774934.3786420","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T15:30:00Z","timestamp":1769614200000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3774934.3786420"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,28]]},"references-count":70,"alternative-id":["10.1145\/3774934.3786420","10.1145\/3774934"],"URL":"https:\/\/doi.org\/10.1145\/3774934.3786420","relation":{},"subject":[],"published":{"date-parts":[[2026,1,28]]},"assertion":[{"value":"2026-01-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}