{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:42:50Z","timestamp":1773247370421,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,1,20]],"date-time":"2025-01-20T00:00:00Z","timestamp":1737331200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,1,20]]},"DOI":"10.1145\/3658617.3697692","type":"proceedings-article","created":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T14:32:21Z","timestamp":1741098741000},"page":"148-154","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["ViDA: Video Diffusion Transformer Acceleration with Differential Approximation and Adaptive Dataflow"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-0582-4503","authenticated-orcid":false,"given":"Li","family":"Ding","sequence":"first","affiliation":[{"name":"Qingyuan Research Institute, Shanghai Jiao Tong Univ., Shanghai, Minhang, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8280-9072","authenticated-orcid":false,"given":"Jun","family":"Liu","sequence":"additional","affiliation":[{"name":"Qingyuan Research Institute, Shanghai Jiao Tong Univ., Shanghai, Minhang, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-2012-8540","authenticated-orcid":false,"given":"Shan","family":"Huang","sequence":"additional","affiliation":[{"name":"Qingyuan Research Institute, Shanghai Jiao Tong Univ., Shanghai, Minhang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0849-3252","authenticated-orcid":false,"given":"Guohao","family":"Dai","sequence":"additional","affiliation":[{"name":"Qingyuan Research Institute, Shanghai Jiao Tong Univ., Shanghai, Minhang, China"}]}],"member":"320","published-online":{"date-parts":[[2025,3,4]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02171"},{"key":"e_1_3_2_1_2_1","unstructured":"Fan Bao Shen Nie Kaiwen Xue Chongxuan Li Shi Pu Yaole Wang Gang Yue Yue Cao Hang Su and Jun Zhu. 2023. One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale. (2023)."},{"key":"e_1_3_2_1_3_1","unstructured":"Andreas Blattmann Tim Dockhorn Sumith Kulal Daniel Mendelevitch Maciej Kilian Dominik Lorenz Yam Levi Zion English Vikram Voleti Adam Letts et al. 2023. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)."},{"key":"e_1_3_2_1_4_1","unstructured":"Tim Brooks Bill Peebles Connor Holmes Will DePue Yufei Guo Li Jing David Schnurr Joe Taylor Troy Luhman Eric Luhman Clarence Ng Ricky Wang and Aditya Ramesh. 2024. Video generation models as world simulators. (2024). https:\/\/openai.com\/research\/video-generation-models-as-world-simulators"},{"key":"e_1_3_2_1_5_1","volume-title":"Videocrafter2: Overcoming data limitations for high-quality video diffusion models. arXiv preprint arXiv:2401.09047","author":"Chen Haoxin","year":"2024","unstructured":"Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, and Ying Shan. 2024. Videocrafter2: Overcoming data limitations for high-quality video diffusion models. arXiv preprint arXiv:2401.09047 (2024)."},{"key":"e_1_3_2_1_6_1","unstructured":"Junsong Chen Jincheng Yu Chongjian Ge Lewei Yao Enze Xie Yue Wu Zhongdao Wang James Kwok Ping Luo Huchuan Lu et al. 2023. PixArt-alpha: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis. arXiv preprint arXiv:2310.00426 (2023)."},{"key":"e_1_3_2_1_7_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)."},{"key":"e_1_3_2_1_8_1","unstructured":"Peng Gao Le Zhuo Chris Liu Ruoyi Du Xu Luo Longtian Qiu Yuhang Zhang et al. 2024. Lumina-T2X: Transforming Text into Any Modality Resolution and Duration via Flow-based Large Diffusion Transformers. arXiv preprint arXiv:2405.05945 (2024)."},{"key":"e_1_3_2_1_9_1","volume-title":"Model tells you what to discard: Adaptive kv cache compression for llms. arXiv preprint arXiv:2310.01801","author":"Ge Suyu","year":"2023","unstructured":"Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, and Jianfeng Gao. 2023. Model tells you what to discard: Adaptive kv cache compression for llms. arXiv preprint arXiv:2310.01801 (2023)."},{"key":"e_1_3_2_1_10_1","volume-title":"Gemmini: An agile systolic array generator enabling systematic evaluations of deep-learning architectures. arXiv preprint arXiv:1911.09925 3, 25","author":"Genc Hasan","year":"2019","unstructured":"Hasan Genc, Ameer Haj-Ali, Vighnesh Iyer, Alon Amid, Howard Mao, John Wright, Colin Schmidt, Jerry Zhao, Albert Ou, Max Banister, et al. 2019. Gemmini: An agile systolic array generator enabling systematic evaluations of deep-learning architectures. arXiv preprint arXiv:1911.09925 3, 25 (2019), 15--17."},{"key":"e_1_3_2_1_11_1","volume-title":"Denoising diffusion probabilistic models. Advances in neural information processing systems 33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840--6851."},{"key":"e_1_3_2_1_12_1","first-page":"8633","article-title":"Video diffusion models","volume":"35","author":"Ho Jonathan","year":"2022","unstructured":"Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. 2022. Video diffusion models. Advances in Neural Information Processing Systems 35 (2022), 8633--8646.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_13_1","first-page":"148","article-title":"FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics","volume":"6","author":"Hong Ke","year":"2024","unstructured":"Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Yuhan Dong, Yu Wang, et al. 2024. FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics. Proceedings of Machine Learning and Systems 6 (2024), 148--161.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_14_1","unstructured":"Inc. Intel. [n. d.]. Intel Xeon Platinum 8358P Processor. https:\/\/www.intel.com\/content\/www\/us\/en\/products\/sku\/212308\/intel-xeon-platinum-8358p-processor-48m-cache-2-60-ghz\/specifications.html."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575747"},{"key":"e_1_3_2_1_16_1","volume-title":"Cambricon-D: Full-Network Differential Acceleration for Diffusion Models. In 2024 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE.","author":"Kong Weihao","year":"2024","unstructured":"Weihao Kong, Hao Yifan, Qi Guo, Yongwei Zhao, Xinkai Song, Xiaqing Li, Mo Zou, Zidong Du, Rui Zhang, Chang Liu, Yuanbo Wen, Pengwei Jin, Xing Hu, Wei Li, Zhiwei Xu, and Tianshi Chen. 2024. Cambricon-D: Full-Network Differential Acceleration for Diffusion Models. In 2024 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE."},{"key":"e_1_3_2_1_17_1","volume-title":"d.]. kuaishou (kuaishou.com). https:\/\/klingai.org\/zh Accessed","author":"Inc. Kwai. [n.","year":"2024","unstructured":"Inc. Kwai. [n. d.]. kuaishou (kuaishou.com). https:\/\/klingai.org\/zh Accessed June 6, 2024."},{"key":"e_1_3_2_1_18_1","unstructured":"Steven Levy. [n. d.]. OpenAI's Sora Turns AI Prompts Into Photorealistic Videos. https:\/\/www.wired.com\/story\/openai-sora-generative-ai-video\/."},{"key":"e_1_3_2_1_19_1","volume-title":"MARCA: Mamba Accelerator with ReConfigurable Architecture. arXiv preprint arXiv:2409.11440","author":"Li Jinhao","year":"2024","unstructured":"Jinhao Li, Shan Huang, Jiaming Xu, Jun Liu, Li Ding, Ningyi Xu, and Guohao Dai. 2024. MARCA: Mamba Accelerator with ReConfigurable Architecture. arXiv preprint arXiv:2409.11440 (2024)."},{"key":"e_1_3_2_1_20_1","volume-title":"The Twelfth International Conference on Learning Representations.","author":"Lu Haoyu","year":"2023","unstructured":"Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, and Mingyu Ding. 2023. Vdt: General-purpose video diffusion transformers via mask modeling. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_1_21_1","volume-title":"Latte: Latent diffusion transformer for video generation. arXiv preprint arXiv:2401.03048","author":"Ma Xin","year":"2024","unstructured":"Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, and Yu Qiao. 2024. Latte: Latent diffusion transformer for video generation. arXiv preprint arXiv:2401.03048 (2024)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00020"},{"key":"e_1_3_2_1_23_1","volume-title":"d.]. HewlettPackard\/cacti. https:\/\/github.com\/HewlettPackard\/cacti Accessed","author":"Naveen Muralimanohar Ali Shafiee","year":"2023","unstructured":"Ali Shafiee Naveen Muralimanohar and Vaishnav Srinivas. [n. d.]. HewlettPackard\/cacti. https:\/\/github.com\/HewlettPackard\/cacti Accessed May 22, 2023."},{"key":"e_1_3_2_1_24_1","unstructured":"Inc. NVIDIA. [n. d.]. NVIDIA A100 Tensor Core GPU Architecture. https:\/\/images.nvidia.com\/aem-dam\/en-zz\/Solutions\/data-center\/nvidiaampere-architecture-whitepaper.pdf.."},{"key":"e_1_3_2_1_25_1","unstructured":"Inc. NVIDIA. 2024. CUDA Event API. https:\/\/docs.nvidia.com\/cuda\/cuda-runtime-api\/index.html. (2024)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01112"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00387"},{"key":"e_1_3_2_1_28_1","unstructured":"Python. [n. d.]. Python Time Library. https:\/\/docs.python.org\/3\/library\/time.html."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS51556.2021.9401196"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640393"},{"key":"e_1_3_2_1_32_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.vlsi.2017.02.002"},{"key":"e_1_3_2_1_34_1","volume-title":"d.]. Design Compiler (synopsys.com). https:\/\/www.synopsys.com\/implementation-and-signoff\/rtl-synthesis-test\/dc-ultra.html Accessed","author":"Inc. Synopsys. [n.","year":"2023","unstructured":"Inc. Synopsys. [n. d.]. Design Compiler (synopsys.com). https:\/\/www.synopsys.com\/implementation-and-signoff\/rtl-synthesis-test\/dc-ultra.html Accessed May 22, 2023."},{"key":"e_1_3_2_1_35_1","volume-title":"FVD: A new metric for video generation.","author":"Unterthiner Thomas","year":"2019","unstructured":"Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Rapha\u00ebl Marinier, Marcin Michalski, and Sylvain Gelly. 2019. FVD: A new metric for video generation. (2019)."},{"key":"e_1_3_2_1_36_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_37_1","volume-title":"2024 61th ACM\/IEEE Design Automation Conference (DAC). IEEE.","author":"Wang Xuhang","year":"2024","unstructured":"Xuhang Wang, Zhuoran Song, and Xiaoyao Liang. 2024. InterArch: Video Transformer Acceleration via Inter-Feature Deduplication with Cube-based Dataflow. In 2024 61th ACM\/IEEE Design Automation Conference (DAC). IEEE."},{"key":"e_1_3_2_1_38_1","volume-title":"Godiva: Generating open-domain videos from natural descriptions. arXiv preprint arXiv:2104.14806","author":"Wu Chenfei","year":"2021","unstructured":"Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, and Nan Duan. 2021. Godiva: Generating open-domain videos from natural descriptions. arXiv preprint arXiv:2104.14806 (2021)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.571"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071027"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626202.3637562"},{"key":"e_1_3_2_1_42_1","unstructured":"Zangwei Zheng Xiangyu Peng and Yang You. 2024. Open-Sora: Democratizing Efficient Video Production for All. https:\/\/github.com\/hpcaitech\/Open-Sora"},{"key":"e_1_3_2_1_43_1","unstructured":"Zixuan Zhou Xuefei Ning Ke Hong Tianyu Fu Jiaming Xu Shiyao Li Yuming Lou Luning Wang Zhihang Yuan Xiuhong Li et al. 2024. A survey on efficient inference for large language models. arXiv preprint arXiv:2404.14294 (2024)."}],"event":{"name":"ASPDAC '25: 30th Asia and South Pacific Design Automation Conference","location":"Tokyo Japan","acronym":"ASPDAC '25","sponsor":["SIGDA ACM Special Interest Group on Design Automation","IEICE","IPSJ","IEEE CAS","IEEE CEDA"]},"container-title":["Proceedings of the 30th Asia and South Pacific Design Automation Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3658617.3697692","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3658617.3697692","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:49Z","timestamp":1750295869000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3658617.3697692"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,20]]},"references-count":43,"alternative-id":["10.1145\/3658617.3697692","10.1145\/3658617"],"URL":"https:\/\/doi.org\/10.1145\/3658617.3697692","relation":{},"subject":[],"published":{"date-parts":[[2025,1,20]]},"assertion":[{"value":"2025-03-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}