{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T15:58:01Z","timestamp":1781539081627,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T00:00:00Z","timestamp":1781481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"funder":[{"name":"National Key R&D Program of China","award":["2024YFE0212000"],"award-info":[{"award-number":["2024YFE0212000"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,6,16]]},"DOI":"10.1145\/3805622.3810600","type":"proceedings-article","created":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T14:42:57Z","timestamp":1781534577000},"page":"1870-1878","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Toward Generation-Centric Coding: Compressing Latents representation for TI2V Synthesis"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5835-7913","authenticated-orcid":false,"given":"Jianran","family":"Liu","sequence":"first","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6895-3404","authenticated-orcid":false,"given":"Wen","family":"Ji","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China and Institute of AI for Industries, CAS, Nanjing, Jiangsu, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1348-1168","authenticated-orcid":false,"given":"Xiaokai","family":"Meng","sequence":"additional","affiliation":[{"name":"Electric Power Research Institute, State Grid Shanxi Electric Power Co., Ltd, Taiyuan, Shanxi, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8943-8139","authenticated-orcid":false,"given":"Wancai","family":"Zhang","sequence":"additional","affiliation":[{"name":"NARI Technology Co., Ltd., Nanjing, Jiangsu, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8022-6685","authenticated-orcid":false,"given":"Ying","family":"Wang","sequence":"additional","affiliation":[{"name":"LonganPi Intelligent Information Technology Co., Ltd., Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,6,15]]},"reference":[{"key":"e_1_3_3_1_2_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Ball\u00e9 Johannes","year":"2018","unstructured":"Johannes Ball\u00e9, David Minnen, Saurabh Singh, Sung\u00a0Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_1_3_2","unstructured":"Fabrice Bellard. 2014. BPG Image Format. https:\/\/bellard.org\/bpg\/."},{"key":"e_1_3_3_1_4_2","unstructured":"Gisle Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. ITU SG16 Doc. VCEG-M33 (2001)."},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Benjamin Bross Ye-Kui Wang Yan Ye Shan Liu Jianle Chen Gary\u00a0J Sullivan and Jens-Rainer Ohm. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31 10 (2021) 3736\u20133764.","DOI":"10.1109\/TCSVT.2021.3101953"},{"key":"e_1_3_3_1_6_2","doi-asserted-by":"crossref","unstructured":"Hila Chefer Shiran Zada Roni Paiss Ariel Ephrat Omer Tov Michael Rubinstein Lior Wolf Tali Dekel Tomer Michaeli and Inbar Mosseri. 2024. Still-moving: Customized video generation without customized video data. ACM Transactions on Graphics 43 6 (2024) 1\u201311.","DOI":"10.1145\/3687945"},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Hao Chen Yujin Han Diganta Misra Xiang Li Kai Hu Difan Zou Masashi Sugiyama Jindong Wang and Bhiksha Raj. 2024. Slight corruption in pre-training data makes better diffusion models. Advances in Neural Information Processing Systems 37 (2024) 126149\u2013126206.","DOI":"10.52202\/079017-4008"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP40778.2020.9190843"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00796"},{"key":"e_1_3_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01268"},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"crossref","unstructured":"Lin\u00a0Geng Foo Hossein Rahmani and Jun Liu. 2025. AI-generated content (AIGC) for various data modalities: A survey. Comput. Surveys 57 9 (2025) 1\u201366.","DOI":"10.1145\/3728633"},{"key":"e_1_3_3_1_12_2","doi-asserted-by":"crossref","unstructured":"Henry Gouk Eibe Frank Bernhard Pfahringer and Michael\u00a0J Cree. 2021. Regularisation of neural networks by enforcing lipschitz continuity. Machine Learning 110 2 (2021) 393\u2013416.","DOI":"10.1007\/s10994-020-05929-w"},{"key":"e_1_3_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00563"},{"key":"e_1_3_3_1_14_2","doi-asserted-by":"crossref","unstructured":"Yaosi Hu Chong Luo and Zhenzhong Chen. 2023. A benchmark for controllable text-image-to-video generation. IEEE Transactions on Multimedia 26 (2023) 1706\u20131719.","DOI":"10.1109\/TMM.2023.3284989"},{"key":"e_1_3_3_1_15_2","unstructured":"Joon-Young Jeong Hyun-Jeong Kim Jong-Seok Park and Hyun-Wook Park. 2025. Analysis of the feature compression for machines. IEEE Access 13 (2025) 15299\u201315314."},{"key":"e_1_3_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02465"},{"key":"e_1_3_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Wei Jiang Yongqi Zhai Jiayu Yang Feng Gao and Ronggang Wang. 2025. MLICv2: Enhanced multi-reference entropy modeling for learned image compression. ACM Transactions on Multimedia Computing Communications and Applications (2025).","DOI":"10.1145\/3785671"},{"key":"e_1_3_3_1_18_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Li Jiachen","year":"2025","unstructured":"Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, and William\u00a0Yang Wang. 2025. T2V-Turbo-v2: Enhancing Video Model Post-Training through Data, Reward, and Conditional Guidance Design. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"crossref","unstructured":"Mingxuan Li and Wen Ji. 2023. Lightweight multiattention recursive residual CNN-based in-loop filter driven by neuron diversity. IEEE Transactions on Circuits and Systems for Video Technology 33 11 (2023) 6996\u20137008.","DOI":"10.1109\/TCSVT.2023.3270729"},{"key":"e_1_3_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Mingxiang Liao Qixiang Ye Wangmeng Zuo Fang Wan Tianyu Wang Yuzhong Zhao Jingdong Wang Xinyu Zhang et\u00a0al. 2024. Evaluation of text-to-video generation models: A dynamics perspective. Advances in Neural Information Processing Systems 37 (2024) 109790\u2013109816.","DOI":"10.52202\/079017-3483"},{"key":"e_1_3_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01383"},{"key":"e_1_3_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP46576.2022.9897595"},{"key":"e_1_3_3_1_24_2","unstructured":"Chika Maduabuchi Hao Chen Yujin Han and Jindong Wang. 2025. Corruption-Aware training of latent video diffusion models for robust text-to-Video generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2505.21545 (2025)."},{"key":"e_1_3_3_1_25_2","unstructured":"David Minnen Johannes Ball\u00e9 and George\u00a0D Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. Advances in Neural Information Processing Systems 31 (2018)."},{"key":"e_1_3_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i5.28226"},{"key":"e_1_3_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00861"},{"key":"e_1_3_3_1_28_2","doi-asserted-by":"crossref","unstructured":"Keqiang Sun Junting Pan Yuying Ge Hao Li Haodong Duan Xiaoshi Wu Renrui Zhang Aojun Zhou Zipeng Qin Yi Wang et\u00a0al. 2023. JourneyDB: A benchmark for generative image understanding. Advances in Neural Information Processing Systems 36 (2023) 49659\u201349678.","DOI":"10.52202\/075280-2161"},{"key":"e_1_3_3_1_29_2","unstructured":"Lucas Theis Wenzhe Shi Andrew Cunningham and Ferenc Husz\u00e1r. 2017. Lossy image compression with compressive autoencoders. Stat 1050 (2017) 1."},{"key":"e_1_3_3_1_30_2","volume-title":"Proceedings of the International Conference on Learning Representations Workshops","author":"Unterthiner Thomas","year":"2019","unstructured":"Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Rapha\u00ebl Marinier, Marcin Michalski, and Sylvain Gelly. 2019. FVD: A new metric for video generation. In Proceedings of the International Conference on Learning Representations Workshops."},{"key":"e_1_3_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i8.32861"},{"key":"e_1_3_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACSSC.2003.1292216"},{"key":"e_1_3_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICMEW53276.2021.9455944"},{"key":"e_1_3_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/VCIP56404.2022.10008806"},{"key":"e_1_3_3_1_35_2","first-page":"399","volume-title":"European Conference on Computer Vision","author":"Xing Jinbo","year":"2024","unstructured":"Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Gongye Liu, Xintao Wang, Ying Shan, and Tien-Tsin Wong. 2024. Dynamicrafter: Animating open-domain images with video diffusion priors. In European Conference on Computer Vision. Springer, 399\u2013417."},{"key":"e_1_3_3_1_36_2","unstructured":"Erdem Yildirim. 2022. Text-to-image generation AI in architecture. Art and Architecture: Theory Practice and Experience 97 (2022)."},{"key":"e_1_3_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02138"},{"key":"e_1_3_3_1_38_2","doi-asserted-by":"crossref","unstructured":"Haotian Zhang Li Li and Dong Liu. 2025. Generalized Gaussian Model for Learned Image Compression. IEEE Transactions on Image Processing 34 (2025) 1950\u20131965.","DOI":"10.1109\/TIP.2025.3550013"},{"key":"e_1_3_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_3_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i10.33114"},{"key":"e_1_3_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01697"}],"event":{"name":"ICMR '26: International Conference on Multimedia Retrieval","location":"Amsterdam The Netherlands","acronym":"ICMR '26","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2026 International Conference on Multimedia Retrieval"],"original-title":[],"deposited":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T15:51:45Z","timestamp":1781538705000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3805622.3810600"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,6,15]]},"references-count":40,"alternative-id":["10.1145\/3805622.3810600","10.1145\/3805622"],"URL":"https:\/\/doi.org\/10.1145\/3805622.3810600","relation":{},"subject":[],"published":{"date-parts":[[2026,6,15]]},"assertion":[{"value":"2026-06-15","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}