{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T14:59:29Z","timestamp":1776092369601,"version":"3.50.1"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T00:00:00Z","timestamp":1731974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2024,12,19]]},"abstract":"<jats:p>We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors. Code and model weights are available at https:\/\/doubiiu.github.io\/projects\/ToonCrafter<\/jats:p>","DOI":"10.1145\/3687761","type":"journal-article","created":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T15:46:04Z","timestamp":1732031164000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":38,"title":["ToonCrafter: Generative Cartoon Interpolation"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2181-1879","authenticated-orcid":false,"given":"Jinbo","family":"Xing","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3336-0686","authenticated-orcid":false,"given":"Hanyuan","family":"Liu","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9664-4967","authenticated-orcid":false,"given":"Menghan","family":"Xia","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0066-3448","authenticated-orcid":false,"given":"Yong","family":"Zhang","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6585-8604","authenticated-orcid":false,"given":"Xintao","family":"Wang","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7673-8325","authenticated-orcid":false,"given":"Ying","family":"Shan","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7792-9307","authenticated-orcid":false,"given":"Tien-Tsin","family":"Wong","sequence":"additional","affiliation":[{"name":"Monash University, Melbourne, Australia"},{"name":"The Chinese University of Hong Kong, Hong Kong, China"}]}],"member":"320","published-online":{"date-parts":[[2024,11,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"PySceneDetect Authors. 2023. PySceneDetect. Accessed October. 1 2023 [Online]. https:\/\/github.com\/Breakthrough\/PySceneDetect"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Youngmin Baek Bado Lee Dongyoon Han Sangdoo Yun and Hwalsuk Lee. 2019. Character Region Awareness for Text Detection. In CVPR.","DOI":"10.1109\/CVPR.2019.00959"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Max Bain Arsha Nagrani G\u00fcl Varol and Andrew Zisserman. 2021. Frozen in time: A joint video and image encoder for end-to-end retrieval. In ICCV.","DOI":"10.1109\/ICCV48922.2021.00175"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Wenbo Bao Wei-Sheng Lai Chao Ma Xiaoyun Zhang Zhiyong Gao and Ming-Hsuan Yang. 2019. Depth-aware video frame interpolation. In CVPR.","DOI":"10.1109\/CVPR.2019.00382"},{"key":"e_1_2_1_5_1","unstructured":"Andreas Blattmann Tim Dockhorn Sumith Kulal Daniel Mendelevitch Maciej Kilian Dominik Lorenz Yam Levi Zion English Vikram Voleti Adam Letts et al. 2023a. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)."},{"key":"e_1_2_1_6_1","volume-title":"Sanja Fidler, and Karsten Kreis.","author":"Blattmann Andreas","year":"2023","unstructured":"Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. 2023b. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models. In CVPR."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Shuhong Chen and Matthias Zwicker. 2022. Improving the Perceptual Quality of 2D Animation Interpolation. In ECCV.","DOI":"10.1007\/978-3-031-19790-1_17"},{"key":"e_1_2_1_8_1","volume-title":"Seine: Short-to-long video diffusion model for generative transition and prediction. In ICLR.","author":"Chen Xinyuan","year":"2024","unstructured":"Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. 2024. Seine: Short-to-long video diffusion model for generative transition and prediction. In ICLR."},{"key":"e_1_2_1_9_1","volume-title":"Ldmvfi: Video frame interpolation with latent diffusion models. In AAAI.","author":"Danier Duolikun","year":"2024","unstructured":"Duolikun Danier, Fan Zhang, and David Bull. 2024. Ldmvfi: Video frame interpolation with latent diffusion models. In AAAI."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3556544","article-title":"Video frame interpolation: A comprehensive survey","volume":"19","author":"Dong Jiong","year":"2023","unstructured":"Jiong Dong, Kaoru Ota, and Mianxiong Dong. 2023. Video frame interpolation: A comprehensive survey. ACM Transactions on Multimedia Computing, Communications and Applications 19, 2s (2023), 1--31.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Patrick Esser Robin Rombach and Bjorn Ommer. 2021. Taming transformers for high-resolution image synthesis. In CVPR.","DOI":"10.1109\/CVPR46437.2021.01268"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Rinon Gal Yael Vinker Yuval Alaluf Amit Bermano Daniel Cohen-Or Ariel Shamir and Gal Chechik. 2024. Breathing Life Into Sketches Using Text-to-Video Priors. In CVPR.","DOI":"10.1109\/CVPR52733.2024.00414"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1080\/02664769823151"},{"key":"e_1_2_1_14_1","volume-title":"Seer: Language Instructed Video Prediction with Latent Diffusion Models. In ICLR.","author":"Gu Xianfan","year":"2024","unstructured":"Xianfan Gu, Chuan Wen, Weirui Ye, Jiaming Song, and Yang Gao. 2024. Seer: Language Instructed Video Prediction with Latent Diffusion Models. In ICLR."},{"key":"e_1_2_1_15_1","volume-title":"Sparsectrl: Adding sparse controls to text-to-video diffusion models. arXiv preprint arXiv:2311.16933","author":"Guo Yuwei","year":"2023","unstructured":"Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2023. Sparsectrl: Adding sparse controls to text-to-video diffusion models. arXiv preprint arXiv:2311.16933 (2023)."},{"key":"e_1_2_1_16_1","volume-title":"Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths. arXiv preprint arXiv:2211.13221","author":"He Yingqing","year":"2022","unstructured":"Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, and Qifeng Chen. 2022. Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths. arXiv preprint arXiv:2211.13221 (2022)."},{"key":"e_1_2_1_17_1","unstructured":"Jonathan Ho William Chan Chitwan Saharia Jay Whang Ruiqi Gao Alexey Gritsenko Diederik P Kingma Ben Poole Mohammad Norouzi David J Fleet et al. 2022. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)."},{"key":"e_1_2_1_18_1","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS."},{"key":"e_1_2_1_19_1","volume-title":"Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598","author":"Ho Jonathan","year":"2022","unstructured":"Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Zhewei Huang Tianyuan Zhang Wen Heng Boxin Shi and Shuchang Zhou. 2022. Real-Time Intermediate Flow Estimation for Video Frame Interpolation. In ECCV.","DOI":"10.1007\/978-3-031-19781-9_36"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Siddhant Jain Daniel Watson Eric Tabellion Ben Poole Janne Kontkanen et al. 2024. Video interpolation with diffusion models. In CVPR.","DOI":"10.1109\/CVPR52733.2024.00701"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Huaizu Jiang Deqing Sun Varun Jampani Ming-Hsuan Yang Erik Learned-Miller and Jan Kautz. 2018. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In CVPR.","DOI":"10.1109\/CVPR.2018.00938"},{"key":"e_1_2_1_23_1","unstructured":"Junnan Li Dongxu Li Silvio Savarese and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML."},{"key":"e_1_2_1_24_1","volume-title":"Chen Change Loy, and Ziwei Liu","author":"Li Siyao","year":"2021","unstructured":"Siyao Li, Shiyu Zhao, Weijiang Yu, Wenxiu Sun, Dimitris Metaxas, Chen Change Loy, and Ziwei Liu. 2021b. Deep Animation Video Interpolation in the Wild. In CVPR."},{"key":"e_1_2_1_25_1","first-page":"2938","article-title":"Deep sketch-guided cartoon video inbetweening","volume":"28","author":"Li Xiaoyu","year":"2021","unstructured":"Xiaoyu Li, Bo Zhang, Jing Liao, and Pedro V Sander. 2021a. Deep sketch-guided cartoon video inbetweening. IEEE TVCG 28, 8 (2021), 2938--2952.","journal-title":"IEEE TVCG"},{"key":"e_1_2_1_26_1","volume-title":"Geometric gan. arXiv preprint arXiv:1705.02894","author":"Lim Jae Hyun","year":"2017","unstructured":"Jae Hyun Lim and Jong Chul Ye. 2017. Geometric gan. arXiv preprint arXiv:1705.02894 (2017)."},{"key":"e_1_2_1_27_1","volume-title":"Stylecrafter: Enhancing stylized text-to-video generation with style adapter. arXiv preprint arXiv:2312.00330","author":"Liu Gongye","year":"2023","unstructured":"Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Xintao Wang, Yujiu Yang, and Ying Shan. 2023a. Stylecrafter: Enhancing stylized text-to-video generation with style adapter. arXiv preprint arXiv:2312.00330 (2023)."},{"key":"e_1_2_1_28_1","volume-title":"Video Colorization with Pre-trained Text-to-Image Diffusion Models. arXiv preprint arXiv:2306.01732","author":"Liu Hanyuan","year":"2023","unstructured":"Hanyuan Liu, Minshan Xie, Jinbo Xing, Chengze Li, and Tien-Tsin Wong. 2023b. Video Colorization with Pre-trained Text-to-Image Diffusion Models. arXiv preprint arXiv:2306.01732 (2023)."},{"key":"e_1_2_1_29_1","volume-title":"FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models. CVPR","author":"Luo Ao","year":"2024","unstructured":"Ao Luo, Xin Li, Fan Yang, Jiangyu Liu, Haoqiang Fan, and Shuaicheng Liu. 2024. FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models. CVPR (2024)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Simone Meyer Abdelaziz Djelouah Brian McWilliams Alexander Sorkine-Hornung Markus Gross and Christopher Schroers. 2018. Phasenet for video frame interpolation. In CVPR.","DOI":"10.1109\/CVPR.2018.00059"},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Simone Meyer Oliver Wang Henning Zimmer Max Grosse and Alexander Sorkine-Hornung. 2015. Phase-based frame interpolation for video. In CVPR.","DOI":"10.1109\/CVPR.2015.7298747"},{"key":"e_1_2_1_32_1","first-page":"2678","article-title":"A no-reference image blur metric based on the cumulative probability of blur detection (CPBD)","volume":"20","author":"Narvekar Niranjan D","year":"2011","unstructured":"Niranjan D Narvekar and Lina J Karam. 2011. A no-reference image blur metric based on the cumulative probability of blur detection (CPBD). IEEE TIP 20, 9 (2011), 2678--2683.","journal-title":"IEEE TIP"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Simon Niklaus and Feng Liu. 2020. Softmax splatting for video frame interpolation. In CVPR.","DOI":"10.1109\/CVPR42600.2020.00548"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Simon Niklaus Long Mai and Feng Liu. 2017a. Video frame interpolation via adaptive convolution. In CVPR.","DOI":"10.1109\/CVPR.2017.244"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Simon Niklaus Long Mai and Feng Liu. 2017b. Video frame interpolation via adaptive separable convolution. In ICCV.","DOI":"10.1109\/ICCV.2017.37"},{"key":"e_1_2_1_36_1","volume-title":"One-step image translation with text-to-image models. arXiv preprint arXiv:2403.12036","author":"Parmar Gaurav","year":"2024","unstructured":"Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, and Jun-Yan Zhu. 2024. One-step image translation with text-to-image models. arXiv preprint arXiv:2403.12036 (2024)."},{"key":"e_1_2_1_37_1","unstructured":"Zhaofan Qiu Ting Yao and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3d residual networks. In ICCV."},{"key":"e_1_2_1_38_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In ICML."},{"key":"e_1_2_1_39_1","volume-title":"Film: Frame interpolation for large motion. In ECCV.","author":"Reda Fitsum","year":"2022","unstructured":"Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, and Brian Curless. 2022. Film: Frame interpolation for large motion. In ECCV."},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In CVPR.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_1_41_1","volume-title":"Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114","author":"Schuhmann Christoph","year":"2021","unstructured":"Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, and Aran Komatsuzaki. 2021. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114 (2021)."},{"key":"e_1_2_1_42_1","volume-title":"Make-a-video: Text-to-video generation without text-video data. In ICLR.","author":"Singer Uriel","year":"2023","unstructured":"Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, et al. 2023. Make-a-video: Text-to-video generation without text-video data. In ICLR."},{"key":"e_1_2_1_43_1","unstructured":"Jascha Sohl-Dickstein Eric A. Weiss Niru Maheswaranathan and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In ICML."},{"key":"e_1_2_1_44_1","unstructured":"Jiaming Song Chenlin Meng and Stefano Ermon. 2021. Denoising diffusion implicit models. In ICLR."},{"key":"e_1_2_1_45_1","volume-title":"Raft: Recurrent all-pairs field transforms for optical flow. In ECCV.","author":"Teed Zachary","year":"2020","unstructured":"Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In ECCV."},{"key":"e_1_2_1_46_1","volume-title":"ICLR workshop.","author":"Unterthiner Thomas","year":"2019","unstructured":"Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Rapha\u00ebl Marinier, Marcin Michalski, and Sylvain Gelly. 2019. FVD: A new metric for video generation. In ICLR workshop."},{"key":"e_1_2_1_47_1","volume-title":"Videocomposer: Compositional video synthesis with motion controllability. In NeurIPS.","author":"Wang Xiang","year":"2024","unstructured":"Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang, Yingya Zhang, Yujun Shen, Deli Zhao, and Jingren Zhou. 2024a. Videocomposer: Compositional video synthesis with motion controllability. In NeurIPS."},{"key":"e_1_2_1_48_1","volume-title":"Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, and Nong Sang.","author":"Wang Xiang","year":"2024","unstructured":"Xiang Wang, Shiwei Zhang, Hang jie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, and Nong Sang. 2024b. A Recipe for Scaling up Text-to-Video Generation with Text-free Videos. In CVPR."},{"key":"e_1_2_1_49_1","unstructured":"Guangyang Wu Xin Tao Changlin Li Wenyi Wang Xiaohong Liu and Qingqing Zheng. 2024. Perception-Oriented Video Frame Interpolation via Asymmetric Blending. In CVPR."},{"key":"e_1_2_1_50_1","unstructured":"Xiaoyu Xiang Ding Liu Xiao Yang Yiheng Zhu and Xiaohui Shen. 2021. Anime2Sketch: A Sketch Extractor for Anime Arts with Deep Networks. https:\/\/github.com\/Mukosame\/Anime2Sketch."},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1007\/s41095-021-0208-x","article-title":"Flow-aware synthesis: A generic motion model for video frame interpolation","volume":"7","author":"Xing Jinbo","year":"2021","unstructured":"Jinbo Xing, Wenbo Hu, Yuechen Zhang, and Tien-Tsin Wong. 2021. Flow-aware synthesis: A generic motion model for video frame interpolation. Computational Visual Media 7 (2021), 393--405.","journal-title":"Computational Visual Media"},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Jinbo Xing Menghan Xia Yuxin Liu Yuechen Zhang Y He H Liu H Chen X Cun X Wang Y Shan et al. 2024. Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance. IEEE TVCG (2024).","DOI":"10.1109\/TVCG.2024.3365804"},{"key":"e_1_2_1_53_1","volume-title":"DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors. arXiv preprint arXiv:2310.12190","author":"Xing Jinbo","year":"2023","unstructured":"Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Xintao Wang, Tien-Tsin Wong, and Ying Shan. 2023. DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors. arXiv preprint arXiv:2310.12190 (2023)."},{"key":"e_1_2_1_54_1","volume-title":"Quadratic video interpolation. Advances in Neural Information Processing Systems 32","author":"Xu Xiangyu","year":"2019","unstructured":"Xiangyu Xu, Li Siyao, Wenxiu Sun, Qian Yin, and Ming-Hsuan Yang. 2019. Quadratic video interpolation. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Yan Zeng Guoqiang Wei Jiani Zheng Jiaxin Zou Yang Wei Yuchen Zhang and Hang Li. 2024. Make pixels dance: High-dynamic video generation. In CVPR.","DOI":"10.1109\/CVPR52733.2024.00845"},{"key":"e_1_2_1_56_1","doi-asserted-by":"crossref","unstructured":"Guozhen Zhang Yuhan Zhu Haonan Wang Youxin Chen Gangshan Wu and Limin Wang. 2023c. Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In CVPR.","DOI":"10.1109\/CVPR52729.2023.00550"},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Lvmin Zhang Anyi Rao and Maneesh Agrawala. 2023a. Adding conditional control to text-to-image diffusion models. In ICCV.","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_2_1_58_1","doi-asserted-by":"crossref","unstructured":"Richard Zhang Phillip Isola Alexei A Efros Eli Shechtman and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR.","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_2_1_59_1","volume-title":"I2vgen-xl: High-quality image-to-video synthesis via cascaded diffusion models. arXiv preprint arXiv:2311.04145","author":"Zhang Shiwei","year":"2023","unstructured":"Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, and Jingren Zhou. 2023b. I2vgen-xl: High-quality image-to-video synthesis via cascaded diffusion models. arXiv preprint arXiv:2311.04145 (2023)."},{"key":"e_1_2_1_60_1","volume-title":"Globally Optimal Toon Tracking. ACM TOG 35, 4","author":"Zhu Haichao","year":"2016","unstructured":"Haichao Zhu, Xueting Liu, Tien-Tsin Wong, and Pheng-Ann Heng. 2016. Globally Optimal Toon Tracking. ACM TOG 35, 4 (2016), 75:1--75:10."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687761","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687761","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:45Z","timestamp":1750295865000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687761"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,19]]},"references-count":60,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,19]]}},"alternative-id":["10.1145\/3687761"],"URL":"https:\/\/doi.org\/10.1145\/3687761","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,19]]},"assertion":[{"value":"2024-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}