{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T18:55:01Z","timestamp":1774637701117,"version":"3.50.1"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2025,8,1]]},"abstract":"<jats:p>\n                    We present a framework to generate past and future processes for drawing process videos. Given a canvas image uploaded by a user, the framework can generate both preceding and succeeding states of the drawing process, and the generated states can be reused as inputs for further state generation. We observe that the user queries typically have one-to-one or many-to-many states, and in many cases, involve non-contiguous states. This necessitates a backend that solves a set-to-set problem with arbitrary combinations of past or future states. To this end, we repurpose video diffusion models to learn the set-to-set mapping with pretrained video priors. We implement the system with strong diffusion transformer backbones (\n                    <jats:italic toggle=\"yes\">e.g.<\/jats:italic>\n                    , CogVideoX and LTXVideo) and high-quality data processing (\n                    <jats:italic toggle=\"yes\">e.g.<\/jats:italic>\n                    , sampling short shots from long videos of real drawing records). Experiments show that the generated states are diverse in drawing contexts and resemble human drawing processes. This capability may aid artists in visualizing potential outcomes, generating creative inspirations, or refining existing workflows.\n                  <\/jats:p>","DOI":"10.1145\/3731160","type":"journal-article","created":{"date-parts":[[2025,7,27]],"date-time":"2025-07-27T04:02:22Z","timestamp":1753588942000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Generating Past and Future in Digital Painting Processes"],"prefix":"10.1145","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3503-5791","authenticated-orcid":false,"given":"Lvmin","family":"Zhang","sequence":"first","affiliation":[{"name":"Stanford University, Stanford, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-2252-1185","authenticated-orcid":false,"given":"Chuan","family":"Yan","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1516-4083","authenticated-orcid":false,"given":"Yuwei","family":"Guo","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2181-1879","authenticated-orcid":false,"given":"Jinbo","family":"Xing","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8996-7327","authenticated-orcid":false,"given":"Maneesh","family":"Agrawala","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,7,27]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"The Thirteenth International Conference on Learning Representations.","unstructured":"2024. Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imposing Consistent Light Transport. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_2_1_2_1","volume-title":"Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos. arXiv:2403.13044 [cs.CV] https:\/\/arxiv.org\/abs\/2403.13044","author":"Alzayer Hadi","year":"2024","unstructured":"Hadi Alzayer, Zhihao Xia, Xuaner Zhang, Eli Shechtman, Jia-Bin Huang, and Michael Gharbi. 2024. Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos. arXiv:2403.13044 [cs.CV] https:\/\/arxiv.org\/abs\/2403.13044"},{"key":"e_1_2_1_3_1","volume-title":"UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing. arXiv preprint arXiv:2402.13185","author":"Bai Jianhong","year":"2024","unstructured":"Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, and Jiang Bian. 2024. UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing. arXiv preprint arXiv:2402.13185 (2024)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687614"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2019627.2019639"},{"key":"e_1_2_1_6_1","unstructured":"Andreas Blattmann Tim Dockhorn Sumith Kulal Daniel Mendelevitch Maciej Kilian Dominik Lorenz Yam Levi Zion English Vikram Voleti Adam Letts et al. 2023. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01764"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687574"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00698"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2856400.2856417"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23)","author":"Chen Jingye","year":"2024","unstructured":"Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei. 2024a. TextDiffuser: Diffusion Models as Text Painters. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23). Curran Associates Inc., Red Hook, NY, USA, 9353\u20139387."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392386"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings, Part XXVI 16","author":"Das Ayan","year":"2020","unstructured":"Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, and Yi-Zhe Song. 2020. B\u00e9ziersketch: A generative model for scalable vector sketches. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXVI 16. Springer, 632\u2013647."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461912.2461942"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766936"},{"key":"e_1_2_1_16_1","unstructured":"Patrick Esser Sumith Kulal Andreas Blattmann Rahim Entezari Jonas M\u00fcller Harry Saini Yam Levi Dominik Lorenz Axel Sauer Frederic Boesel Dustin Podell Tim Dockhorn Zion English Kyle Lacey Alex Goodwin Yannik Marek and Robin Rombach. 2024. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. arXiv:2403.03206 [cs.CV]"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00522"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531372"},{"key":"e_1_2_1_19_1","volume-title":"International Conference on Learning Representations","author":"Guo Yuwei","year":"2024","unstructured":"Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2024. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. International Conference on Learning Representations (2024)."},{"key":"e_1_2_1_20_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Ha D.","unstructured":"D. Ha and D. Eck. 2018. A neural representation of sketch drawings. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_21_1","unstructured":"Yoav HaCohen Nisan Chiprut Benny Brazowski Daniel Shalem Dudu Moshe Eitan Richardson Eran Levin Guy Shiran Nir Zabari Ori Gordon Poriya Panet Sapir Weissbuch Victor Kulikov Yaki Bitterman Zeev Melumian and Ofir Bibi. 2024. LTX-Video: Realtime Video Latent Diffusion. arXiv:2501.00103 [cs.CV] https:\/\/arxiv.org\/abs\/2501.00103"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/97880.97902"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/280814.280951"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2003.1210867"},{"key":"e_1_2_1_25_1","volume-title":"Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598","author":"Ho Jonathan","year":"2022","unstructured":"Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)."},{"key":"e_1_2_1_26_1","volume-title":"Video diffusion models. arXiv:2204.03458","author":"Ho Jonathan","year":"2022","unstructured":"Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. 2022. Video diffusion models. arXiv:2204.03458 (2022)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00880"},{"key":"e_1_2_1_28_1","unstructured":"Yang Jin Zhicheng Sun Ningyuan Li Kun Xu Kun Xu Hao Jiang Nan Zhuang Quzhe Huang Yang Song Yadong Mu and Zhouchen Lin. 2024. Pyramidal Flow Matching for Efficient Video Generative Modeling. arXiv:2410.05954 [cs.CV]"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00582"},{"key":"e_1_2_1_30_1","volume-title":"Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song.","author":"Koley Subhadeep","year":"2024","unstructured":"Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. 2024. It's All About Your Sketch: Democratising Sketch Control in Diffusion Models. In CVPR."},{"key":"e_1_2_1_31_1","unstructured":"Weijie Kong Qi Tian Zijian Zhang Rox Min Zuozhuo Dai Jin Zhou Jiangfeng Xiong Xin Li Bo Wu Jianwei Zhang Kathrina Wu Qin Lin Aladdin Wang Andong Wang Changlin Li Duojun Huang Fang Yang Hao Tan Hongmei Wang Jacob Song Jiawang Bai Jianbing Wu Jinbao Xue Joey Wang Junkun Yuan Kai Wang Mengyang Liu Pengyu Li Shuai Li Weiyan Wang Wenqing Yu Xinchi Deng Yang Li Yanxin Long Yi Chen Yutao Cui Yuanbo Peng Zhentao Yu Zhiyu He Zhiyong Xu Zixiang Zhou Zunnan Xu Yangyu Tao Qinglin Lu Songtao Liu Daquan Zhou Hongfa Wang Yong Yang Di Wang Yuhong Liu Jie Jiang and Caesar Zhong. 2024. HunyuanVideo: A Systematic Framework For Large Video Generative Models. https:\/\/arxiv.org\/abs\/2412.03603"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/258734.258893"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00653"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612451"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00672"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459833"},{"key":"e_1_2_1_37_1","volume-title":"Neural Painters: A learned differentiable constraint for generating brushstroke paintings. ArXiv abs\/1904.08410","author":"Nakano Reiichiro","year":"2019","unstructured":"Reiichiro Nakano. 2019. Neural Painters: A learned differentiable constraint for generating brushstroke paintings. ArXiv abs\/1904.08410 (2019). https:\/\/api.semanticscholar.org\/CorpusID:120367960"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings, Part XXIV","author":"Nitzan Yotam","year":"2024","unstructured":"Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Daniel Cohen-Or, Taesung Park, and Micha\u00ebl Gharbi. 2024. Lazy Diffusion Transformer for Interactive Image Editing. In Computer Vision - ECCV 2024: 18th European Conference, Milan, Italy, September 29\u2013October 4, 2024, Proceedings, Part XXIV (Milan, Italy). Springer-Verlag, Berlin, Heidelberg, 55\u201372."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591500"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00735"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00244"},{"key":"e_1_2_1_42_1","volume-title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv:2307.01952 [cs.CV]","author":"Podell Dustin","year":"2023","unstructured":"Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M\u00fcller, Joe Penna, and Robin Rombach. 2023. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv:2307.01952 [cs.CV]"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2024.3403160"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","unstructured":"Prasun Roy Saumik Bhattacharya Subhankar Ghosh Umapada Pal and Michael Blumenstein. 2025. D-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining. In Pattern Recognition Apostolos Antonacopoulos Subhasis Chaudhuri Rama Chellappa Cheng-Lin Liu Saumik Bhattacharya and Umapada Pal (Eds.). Springer Nature Switzerland Cham 277\u2013292. 10.1007\/978-3-031-78389-0_19","DOI":"10.1007\/978-3-031-78389-0_19"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530757"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/192161.192185"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818110"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning. PMLR, 4596\u20134604","author":"Shazeer Noam","year":"2018","unstructured":"Noam Shazeer and Mitchell Stern. 2018. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 4596\u20134604."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657497"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00844"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132703"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201370"},{"key":"e_1_2_1_54_1","volume-title":"The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=nJfylDvgzlq","author":"Singer Uriel","year":"2023","unstructured":"Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. 2023. Make-A-Video: Text-to-Video Generation without Text-Video Data. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=nJfylDvgzlq"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19787-1_39"},{"key":"e_1_2_1_56_1","unstructured":"SmilingWolf. 2022. WD14 ViT Tagger. https:\/\/huggingface.co\/SmilingWolf\/wd-v1-4-vit-tagger-v2."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687596"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766960"},{"key":"e_1_2_1_59_1","volume-title":"Generating videos with scene dynamics. Advances in neural information processing systems 29","author":"Vondrick Carl","year":"2016","unstructured":"Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. Advances in neural information processing systems 29 (2016)."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591560"},{"key":"e_1_2_1_61_1","volume-title":"Video-to-Video Synthesis. In Conference on Neural Information Processing Systems (NeurIPS).","author":"Wang Ting-Chun","year":"2018","unstructured":"Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3687761"},{"key":"e_1_2_1_63_1","unstructured":"Sihan Xu Yidong Huang Jiayi Pan Ziqiao Ma and Joyce Chai. 2024. Inversion-Free Image Editing with Natural Language. (2024)."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01763"},{"key":"e_1_2_1_65_1","unstructured":"Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang Jiazheng Xu Yuanming Yang Wenyi Hong Xiaohan Zhang Guanyu Feng et al. 2024. CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer. arXiv preprint arXiv:2408.06072 (2024)."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275090"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_2_1_68_1","article-title":"Real-Time User-Guided Image Colorization with Learned Deep Priors","volume":"9","author":"Zhang Richard","year":"2017","unstructured":"Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S Lin, Tianhe Yu, and Alexei A Efros. 2017. Real-Time User-Guided Image Colorization with Learned Deep Priors. ACM Transactions on Graphics (TOG) 9, 4 (2017).","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00846"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01543"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3731160","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T17:55:17Z","timestamp":1774634117000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3731160"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,27]]},"references-count":70,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,8,1]]}},"alternative-id":["10.1145\/3731160"],"URL":"https:\/\/doi.org\/10.1145\/3731160","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,27]]},"assertion":[{"value":"2025-07-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}