{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T08:16:35Z","timestamp":1774685795449,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":69,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T00:00:00Z","timestamp":1733184000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,12,3]]},"DOI":"10.1145\/3680528.3687596","type":"proceedings-article","created":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T08:14:37Z","timestamp":1733213677000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["ProcessPainter: Learning to draw from sequence data"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7028-3347","authenticated-orcid":false,"given":"Yiren","family":"Song","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore, Singapore and Show Lab, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8957-0952","authenticated-orcid":false,"given":"Shijie","family":"Huang","sequence":"additional","affiliation":[{"name":"National University of Singapore, Sigapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8374-1132","authenticated-orcid":false,"given":"Chen","family":"Yao","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7170-277X","authenticated-orcid":false,"given":"Hai","family":"Ci","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2490-8247","authenticated-orcid":false,"given":"Xiaojun","family":"Ye","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4584-9388","authenticated-orcid":false,"given":"Jiaming","family":"Liu","sequence":"additional","affiliation":[{"name":"Tiamat, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3255-0901","authenticated-orcid":false,"given":"Yuxuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai jiao tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7681-2166","authenticated-orcid":false,"given":"Mike Zheng","family":"Shou","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2024,12,3]]},"reference":[{"key":"e_1_3_3_1_2_1","unstructured":"Emre Aksan Thomas Deselaers Andrea Tagliasacchi and Otmar Hilliges. 2020. Cose: Compositional stroke embeddings. Advances in Neural Information Processing Systems 33 (2020) 10041\u201310052."},{"key":"e_1_3_3_1_3_1","unstructured":"Andreas Blattmann Tim Dockhorn Sumith Kulal Daniel Mendelevitch Maciej Kilian Dominik Lorenz Yam Levi Zion English Vikram Voleti Adam Letts et\u00a0al. 2023a. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.15127 (2023)."},{"key":"e_1_3_3_1_4_1","unstructured":"Andreas Blattmann Tim Dockhorn Sumith Kulal Daniel Mendelevitch Maciej Kilian Dominik Lorenz Yam Levi Zion English Vikram Voleti Adam Letts et\u00a0al. 2023b. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.15127 (2023)."},{"key":"e_1_3_3_1_5_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33012564"},{"key":"e_1_3_3_1_6_1","unstructured":"Jaemin Cho Abhay Zala and Mohit Bansal. 2024. Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_1_7_1","unstructured":"Zuozhuo Dai Zhenghao Zhang Yao Yao Bingxue Qiu Siyu Zhu Long Qin and Weizhi Wang. 2023. AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance. arXiv e-prints (2023) arXiv\u20132311."},{"key":"e_1_3_3_1_8_1","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1810.04805 (2018)."},{"key":"e_1_3_3_1_9_1","unstructured":"Yutong Feng Biao Gong Di Chen Yujun Shen Yu Liu and Jingren Zhou. 2023. Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.17002 (2023)."},{"key":"e_1_3_3_1_10_1","unstructured":"Kevin Frans Lisa Soros and Olaf Witkowski. 2022. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. Advances in Neural Information Processing Systems 35 (2022) 5207\u20135218."},{"key":"e_1_3_3_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19790-1_7"},{"key":"e_1_3_3_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02096"},{"key":"e_1_3_3_1_13_1","unstructured":"Yuwei Guo Ceyuan Yang Anyi Rao Yaohui Wang Yu Qiao Dahua Lin and Bo Dai. 2023. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.04725 (2023)."},{"key":"e_1_3_3_1_14_1","unstructured":"David Ha and Douglas Eck. 2017. A neural representation of sketch drawings. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1704.03477 (2017)."},{"key":"e_1_3_3_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/97879.97902"},{"key":"e_1_3_3_1_16_1","doi-asserted-by":"crossref","unstructured":"Aaron Hertzmann. 2003. A survey of stroke-based rendering. Institute of Electrical and Electronics Engineers.","DOI":"10.1109\/MCG.2003.1210867"},{"key":"e_1_3_3_1_17_1","unstructured":"Aaron Hertzmann. 2022. Toward modeling creative processes for algorithmic painting. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2205.01605 (2022)."},{"key":"e_1_3_3_1_18_1","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020) 6840\u20136851."},{"key":"e_1_3_3_1_19_1","unstructured":"Wenyi Hong Ming Ding Wendi Zheng Xinghan Liu and Jie Tang. 2022. Cogvideo: Large-scale pretraining for text-to-video generation via transformers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2205.15868 (2022)."},{"key":"e_1_3_3_1_20_1","unstructured":"Edward\u00a0J Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2106.09685 (2021)."},{"key":"e_1_3_3_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611766"},{"key":"e_1_3_3_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00880"},{"key":"e_1_3_3_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01462"},{"key":"e_1_3_3_1_24_1","doi-asserted-by":"crossref","unstructured":"Alexander Kirillov Eric Mintun Nikhila Ravi Hanzi Mao Chloe Rolland Laura Gustafson Tete Xiao Spencer Whitehead Alexander\u00a0C. Berg Wan-Yen Lo Piotr Doll\u00e1r and Ross Girshick. 2023. Segment Anything. arXiv:https:\/\/arXiv.org\/abs\/2304.02643 (2023).","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"e_1_3_3_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01202"},{"key":"e_1_3_3_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00192"},{"key":"e_1_3_3_1_27_1","unstructured":"Guillaume Le\u00a0Moing Jean Ponce and Cordelia Schmid. 2021. Ccvs: Context-aware controllable video synthesis. Advances in Neural Information Processing Systems 34 (2021) 14042\u201314055."},{"key":"e_1_3_3_1_28_1","unstructured":"Dongxu Li Junnan Li and Steven Hoi. 2024. Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02156"},{"key":"e_1_3_3_1_30_1","unstructured":"Shanchuan Lin and Xiao Yang. 2024. AnimateDiff-Lightning: Cross-Model Diffusion Distillation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.12706 (2024)."},{"key":"e_1_3_3_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/258734.258893"},{"key":"e_1_3_3_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00653"},{"key":"e_1_3_3_1_33_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i5.28206"},{"key":"e_1_3_3_1_34_1","unstructured":"Yue Ma Yingqing He Hongfa Wang Andong Wang Chenyang Qi Chengfei Cai Xiu Li Zhifeng Li Heung-Yeung Shum Wei Liu et\u00a0al. 2024b. Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.08268 (2024)."},{"key":"e_1_3_3_1_35_1","doi-asserted-by":"crossref","unstructured":"Masaaki Nagahara. 2023. Sparse control for continuous-time systems. International Journal of Robust and Nonlinear Control 33 1 (2023) 6\u201322.","DOI":"10.1002\/rnc.5858"},{"key":"e_1_3_3_1_36_1","unstructured":"Reiichiro Nakano. 2019. Neural painters: A learned differentiable constraint for generating brushstroke paintings. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1904.08410 (2019)."},{"key":"e_1_3_3_1_37_1","unstructured":"Alex Nichol Prafulla Dhariwal Aditya Ramesh Pranav Shyam Pamela Mishkin Bob McGrew Ilya Sutskever and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2112.10741 (2021)."},{"key":"e_1_3_3_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00387"},{"key":"e_1_3_3_1_39_1","unstructured":"Dustin Podell Zion English Kyle Lacey Andreas Blattmann Tim Dockhorn Jonas M\u00fcller Joe Penna and Robin Rombach. 2023. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.01952 (2023)."},{"key":"e_1_3_3_1_40_1","first-page":"8748","volume-title":"International conference on machine learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748\u20138763."},{"key":"e_1_3_3_1_41_1","unstructured":"Aditya Ramesh Prafulla Dhariwal Alex Nichol Casey Chu and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv 2022. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.06125 (2022)."},{"key":"e_1_3_3_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_3_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02155"},{"key":"e_1_3_3_1_44_1","unstructured":"Chitwan Saharia William Chan Saurabh Saxena Lala Li Jay Whang Emily\u00a0L Denton Kamyar Ghasemipour Raphael Gontijo\u00a0Lopes Burcu Karagol\u00a0Ayan Tim Salimans et\u00a0al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems 35 (2022) 36479\u201336494."},{"key":"e_1_3_3_1_45_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i1.16128"},{"key":"e_1_3_3_1_46_1","unstructured":"Christoph Schuhmann Richard Vencu Romain Beaumont Robert Kaczmarczyk Clayton Mullis Aarush Katta Theo Coombes Jenia Jitsev and Aran Komatsuzaki. 2021. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2111.02114 (2021)."},{"key":"e_1_3_3_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00547"},{"key":"e_1_3_3_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19787-1_39"},{"key":"e_1_3_3_1_49_1","unstructured":"Jiaming Song Chenlin Meng and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2010.02502 (2020)."},{"key":"e_1_3_3_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548146"},{"key":"e_1_3_3_1_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-19-5096-4_25"},{"key":"e_1_3_3_1_52_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i2.25326"},{"key":"e_1_3_3_1_53_1","first-page":"543","volume-title":"BMVC","author":"Song Yiren","year":"2022","unstructured":"Yiren Song and Yuxuan Zhang. 2022. CLIPFont: Text Guided Vector WordArt Generation.. In BMVC. 543."},{"key":"e_1_3_3_1_54_1","unstructured":"Yu Tian Jian Ren Menglei Chai Kyle Olszewski Xi Peng Dimitris\u00a0N Metaxas and Sergey Tulyakov. 2021. A good image generator is what you need for high-resolution video synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2104.15069 (2021)."},{"key":"e_1_3_3_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547759"},{"key":"e_1_3_3_1_56_1","first-page":"3","volume-title":"Image and Video-Based Artistic Stylisation","author":"Vanderhaeghe David","year":"2012","unstructured":"David Vanderhaeghe and John Collomosse. 2012. Stroke based painterly rendering. In Image and Video-Based Artistic Stylisation. Springer, 3\u201321."},{"key":"e_1_3_3_1_57_1","unstructured":"Qixun Wang Xu Bai Haofan Wang Zekui Qin and Anthony Chen. 2024a. Instantid: Zero-shot identity-preserving generation in seconds. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.07519 (2024)."},{"key":"e_1_3_3_1_58_1","unstructured":"Rui Wang Hailong Guo Jiaming Liu Huaxia Li Haibo Zhao Xu Tang Yao Hu Hao Tang and Peipei Li. 2024b. StableGarment: Garment-Centric Generation via Stable Diffusion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.10783 (2024)."},{"key":"e_1_3_3_1_59_1","unstructured":"Zhouxia Wang Ziyang Yuan Xintao Wang Tianshui Chen Menghan Xia Ping Luo and Ying Shan. 2023. Motionctrl: A unified and flexible motion controller for video generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.03641 (2023)."},{"key":"e_1_3_3_1_60_1","doi-asserted-by":"crossref","unstructured":"Ning Xie Hirotaka Hachiya and Masashi Sugiyama. 2013. Artist agent: A reinforcement learning approach to automatic stroke generation in oriental ink painting. IEICE TRANSACTIONS on Information and Systems 96 5 (2013) 1134\u20131144.","DOI":"10.1587\/transinf.E96.D.1134"},{"key":"e_1_3_3_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00987"},{"key":"e_1_3_3_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01369"},{"key":"e_1_3_3_1_63_1","unstructured":"Hu Ye Jun Zhang Sibo Liu Xiao Han and Wei Yang. 2023. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.06721 (2023)."},{"key":"e_1_3_3_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_3_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00771"},{"key":"e_1_3_3_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP48485.2024.10447042"},{"key":"e_1_3_3_1_67_1","unstructured":"Yabo Zhang Yuxiang Wei Dongsheng Jiang Xiaopeng Zhang Wangmeng Zuo and Qi Tian. 2023b. Controlvideo: Training-free controllable text-to-video generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.13077 (2023)."},{"key":"e_1_3_3_1_68_1","unstructured":"Tao Zhou Chen Fang Zhaowen Wang Jimei Yang Byungmoon Kim Zhili Chen Jonathan Brandt and Demetri Terzopoulos. 2018. Learning to sketch with deep q networks and demonstrated strokes. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1810.05977 (2018)."},{"key":"e_1_3_3_1_69_1","unstructured":"Y Zhou R Zhang C Chen C Li C Tensmeyer T Yu J Gu J Xu and T Sun. 2021. Lafite: Towards language-free training for text-to-image generation. arxiv 2021. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2111.13792 2 (2021)."},{"key":"e_1_3_3_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01543"}],"event":{"name":"SA '24: SIGGRAPH Asia 2024 Conference Papers","location":"Tokyo Japan","acronym":"SA '24","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["SIGGRAPH Asia 2024 Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3680528.3687596","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3680528.3687596","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:58:26Z","timestamp":1750294706000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3680528.3687596"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,3]]},"references-count":69,"alternative-id":["10.1145\/3680528.3687596","10.1145\/3680528"],"URL":"https:\/\/doi.org\/10.1145\/3680528.3687596","relation":{},"subject":[],"published":{"date-parts":[[2024,12,3]]},"assertion":[{"value":"2024-12-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}