{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T03:23:44Z","timestamp":1774495424198,"version":"3.50.1"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"9","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62071155 and 92270116"],"award-info":[{"award-number":["62071155 and 92270116"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>\n            The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image\/video. Existing methods usually pursue the balance between style and content by adjusting general coarse-level stylized strength, thereby leading to unsatisfactory results and hindering their practical application. To address this critical issue, a novel AST approach namely Flexibly Controllable Arbitrary Style Transfer (FAST) is proposed, which is capable of explicitly customizing the stylization results according to various sources of semantic clues. In the specific, our model is constructed based on Latent Diffusion Model (LDM) and elaborately designed to absorb content and style instances as conditions of LDM. It is characterized by introducing\n            <jats:italic toggle=\"yes\">Style-Adapter<\/jats:italic>\n            , which allows users to flexibly manipulate the stylization results via aligning multi-level style control information and intrinsic knowledge in LDM, meanwhile enhancing the model with improved capacity to harmonize content detail retention and stylization strength. Lastly, our model is extended to handle video AST task. A novel learning objective is leveraged for video diffusion model training, which considerably improves cross-frame temporal consistency on the premise of maintaining stylization strength. Qualitative and quantitative comparisons as well as user studies demonstrate our presented approach outperforms the existing SoTA methods in generating visually plausible stylization results. The project homepage for the article is available at:\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/fast-ldm.github.io\/\">https:\/\/fast-ldm.github.io\/<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3748655","type":"journal-article","created":{"date-parts":[[2025,7,29]],"date-time":"2025-07-29T13:59:31Z","timestamp":1753797571000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["FAST: Flexibly Controllable Arbitrary Style Transfer via Latent Diffusion Models"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-4088-5984","authenticated-orcid":false,"given":"Hanzhang","family":"Wang","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6098-4772","authenticated-orcid":false,"given":"Haoran","family":"Wang","sequence":"additional","affiliation":[{"name":"BGI Research, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1326-248X","authenticated-orcid":false,"given":"Zhongrui","family":"Yu","sequence":"additional","affiliation":[{"name":"ETH Zurich, Z\u00fcrich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1273-0505","authenticated-orcid":false,"given":"Mingming","family":"Sun","sequence":"additional","affiliation":[{"name":"AGI Lab, Beijing Institute of Mathematical Sciences and Applications, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5694-505X","authenticated-orcid":false,"given":"Junjun","family":"Jiang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0039-2281","authenticated-orcid":false,"given":"Xianming","family":"Liu","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8346-9127","authenticated-orcid":false,"given":"Deming","family":"Zhai","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,10]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Yuval Alaluf Daniel Garibi Or Patashnik Hadar Averbuch-Elor and Daniel Cohen-Or. 2023. Cross-image attention for zero-shot appearance transfer. arXiv:2311.03335. Retrieved from https:\/\/arxiv.org\/abs\/2311.03335 [cs.CV]","DOI":"10.1145\/3641519.3657423"},{"key":"e_1_3_1_3_2","first-page":"862","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"An Jie","year":"2021","unstructured":"Jie An, Siyu Huang, Yibing Song, Dejing Dou, Wei Liu, and Jiebo Luo. 2021. Artflow: Unbiased image style transfer via reversible neural flows. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 862\u2013871."},{"key":"e_1_3_1_4_2","first-page":"611","volume-title":"European Conference on Computer Vision (ECCV)","author":"Butler Daniel J.","year":"2012","unstructured":"Daniel J. Butler, Jonas Wulff, Garrett B. Stanley, and Michael J. Black. 2012. A naturalistic open source movie for optical flow evaluation. In European Conference on Computer Vision (ECCV). Springer, 611\u2013625."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00132"},{"key":"e_1_3_1_6_2","first-page":"1105","volume-title":"IEEE International Conference on Computer Vision","author":"Chen Dongdong","year":"2017","unstructured":"Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In IEEE International Conference on Computer Vision. IEEE, 1105\u20131114."},{"key":"e_1_3_1_7_2","unstructured":"Dar-Yen Chen. 2023. ArtFusion: Arbitrary style transfer using dual conditional latent diffusion models. arXiv:2306.09330. Retrieved from https:\/\/arxiv.org\/abs\/2306.09330"},{"key":"e_1_3_1_8_2","first-page":"26561","article-title":"Artistic style transfer with internal-external learning and contrastive learning","volume":"34","author":"Chen Haibo","year":"2021","unstructured":"Haibo Chen, Lei Zhao, Zhizhong Wang, Huiming Zhang, Zhiwen Zuo, Ailin Li, Wei Xing, Dongming Lu, et al. 2021. Artistic style transfer with internal-external learning and contrastive learning. Advances in Neural Information Processing Systems, Vol. 34, 26561\u201326573.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_9_2","unstructured":"Weifeng Chen Jie Wu Pan Xie Hefeng Wu Jiashi Li Xin Xia Xuefeng Xiao and Liang Lin. 2023. Control-a-video: Controllable text-to-video generation with diffusion models. arXiv:2305.13840. Retrieved from https:\/\/arxiv.org\/abs\/2305.13840"},{"key":"e_1_3_1_10_2","first-page":"433","volume-title":"AAAI Conference on Artificial Intelligence","volume":"37","author":"Cheng Jiaxin","unstructured":"Jiaxin Cheng, Yue Wu, Ayush Jaiswal, Xu Zhang, Pradeep Natarajan, and Prem Natarajan. 2023. User-controllable arbitrary style transfer via entropy regularization. In AAAI Conference on Artificial Intelligence, Vol. 37. AAAI, 433\u2013441."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00840"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","unstructured":"Jia Deng Wei Dong Richard Socher Li-Jia Li Kai Li and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE 248\u2013255.","DOI":"10.1109\/CVPRW.2009.5206848"},{"key":"e_1_3_1_13_2","first-page":"11326","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Deng Yingying","year":"2022","unstructured":"Yingying Deng, Fan Tang, Weiming Dong, Chongyang Ma, Xingjia Pan, Lei Wang, and Changsheng Xu. 2022. Stytr2: Image style transfer with transformers. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 11326\u201311336."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3414015"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.265"},{"key":"e_1_3_1_16_2","first-page":"2672","article-title":"Generative adversarial networks","author":"Goodfellow Ian J.","year":"2014","unstructured":"Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. Advances in Neural Information Processing Systems, 2672\u20132680.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_17_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33, 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_18_2","unstructured":"Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv:2207.12598. Retrieved from https:\/\/arxiv.org\/abs\/2207.12598"},{"key":"e_1_3_1_19_2","doi-asserted-by":"crossref","unstructured":"Cong Hu Xiao-Zhong Wei and Xiao-Jun Wu. 2025. DIRformer: A novel image restoration approach based on U-shaped transformer and diffusion models. ACM Transactions on Multimedia Computing Communications and Applications (TOMM) 21 2 Article 57 (Feb. 2025) 1\u201323. DOI: https:\/\/doi.org\/10.1145\/3703632","DOI":"10.1145\/3703632"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.745"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.167"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2019.2921336"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_43"},{"key":"e_1_3_1_24_2","unstructured":"Yoni Kasten Ohad Rahamim and Gal Chechik. 2023. Point-cloud completion with pretrained text-to-image diffusion models. arXiv:2306.10533. Retrieved from https:\/\/arxiv.org\/abs\/2306.10533"},{"key":"e_1_3_1_25_2","unstructured":"Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv:1312.6114. Retrieved from https:\/\/arxiv.org\/abs\/1312.6114"},{"key":"e_1_3_1_26_2","unstructured":"Gihyun Kwon and Jong Chul Ye. 2022. Diffusion-based image translation using disentangled style and content representation. arXiv:2209.15264. Retrieved from https:\/\/arxiv.org\/abs\/2209.15264"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572030"},{"key":"e_1_3_1_28_2","first-page":"1287","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"37","author":"Li Dongyang","year":"2023","unstructured":"Dongyang Li, Hao Luo, Pichao Wang, Zhibin Wang, Shang Liu, and Fan Wang. 2023. Frequency domain disentanglement for arbitrary neural style transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. AAAI, 1287\u20131295."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00393"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-70139-4"},{"key":"e_1_3_1_31_2","unstructured":"Pengwei Liang Junjun Jiang Xianming Liu and Jiayi Ma. 2023. Image deblurring by exploring in-depth properties of transformer. arXiv:2303.15198. Retrieved from https:\/\/arxiv.org\/abs\/2303.15198"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_1_33_2","first-page":"6649","volume-title":"IEEE\/CVF International Conference on Computer Vision","author":"Liu Songhua","year":"2021","unstructured":"Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Meiling Wang, Xin Li, Zhengxing Sun, Qian Li, and Errui Ding. 2021. Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In IEEE\/CVF International Conference on Computer Vision. IEEE, 6649\u20136658."},{"issue":"7","key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"3312","DOI":"10.1109\/TIP.2019.2895768","article-title":"Generative adversarial networks and perceptual losses for video super-resolution","volume":"28","author":"Lucas Alice","year":"2019","unstructured":"Alice Lucas, Santiago Lopez-Tapia, Rafael Molina, and Aggelos K. Katsaggelos. 2019. Generative adversarial networks and perceptual losses for video super-resolution. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society 28, 7 (2019), 3312\u20133327.","journal-title":"IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society"},{"key":"e_1_3_1_35_2","first-page":"11461","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Lugmayr Andreas","year":"2022","unstructured":"Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. Repaint: Inpainting using denoising diffusion probabilistic models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 11461\u201311471."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3638770"},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","unstructured":"Chong Mou Xintao Wang Liangbin Xie Jian Zhang Zhongang Qi Ying Shan and Xiaohu Qie. 2023. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv:2302.08453. Retrieved from https:\/\/arxiv.org\/abs\/2302.08453","DOI":"10.1609\/aaai.v38i5.28226"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3686155"},{"key":"e_1_3_1_39_2","first-page":"5880","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Park Dae Young","year":"2019","unstructured":"Dae Young Park and Kwang Hee Lee. 2019. Arbitrary style transfer with style-attentional networks. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 5880\u20135888."},{"key":"e_1_3_1_40_2","first-page":"319","volume-title":"European Conference on Computer Vision (ECCV)","author":"Park Taesung","year":"2020","unstructured":"Taesung Park, Alexei A. Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision (ECCV). Springer, 319\u2013345."},{"issue":"2020","key":"e_1_3_1_41_2","first-page":"7198","article-title":"Swapping autoencoder for deep image manipulation","volume":"33","author":"Park Taesung","year":"2020","unstructured":"Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei Efros, and Richard Zhang. 2020. Swapping autoencoder for deep image manipulation. In Advances in Neural Information Processing Systems, Vol. 33, 2020, 7198\u20137211.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.2308\/iace-50038"},{"key":"e_1_3_1_43_2","first-page":"8693","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Qi Tianhao","year":"2024","unstructured":"Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, and Yongdong Zhang. 2024. DEADiff: An efficient stylization diffusion model with disentangled representations. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 8693\u20138702."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3019967"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_1_46_2","first-page":"26","volume-title":"38th German Conference on Pattern Recognition (GCPR \u201916)","author":"Ruder Manuel","year":"2016","unstructured":"Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox. 2016. Artistic style transfer for videos. In 38th German Conference on Pattern Recognition (GCPR \u201916). Springer, 26\u201336."},{"key":"e_1_3_1_47_2","unstructured":"Dan Ruta Gemma Canet Tarr\u00e9s Andrew Gilbert Eli Shechtman Nicholas Kolkin and John Collomosse. 2023. DIFF-NST: Diffusion interleaving for deformable neural style transfer. arXiv:2307.04157. Retrieved from https:\/\/arxiv.org\/abs\/2307.04157"},{"issue":"4","key":"e_1_3_1_48_2","first-page":"4713","article-title":"Image super-resolution via iterative refinement","volume":"45","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. 2022. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 4 (2022), 4713\u20134726.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_49_2","first-page":"25278","article-title":"Laion-5b: An open large-scale dataset for training next generation image-text models","volume":"35","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. In Advances in Neural Information Processing Systems, Vol. 35, 25278\u201325294.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3311781"},{"key":"e_1_3_1_51_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_1_52_2","first-page":"2256","volume-title":"International Conference on Machine Learning","author":"Sohl-Dickstein Jascha","year":"2015","unstructured":"Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256\u20132265."},{"key":"e_1_3_1_53_2","unstructured":"Jiaming Song Chenlin Meng and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv:2010.02502. Retrieved from https:\/\/arxiv.org\/abs\/2010.02502"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463257"},{"key":"e_1_3_1_55_2","first-page":"8934","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Sun Deqing","year":"2018","unstructured":"Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. Pwc-net: CNNs for optical flow using pyramid, warping, and cost volume. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 8934\u20138943."},{"key":"e_1_3_1_56_2","unstructured":"Haofan Wang Qixun Wang Xu Bai Zekui Qin and Anthony Chen. 2024. InstantStyle: Free lunch towards style-preserving in text-to-image generation. arXiv:2404.02733. Retrieved from https:\/\/arxiv.org\/abs\/2404.02733"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3293777"},{"issue":"2020","key":"e_1_3_1_58_2","doi-asserted-by":"crossref","first-page":"9125","DOI":"10.1109\/TIP.2020.3024018","article-title":"Consistent video style transfer via relaxation and regularization","volume":"29","author":"Wang Wenjing","year":"2020","unstructured":"Wenjing Wang, Shuai Yang, Jizheng Xu, and Jiaying Liu. 2020. Consistent video style transfer via relaxation and regularization. IEEE Transactions on Image Processing 29 (2020), 9125\u20139139.","journal-title":"IEEE Transactions on Image Processing"},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","unstructured":"Yuanzhi Wang Yong Li Xiaoya Zhang Xin Liu Anbo Dai Antoni B. Chan and Zhen Cui. 2024. Edit temporal-consistent videos with image diffusion model. ACM Transactions on Multimedia Computing Communications and Applications (TOMM) 20 12 (2024) 1\u201316.","DOI":"10.1145\/3691344"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_1_61_2","first-page":"2830","volume-title":"AAAI Conference on Artificial Intelligence","volume":"37","author":"Wu Jingyu","year":"2023","unstructured":"Jingyu Wu, Lefan Hou, Zejian Li, Jun Liao, Li Liu, and Lingyun Sun. 2023. Preserving structural consistency in arbitrary artist and artwork style transfer. In AAAI Conference on Artificial Intelligence, Vol. 37. AAAI, 2830\u20132838."},{"key":"e_1_3_1_62_2","first-page":"14618","volume-title":"IEEE\/CVF International Conference on Computer Vision","author":"Wu Xiaolei","year":"2021","unstructured":"Xiaolei Wu, Zhihao Hu, Lu Sheng, and Dong Xu. 2021. Styleformer: Real-time arbitrary style transfer via parametric style composition. In IEEE\/CVF International Conference on Computer Vision. IEEE, 14618\u201314627."},{"key":"e_1_3_1_63_2","first-page":"189","volume-title":"European Conference on Computer Vision (ECCV)","author":"Wu Zijie","year":"2022","unstructured":"Zijie Wu, Zhen Zhu, Junping Du, and Xiang Bai. 2022. CCPL: Contrastive coherence preserving loss for versatile style transfer. In European Conference on Computer Vision (ECCV). Springer, Berlin, 189\u2013206."},{"key":"e_1_3_1_64_2","first-page":"1395","volume-title":"IEEE International Conference on Computer Vision","author":"Xie Saining","year":"2015","unstructured":"Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In IEEE International Conference on Computer Vision. IEEE, 1395\u20131403."},{"key":"e_1_3_1_65_2","first-page":"2048","volume-title":"International Conference on Machine Learning","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. PMLR, 2048\u20132057."},{"key":"e_1_3_1_66_2","unstructured":"Zhongrui Yu Haoran Wang Jinze Yang Hanzhang Wang Zeke Xie Yunfeng Cai Jiale Cao Zhong Ji and Mingming Sun. 2024. SGD: Street view synthesis with Gaussian splatting and diffusion prior. arXiv:2403.20079. Retrieved from https:\/\/arxiv.org\/abs\/2403.20079"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_3_1_69_2","first-page":"10146","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang Yuxin","year":"2023","unstructured":"Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. 2023. Inversion-based style transfer with diffusion models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 10146\u201310156."},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"},{"key":"e_1_3_1_71_2","first-page":"23109","volume-title":"IEEE\/CVF International Conference on Computer Vision","author":"Zhu Mingrui","year":"2023","unstructured":"Mingrui Zhu, Xiao He, Nannan Wang, Xiaoyu Wang, and Xinbo Gao. 2023. All-to-key attention for arbitrary style transfer. In IEEE\/CVF International Conference on Computer Vision. IEEE, 23109\u201323119."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748655","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T16:02:14Z","timestamp":1757520134000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748655"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,10]]},"references-count":70,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3748655"],"URL":"https:\/\/doi.org\/10.1145\/3748655","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,10]]},"assertion":[{"value":"2024-12-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-29","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}