{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T15:46:50Z","timestamp":1776786410110,"version":"3.51.2"},"reference-count":76,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,5,31]]},"abstract":"<jats:p>\n                    Face sketch synthesis is a technique aimed at converting face photos into sketches. Existing face sketch synthesis research mainly relies on training with numerous photo\u2013sketch sample pairs from existing datasets. However, these large-scale discriminative learning methods will have to face problems, such as data scarcity and high human labor costs. Once the training data become scarce, their generative performance significantly degrades. In this article, we propose a one-shot face sketch synthesis method based on diffusion models. We optimize text instructions on a diffusion model using face photo\u2013sketch image pairs. Then, the instructions derived through gradient-based optimization are used for inference. To simulate real-world scenarios more accurately and evaluate method effectiveness more comprehensively, we introduce a new benchmark named One-shot Face Sketch Dataset (OS-Sketch). The benchmark consists of 400 pairs of face photo\u2013sketch images, including sketches with different styles and photos with different backgrounds, ages, sexes, expressions, illumination, and so on. For a solid out-of-distribution evaluation, we select only one pair of images for training at each time, with the rest used for inference. Extensive experiments demonstrate that the proposed method can convert various photos into realistic and highly consistent sketches in a one-shot context. Compared to other methods, our approach offers greater convenience and broader applicability. The dataset will be available at:\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/HanWu3125\/OS-Sketch\">https:\/\/github.com\/HanWu3125\/OS-Sketch<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3803012","type":"journal-article","created":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T14:25:31Z","timestamp":1774448731000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["One-shot Face Sketch Synthesis in-the-Wild via Generative Diffusion Prior and Instruction Tuning"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-8371-9997","authenticated-orcid":false,"given":"Han","family":"Wu","sequence":"first","affiliation":[{"name":"Guangdong University of Technology, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4541-5114","authenticated-orcid":false,"given":"Junyao","family":"Li","sequence":"additional","affiliation":[{"name":"Guangdong University of Technology, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-7479-7757","authenticated-orcid":false,"given":"Kangbo","family":"Zhao","sequence":"additional","affiliation":[{"name":"Guangdong University of Technology, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8065-5095","authenticated-orcid":false,"given":"Sen","family":"Zhang","sequence":"additional","affiliation":[{"name":"TikTok, ByteDance, Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9413-6528","authenticated-orcid":false,"given":"Yukai","family":"Shi","sequence":"additional","affiliation":[{"name":"Guangdong University of Technology, Guangzhou, China and University of Sydney, Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5704-4168","authenticated-orcid":false,"given":"Liang","family":"Lin","sequence":"additional","affiliation":[{"name":"Peng Cheng Laboratory, Shenzhen, China and Sun Yat-sen University, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,21]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2890017"},{"issue":"4","key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1109\/TMM.2018.2871417","article-title":"Synthesis of realistic facial expressions using expression map","volume":"21","author":"Agarwal Swapna","year":"2018","unstructured":"Swapna Agarwal and Dipti Prasad Mukherjee. 2018. Synthesis of realistic facial expressions using expression map. IEEE Transactions on Multimedia 21, 4 (2018), 902\u2013914.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3475799"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1016\/j.neucom.2022.04.077","article-title":"Unconstrained face sketch synthesis via perception-adaptive network and a new benchmark","volume":"494","author":"Nie Lin","year":"2022","unstructured":"Lin Nie, Lingbo Liu, Zhengtao Wu, and Wenxiong Kang. 2022. Unconstrained face sketch synthesis via perception-adaptive network and a new benchmark. Neurocomputing 494 (2022), 192\u2013202.","journal-title":"Neurocomputing"},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1109\/ICTC.2017.8191006","volume-title":"Proceedings of the 2017 International Conference on Information and Communication Technology Convergence (ICTC)","author":"Chikontwe Philip","year":"2017","unstructured":"Philip Chikontwe and Lee Hyo Jong. 2017. Face sketch synthesis using conditional adversarial networks. In Proceedings of the 2017 International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 373\u2013378."},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","first-page":"5865","DOI":"10.1109\/TIP.2023.3326680","article-title":"HiFiSketch: High fidelity face photo-sketch synthesis and manipulation","author":"Peng Chunlei","year":"2023","unstructured":"Chunlei Peng, Congyu Zhang, Decheng Liu, Nannan Wang, and Xinbo Gao. 2023. HiFiSketch: High fidelity face photo-sketch synthesis and manipulation. IEEE Transactions on Image Processing 32 (2023), 5865\u20135876.","journal-title":"IEEE Transactions on Image Processing"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11633-022-1349-9"},{"key":"e_1_3_1_9_2","first-page":"7237","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Gao Fei","year":"2023","unstructured":"Fei Gao, Yifan Zhu, Chang Jiang, and Nannan Wang. 2023. Human-Inspired facial sketch synthesis with dynamic adaptation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 7237\u20137247."},{"key":"e_1_3_1_10_2","first-page":"8780","article-title":"Diffusion models beat gans on image synthesis","volume":"34","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems 34 (2021), 8780\u20138794.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_1_12_2","unstructured":"Alex Nichol Prafulla Dhariwal Aditya Ramesh Pranav Shyam Pamela Mishkin Bob McGrew Ilya Sutskever and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741. Retrieved from https:\/\/arxiv.org\/abs\/2112.10741"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530757"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00246"},{"key":"e_1_3_1_15_2","first-page":"28188","volume-title":"Proceedings of the Computer Vision and Pattern Recognition Conference","author":"Chen Junyang","year":"2025","unstructured":"Junyang Chen, Jinshan Pan, and Jiangxin Dong. 2025. Faithdiff: Unleashing diffusion priors for faithful image super-resolution. In Proceedings of the Computer Vision and Pattern Recognition Conference, 28188\u201328197."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2024.3378720"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2024.3408045"},{"issue":"10","key":"e_1_3_1_18_2","doi-asserted-by":"crossref","first-page":"10424","DOI":"10.1109\/TCSVT.2024.3409184","article-title":"Denoising diffusion probabilistic model for face sketch-to-photo synthesis","volume":"34","author":"Que Yue","year":"2024","unstructured":"Yue Que, Li Xiong, Weiguo Wan, Xue Xia, and Zhiwei Liu. 2024. Denoising diffusion probabilistic model for face sketch-to-photo synthesis. IEEE Transactions on Circuits and Systems for Video Technology 34, 10 (2024), 10424\u201310436.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_19_2","first-page":"9598","article-title":"Visual instruction inversion: Image editing via image prompting","volume":"36","author":"Nguyen Thao","year":"2023","unstructured":"Thao Nguyen, Yuheng Li, Utkarsh Ojha, and Yong Jae Lee. 2023. Visual instruction inversion: Image editing via image prompting. Advances in Neural Information Processing Systems 36 (2023), 9598\u20139613.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.52202\/068431-1813"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00660"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.222"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995324"},{"key":"e_1_3_1_24_2","first-page":"1005","volume-title":"Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR \u201905)","author":"Liu Qingshan","year":"2005","unstructured":"Qingshan Liu, Xiaoou Tang, Hongliang Jin, Hanqing Lu, and Songde Ma. 2005. A nonlinear approach for face sketch synthesis and recognition. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR \u201905). IEEE, 1005\u20131010."},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1016\/j.neucom.2016.07.071","article-title":"Data-driven vs. model-driven: Fast face sketch synthesis","volume":"257","author":"Wang Nannan","year":"2017","unstructured":"Nannan Wang, Mingrui Zhu, Jie Li, Bin Song, and Zan Li. 2017. Data-driven vs. model-driven: Fast face sketch synthesis. Neurocomputing 257 (2017), 214\u2013221.","journal-title":"Neurocomputing"},{"key":"e_1_3_1_26_2","first-page":"3574","volume-title":"Proceedings of the 26th International Joint Conference on Artificial Intelligence","author":"Zhu Mingrui","year":"2017","unstructured":"Mingrui Zhu, Nannan Wang, Xinbo Gao, and Jie Li. 2017. Deep graphical feature learning for face sketch synthesis. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, 3574\u20133580."},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.neucom.2018.03.042","article-title":"Composite components-based face sketch recognition","volume":"302","author":"Liu Decheng","year":"2018","unstructured":"Decheng Liu, Jie Li, Nannan Wang, Chunlei Peng, and Xinbo Gao. 2018. Composite components-based face sketch recognition. Neurocomputing 302 (2018), 46\u201354.","journal-title":"Neurocomputing"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007669.3007679"},{"key":"e_1_3_1_29_2","first-page":"1125","volume-title":"Proceedings of the 2011 18th IEEE International Conference on Image Processing","author":"Zhang Jiewei","year":"2011","unstructured":"Jiewei Zhang, Nannan Wang, Xinbo Gao, Dacheng Tao, and Xuelong Li. 2011. Face sketch-photo synthesis based on support vector regression. In Proceedings of the 2011 18th IEEE International Conference on Image Processing. IEEE, 1125\u20131128."},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.neucom.2019.07.008","article-title":"FCN based preprocessing for exemplar-based face sketch synthesis","volume":"365","author":"Lu Dan","year":"2019","unstructured":"Dan Lu, Zhenxue Chen, Q. M. Jonathan Wu, and Xuetao Zhang. 2019. FCN based preprocessing for exemplar-based face sketch synthesis. Neurocomputing 365 (2019), 113\u2013124.","journal-title":"Neurocomputing"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2890018"},{"key":"e_1_3_1_32_2","first-page":"23109","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Zhu Mingrui","year":"2023","unstructured":"Mingrui Zhu, Xiao He, Nannan Wang, Xiaoyu Wang, and Xinbo Gao. 2023. All-to-key attention for arbitrary style transfer. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 23109\u201323119."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749321"},{"issue":"1","key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/TIP.2016.2623485","article-title":"Content-adaptive sketch portrait generation by decompositional representation learning","volume":"26","author":"Zhang Dongyu","year":"2016","unstructured":"Dongyu Zhang, Liang Lin, Tianshui Chen, Xian Wu, Wenwei Tan, and Ebroul Izquierdo. 2016. Content-adaptive sketch portrait generation by decompositional representation learning. IEEE Transactions on Image Processing 26, 1 (2016), 328\u2013339.","journal-title":"IEEE Transactions on Image Processing"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.patcog.2017.10.025","article-title":"A modified convolutional neural network for face sketch synthesis","volume":"76","author":"Jiao Licheng","year":"2018","unstructured":"Licheng Jiao, Sibo Zhang, Lingling Li, Fang Liu, and Wenping Ma. 2018. A modified convolutional neural network for face sketch synthesis. Pattern Recognition 76 (2018), 125\u2013136.","journal-title":"Pattern Recognition"},{"key":"e_1_3_1_36_2","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1109\/WACV.2018.00059","volume-title":"Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)","author":"Chen Chaofeng","year":"2018","unstructured":"Chaofeng Chen, Xiao Tan, and Kwan-Yee K. Wong. 2018. Face sketch synthesis with style transfer using pyramid column feature. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 485\u2013493."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-021-01442-2"},{"key":"e_1_3_1_38_2","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow Ian","year":"2014","unstructured":"Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"3","key":"e_1_3_1_39_2","doi-asserted-by":"crossref","first-page":"1132","DOI":"10.1109\/TCYB.2018.2886238","article-title":"Deep learning meets game theory: Bregman-based algorithms for interactive deep generative adversarial networks","volume":"50","author":"Tembine Hamidou","year":"2019","unstructured":"Hamidou Tembine. 2019. Deep learning meets game theory: Bregman-based algorithms for interactive deep generative adversarial networks. IEEE Transactions on Cybernetics 50, 3 (2019), 1132\u20131145.","journal-title":"IEEE Transactions on Cybernetics"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.23919\/TST.2017.8195348"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589002"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3672400"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.632"},{"key":"e_1_3_1_44_2","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.patrec.2017.06.012","article-title":"Back projection: An effective postprocessing method for GAN-based face sketch synthesis","volume":"107","author":"Wang Nannan","year":"2018","unstructured":"Nannan Wang, Wenjin Zha, Jie Li, and Xinbo Gao. 2018. Back projection: An effective postprocessing method for GAN-based face sketch synthesis. Pattern Recognition Letters 107 (2018), 59\u201365.","journal-title":"Pattern Recognition Letters"},{"key":"e_1_3_1_45_2","first-page":"10743","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yi Ran","year":"2019","unstructured":"Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin. 2019. Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 10743\u201310752."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2020.2972944"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.3030536"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-022-03302-z"},{"issue":"2","key":"e_1_3_1_49_2","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1007\/s00138-024-01658-5","article-title":"Ipdm: Identity preserving diffusion model for face sketch and photo synthesis","volume":"36","author":"Tang Duoxun","year":"2025","unstructured":"Duoxun Tang, Xinhang Jiang, Ying Zhang, Yuhang Dai, and Ye Lin. 2025. Ipdm: Identity preserving diffusion model for face sketch and photo synthesis. Machine Vision and Applications 36, 2 (2025), 34.","journal-title":"Machine Vision and Applications"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591513"},{"key":"e_1_3_1_51_2","first-page":"1829","article-title":"Lightweight text-driven image editing with disentangled content and attributes","author":"Li Bo","year":"2023","unstructured":"Bo Li, Xiao Lin, Bin Liu, Zhi-Fen He, and Yu-Kun Lai. 2023. Lightweight text-driven image editing with disentangled content and attributes. IEEE Transactions on Multimedia 26 (2023), 1829\u20131841.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_52_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3730403","article-title":"StyleInject: Parameter efficient tuning of text-to-image diffusion models","author":"Zhou Mohan","year":"2025","unstructured":"Mohan Zhou, Yalong Bai, Qing Yang, and Tiejun Zhao. 2025. StyleInject: Parameter efficient tuning of text-to-image diffusion models. ACM Transactions on Multimedia Computing, Communications and Applications 21, 5 (2025), 1\u201322.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"issue":"1","key":"e_1_3_1_53_2","first-page":"1","article-title":"4d facial expression diffusion model","volume":"21","author":"Zou Kaifeng","year":"2024","unstructured":"Kaifeng Zou, Sylvain Faisan, Boyang Yu, S\u00e9bastien Valette, and Hyewon Seo. 2024. 4d facial expression diffusion model. ACM Transactions on Multimedia Computing, Communications and Applications 21, 1 (2024), 1\u201323.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_54_2","first-page":"1","article-title":"DiFace: Cross-Modal face recognition through controlled diffusion","author":"Sun Bowen","year":"2025","unstructured":"Bowen Sun, Guo Lu, and Shibao Zheng. 2025. DiFace: Cross-Modal face recognition through controlled diffusion. ACM Transactions on Multimedia Computing, Communications and Applications 21, 6 (2025), 1\u201322.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3712064"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00585"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01764"},{"key":"e_1_3_1_58_2","unstructured":"Amir Hertz Ron Mokady Jay Tenenbaum Kfir Aberman Yael Pritch and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. arXiv:2208.01626. Retrieved from https:\/\/arxiv.org\/abs\/2208.01626"},{"key":"e_1_3_1_59_2","first-page":"89","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Gafni Oran","year":"2022","unstructured":"Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, and Yaniv Taigman. 2022. Make-a-scene: Scene-based text-to-image generation with human priors. In Proceedings of the European Conference on Computer Vision. Springer, 89\u2013106."},{"key":"e_1_3_1_60_2","unstructured":"Ziqi Huang Tianxing Wu Yuming Jiang Kelvin C. K. Chan and Ziwei Liu. 2023. ReVersion: Diffusion-based relation inversion from images. arXiv:2303.13495. Retrieved from https:\/\/arxiv.org\/abs\/2303.13495"},{"key":"e_1_3_1_61_2","first-page":"108185","article-title":"Exploiting gaussian agnostic representation learning with diffusion priors for enhanced infrared small target detection","author":"Li Junyao","year":"2025","unstructured":"Junyao Li, Yahao Lu, Xingyuan Guo, Xiaoyu Xian, Tiantian Wang, and Yukai Shi. 2025. Exploiting gaussian agnostic representation learning with diffusion priors for enhanced infrared small target detection. Neural Networks (2025), 108185.","journal-title":"Neural Networks"},{"key":"e_1_3_1_62_2","unstructured":"Chenlin Meng Yang Song Jiaming Song Jiajun Wu Jun-Yan Zhu and Stefano Ermon. 2021. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv:2108.01073. Retrieved from https:\/\/arxiv.org\/abs\/2108.01073"},{"key":"e_1_3_1_63_2","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00582"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00191"},{"key":"e_1_3_1_66_2","first-page":"8302","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cho Hansam","year":"2024","unstructured":"Hansam Cho, Jonghyun Lee, Seunggyu Chang, and Yonghyun Jeong. 2024. One-shot structure-aware stylized image synthesis. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 8302\u20138311."},{"key":"e_1_3_1_67_2","first-page":"4625","volume-title":"Proceedings of the 39th AAAI Conference on Artificial Intelligence","author":"Li Bonan","year":"2025","unstructured":"Bonan Li, Zicheng Zhang, Xuecheng Nie, Congying Han, Yinhan Hu, Xinmin Qiu, and Tiande Guo. 2025. Styo: Stylize your face in only one-shot. In Proceedings of the 39th AAAI Conference on Artificial Intelligence, 4625\u20134633."},{"key":"e_1_3_1_68_2","first-page":"51008","article-title":"Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery","volume":"36","author":"Wen Yuxin","year":"2023","unstructured":"Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2023. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Advances in Neural Information Processing Systems 36 (2023), 51008\u201351025.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_69_2","unstructured":"I. Loshchilov. 2017. Decoupled weight decay regularization. arXiv:1711.05101. Retrieved from https:\/\/arxiv.org\/abs\/1711.05101"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_3_1_72_2","article-title":"Gans trained by a two time-scale update rule converge to a local nash equilibrium","volume":"30","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00482"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3253773"},{"key":"e_1_3_1_75_2","first-page":"9192","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Nam Hyelin","year":"2024","unstructured":"Hyelin Nam, Gihyun Kwon, Geon Yeong Park, and Jong Chul Ye. 2024. Contrastive denoising score for text-guided latent diffusion image editing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 9192\u20139201."},{"key":"e_1_3_1_76_2","first-page":"15298","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Chinchuthakun Worameth","year":"2025","unstructured":"Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong, Pitchaporn Rewatbowornwong, Pramook Khungurn, and Supasorn Suwajanakorn. 2025. LUSD: Localized update score distillation for text-guided image editing. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 15298\u201315307."},{"key":"e_1_3_1_77_2","first-page":"7787","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhou Yang","year":"2024","unstructured":"Yang Zhou, Zichong Chen, and Hui Huang. 2024. Deformable one-shot face stylization via dino semantic guidance. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 7787\u20137796."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3803012","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T14:58:03Z","timestamp":1776783483000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3803012"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,21]]},"references-count":76,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2026,5,31]]}},"alternative-id":["10.1145\/3803012"],"URL":"https:\/\/doi.org\/10.1145\/3803012","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,21]]},"assertion":[{"value":"2025-06-18","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-03-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-04-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}