{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T17:56:56Z","timestamp":1768240616208,"version":"3.49.0"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"name":"Shenzhen Science and Technology Program","award":["CJGJZD20220517142402006"],"award-info":[{"award-number":["CJGJZD20220517142402006"]}]},{"name":"Major Key Project of Pengcheng Laboratory","award":["PCL2025A12"],"award-info":[{"award-number":["PCL2025A12"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>Pose-guided human image generation aims to render a source image in a specific pose. Current methods predominantly employ 2D-based signals, which exhibit inherent information deficits, as pose conditions. This leads to difficulty in establishing precise source-target appearance-pose correspondence and further causing uncertainty in predicting self-occluded regions\u2019 appearance. To address these issues, we propose a 3D Pose Conditional Diffusion model (3DPCD) that leverages a human parametric model to integrate comprehensive and adjustable 3D control into forward\u2013backward diffusion steps. Specifically, we employ Fourier-transformed SMPL-X as the 3D pose representation to facilitate precise source-target correspondence by understanding the complete pose information. Building on this, we further propose a hierarchical appearance-pose alignment method, which aligns appearance with the complete pose information at both global and local levels. Moreover, motivated by the fact that human pose transformation is a progressive process in 3D space and our 3D pose representation is adjustable, we integrate progressively interpolated 3D control into a series of sampling steps. This effectively mitigates uncertainties in pixel transfer between poses. It should be noted that the proposed explicit pose-guided strategy also supports flexible adjustment of pose, shape, and viewpoint. Both quantitative and qualitative evaluations demonstrate that our 3DPCD outperforms state-of-the-art methods on the widely used DeepFashion InShop benchmark and our newly constructed PoseWeb-33 dataset, which features richer appearance variations and more diverse conditional poses.<\/jats:p>","DOI":"10.1145\/3778044","type":"journal-article","created":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T14:49:34Z","timestamp":1765205374000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing Pose-Guided Human Image Generation with Comprehensive and Adjustable 3D Control"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-3099-4773","authenticated-orcid":false,"given":"Xin","family":"Dong","sequence":"first","affiliation":[{"name":"Shenzhen International Graduate School, Tsinghua University, Shenzhen, China and Peng Cheng Laboratory, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9221-2776","authenticated-orcid":false,"given":"Lihan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shenzhen International Graduate School, Tsinghua University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9824-7287","authenticated-orcid":false,"given":"Aoyang","family":"Liu","sequence":"additional","affiliation":[{"name":"Shenzhen International Graduate School, Tsinghua University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1712-7983","authenticated-orcid":false,"given":"Xiaojun","family":"Liang","sequence":"additional","affiliation":[{"name":"Peng Cheng Laboratory, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2168-1242","authenticated-orcid":false,"given":"Yutao","family":"Guo","sequence":"additional","affiliation":[{"name":"Shenzhen International Graduate School, Tsinghua University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1534-4549","authenticated-orcid":false,"given":"Yansong","family":"Tang","sequence":"additional","affiliation":[{"name":"Shenzhen International Graduate School, Tsinghua University, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2026,1,12]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"5968","volume-title":"CVPR","author":"Bhunia Ankan Kumar","year":"2023","unstructured":"Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, and Fahad Shahbaz Khan. 2023. Person image synthesis via denoising diffusion model. In CVPR, 5968\u20135976."},{"key":"e_1_3_2_3_2","first-page":"1","article-title":"SMPler-X: Scaling up expressive human pose and shape estimation","volume":"36","author":"Cai Zhongang","year":"2024","unstructured":"Zhongang Cai, Wanqi Yin, Ailing Zeng, Chen Wei, Qingping Sun, Wang Yanjun, Hui En Pang, Haiyi Mei, Mingyuan Zhang, Lei Zhang, et al. 2024. SMPler-X: Scaling up expressive human pose and shape estimation. In NeurIPS, Vol. 36, 1\u201315.","journal-title":"NeurIPS"},{"key":"e_1_3_2_4_2","doi-asserted-by":"crossref","unstructured":"Weifeng Chen Tao Gu Yuhao Xu and Chengcai Chen. 2024. Magic clothing: Controllable garment-driven image synthesis. arXiv:2404.09512. Retrieved from https:\/\/arxiv.org\/abs\/2404.09512","DOI":"10.1145\/3664647.3680691"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW60793.2023.00451"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01391"},{"key":"e_1_3_2_7_2","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow Ian J.","year":"2014","unstructured":"Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS, Vol. 27, 2672\u20132680.","journal-title":"NeurIPS"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612255"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02081"},{"key":"e_1_3_2_10_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS, Vol. 33, 6840\u20136851.","journal-title":"NeurIPS"},{"key":"e_1_3_2_11_2","first-page":"8153","volume-title":"CVPR","author":"Hu Li","year":"2024","unstructured":"Li Hu. 2024. Animate anyone: Consistent and controllable image-to-video synthesis for character animation. In CVPR, 8153\u20138163."},{"key":"e_1_3_2_12_2","unstructured":"Zehuan Huang Hongxing Fan Lipeng Wang and Lu Sheng. 2024. From parts to whole: A unified reference framework for controllable human image generation. In CVPR."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2025.3559891"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01465"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00744"},{"key":"e_1_3_2_16_2","unstructured":"Diederik P. Kingma. 2013. Auto-encoding variational bayes. arXiv:1312.6114. Retrieved from https:\/\/arxiv.org\/abs\/1312.6114"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00381"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02156"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3115628"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.124"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00614"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01066"},{"key":"e_1_3_2_25_2","first-page":"406","article-title":"Pose guided person image generation","volume":"30","author":"Ma Liqian","year":"2017","unstructured":"Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. 2017. Pose guided person image generation. In NeurIPS, Vol. 30, 406\u2013416.","journal-title":"NeurIPS"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00513"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i5.28226"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_8"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01123"},{"key":"e_1_3_2_30_2","volume-title":"ICML","author":"Pham Trung X.","year":"2024","unstructured":"Trung X. Pham, Kang Zhang, and Chang D. Yoo. 2024. Cross-view masked diffusion transformers for person image synthesis. In ICML."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3161985"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00517"},{"key":"e_1_3_2_33_2","unstructured":"Tianhe Ren Shilong Liu Ailing Zeng Jing Lin Kunchang Li He Cao Jiayu Chen Xinyu Huang Yukang Chen Feng Yan et al. 2024. Grounded SAM: Assembling open-world models for diverse visual tasks. arXiv:2401.14159. Retrieved from https:\/\/arxiv.org\/abs\/2401.14159"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01317"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00771"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_37_2","unstructured":"Fei Shen Hu Ye Jun Zhang Cong Wang Xiao Han and Wei Yang. 2023. Advancing pose-guided image synthesis with progressive conditional diffusion models. arXiv:2310.06313. Retrieved from https:\/\/arxiv.org\/abs\/2310.06313"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00359"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3658221"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.00330"},{"key":"e_1_3_2_41_2","unstructured":"Shengeng Tang Jiayi He Dan Guo Yanyan Wei Feng Li and Richang Hong. 2025. Sign-IDD: Iconicity disentangled diffusion for sign language production. In AAAI Vol. 39 7266\u20137274."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00558"},{"key":"e_1_3_2_43_2","first-page":"65670","article-title":"Stable-pose: Leveraging transformers for pose-guided text-to-image generation","volume":"37","author":"Wang Jiajun","year":"2024","unstructured":"Jiajun Wang, Morteza Ghahremani Boozandani, Yitong Li, Bj\u00f6rn Ommer, and Christian Wachinger. 2024. Stable-pose: Leveraging transformers for pose-guided text-to-image generation. In NeurIPS, Vol. 37, 65670\u201365698.","journal-title":"NeurIPS"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-024-02056-0"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00755"},{"key":"e_1_3_2_46_2","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS, Vol. 35, 24824\u201324837.","journal-title":"NeurIPS"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00987"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00670"},{"key":"e_1_3_2_49_2","unstructured":"Hu Ye Jun Zhang Sibo Liu Xiao Han and Wei Yang. 2023. IP-Adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv:2308.06721. Retrieved from https:\/\/arxiv.org\/abs\/2308.06721"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3054775"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00565"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00789"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3104166"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-024-02079-7"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00756"},{"key":"e_1_3_2_57_2","volume-title":"ICASSP","author":"Zhou Pengfei","year":"2024","unstructured":"Pengfei Zhou, Xukun Shen, and Yong Hu. 2024. Text-driven 3D human generation via contrastive preference optimization. In ICASSP."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_10"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01130"},{"key":"e_1_3_2_60_2","volume-title":"ECCV","author":"Zhu Shenhao","year":"2024","unstructured":"Shenhao Zhu, Junming Leo Chen, Zuozhuo Dai, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, and Siyu Zhu. 2024. Champ: Controllable and consistent human image animation with 3D parametric guidance. In ECCV."},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00245"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3778044","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T14:28:41Z","timestamp":1768228121000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3778044"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,12]]},"references-count":60,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3778044"],"URL":"https:\/\/doi.org\/10.1145\/3778044","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,12]]},"assertion":[{"value":"2025-03-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-21","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}