{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T16:05:21Z","timestamp":1779379521426,"version":"3.53.1"},"reference-count":32,"publisher":"Wiley","issue":"3","license":[{"start":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T00:00:00Z","timestamp":1779321600000},"content-version":"vor","delay-in-days":20,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"},{"start":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T00:00:00Z","timestamp":1777593600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computer Animation &amp;amp; Virtual"],"published-print":{"date-parts":[[2026,5]]},"abstract":"<jats:title>ABSTRACT<\/jats:title>\n                  <jats:p>Compositing human figures into scene images has broad applications in areas such as entertainment and advertising. However, existing methods often cannot handle occlusion of the inserted person by foreground objects and unnaturally place the person in the frontmost layer. Moreover, they offer limited control over the inserted person's pose. To address these challenges, we propose two methods. Both allow explicit pose control via a 3D body model and leverage latent diffusion models to synthesize the person at a contextually appropriate depth, naturally handling occlusions without requiring occlusion masks. The first is a two\u2010stage approach: the model first learns a depth map of the scene with the person through supervised learning, and then synthesizes the person accordingly. The second method learns occlusion implicitly and synthesizes the person directly from input data without explicit depth supervision. Quantitative and qualitative evaluations show that both methods outperform existing approaches by better preserving scene consistency while accurately reflecting occlusions and user\u2010specified poses.<\/jats:p>","DOI":"10.1002\/cav.70119","type":"journal-article","created":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T15:12:02Z","timestamp":1779376322000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Person\u2010In\u2010Situ: Scene\u2010Consistent Human Image Insertion With Occlusion\u2010Aware Pose Control"],"prefix":"10.1002","volume":"37","author":[{"given":"Shun","family":"Masuda","sequence":"first","affiliation":[{"name":"University of Tsukuba  Ibaraki Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuki","family":"Endo","sequence":"additional","affiliation":[{"name":"University of Tsukuba  Ibaraki Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yoshihiro","family":"Kanamori","sequence":"additional","affiliation":[{"name":"University of Tsukuba  Ibaraki Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"311","published-online":{"date-parts":[[2026,5,21]]},"reference":[{"key":"e_1_2_11_2_1","doi-asserted-by":"crossref","unstructured":"S.Kulal T.Brooks A.Aiken et al. \u201cPutting People in Their Place: Affordance\u2010Aware Human Insertion Into Scenes \u201d In:CVPR 17089\u201317099(2023).","DOI":"10.1109\/CVPR52729.2023.01639"},{"issue":"6","key":"e_1_2_11_3_1","doi-asserted-by":"crossref","first-page":"248:1","DOI":"10.1145\/2816795.2818013","article-title":"SMPL: A Skinned Multi\u2010Person Linear Model","volume":"34","author":"Loper M.","year":"2015","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_11_4_1","doi-asserted-by":"crossref","unstructured":"R.Rombach A.Blattmann D.Lorenz P.Esser andB.Ommer \u201cHigh\u2010Resolution Image Synthesis With Latent Diffusion Models \u201d In:CVPR 10674\u201310685(2022).","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_11_5_1","doi-asserted-by":"crossref","unstructured":"A. K.Bhunia S. H.Khan H.Cholakkal et al. \u201cPerson Image Synthesis via Denoising Diffusion Model \u201d In:CVPR 5968\u20135976(2023).","DOI":"10.1109\/CVPR52729.2023.00578"},{"key":"e_1_2_11_6_1","doi-asserted-by":"crossref","unstructured":"X.Han X.Zhu J.Deng Y.Song andT.Xiang \u201cControllable Person Image Synthesis With Pose\u2010Constrained Latent Diffusion \u201d In:ICCV 22711\u201322720(2023).","DOI":"10.1109\/ICCV51070.2023.02081"},{"key":"e_1_2_11_7_1","doi-asserted-by":"crossref","unstructured":"Y.Okuyama Y.Endo andY.Kanamori \u201cDiffBody: Diffusion\u2010Based Pose and Shape Editing of Human Images. In:WACV 6333\u20136342\u201d(2024).","DOI":"10.1109\/WACV57701.2024.00621"},{"key":"e_1_2_11_8_1","doi-asserted-by":"crossref","unstructured":"L.Hu \u201cAnimate Anyone: Consistent and Controllable Image\u2010To\u2010Video Synthesis for Character Animation \u201d In:CVPR 8153\u20138163(2024).","DOI":"10.1109\/CVPR52733.2024.00779"},{"key":"e_1_2_11_9_1","doi-asserted-by":"crossref","unstructured":"Z.Xu J.Zhang J. H.Liew et al. \u201cMagicAnimate: Temporally Consistent Human Image Animation Using Diffusion Model \u201d In:CVPR 1481\u20131490(2024).","DOI":"10.1109\/CVPR52733.2024.00147"},{"key":"e_1_2_11_10_1","doi-asserted-by":"crossref","unstructured":"S.Zhu J. L.Chen Z.Dai et al. \u201cChamp: Controllable and Consistent Human Image Animation With 3D Parametric Guidance \u201d In:ECCV. 15113 ofLecture Notes in Computer Science 145\u2013162(2024).","DOI":"10.1007\/978-3-031-73001-6_9"},{"key":"e_1_2_11_11_1","doi-asserted-by":"crossref","unstructured":"B.Yang S.Gu B.Zhang et al. \u201cPaint by Example: Exemplar\u2010Based Image Editing With Diffusion Models \u201d In:CVPR 18381\u201318391(2023).","DOI":"10.1109\/CVPR52729.2023.01763"},{"key":"e_1_2_11_12_1","doi-asserted-by":"crossref","unstructured":"X.Chen L.Huang Y.Liu Y.Shen D.Zhao andH.Zhao \u201cAnyDoor: Zero\u2010Shot Object\u2010Level Image Customization \u201d In:CVPR 6593\u20136602(2024).","DOI":"10.1109\/CVPR52733.2024.00630"},{"key":"e_1_2_11_13_1","unstructured":"J.Lee H.Cho Y. J.Yoo S. B.Kim andY.Jeong \u201cCompose and Conquer: Diffusion\u2010Based 3D Depth Aware Composable Image Synthesis \u201d In: ICLR(2024)."},{"key":"e_1_2_11_14_1","unstructured":"L.Yang B.Kang Z.Huang et al. \u201cDepth Anything V2 \u201d In: NeurIPS(2024)."},{"key":"e_1_2_11_15_1","doi-asserted-by":"crossref","unstructured":"K.He G.Gkioxari P.Doll\u00e1r andR. B.Girshick \u201cMask R\u2010CNN \u201d In:ICCV 2980\u20132988(2017).","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_2_11_16_1","unstructured":"luca\u2010medeiros \u201cLanguage Segment\u2010Anything\u201d https:\/\/github.com\/luca\u2010medeiros\/lang\u2010segment\u2010anythingaccessed on 8\/19\/2024."},{"key":"e_1_2_11_17_1","doi-asserted-by":"crossref","unstructured":"S.Liu Z.Zeng T.Ren et al. \u201cGrounding DINO: Marrying DINO With Grounded Pre\u2010Training for Open\u2010Set Object Detection \u201d In:ECCV. 15105 ofLecture Notes in Computer Science 38\u201355(2024).","DOI":"10.1007\/978-3-031-72970-6_3"},{"key":"e_1_2_11_18_1","doi-asserted-by":"crossref","unstructured":"A.Kirillov E.Mintun N.Ravi et al. \u201cSegment Anything \u201darXiv:2304.02643(2023).","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"e_1_2_11_19_1","doi-asserted-by":"crossref","unstructured":"Q.Fang K.Chen Y.Fan Q.Shuai J.Li andW.Zhang \u201cLearning Analytical Posterior Probability for Human Mesh Recovery \u201d In:CVPR 8781\u20138791(2023).","DOI":"10.1109\/CVPR52729.2023.00848"},{"key":"e_1_2_11_20_1","doi-asserted-by":"crossref","unstructured":"B.Ke A.Obukhov S.Huang N.Metzger R. C.Daudt andK.Schindler \u201cRepurposing Diffusion\u2010Based Image Generators for Monocular Depth Estimation \u201d In:CVPR 9492\u20139502(2024).","DOI":"10.1109\/CVPR52733.2024.00907"},{"key":"e_1_2_11_21_1","unstructured":"J.HoandT.Salimans \u201cClassifier\u2010Free Diffusion Guidance \u201darXiv:2207.12598(2022)."},{"key":"e_1_2_11_22_1","unstructured":"P.vonPlaten S.Patil A.Lozhkov et al. \u201cDiffusers: State\u2010of\u2010the\u2010Art Diffusion Models\u201d(2022) https:\/\/github.com\/huggingface\/diffusersaccessed on 10\/4\/2024."},{"key":"e_1_2_11_23_1","doi-asserted-by":"crossref","unstructured":"M.Andriluka L.Pishchulin P. V.Gehler andB.Schiele \u201c2D Human Pose Estimation: New Benchmark and State of the Art Analysis \u201d In:CVPR 3686\u20133693(2014).","DOI":"10.1109\/CVPR.2014.471"},{"key":"e_1_2_11_24_1","unstructured":"J.Carreira E.Noland C.Hillier andA.Zisserman \u201cA Short Note on the Kinetics\u2010700 Human Action Dataset \u201darXiv:1907.06987(2022)."},{"key":"e_1_2_11_25_1","doi-asserted-by":"crossref","unstructured":"A.Diba M.Fayyaz V.Sharma et al. \u201cLarge Scale Holistic Video Understanding \u201d In:European Conference on Computer Vision 593\u2013610(2020).","DOI":"10.1007\/978-3-030-58558-7_35"},{"key":"e_1_2_11_26_1","unstructured":"W.Kay J.Carreira K.Simonyan et al. \u201cThe Kinetics Human Action Video Dataset \u201darXiv:1705.06950(2017)."},{"key":"e_1_2_11_27_1","doi-asserted-by":"crossref","unstructured":"G. A.Sigurdsson G.Varol X.Wang A.Farhadi I.Laptev andA.Gupta \u201cHollywood in Homes: Crowdsourcing Data Collection for Activity Understanding \u201d In:ECCV (1). 9905 ofLecture Notes in Computer Science 510\u2013526(2016).","DOI":"10.1007\/978-3-319-46448-0_31"},{"key":"e_1_2_11_28_1","unstructured":"I.LoshchilovandF.Hutter \u201cDecoupled Weight Decay Regularization \u201d In:ICLR 2019OpenReview.net 2019."},{"key":"e_1_2_11_29_1","unstructured":"A.Radford J. W.Kim C.Hallacy et al. \u201cLearning Transferable Visual Models From Natural Language Supervision \u201d In:ICML. 139 ofProceedings of Machine Learning Research 8748\u20138763(2021)."},{"key":"e_1_2_11_30_1","unstructured":"M.Oquab T.Darcet T.Moutakanni et al. \u201cDINOv2: Learning Robust Visual Features Without Supervision \u201d Transactions on Machine Learning Research(2024)."},{"key":"e_1_2_11_31_1","unstructured":"\u201cPixabay\u201d https:\/\/pixabay.com\/accessed on 11\/3\/2025."},{"key":"e_1_2_11_32_1","doi-asserted-by":"crossref","unstructured":"R.Parihar H.Gupta V. S.Sachidanand andR. V.Babu \u201cText2Place: Affordance\u2010Aware Text Guided Human Placement \u201d In:ECCV. 15061 ofLecture Notes in Computer Science 57\u201377(2024).","DOI":"10.1007\/978-3-031-72646-0_4"},{"key":"e_1_2_11_33_1","doi-asserted-by":"crossref","unstructured":"R.Suvorov E.Logacheva A.Mashikhin et al. \u201cResolution\u2010Robust Large Mask Inpainting With Fourier Convolutions \u201d In:WACV 3172\u20133182(2022).","DOI":"10.1109\/WACV51458.2022.00323"}],"container-title":["Computer Animation and Virtual Worlds"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cav.70119","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/cav.70119","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cav.70119","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T15:12:20Z","timestamp":1779376340000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cav.70119"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,5]]}},"alternative-id":["10.1002\/cav.70119"],"URL":"https:\/\/doi.org\/10.1002\/cav.70119","archive":["Portico"],"relation":{},"ISSN":["1546-4261","1546-427X"],"issn-type":[{"value":"1546-4261","type":"print"},{"value":"1546-427X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5]]},"assertion":[{"value":"2025-06-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-05-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e70119"}}