{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,28]],"date-time":"2026-05-28T01:16:22Z","timestamp":1779930982395,"version":"3.53.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"11","funder":[{"name":"National Key Research and Development Program of China","award":["2023YFC3305600"],"award-info":[{"award-number":["2023YFC3305600"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U2336212"],"award-info":[{"award-number":["U2336212"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62472381"],"award-info":[{"award-number":["62472381"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["226-2024-00058"],"award-info":[{"award-number":["226-2024-00058"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Fundamental Research Funds for the Zhejiang Provincial Universities","award":["226-2024-00208"],"award-info":[{"award-number":["226-2024-00208"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>\n                    To achieve content-consistent results in text-conditioned image editing, existing methods typically employ a\n                    <jats:italic toggle=\"yes\">reconstruction<\/jats:italic>\n                    <jats:italic toggle=\"yes\">branch<\/jats:italic>\n                    to capture the source image details via diffusion inversion and a\n                    <jats:italic toggle=\"yes\">generation branch<\/jats:italic>\n                    to synthesize the target image based on the given textual prompt and the masked source image details. However, accurately segmenting source details is challenging with the current fixed-threshold mask strategy. Additionally, the inadequacies in the inversion process can lead to insufficient retention of source details. In this article, we propose a method called\n                    <jats:italic toggle=\"yes\">SAMControl (Soft Attention Mask)<\/jats:italic>\n                    to adaptively control the pose and object details for image editing. SAMControl dynamically learns flexible attention masks for different images at various diffusion steps. Furthermore, in the reconstruction branch, we utilize a direct inversion technique to ensure the fidelity of source details within SAM. Extensive qualitative and quantitative results demonstrate the effectiveness of the proposed method.\n                  <\/jats:p>","DOI":"10.1145\/3702999","type":"journal-article","created":{"date-parts":[[2024,11,5]],"date-time":"2024-11-05T11:38:18Z","timestamp":1730806698000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["SAMControl: Controlling Pose and Object for Image Editing with Soft Attention Mask"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0431-6390","authenticated-orcid":false,"given":"Yue","family":"Zhang","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1297-768X","authenticated-orcid":false,"given":"Chao","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Technology Sydney, Sydney, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-9053-8972","authenticated-orcid":false,"given":"Fei","family":"Fang","sequence":"additional","affiliation":[{"name":"Qingdao University of Technology, Qingdao, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4288-4516","authenticated-orcid":false,"given":"Yunzhi","family":"Zhuge","sequence":"additional","affiliation":[{"name":"Dalian University of Technology, Dalian, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9572-2345","authenticated-orcid":false,"given":"Hehe","family":"Fan","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7778-8807","authenticated-orcid":false,"given":"Xiaojun","family":"Chang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2620-3247","authenticated-orcid":false,"given":"Cheng","family":"Deng","sequence":"additional","affiliation":[{"name":"Xidian University, X\u2019ian, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0512-880X","authenticated-orcid":false,"given":"Yi","family":"Yang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,11,10]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"8861","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Brack Manuel","year":"2024","unstructured":"Manuel Brack, Felix Friedrich, Katharia Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, and Apolin\u00e1rio Passos. 2024. Ledits++: Limitless image editing using text-to-image models. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 8861\u20138870."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01764"},{"key":"e_1_3_1_4_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33, 1877\u20131901.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_5_2","first-page":"22503","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV \u201923)","author":"Cao Mingdeng","year":"2023","unstructured":"Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, and Yinqiang Zheng. 2023. MasaCtrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV \u201923). IEEE, 22503\u201322513."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00574"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02078"},{"key":"e_1_3_1_8_2","first-page":"1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023 (NeurIPS \u201923)","volume":"36","author":"Clark Kevin","year":"2023","unstructured":"Kevin Clark and Priyank Jaini. 2023. Text-to-image diffusion models are zero shot classifiers. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023 (NeurIPS \u201923).Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (Eds.), Vol. 36. IEEE, 1\u201313."},{"key":"e_1_3_1_9_2","first-page":"1","volume-title":"The Eleventh International Conference on Learning Representations, ICLR","author":"Couairon Guillaume","year":"2023","unstructured":"Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord. 2023. DiffEdit: Diffusion-based semantic image editing with mask guidance. In The Eleventh International Conference on Learning Representations, ICLR. OpenReview.net, Kigali, Rwanda, 1\u201310."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589002"},{"key":"e_1_3_1_11_2","first-page":"8780","article-title":"Diffusion models beat gans on image synthesis","volume":"34","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. In Proceedings of the Advances in Neural Information Processing Systems 34 (2021), 8780\u20138794.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00683"},{"key":"e_1_3_1_13_2","first-page":"1","volume-title":"Proceedings of the 11th International Conference on Learning Representations (ICLR \u201923)","author":"Gal Rinon","year":"2023","unstructured":"Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-Or. 2023. An image is worth one word: Personalizing text-to-image generation using textual inversion. In Proceedings of the 11th International Conference on Learning Representations (ICLR \u201923). OpenReview.net, 1\u201318."},{"key":"e_1_3_1_14_2","first-page":"12709","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201924)","author":"Geng Zigang","year":"2024","unstructured":"Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Houqiang Li, Han Hu, Dong Chen, and Baining Guo. 2024. InstructDiffusion: A generalist modeling interface for vision tasks. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201924). IEEE, 12709\u201312720."},{"key":"e_1_3_1_15_2","first-page":"1","volume-title":"Proceedings of the 10th International Conference on Learning Representations (ICLR\u00a0\u201922)","volume":"1","author":"Gu Jiatao","year":"2022","unstructured":"Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. 2022. StyleNeRF: A style-based 3D aware generator for high-resolution image synthesis. In Proceedings of the 10th International Conference on Learning Representations (ICLR\u00a0\u201922), Vol. 1. OpenReview.net, Virtual Event, 1\u201312."},{"key":"e_1_3_1_16_2","first-page":"1","article-title":"Optimizing prompts for text-to-image generation","volume":"36","author":"Hao Yaru","year":"2024","unstructured":"Yaru Hao, Zewen Chi, Li Dong, and Furu Wei. 2024. Optimizing prompts for text-to-image generation. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36, 1\u201312.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_17_2","first-page":"1","volume-title":"Proceedings of the 11th International Conference on Learning Representations (ICLR \u201923)","author":"Hertz Amir","year":"2023","unstructured":"Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2023. Prompt-to-prompt image editing with cross-attention control. In Proceedings of the 11th International Conference on Learning Representations (ICLR \u201923). OpenReview.net, 1\u201313."},{"key":"e_1_3_1_18_2","first-page":"12469","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201924)","author":"Huberman-Spiegelglas Inbar","year":"2024","unstructured":"Inbar Huberman-Spiegelglas, Vladimir Kulikov, and Tomer Michaeli. 2024. An edit friendly DDPM noise space: Inversion and manipulations. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201924). IEEE, 12469\u201312478."},{"key":"e_1_3_1_19_2","unstructured":"Xuan Ju Ailing Zeng Yuxuan Bian Shaoteng Liu and Qiang Xu. 2023. Direct inversion: Boosting diffusion-based editing with 3 lines of code. arXiv:2310.01506 [cs.CV]."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00813"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00582"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00246"},{"key":"e_1_3_1_24_2","first-page":"852","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Kim Hyunsu","year":"2021","unstructured":"Hyunsu Kim, Yunjey Choi, Junho Kim, Sungjoo Yoo, and Youngjung Uh. 2021. Exploiting spatial dimensions of latent in gan for real-time image editing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 852\u2013861."},{"key":"e_1_3_1_25_2","first-page":"1","article-title":"Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing","volume":"36","author":"Li Dongxu","year":"2024","unstructured":"Dongxu Li, Junnan Li, and Steven Hoi. 2024. Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36, 1\u201313.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_26_2","first-page":"6254","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Shanglin","year":"2024","unstructured":"Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xiuhui Liu, Jiaming Liu, Lin Li, Xu Tang, Yao Hu, Jianzhuang Liu, et al. 2024. Zone: Zero-shot instruction-guided local editing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 6254\u20136263."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3614097"},{"key":"e_1_3_1_28_2","first-page":"1","volume-title":"Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922)","author":"Meng Chenlin","year":"2022","unstructured":"Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2022. SDEdit: Guided image synthesis and editing with stochastic differential equations. In Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922). OpenReview.net, Virtual Event, 1\u201313."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_3_1_30_2","first-page":"4906","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201924)","author":"Min Zhiyuan","year":"2024","unstructured":"Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, and Yi Yang. 2024. Entangled view-epipolar information aggregation for generalizable neural radiance fields. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201924). IEEE, 4906\u20134916."},{"key":"e_1_3_1_31_2","unstructured":"Daiki Miyake Akihiro Iohara Yu Saito and Toshiyuki Tanaka. 2023. Negative-prompt Inversion: Fast image inversion for editing with text-guided diffusion models. arXiv:2305.16807. Retrieved from https:\/\/arxiv.org\/pdf\/2305.16807."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00585"},{"key":"e_1_3_1_33_2","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations (ICLR \u201924)","author":"Mou Chong","year":"2024","unstructured":"Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, and Jian Zhang. 2024. DragonDiffusion: Enabling drag-style manipulation on diffusion models. In Proceedings of the 12th International Conference on Learning Representations (ICLR \u201924). OpenReview.net, 1\u201313."},{"key":"e_1_3_1_34_2","first-page":"4296","volume-title":"Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI \u201924), Proceedings of the 36th Conference on Innovative Applications of Artificial Intelligence (IAAI \u201924), Proceedings of the 14th Symposium on Educational Advances in Artificial Intelligence","author":"Mou Chong","year":"2024","unstructured":"Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. 2024. T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI \u201924), Proceedings of the 36th Conference on Innovative Applications of Artificial Intelligence (IAAI \u201924), Proceedings of the 14th Symposium on Educational Advances in Artificial Intelligence. Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natarajan (Eds.), AAAI Press, 4296\u20134304."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591513"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00209"},{"key":"e_1_3_1_37_2","first-page":"15886","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV \u201923)","author":"Qi Chenyang","year":"2023","unstructured":"Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, and Qifeng Chen. 2023. FateZero: Fusing attentions for zero-shot text-based video editing. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV \u201923). IEEE, 15886\u201315896."},{"key":"e_1_3_1_38_2","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, IEEE, virtually online, 8748\u20138763."},{"key":"e_1_3_1_39_2","first-page":"8821","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning. PMLR, IEEE, virtually online, 8821\u20138831."},{"key":"e_1_3_1_40_2","first-page":"1060","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Reed Scott","year":"2016","unstructured":"Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In Proceedings of the International Conference on Machine Learning. PMLR, IEEE, 1060\u20131069."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_1_42_2","first-page":"20154","article-title":"Graf: Generative radiance fields for 3d-aware image synthesis","volume":"33","author":"Schwarz Katja","year":"2020","unstructured":"Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. Graf: Generative radiance fields for 3d-aware image synthesis. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33, 20154\u201320166.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_43_2","first-page":"8839","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201924)","author":"Shi Yujun","year":"2024","unstructured":"Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, and Song Bai. 2024. DragDiffusion: Harnessing diffusion models for interactive point-based image editing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201924). IEEE, 8839\u20138849."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01739"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00191"},{"key":"e_1_3_1_46_2","unstructured":"Andrey Voynov Qinghao Chu Daniel Cohen-Or and Kfir Aberman. 2023. P+: Extended textual conditioning in text-to-image generation. arXiv:abs\/2303.09522. Retrieved from https:\/\/arxiv.org\/pdf\/2303.09522"},{"key":"e_1_3_1_47_2","first-page":"22532","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wallace Bram","year":"2023","unstructured":"Bram Wallace, Akash Gokul, and Nikhil Naik. 2023. Edict: Exact diffusion inversion via coupled transformations. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 22532\u201322541."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_1_49_2","unstructured":"Chenfei Wu Lun Huang Qianxi Zhang Binyang Li Lei Ji Fan Yang Guillermo Sapiro and Nan Duan. 2021. GODIVA: Generating open-domain videos from natural descriptions. arXiv:abs\/2104.14806. Retrieved from https:\/\/arxiv.org\/pdf\/2104.14806"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3181070"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00143"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00455"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00584"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3702999","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,10]],"date-time":"2025-11-10T14:51:32Z","timestamp":1762786292000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702999"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,10]]},"references-count":53,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3702999"],"URL":"https:\/\/doi.org\/10.1145\/3702999","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,10]]},"assertion":[{"value":"2024-06-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}