{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T19:41:16Z","timestamp":1765309276389,"version":"3.46.0"},"publisher-location":"New York, NY, USA","reference-count":53,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,27]]},"DOI":"10.1145\/3746027.3754869","type":"proceedings-article","created":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T06:56:44Z","timestamp":1761375404000},"page":"9434-9443","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-7229-1398","authenticated-orcid":false,"given":"Zhenghao","family":"Zhang","sequence":"first","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4282-0843","authenticated-orcid":false,"given":"Junchao","family":"Liao","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-6224-7979","authenticated-orcid":false,"given":"Xiangyu","family":"Meng","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-4712-8570","authenticated-orcid":false,"given":"Long","family":"Qin","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-7874-9468","authenticated-orcid":false,"given":"Weizhi","family":"Wang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,27]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Fan Bao Chendong Xiang Gang Yue Guande He Hongzhou Zhu Kaiwen Zheng Min Zhao Shilong Liu Yaole Wang and Jun Zhu. 2024. Vidu: a Highly Consistent Dynamic and Skilled Text-to-Video Generator with Diffusion Models. eprint 2405.04233"},{"key":"e_1_3_2_1_2_1","unstructured":"Andreas Blattmann Tim Dockhorn Sumith Kulal Daniel Mendelevitch Maciej Kilian Dominik Lorenz Yam Levi Zion English Vikram Voleti Adam Letts Varun Jampani and Robin Rombach. 2023. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. eprint 2311.15127"},{"key":"e_1_3_2_1_3_1","unstructured":"Tim Brooks Bill Peebles Connor Holmes Will DePue Yufei Guo Li Jing David Schnurr Joe Taylor Troy Luhman Eric Luhman Clarence Ng Ricky Wang and Aditya Ramesh. 2024. Video generation models as world simulators. https:\/\/openai.com\/research\/video-generation-models-as-world-simulators."},{"key":"e_1_3_2_1_4_1","unstructured":"Haoxin Chen Menghan Xia Yin-Yin He Yong Zhang Xiaodong Cun Shaoshu Yang Jinbo Xing Yaofang Liu Qifeng Chen Xintao Wang Chao-Liang Weng and Ying Shan. 2023. VideoCrafter1: Open Diffusion Models for High-Quality Video Generation. eprint 2310.19512"},{"key":"e_1_3_2_1_5_1","volume-title":"Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, and Sergey Tulyakov.","author":"Chen Tsai-Shien","year":"2025","unstructured":"Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee, Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, and Sergey Tulyakov. 2025. Multi-subject Open-set Personalization in Video Generation. arXiv:2501.06187 [cs.CV]"},{"key":"e_1_3_2_1_6_1","unstructured":"Zuozhuo Dai Zhenghao Zhang Yao Yao Bingxue Qiu Siyu Zhu Long Qin and Weizhi Wang. 2023. Fine-Grained Open Domain Image Animation with Motion Guidance. eprint 2311.12886"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00482"},{"key":"e_1_3_2_1_8_1","volume-title":"Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. eprint 2307.04725","author":"Guo Yuwei","year":"2023","unstructured":"Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, and Bo Dai. 2023. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. eprint 2307.04725"},{"key":"e_1_3_2_1_9_1","unstructured":"Xuanhua He Quande Liu Shengju Qian Xin Wang Tao Hu Ke Cao Keyu Yan and Jie Zhang. 2024. ID-Animator: Zero-Shot Identity-Preserving Human Video Generation. arXiv:2404.15275 [cs.CV]"},{"key":"e_1_3_2_1_10_1","unstructured":"Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. eprint 2207.12598"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.167"},{"key":"e_1_3_2_1_12_1","unstructured":"Yuzhou Huang Ziyang Yuan Quande Liu Qiulin Wang Xintao Wang Ruimao Zhang Pengfei Wan Di Zhang and Kun Gai. 2025. ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning. arXiv:2501.04698 [cs.CV]"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00639"},{"key":"e_1_3_2_1_14_1","unstructured":"Nikita Karaev Iurii Makarov Jianyuan Wang Natalia Neverova Andrea Vedaldi and Christian Rupprecht. 2024. CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos. arXiv:2410.11831 [cs.CV]"},{"volume-title":"Adam: A Method for Stochastic Optimization. In Int. Conf. Learn. Represent., Yoshua Bengio and Yann LeCun (Eds.).","author":"Diederik","key":"e_1_3_2_1_15_1","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Int. Conf. Learn. Represent., Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_2_1_16_1","unstructured":"Weijie Kong Qi Tian Zijian Zhang Rox Min Zuozhuo Dai Jin Zhou Jiangfeng Xiong Xin Li Bo Wu Jianwei Zhang Kathrina Wu Qin Lin Junkun Yuan Yanxin Long Aladdin Wang Andong Wang Changlin Li Duojun Huang Fang Yang Hao Tan Hongmei Wang Jacob Song Jiawang Bai Jianbing Wu Jinbao Xue Joey Wang Kai Wang Mengyang Liu Pengyu Li Shuai Li Weiyan Wang Wenqing Yu Xinchi Deng Yang Li Yi Chen Yutao Cui Yuanbo Peng Zhentao Yu Zhiyu He Zhiyong Xu Zixiang Zhou Zunnan Xu Yangyu Tao Qinglin Lu Songtao Liu Dax Zhou Hongfa Wang Yong Yang Di Wang Yuhong Liu Jie Jiang and Caesar Zhong. 2025. HunyuanVideo: A Systematic Framework For Large Video Generative Models. arXiv:2412.03603 [cs.CV]"},{"key":"e_1_3_2_1_17_1","unstructured":"Shakker Labs. [n.d.]. Flux.1-dev-controlnet-union-pro. Accessed 2024 [Online]. https:\/\/huggingface.co\/Shakker-Labs\/FLUX.1-dev-ControlNet-Union-Pro"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00915"},{"key":"e_1_3_2_1_19_1","volume-title":"BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In Int. Conf. Mach. Learn. (Proceedings of Machine Learning Research","volume":"12900","author":"Li Junnan","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven C. H. Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In Int. Conf. Mach. Learn. (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1ri, Gang Niu, and Sivan Sabato (Eds.). PMLR, 12888-12900. https:\/\/proceedings.mlr.press\/v162\/li22n.html"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72970-6_3"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01846"},{"key":"e_1_3_2_1_22_1","volume-title":"Adv. Neural Inform. Process. Syst.","author":"Lu Cheng","year":"2022","unstructured":"Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. 2022. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Adv. Neural Inform. Process. Syst., Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. 5775-5787. http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/260a14acce2a89dad36adc8eefe7c59e-Abstract-Conference.html"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","unstructured":"Wan-Duo Kurt Ma John P. Lewis and W. Bastiaan Kleijn. 2024. TrailBlazer: Trajectory Control for Diffusion-Based Video Generation. In SIGGRAPH Asia Takeo Igarashi Ariel Shamir and Hao (Richard) Zhang (Eds.). ACM 97:1-97:11. doi:10.1145\/3680528.3687652","DOI":"10.1145\/3680528.3687652"},{"key":"e_1_3_2_1_24_1","unstructured":"Arun Mallya Ting-Chun Wang and Ming-Yu Liu. 2022. Implicit Warping for Animation with Image Sets. In Adv. Neural Inform. Process. Syst. Sanmi Koyejo S. Mohamed A. Agarwal Danielle Belgrave K. Cho and A. Oh (Eds.). 22438-22450. http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/8cb31912235561112339f04903657f72-Abstract-Conference.html"},{"key":"e_1_3_2_1_25_1","article-title":"DINOv2: Learning Robust Visual Features without","volume":"2024","author":"Oquab Maxime","year":"2024","unstructured":"Maxime Oquab, Timoth\u00e9e Darcet, Th\u00e9o Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv\u00e9 J\u00e9gou, Julien Mairal, Patrick Labatut, Armand Joulin, and Piotr Bojanowski. 2024. DINOv2: Learning Robust Visual Features without Supervision. Trans. Mach. Learn. Res., Vol. 2024 (2024). https:\/\/openreview.net\/forum?id=a68SUt6zFt","journal-title":"Supervision. Trans. Mach. Learn. Res."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00387"},{"key":"e_1_3_2_1_27_1","unstructured":"Adam Polyak Amit Zohar Andrew Brown Andros Tjandra Animesh Sinha Ann Lee Apoorv Vyas Bowen Shi Chih-Yao Ma Ching-Yao Chuang David Yan Dhruv Choudhary Dingkang Wang Geet Sethi Guan Pang Haoyu Ma Ishan Misra Ji Hou Jialiang Wang Kiran Jagadeesh Kunpeng Li Luxin Zhang Mannat Singh Mary Williamson Matt Le Matthew Yu Mitesh Kumar Singh Peizhao Zhang Peter Vajda Quentin Duval Rohit Girdhar Roshan Sumbaly Sai Saketh Rambhatla Sam S. Tsai Samaneh Azadi Samyak Datta Sanyuan Chen Sean Bell Sharadh Ramaswamy Shelly Sheynin Siddharth Bhattacharya Simran Motwani Tao Xu Tianhe Li Tingbo Hou Wei-Ning Hsu Xi Yin Xiaoliang Dai Yaniv Taigman Yaqiao Luo Yen-Cheng Liu Yi-Chiao Wu Yue Zhao Yuval Kirstain Zecheng He Zijian He Albert Pumarola Ali K. Thabet Artsiom Sanakoyeu Arun Mallya Baishan Guo Boris Araya Breena Kerr Carleigh Wood Ce Liu Cen Peng Dmitry Vengertsev Edgar Sch\u00f6nfeld Elliot Blanchard Felix Juefei-Xu Fraylie Nord Jeff Liang John Hoffman Jonas Kohler Kaolin Fire Karthik Sivakumar Lawrence Chen Licheng Yu Luya Gao Markos Georgopoulos Rashel Moritz Sara K. Sampson Shikai Li Simone Parmeggiani Steve Fine Tara Fowler Vladan Petrovic and Yuming Du. 2024. Movie Gen: A Cast of Media Foundation Models. arXiv:2410.13720"},{"key":"e_1_3_2_1_28_1","volume-title":"Learning Transferable Visual Models From Natural Language Supervision. In Int. Conf. Mach. Learn. (Proceedings of Machine Learning Research","volume":"8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Int. Conf. Mach. Learn. (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748-8763. http:\/\/proceedings.mlr.press\/v139\/radford21a.html"},{"key":"e_1_3_2_1_29_1","unstructured":"Tianhe Ren Shilong Liu Ailing Zeng Jing Lin Kunchang Li He Cao Jiayu Chen Xinyu Huang Yukang Chen Feng Yan Zhaoyang Zeng Hao Zhang Feng Li Jie Yang Hongyang Li Qing Jiang and Lei Zhang. 2024. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks. arXiv:2401.14159 [cs.CV]"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02155"},{"key":"e_1_3_2_1_32_1","volume-title":"Attention is All you Need","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Adv. Neural Inform. Process. Syst., Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998-6008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72751-1_1"},{"key":"e_1_3_2_1_34_1","volume-title":"Qifeng Chen, Yujun Shen, and Limin Wang.","author":"Wang Hanlin","year":"2024","unstructured":"Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Qifeng Chen, Yujun Shen, and Limin Wang. 2024a. LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis. arXiv:2412.15214 [cs.CV]"},{"key":"e_1_3_2_1_35_1","unstructured":"Jiuniu Wang Hangjie Yuan Dayou Chen Yingya Zhang Xiang Wang and Shiwei Zhang. 2023a. Modelscope text-to-video technical report. eprint 2308.06571"},{"key":"e_1_3_2_1_36_1","volume-title":"Adv. Neural Inform. Process. Syst.","author":"Wang Xiang","year":"2023","unstructured":"Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang, Yingya Zhang, Yujun Shen, Deli Zhao, and Jingren Zhou. 2023c. VideoComposer: Compositional Video Synthesis with Motion Controllability. In Adv. Neural Inform. Process. Syst., Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (Eds.), Vol. 36. 7594-7611. http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/180f6184a3458fa19c28c5483bc61877-Abstract-Conference.html"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00628"},{"key":"e_1_3_2_1_38_1","unstructured":"Zhouxia Wang Ziyang Yuan Xintao Wang Tianshui Chen Menghan Xia Ping Luo and Yin Shan. 2023b. MotionCtrl: A Unified and Flexible Motion Controller for Video Generation. eprint 2312.03641"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00625"},{"key":"e_1_3_2_1_40_1","unstructured":"Yujie Wei Shiwei Zhang Hangjie Yuan Xiang Wang Haonan Qiu Rui Zhao Yutong Feng Feng Liu Zhizhong Huang Jiaxin Ye Yingya Zhang and Hongming Shan. 2024b. DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control. arXiv:2410.13830 [cs.CV]"},{"key":"e_1_3_2_1_41_1","unstructured":"Jianzong Wu Xiangtai Li Yanhong Zeng Jiangning Zhang Qianyu Zhou Yining Li Yunhai Tong and Kai Chen. 2024b. MotionBooth: Motion-Aware Customized Text-to-Video Generation. arXiv:2406.17758 [cs.CV]"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72670-5_19"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3298645"},{"key":"e_1_3_2_1_44_1","unstructured":"An Yang Baosong Yang Beichen Zhang Binyuan Hui Bo Zheng Bowen Yu Chengyuan Li Dayiheng Liu Fei Huang Haoran Wei et al. 2024b. Qwen2.5 technical report. eprint 2412.15115"},{"key":"e_1_3_2_1_45_1","unstructured":"Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang Jiazheng Xu Yuanming Yang Wenyi Hong Xiaohan Zhang Guanyu Feng et al. 2024a. CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer. eprint 2408.06072"},{"key":"e_1_3_2_1_46_1","unstructured":"Hu Ye Jun Zhang Sibo Liu Xiao Han and Wei Yang. 2023. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. arXiv:2308.06721 [cs.CV]"},{"key":"e_1_3_2_1_47_1","unstructured":"Shengming Yin Chenfei Wu Jian Liang Jie Shi Houqiang Li Gong Ming and Nan Duan. 2023. DragNUWA: Fine-grained control in video generation by integrating text image and trajectory. eprint 2308.08089"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01008"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Shenghai Yuan Jinfa Huang Xianyi He Yunyuan Ge Yujun Shi Liuhan Chen Jiebo Luo and Li Yuan. 2024. Identity-Preserving Text-to-Video Generation by Frequency Decomposition. arXiv:2411.17440 [cs.CV]","DOI":"10.32388\/TZIID6"},{"key":"e_1_3_2_1_50_1","unstructured":"Shiwei Zhang Jiayu Wang Yingya Zhang Kang Zhao Hangjie Yuan Zhiwu Qin Xiang Wang Deli Zhao and Jingren Zhou. 2023. I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models. eprint 2311.04145"},{"key":"e_1_3_2_1_51_1","volume-title":"Tora: Trajectory-oriented Diffusion Transformer for Video Generation. arXiv:2407.21705 [cs.CV]","author":"Zhang Zhenghao","year":"2024","unstructured":"Zhenghao Zhang, Junchao Liao, Menghao Li, Zuozhuo Dai, Bingxue Qiu, Siyu Zhu, Long Qin, and Weizhi Wang. 2024. Tora: Trajectory-oriented Diffusion Transformer for Video Generation. arXiv:2407.21705 [cs.CV]"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00364"},{"key":"e_1_3_2_1_53_1","volume-title":"StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation","author":"Zhou Yupeng","year":"2024","unstructured":"Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, and Qibin Hou. 2024. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation. In Adv. Neural Inform. Process. Syst., Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang (Eds.), Vol. 37. 110315-110340. http:\/\/papers.nips.cc\/paper_files\/paper\/2024\/hash\/c7138635035501eb71b0adf6ddc319d6-Abstract-Conference.html gr"}],"event":{"name":"MM '25: The 33rd ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"MM '25"},"container-title":["Proceedings of the 33rd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746027.3754869","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T19:38:03Z","timestamp":1765309083000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746027.3754869"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,27]]},"references-count":53,"alternative-id":["10.1145\/3746027.3754869","10.1145\/3746027"],"URL":"https:\/\/doi.org\/10.1145\/3746027.3754869","relation":{},"subject":[],"published":{"date-parts":[[2025,10,27]]},"assertion":[{"value":"2025-10-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}