{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T19:51:03Z","timestamp":1765309863684,"version":"3.46.0"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100006465","name":"Korea Creative Content Agency","doi-asserted-by":"publisher","award":["RS-2024-00398536"],"award-info":[{"award-number":["RS-2024-00398536"]}],"id":[{"id":"10.13039\/501100006465","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,27]]},"DOI":"10.1145\/3746027.3754808","type":"proceedings-article","created":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T06:54:17Z","timestamp":1761375257000},"page":"2419-2428","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["M2PE-Diff: Music-to-Pose Encoder for Dance Video Generation Leveraging Latent Diffusion Framework"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7971-9461","authenticated-orcid":false,"given":"Nokap Tony","family":"Park","sequence":"first","affiliation":[{"name":"SK Telecom, Seoul, Seoul, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2025,10,27]]},"reference":[{"volume-title":"European Conference on Computer Vision (ECCV). 561-578","author":"Bogo Federica","key":"e_1_3_2_2_1_1","unstructured":"Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter V. Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In European Conference on Computer Vision (ECCV). 561-578."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_2_3_1","first-page":"6263","article-title":"MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion. In Proceedings of the 41st International Conference on Machine Learning (ICML)","volume":"242","author":"Chang Di","year":"2024","unstructured":"Di Chang, Yichun Shi, Quankai Gao, Hongyi Xu, Jessica Fu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, and Mohammad Soleymani. 2024. MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion. In Proceedings of the 41st International Conference on Machine Learning (ICML). Article 242, 6263-6285 pages.","journal-title":"Article"},{"key":"e_1_3_2_2_4_1","volume-title":"NeurIPS","author":"Copet Jade","year":"2023","unstructured":"Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, and Alexandre D\u00e9fossez. 2024. Simple and Controllable Music Generation. In NeurIPS 2023. https:\/\/github.com\/facebookresearch\/audiocraft Last Modified: January 29, 2024."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","unstructured":"Zuozhuo Dai Zhenghao Zhang Yao Yao Bingxue Qiu Siyu Zhu Long Qin and Weizhi Wang. 2023. AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance. arXiv:2311.12886 [cs.CV] doi:10.48550\/arXiv.2311.12886","DOI":"10.48550\/arXiv.2311.12886"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2005.00341"},{"volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 6382-6391","author":"Dwivedi Sai Kumar","key":"e_1_3_2_2_7_1","unstructured":"Sai Kumar Dwivedi, Nikos Athanasiou, Muhammed Kocabas, and Michael J. Black. 2021. Learning to Regress Bodies from Images using Differentiable Semantic Rendering. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 6382-6391."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00129"},{"key":"e_1_3_2_2_9_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7277-7288","author":"Ge Songwei","year":"2024","unstructured":"Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, and Jia-Bin Huang. 2024. On the Content Bias in Fr\u00e9chet Video Distance. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7277-7288."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00760"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02445"},{"key":"e_1_3_2_2_12_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8153-8163","author":"Hu Li","year":"2024","unstructured":"Li Hu, Xin Gao, Peng Zhang, Ke Sun, Bang Zhang, and Liefeng Bo. 2024. Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8153-8163."},{"key":"e_1_3_2_2_13_1","unstructured":"National Instruments. 2011. Peak Signal-to-Noise Ratio as an Image Quality Metric. https:\/\/www.ni.com\/en\/shop\/data-acquisition-and-control\/add-ons-for-data-acquisition-and-control\/what-is-vision-development-module\/peak-signal-to-noise-ratio-as-an-image-quality-metric.html NI Technical Documentation."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02608"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i2.20014"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2101.08779"},{"key":"e_1_3_2_2_17_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13798-13807","author":"Li Siyao","year":"2022","unstructured":"Siyao Li, Weijiang Yu, Tianpei Gu, Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, and Ziwei Liu. 2022a. Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13798-13807."},{"key":"e_1_3_2_2_18_1","first-page":"1","article-title":"A Technique for the Measurement of Attitudes","volume":"22","author":"Likert Rensis","year":"1932","unstructured":"Rensis Likert. 1932. A Technique for the Measurement of Attitudes. Archives of Psychology, Vol. 22, 140 (1932), 1-55.","journal-title":"Archives of Psychology"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818013"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1711.05101"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02548"},{"key":"e_1_3_2_2_22_1","unstructured":"Brian McFee Colin Raffel Dawen Liang Daniel P. W. Ellis Matt McVicar Eric Battenberg Oriol Nieto Josh Moore Fabian-Robert St\u00f6ter Curtis Hawthorne Jesse Engel Justin Salamon Peter Li Harold Soh Jonathan M. P. Stoehr Andrew McCallum Jesse Engel and Matthias Mauch. 2025. librosa: Python library for audio and music analysis. https:\/\/librosa.org\/ Version 0.11.0."},{"key":"e_1_3_2_2_23_1","first-page":"8162","volume-title":"Proceedings of the 38th International Conference on Machine Learning (ICML)","volume":"139","author":"Nichol Alexander Quinn","year":"2021","unstructured":"Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the 38th International Conference on Machine Learning (ICML), Vol. 139. PMLR, 8162-8171."},{"key":"e_1_3_2_2_24_1","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","volume":"139","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the International Conference on Machine Learning (ICML), Vol. 139. 8748-8763."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2503.09905"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00985"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.07944"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2211.10658"},{"key":"e_1_3_2_2_30_1","first-page":"150","volume-title":"Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019","author":"Tsuchida Shuhei","year":"2019","unstructured":"Shuhei Tsuchida, Satoru Fukayama, Masahiro Hamasaki, and Masataka Goto. 2019. AIST Dance Video Database: Multi-genre, Multi-dancer, and Multi-camera Database for Dance Information Processing. In Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019. Delft, Netherlands, 150-157."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01965"},{"key":"e_1_3_2_2_32_1","volume-title":"Advances in Neural Information Processing Systems 30 (NeurIPS","author":"van den Oord A\u00e4ron","year":"2017","unstructured":"A\u00e4ron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017). 6306-6315."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV61041.2025.00502"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2406.01188"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00147"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2307.15880"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i7.28486"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00462"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-73001-6_910.1007\/978-3-031-73001-6_9"}],"event":{"name":"MM '25: The 33rd ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"MM '25"},"container-title":["Proceedings of the 33rd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746027.3754808","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T19:46:54Z","timestamp":1765309614000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746027.3754808"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,27]]},"references-count":41,"alternative-id":["10.1145\/3746027.3754808","10.1145\/3746027"],"URL":"https:\/\/doi.org\/10.1145\/3746027.3754808","relation":{},"subject":[],"published":{"date-parts":[[2025,10,27]]},"assertion":[{"value":"2025-10-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}