{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,25]],"date-time":"2025-05-25T19:10:03Z","timestamp":1748200203996,"version":"3.41.0"},"publisher-location":"Cham","reference-count":29,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031923869","type":"print"},{"value":"9783031923876","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,12]],"date-time":"2025-05-12T00:00:00Z","timestamp":1747008000000},"content-version":"vor","delay-in-days":131,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Motion retargeting aims at transferring a given motion from a source character to a target one. The task becomes increasingly challenging as the differences between the body shape and skeletal structure of input and target characters increase. We present a novel approach for motion retargeting between skeletons whose goal is to transfer the motion from a source skeleton to a target one in a different format. Our approach works when the two skeletons differ in scale, bone length, and number of joints. Surpassing previous approaches, our method can also retarget between skeletons that differ in hierarchy and topology, such as retargeting between animals and humans. We train our method as a transformer using a random masking strategy both in time and space, aiming at reconstructing the joints of the masked input skeleton to obtain a deep representation of the motion. At testing time, our proposal can retarget the input motion to different skeletons, reconstructing the disparities between the source and the target. Our method outperforms state-of-the-art results on the Mixamo dataset, which features a high variance between skeleton formats. Moreover, we show how our approach can effectively generalize to different domains by transferring between human motion and quadrupeds, and vice-versa. Our code is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/www.github.com\/mmlab-cv\/skeleton-aware-motion-retargeting\" ext-link-type=\"uri\">www.github.com\/mmlab-cv\/skeleton-aware-motion-retargeting<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/978-3-031-92387-6_21","type":"book-chapter","created":{"date-parts":[[2025,5,25]],"date-time":"2025-05-25T18:42:49Z","timestamp":1748198569000},"page":"287-303","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Skeleton-Aware Motion Retargeting Using Masked Pose Modeling"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3713-3053","authenticated-orcid":false,"given":"Giulia","family":"Martinelli","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7147-9109","authenticated-orcid":false,"given":"Nicola","family":"Garau","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5704-5785","authenticated-orcid":false,"given":"Niccol\u00f3","family":"Bisagno","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7858-0928","authenticated-orcid":false,"given":"Nicola","family":"Conci","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,12]]},"reference":[{"key":"21_CR1","doi-asserted-by":"crossref","unstructured":"Aberman, K., Li, P., Lischinski, D., Sorkine-Hornung, O., Cohen-Or, D., Chen, B.: Skeleton-aware networks for deep motion retargeting. ACM Trans. Graph. (TOG) 39(4), 62-1 (2020)","DOI":"10.1145\/3386569.3392462"},{"key":"21_CR2","doi-asserted-by":"crossref","unstructured":"Aberman, K., Weng, Y., Lischinski, D., Cohen-Or, D., Chen, B.: Unpaired motion style transfer from video to animation. ACM Trans. Graph. (TOG) 39(4), 64-1 (2020)","DOI":"10.1145\/3386569.3392469"},{"key":"21_CR3","doi-asserted-by":"crossref","unstructured":"Aberman, K., Wu, R., Lischinski, D., Chen, B., Cohen-Or, D.: Learning character-agnostic motion for motion retargeting in 2D. arXiv preprint arXiv:1905.01680 (2019)","DOI":"10.1145\/3306346.3322999"},{"key":"21_CR4","unstructured":"Adobe: Mixamo (2020)"},{"key":"21_CR5","unstructured":"Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)"},{"key":"21_CR6","doi-asserted-by":"crossref","unstructured":"Chan, C., Ginosar, S., Zhou, T., Efros, A.A.: Everybody dance now. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 5933\u20135942 (2019)","DOI":"10.1109\/ICCV.2019.00603"},{"key":"21_CR7","unstructured":"Chen, M., et al.: Generative pretraining from pixels. In: International Conference on Machine Learning, pp. 1691\u20131703. PMLR (2020)"},{"key":"21_CR8","unstructured":"Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)"},{"key":"21_CR9","unstructured":"Dosovitskiy, A., et\u00a0al.: An image is worth 16$$\\times $$16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)"},{"key":"21_CR10","unstructured":"Feichtenhofer, C., Fan, H., Li, Y., He, K.: Masked autoencoders as spatiotemporal learners. arXiv preprint arXiv:2205.09113 (2022)"},{"key":"21_CR11","doi-asserted-by":"crossref","unstructured":"Harvey, F.G., Yurick, M., Nowrouzezahrai, D., Pal, C.: Robust motion in-betweening. ACM Trans. Graph. (TOG) 39(4), 60\u20131 (2020)","DOI":"10.1145\/3386569.3392480"},{"key":"21_CR12","doi-asserted-by":"crossref","unstructured":"He, K., Chen, X., Xie, S., Li, Y., Doll\u00e1r, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000\u201316009 (2022)","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"21_CR13","doi-asserted-by":"crossref","unstructured":"Jiang, B., Zhang, Y., Wei, X., Xue, X., Fu, Y.: H4D: human 4D modeling by learning neural compositional representation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 19355\u201319365 (2022)","DOI":"10.1109\/CVPR52688.2022.01875"},{"key":"21_CR14","doi-asserted-by":"crossref","unstructured":"Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 5253\u20135263 (2020)","DOI":"10.1109\/CVPR42600.2020.00530"},{"issue":"4","key":"21_CR15","first-page":"1","volume":"41","author":"P Li","year":"2022","unstructured":"Li, P., Aberman, K., Zhang, Z., Hanocka, R., Sorkine-Hornung, O.: GANimator: neural motion synthesis from a single sequence. ACM Trans. Graph. (TOG) 41(4), 1\u201312 (2022)","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"21_CR16","unstructured":"Liu, Y., et al.: RoBERTa: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)"},{"key":"21_CR17","doi-asserted-by":"crossref","unstructured":"Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10975\u201310985 (2019)","DOI":"10.1109\/CVPR.2019.01123"},{"issue":"6","key":"21_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3272127.3275014","volume":"37","author":"XB Peng","year":"2018","unstructured":"Peng, X.B., Kanazawa, A., Malik, J., Abbeel, P., Levine, S.: SFV: reinforcement learning of physical skills from videos. ACM Trans. Graph. (TOG) 37(6), 1\u201314 (2018)","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"21_CR19","doi-asserted-by":"crossref","unstructured":"Seol, Y., O\u2019Sullivan, C., Lee, J.: Creature features: online motion puppetry for non-human characters. In: Proceedings of the 12th ACM SIGGRAPH Symposium on Computer Animation, pp. 213\u2013221 (2013)","DOI":"10.1145\/2485895.2485903"},{"key":"21_CR20","unstructured":"Tong, Z., Song, Y., Wang, J., Wang, L.: VideoMAE: masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv preprint arXiv:2203.12602 (2022)"},{"key":"21_CR21","unstructured":"Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)"},{"key":"21_CR22","doi-asserted-by":"crossref","unstructured":"Villegas, R., Ceylan, D., Hertzmann, A., Yang, J., Saito, J.: Contact-aware retargeting of skinned motion. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 9720\u20139729 (2021)","DOI":"10.1109\/ICCV48922.2021.00958"},{"key":"21_CR23","doi-asserted-by":"crossref","unstructured":"Villegas, R., Yang, J., Ceylan, D., Lee, H.: Neural kinematic networks for unsupervised motion retargetting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8639\u20138648 (2018)","DOI":"10.1109\/CVPR.2018.00901"},{"key":"21_CR24","doi-asserted-by":"crossref","unstructured":"Xie, Z., et al.: SimMIM: a simple framework for masked image modeling. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 9653\u20139663 (2022)","DOI":"10.1109\/CVPR52688.2022.00943"},{"key":"21_CR25","unstructured":"Yamane, K., Ariki, Y., Hodgins, J.: Animating non-humanoid characters with human motion data. In: Proceedings of the 2010 ACM SIGGRAPH\/Eurographics Symposium on Computer Animation, pp. 169\u2013178 (2010)"},{"key":"21_CR26","doi-asserted-by":"crossref","unstructured":"Yang, Z., et al.: TransMoMo: invariance-driven unsupervised video motion retargeting. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 5306\u20135315 (2020)","DOI":"10.1109\/CVPR42600.2020.00535"},{"issue":"4","key":"21_CR27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3197517.3201366","volume":"37","author":"H Zhang","year":"2018","unstructured":"Zhang, H., Starke, S., Komura, T., Saito, J.: Mode-adaptive neural networks for quadruped motion control. ACM Trans. Graph. (TOG) 37(4), 1\u201311 (2018)","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"21_CR28","doi-asserted-by":"crossref","unstructured":"Zhang, J., et al.: Skinned motion retargeting with residual perception of motion semantics & geometry. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 13864\u201313872 (2023)","DOI":"10.1109\/CVPR52729.2023.01332"},{"key":"21_CR29","doi-asserted-by":"crossref","unstructured":"Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223\u20132232 (2017)","DOI":"10.1109\/ICCV.2017.244"}],"container-title":["Lecture Notes in Computer Science","Computer Vision \u2013 ECCV 2024 Workshops"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-92387-6_21","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,25]],"date-time":"2025-05-25T18:42:56Z","timestamp":1748198576000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-92387-6_21"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"ISBN":["9783031923869","9783031923876"],"references-count":29,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-92387-6_21","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025]]},"assertion":[{"value":"12 May 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ECCV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"European Conference on Computer Vision","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Milan","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Italy","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"29 September 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"4 October 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"18","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"eccv2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/eccv2024.ecva.net\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}