{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T23:54:23Z","timestamp":1780444463716,"version":"3.54.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"8","funder":[{"name":"Zhejiang Key Research and Development Program","award":["2023C03196"],"award-info":[{"award-number":["2023C03196"]}]},{"name":"Natural Science Foundation of Shanghai","award":["24ZR1425600"],"award-info":[{"award-number":["24ZR1425600"]}]},{"name":"NSFC, \u201cPioneer\u201d and \u201cLeading Goose\u201d R&D Program of Zhejiang","award":["2025C02014"],"award-info":[{"award-number":["2025C02014"]}]},{"name":"Ningbo Science and Technology Special Projects","award":["2025Z028"],"award-info":[{"award-number":["2025Z028"]}]},{"name":"Fundamental Research Funds for the Central Universities, and the Chenguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission","award":["24CGA73"],"award-info":[{"award-number":["24CGA73"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>Monocular 3D human pose estimation presents a considerable challenge owing to the intrinsic depth ambiguity associated with single-camera observations. Existing methods primarily rely on mean per joint position error (MPJPE) loss to train models for the conversion from 2D to 3D coordinates. However, empirical analysis reveals that models trained solely with point-based supervision may produce biomechanically implausible poses or exhibit significant depth ambiguity, even when achieving low MPJPE. This limitation arises from the fact that point-based loss only considers individual joint locations without accounting for inter-joint relationships. Fortunately, edges of human pose encode critical prior knowledge, including skeleton connectivity and biomechanical distributions. Explicitly modeling edge representations enables the model to overcome the constraints associated with point-only approaches, reducing the uncertainty in the optimization process of the 2D-3D inverse mapping and directly constraining depth ambiguity. Therefore, we propose the Graph-Aware Multi-Representation Aggregation (GAMA-Pose) framework that jointly predicts points and edges, with their fusion serving as the final output. To ensure the accuracy of edge predictions and mitigate depth ambiguity, Anti-Depth-Ambiguity Loss (ADA-Loss) is introduced to supervise the properties of edges and give direct supervision on depth ambiguity. Correspondingly, edge-based metrics are proposed to quantify the error of predicted edges. Experiments conducted on Human3.6M and MPI-INF-3DHP datasets demonstrate that GAMA-Pose effectively addresses the limitations of models relying solely on point constraints, mitigates depth ambiguity, enhances the accuracy of both point and edge predictions, and achieves state-of-the-art (SOTA) performance on both datasets.<\/jats:p>","DOI":"10.1145\/3737647","type":"journal-article","created":{"date-parts":[[2025,6,3]],"date-time":"2025-06-03T10:40:41Z","timestamp":1748947241000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["GAMA-Pose: Graph-Aware Multi-Representation Aggregation for 3D Human Pose Estimation"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-4910-3389","authenticated-orcid":false,"given":"Songran","family":"Zhou","sequence":"first","affiliation":[{"name":"Polytechnic Institute, Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7125-3687","authenticated-orcid":false,"given":"Tao","family":"Wu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3414-8754","authenticated-orcid":false,"given":"Xuewei","family":"Li","sequence":"additional","affiliation":[{"name":"School of Electronic and Information Engineering, Shanghai Dianji University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4749-5552","authenticated-orcid":false,"given":"Xiubo","family":"Liang","sequence":"additional","affiliation":[{"name":"School of Software Technology, Zhejiang University, Ningbo, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6986-3766","authenticated-orcid":false,"given":"Naye","family":"Ji","sequence":"additional","affiliation":[{"name":"College of Media Engineering, Communication University of Zhejiang, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3023-1662","authenticated-orcid":false,"given":"Xi","family":"Li","sequence":"additional","affiliation":[{"name":"College of Computer Science, Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,8,12]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00236"},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","unstructured":"Hanyuan Chen Jun-Yan He Wangmeng Xiang Zhi-Qi Cheng Wei Liu Hanbing Liu Bin Luo Yifeng Geng and Xuansong Xie. 2023. Hdformer: High-order directed transformer for 3d human pose estimation. arXiv:2302.01825. Retrieved from https:\/\/arxiv.org\/abs\/2302.01825","DOI":"10.24963\/ijcai.2023\/65"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01311"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00235"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00465"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00132"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV56688.2023.00292"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01251"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01253"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3275914"},{"key":"e_1_3_1_12_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Vol. 33, 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems, Vol"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01464"},{"key":"e_1_3_1_14_2","first-page":"448","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, 448\u2013456."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_3_1_16_2","unstructured":"Hongbo Kang Yong Wang Mengyuan Liu Doudou Wu Peng Liu and Wenming Yang. 2023. Double-chain constraints for 3d human pose estimation in images and videos. arXiv:2308.05298. Retrieved from https:\/\/arxiv.org\/abs\/2308.05298"},{"key":"e_1_3_1_17_2","unstructured":"Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907. Retrieved from https:\/\/arxiv.org\/abs\/1609.02907"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00958"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3141231"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01280"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW63382.2024.00467"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58607-2_19"},{"key":"e_1_3_1_23_2","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Liu Kenkun","year":"2020","unstructured":"Kenkun Liu, Zhiming Zou, and Wei Tang. 2020. Learning global pose features in graph convolutional networks for 3d human pose estimation. In Proceedings of the Asian Conference on Computer Vision."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-60639-8_40"},{"key":"e_1_3_1_25_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv:1711.05101. Retrieved from https:\/\/arxiv.org\/abs\/1711.05101"},{"key":"e_1_3_1_26_2","doi-asserted-by":"crossref","unstructured":"Cheng Luo Siyang Song Weicheng Xie Linlin Shen and Hatice Gunes. 2022. Learning multi-dimensional edge feature-based au relation graph for facial action unit recognition. arXiv:2205.01782. Retrieved from https:\/\/arxiv.org\/abs\/2205.01782","DOI":"10.24963\/ijcai.2022\/173"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00677"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2017.00064"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00763"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00113"},{"key":"e_1_3_1_32_2","unstructured":"Xiaoye Qian Youbao Tang Ning Zhang Mei Han Jing Xiao Ming-Chun Huang and Ruei-Sung Lin. 2023. HSTFormer: Hierarchical spatial-temporal transformers for 3D human pose estimation. arXiv:2301.07322 Retrieved from https:\/\/arxiv.org\/abs\/2301.07322"},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","unstructured":"Helge Rhodin Mathieu Salzmann and Pascal Fua. 2018. Unsupervised geometry-aware representation for 3d human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV) 750\u2013767.","DOI":"10.1007\/978-3-030-01249-6_46"},{"key":"e_1_3_1_34_2","unstructured":"C\u00e9dric Rommel Victor Letzelter Nermin Samet Renaud Marlet Matthieu Cord Patrick P\u00e9rez and Eduardo Valle. 2023. ManiPose: Manifold-constrained multi-hypothesis 3D human pose estimation. arXiv:2312.06386. Retrieved from https:\/\/arxiv.org\/abs\/2312.06386"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20065-6_27"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01356"},{"key":"e_1_3_1_37_2","unstructured":"Jiaming Song Chenlin Meng and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv:2010.02502. Retrieved from https:\/\/arxiv.org\/abs\/2010.02502"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00464"},{"key":"e_1_3_1_39_2","first-page":"30","article-title":"Attention is all you need. In","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_40_2","unstructured":"Tao Wu Yong Zhang Xiaodong Cun Zhongang Qi Junfu Pu Huanzhang Dou Guangcong Zheng Ying Shan and Xi Li. 2024. Videomaker: Zero-shot customized video generation with the inherent force of video diffusion models. arXiv:2412.19645. Retrieved from https:\/\/arxiv.org\/abs\/2412.19645"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i8.32914"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00060"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3182269"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00810"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00853"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01288"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3109517"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.3390\/app122010591"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00857"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01145"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2972104"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-96530-3"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01128"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3737647","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T20:36:44Z","timestamp":1755031004000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3737647"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,12]]},"references-count":52,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3737647"],"URL":"https:\/\/doi.org\/10.1145\/3737647","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,12]]},"assertion":[{"value":"2024-11-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}