{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T01:33:13Z","timestamp":1768786393397,"version":"3.49.0"},"reference-count":63,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2024,7,8]],"date-time":"2024-07-08T00:00:00Z","timestamp":1720396800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"2023 Innovation Fund of Engineering Research Center of Integration and Application of Digital Learning Technology, Ministry of Education","award":["1311021"],"award-info":[{"award-number":["1311021"]}]},{"name":"2023 Innovation Fund of Engineering Research Center of Integration and Application of Digital Learning Technology, Ministry of Education","award":["62201365"],"award-info":[{"award-number":["62201365"]}]},{"name":"National Natural Science Foundation of China","award":["1311021"],"award-info":[{"award-number":["1311021"]}]},{"name":"National Natural Science Foundation of China","award":["62201365"],"award-info":[{"award-number":["62201365"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human\u2013robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial\u2013temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial\u2013temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial\u2013temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial\u2013temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.<\/jats:p>","DOI":"10.3390\/s24134422","type":"journal-article","created":{"date-parts":[[2024,7,8]],"date-time":"2024-07-08T12:21:03Z","timestamp":1720441263000},"page":"4422","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Learning Temporal\u2013Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-0584-4177","authenticated-orcid":false,"given":"Hexin","family":"Wang","sequence":"first","affiliation":[{"name":"College of Information Engineering, Capital Normal University, Beijing 100048, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei","family":"Quan","sequence":"additional","affiliation":[{"name":"College of Information Engineering, Capital Normal University, Beijing 100048, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Runjing","family":"Zhao","sequence":"additional","affiliation":[{"name":"College of Information Engineering, Capital Normal University, Beijing 100048, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Miaomiao","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Information Engineering, Capital Normal University, Beijing 100048, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2239-1121","authenticated-orcid":false,"given":"Na","family":"Jiang","sequence":"additional","affiliation":[{"name":"College of Information Engineering, Capital Normal University, Beijing 100048, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,8]]},"reference":[{"key":"ref_1","first-page":"1","article-title":"Recent advances of monocular 2D and 3D human pose estimation: A deep learning perspective","volume":"55","author":"Liu","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"H\u00f6ll, M., Oberweger, M., Arth, C., and Lepetit, V. (2018, January 18\u201322). Efficient physics-based implementation for realistic hand-object interaction in virtual reality. Proceedings of the 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Tuebingen\/Reutlingen, Germany.","DOI":"10.1109\/VR.2018.8448284"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ying, J., and Zhao, X. (2021, January 19\u201322). RGB-D fusion for point-cloud-based 3D human pose estimation. Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA.","DOI":"10.1109\/ICIP42928.2021.9506588"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Gong, J., Fan, Z., Ke, Q., Rahmani, H., and Liu, J. (2022, January 19\u201324). Meta agent teaming active learning for pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleands, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01080"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Svenstrup, M., Tranberg, S., Andersen, H.J., and Bak, T. (2009, January 12\u201317). Pose estimation and adaptive robot behaviour for human-robot interaction. Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan.","DOI":"10.1109\/ROBOT.2009.5152690"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1109\/TIP.2021.3131937","article-title":"Collaborative refining for person re-identification with label noise","volume":"31","author":"Ye","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, J., and Gao, Y. (2020). 3D pose estimation for object detection in remote sensing images. Sensors, 20.","DOI":"10.3390\/s20051240"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1109\/JRFID.2022.3140256","article-title":"Environment Adaptive RFID-Based 3D Human Pose Tracking With a Meta-Learning Approach","volume":"6","author":"Yang","year":"2022","journal-title":"IEEE J. Radio Freq. Identif."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1109\/TCSVT.2021.3057267","article-title":"Anatomy-aware 3D human pose estimation with bone-based pose decomposition","volume":"32","author":"Chen","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_10","unstructured":"Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.J., Yuan, J., and Thalmann, N.M. (November, January 27). Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sun, X., Xiao, B., Wei, F., Liang, S., and Wei, Y. (2018, January 8\u201314). Integral human pose regression. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_33"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kocabas, M., Karagoz, S., and Akbas, E. (2018, January 8\u201314). Multiposenet: Fast multi-person pose estimation using pose residual network. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01252-6_26"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., and Sun, J. (2018, January 18\u201322). Cascaded pyramid network for multi-person pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00742"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Saito, S., Simon, T., Saragih, J., and Joo, H. (2020, January 13\u201319). Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR42600.2020.00016"},{"key":"ref_15","unstructured":"Zheng, Z., Yu, T., Wei, Y., Dai, Q., and Liu, Y. (November, January 27). Deephuman: 3D human reconstruction from a single image. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Pavllo, D., Feichtenhofer, C., Grangier, D., and Auli, M. (2019, January 16\u201320). 3D human pose estimation in video with temporal convolutions and semi-supervised training. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00794"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"3000","DOI":"10.1109\/TPAMI.2021.3051173","article-title":"Hemlets posh: Learning part-centric heatmap triplets for 3d human pose and shape estimation","volume":"44","author":"Zhou","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Li, H., Shi, B., Dai, W., Zheng, H., Wang, B., Sun, Y., Guo, M., Li, C., Zou, J., and Xiong, H. (2023, January 7\u201314). Pose-oriented transformer with uncertainty-guided refinement for 2D-to-3D human pose estimation. Proceedings of the AAAI Conference on Artificial Intelligence, Washington DC, USA.","DOI":"10.1609\/aaai.v37i1.25213"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., and Pons-Moll, G. (2018, January 8\u201314). Recovering accurate 3D human pose in the wild using imus and a moving camera. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_37"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., and Ding, Z. (2021, January 11\u201317). 3D human pose estimation with spatial and temporal transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Virtual.","DOI":"10.1109\/ICCV48922.2021.01145"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Shan, W., Liu, Z., Zhang, X., Wang, S., Ma, S., and Gao, W. (2022, January 23\u201327). P-stmo: Pre-trained spatial temporal many-to-one model for 3D human pose estimation. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20065-6_27"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Martinez, J., Hossain, R., Romero, J., and Little, J.J. (2017, January 22\u201329). A simple yet effective baseline for 3D human pose estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.288"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Xu, T., and Takano, W. (2021, January 19\u201325). Graph stacked hourglass networks for 3d human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR46437.2021.01584"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Yang, C.Y., Luo, J., Xia, L., Sun, Y., Qiao, N., Zhang, K., Jiang, Z., Hwang, J.N., and Kuo, C.H. (2023, January 2\u20137). CameraPose: Weakly-Supervised Monocular 3D Human Pose Estimation by Leveraging In-the-wild 2D Annotations. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00294"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chai, W., Jiang, Z., Hwang, J.N., and Wang, G. (2023). Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation. arXiv.","DOI":"10.1109\/ICCV51070.2023.01347"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.1109\/TMM.2022.3168137","article-title":"Joint-bone fusion graph convolutional network for semi-supervised skeleton action recognition","volume":"25","author":"Tu","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhao, Q., Zheng, C., Liu, M., Wang, P., and Chen, C. (2023, January 18\u201322). PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00857"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, J., Tu, Z., Yang, J., Chen, Y., and Yuan, J. (2022, January 19\u201324). Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01288"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Tang, Z., Qiu, Z., Hao, Y., Hong, R., and Yao, T. (2023, January 18\u201322). 3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00464"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wan, Z., Li, Z., Tian, M., Liu, J., Yi, S., and Li, H. (2021, January 11\u201317). Encoder-decoder with multi-level attention for 3D human shape and pose estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Virtual.","DOI":"10.1109\/ICCV48922.2021.01279"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhu, W., Ma, X., Liu, Z., Liu, L., Wu, W., and Wang, Y. (2023, January 2\u20136). MotionBERT: A Unified Perspective on Learning Human Motion Representations. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Paris, France.","DOI":"10.1109\/ICCV51070.2023.01385"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Dabral, R., Mundhada, A., Kusupati, U., Afaque, S., Sharma, A., and Jain, A. (2018, January 8\u201314). Learning 3D human pose from structure and motion. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_41"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Pavlakos, G., Zhou, X., and Daniilidis, K. (2018, January 18\u201322). Ordinal depth supervision for 3D human pose estimation. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00763"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhou, X., Huang, Q., Sun, X., Xue, X., and Wei, Y. (2017, January 22\u201329). Towards 3D human pose estimation in the wild: A weakly-supervised approach. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.51"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Hossain, M.R.I., and Little, J.J. (2018, January 8\u201314). Exploiting temporal information for 3D human pose estimation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_5"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Si, C., Jing, Y., Wang, W., Wang, L., and Tan, T. (2018, January 8\u201314). Skeleton-based action recognition with spatial reasoning and temporal stack learning. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01246-5_7"},{"key":"ref_37","unstructured":"Li, C., Wang, P., Wang, S., Hou, Y., and Li, W. (2017, January 10\u201314). Skeleton-based action recognition using LSTM and CNN. Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.patrec.2021.03.028","article-title":"Animepose: Multi-person 3D pose estimation and animation","volume":"147","author":"Kumarapu","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Lee, K., Lee, I., and Lee, S. (2018, January 8\u201314). Propagating lstm: 3d pose estimation based on joint interdependency. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_8"},{"key":"ref_40","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_41","unstructured":"Yang, S., Quan, Z., Nie, M., and Yang, W. (2020). Transpose: Towards explainable human pose estimation by transformer. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lin, K., Wang, L., and Liu, Z. (2021, January 19\u201325). End-to-end human pose and mesh reconstruction with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR46437.2021.00199"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Li, W., Liu, H., Tang, H., Wang, P., and Van Gool, L. (2022, January 19\u201324). Mhformer: Multi-hypothesis transformer for 3D human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01280"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Shen, X., Yang, Z., Wang, X., Ma, J., Zhou, C., and Yang, Y. (2023, January 18\u201322). Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00858"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Einfalt, M., Ludwig, K., and Lienhart, R. (2023, January 2\u20137). Uplift and upsample: Efficient 3D human pose estimation with uplifting transformers. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00292"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1109\/TMM.2022.3141231","article-title":"Exploiting temporal contexts with strided transformer for 3d human pose estimation","volume":"25","author":"Li","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Zhang, T., Huang, B., and Wang, Y. (2020, January 13\u201319). Object-occluded human shape and pose estimation from a single color image. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR42600.2020.00740"},{"key":"ref_48","first-page":"2752","article-title":"Multi-task deep learning for real-time 3D human pose estimation and action recognition","volume":"43","author":"Luvizon","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Pavlakos, G., Zhou, X., Derpanis, K.G., and Daniilidis, K. (2017, January 21\u201326). Coarse-to-fine volumetric prediction for single-image 3D human pose. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.139"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Tekin, B., M\u00e1rquez-Neila, P., Salzmann, M., and Fua, P. (2017, January 22\u201329). Learning to fuse 2D and 3D image cues for monocular body pose estimation. Proceedings of the IEEE international Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.425"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Rayat Imtiaz Hossain, M., and Little, J.J. (2017). Exploiting temporal information for 3D pose estimation. arXiv.","DOI":"10.1007\/978-3-030-01249-6_5"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"\u0160ajina, R., and Iva\u0161i\u0107-Kos, M. (2022). 3D Pose Estimation and Tracking in Handball Actions Using a Monocular Camera. J. Imaging, 8.","DOI":"10.3390\/jimaging8110308"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Liu, J., Rojas, J., Li, Y., Liang, Z., Guan, Y., Xi, N., and Zhu, H. (June, January 30). A graph attention spatio-temporal convolutional network for 3D human pose estimation in video. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9561605"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Fang, H.S., Xie, S., Tai, Y.W., and Lu, C. (2017, January 22\u201329). Rmpe: Regional multi-person pose estimation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.256"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 16\u201320). Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1109\/TPAMI.2013.248","article-title":"Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments","volume":"36","author":"Ionescu","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1007\/s11263-009-0273-6","article-title":"Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion","volume":"87","author":"Sigal","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Liu, R., Shen, J., Wang, H., Chen, C., Cheung, S.c., and Asari, V. (2020, January 13\u201319). Attention mechanism exploits temporal contexts: Real-time 3D human pose reconstruction. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR42600.2020.00511"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Gong, J., Foo, L.G., Fan, Z., Ke, Q., Rahmani, H., and Liu, J. (2023, January 18\u201322). Diffpose: Toward more reliable 3D pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01253"},{"key":"ref_60","unstructured":"Ci, H., Wang, C., Ma, X., and Wang, Y. (November, January 27). Optimizing network structure for 3d human pose estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_61","unstructured":"Xu, Y., Zhu, S.C., and Tung, T. (November, January 27). Denserac: Joint 3d pose and shape estimation by dense render-and-compare. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Wehrbein, T., Rudolph, M., Rosenhahn, B., and Wandt, B. (2021, January 11\u201317). Probabilistic monocular 3D human pose estimation with normalizing flows. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Virtual.","DOI":"10.1109\/ICCV48922.2021.01101"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Yu, B.X., Zhang, Z., Liu, Y., Zhong, S.h., Liu, Y., and Chen, C.W. (2023, January 2\u20136). GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Paris, France.","DOI":"10.1109\/ICCV51070.2023.00810"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/13\/4422\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:11:53Z","timestamp":1760109113000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/13\/4422"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,8]]},"references-count":63,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["s24134422"],"URL":"https:\/\/doi.org\/10.3390\/s24134422","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,8]]}}}