{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T10:51:00Z","timestamp":1761821460140,"version":"build-2065373602"},"reference-count":53,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2021,4,26]],"date-time":"2021-04-26T00:00:00Z","timestamp":1619395200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2019YFC1511200"],"award-info":[{"award-number":["2019YFC1511200"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>High-quality and complete human motion 4D reconstruction is of great significance for immersive VR and even human operation. However, it has inevitable self-scanning constraints, and tracking under monocular settings also has strict restrictions. In this paper, we propose a human motion capture system combined with human priors and performance capture that only uses a single RGB-D sensor. To break the self-scanning constraint, we generated a complete mesh only using the front view input to initialize the geometric capture. In order to construct a correct warping field, most previous methods initialize their systems in a strict way. To maintain high fidelity while increasing the easiness of the system, we updated the model while capturing motion. Additionally, we blended in human priors in order to improve the reliability of model warping. Extensive experiments demonstrated that our method can be used more comfortably while maintaining credible geometric warping and remaining free of self-scanning constraints.<\/jats:p>","DOI":"10.3390\/s21093029","type":"journal-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T06:19:11Z","timestamp":1619504351000},"page":"3029","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Human Motion Tracking with Less Constraint of Initial Posture from a Single RGB-D Sensor"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2819-704X","authenticated-orcid":false,"given":"Chen","family":"Liu","sequence":"first","affiliation":[{"name":"College of Information Science and Engineering, Northeastern University, Shenyang 110819, China"},{"name":"State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China"},{"name":"Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China"}]},{"given":"Anna","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Northeastern University, Shenyang 110819, China"}]},{"given":"Chunguang","family":"Bu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China"},{"name":"Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China"}]},{"given":"Wenhui","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Northeastern University, Shenyang 110819, China"}]},{"given":"Haijing","family":"Sun","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Northeastern University, Shenyang 110819, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R.A., Kohli, P., Shotton, J., Hodges, S., Freeman, D., and Davison, A.J. (2011, January 16\u201319). KinectFusion: Real-Time 3D Reconstruction and Interaction Using a Moving Depth Camera. Proceedings of the 24th ACM Symposium on User Interface Software & Technology, Santa Barbara, CA, USA.","DOI":"10.1145\/2047196.2047270"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Newcombe, R.A., Davison, A.J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., and Fitzgibbon, A. (2011, January 26\u201329). KinectFusion: Real-time dense surface mapping and tracking. Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, The Switzerland.","DOI":"10.1109\/ISMAR.2011.6162880"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Newcombe, R.A., Fox, D., and Seitz, S.M. (2015, January 7\u201313). DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298631"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Dou, M.S., Taylor, J., Fuchs, H., Fitzgibbon, A., and Izadi, S. (2015, January 7\u201313). 3D Scanning Deformable Objects with a Single RGBD Sensor. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298647"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1007\/978-3-319-46484-8_22","article-title":"VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction","volume":"9912","author":"Innmann","year":"2016","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2897824.2925969","article-title":"Fusion4D","volume":"35","author":"Dou","year":"2016","journal-title":"ACM Trans. Graph."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Slavcheva, M., Baust, M., Cremers, D., and Ilic, S. (2017, January 21\u201326). KillingFusion: Non-rigid 3D Reconstruction without Correspondences. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.581"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.patcog.2016.05.019","article-title":"RGB-D-based action recognition datasets: A survey","volume":"60","author":"Zhang","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_9","unstructured":"Hao, L., Adams, B., Guibas, L.J., and Pauly, M. (2009, January 16\u201319). Robust Single-View Geometry and Motion Reconstruction. Proceedings of the ACM Siggraph Asia, Yokohama, Japan."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1360612.1360696","article-title":"Articulated mesh animation from multi-view silhouettes","volume":"27","author":"Vlasic","year":"2008","journal-title":"ACM Trans. Graph."},{"key":"ref_11","unstructured":"Dou, M., Fuchs, H., and Frahm, J.-M. (2013, January 1\u20134). Scanning and Tracking Dynamic Objects with Commodity Depth Cameras. Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Adelaide, Australia."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1109\/TVCG.2012.56","article-title":"Scanning 3d full human bodies using kinects","volume":"18","author":"Tong","year":"2012","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1109\/TMM.2012.2229264","article-title":"Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras","volume":"15","author":"Alexiadis","year":"2012","journal-title":"IEEE Trans. Multimed."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3130800.3130801","article-title":"Motion2Fusion: Real-time Volumetric Performance Capture","volume":"36","author":"Dou","year":"2017","journal-title":"ACM Trans. Graph."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Joo, H., Simon, T., and Sheikh, Y. (2018, January 18\u201323). Total Capture: A 3d Deformation Model for Tracking Faces, Hands, and Bodies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00868"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2508","DOI":"10.1109\/TPAMI.2019.2915229","article-title":"UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction using Commercial RGBD Cameras","volume":"42","author":"Xu","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Guo, K., Taylor, J., Fanello, S., Tagliasacchi, A., Dou, M., Davidson, P., Kowdle, A., and Izadi, S. (2018, January 5\u20138). TwinFusion: High Framerate Non-rigid Fusion through Fast Correspondence Tracking. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00074"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1109\/TVCG.2019.2930691","article-title":"Flyfusion: Realtime dynamic scene reconstruction using a flying depth camera","volume":"27","author":"Xu","year":"2019","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Yu, T., Zheng, Z.R., Guo, K.W., Zhao, J.H., Dai, Q.H., Li, H., Pons-Moll, G., and Liu, Y.B. (2018, January 18\u201323). DoubleFusion: Real-time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00761"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Yu, T., Li, H., Guo, K., Dai, Q., Fang, L., and Liu, Y. (2018, January 8\u201314). HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_24"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., and Schmid, C. (2018, January 8\u201314). Bodynet: Volumetric Inference of 3d Human Body Shapes. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_2"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Saito, S., Simon, T., Saragih, J., and Joo, H. (2020, January 13\u201319). PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00016"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Yu, T., Wei, Y., Dai, Q., and Liu, Y. (2019). DeepHuman: 3D Human Reconstruction from a Single Image. arXiv.","DOI":"10.1109\/ICCV.2019.00783"},{"key":"ref_24","unstructured":"Ma, Q., Tang, S., Pujades, S., Pons-Moll, G., Ranjan, A., and Black, M.J. (2019). Dressing 3D Humans using a Conditional Mesh-VAE-GAN. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Alldieck, T., Pons-Moll, G., Theobalt, C., and Magnor, M. (2019). Tex2Shape: Detailed Full Human Body Geometry from a Single Image. arXiv.","DOI":"10.1109\/ICCV.2019.00238"},{"key":"ref_26","unstructured":"Zheng, Z., Yu, T., Liu, Y., and Dai, Q. (2020). PaMIR: Parametric Model-Conditioned Implicit Representation for Image-based Human Reconstruction. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Onizuka, H., Hayirci, Z., Thomas, D., Sugimoto, A., Uchiyama, H., and Taniguchi, R.-i. (2020). TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell. arXiv.","DOI":"10.1109\/CVPR42600.2020.00605"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Huang, Z., Xu, Y., Lassner, C., Li, H., and Tung, T. (2020). ARCH: Animatable Reconstruction of Clothed Humans. arXiv.","DOI":"10.1109\/CVPR42600.2020.00316"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., and Theobalt, C. (2020). DeepCap: Monocular Human Performance Capture Using Weak Supervision. arXiv.","DOI":"10.1109\/CVPR42600.2020.00510"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, L., Zhao, X., Yu, T., Wang, S., and Liu, Y. (2020). NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image. arXiv.","DOI":"10.1007\/978-3-030-58565-5_26"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chibane, J., Alldieck, T., and Pons-Moll, G. (2020, January 13\u201319). Implicit Functions in Feature Space for 3d Shape Reconstruction and Completion. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00700"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1007\/978-3-319-46454-1_34","article-title":"Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image","volume":"9909","author":"Bogo","year":"2016","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kanazawa, A., Black, M.J., Jacobs, D.W., and Malik, J. (2018, January 18\u201323). End-to-End Recovery of Human Shape and Pose. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00744"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Pavlakos, G., Kolotouros, N., and Daniilidis, K. (2019, January 27\u201328). TexturePose: Supervising Human Mesh Estimation with Texture Consistency. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00089"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kocabas, M., Athanasiou, N., and Black, M.J. (2020, January 13\u201319). VIBE: Video Inference for Human Body Pose and Shape Estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00530"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Choi, H., Moon, G., and Lee, K.M. (2020, January 23\u201328). Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose. Proceedings of the ECCV, Glasgow, UK.","DOI":"10.1007\/978-3-030-58571-6_45"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhu, H., Zuo, X., Wang, S., Cao, X., and Yang, R. (2019, January 27\u201328). Detailed Human Shape Estimation from a Single Image by Hierarchical Mesh Deformation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seoul, Korea.","DOI":"10.1109\/CVPR.2019.00462"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Kolotouros, N., Pavlakos, G., Black, M.J., and Daniilidis, K. (2019). Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop. arXiv.","DOI":"10.1109\/ICCV.2019.00234"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., and Schiele, B. (2018, January 5\u20138). Neural body fitting: Unifying deep learning and model based human pose and shape estimation. Proceedings of the International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00062"},{"key":"ref_40","unstructured":"Yoshiyasu, Y., and Gamez, L. (2019). Learning Body Shape and Pose from Dense Correspondences. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A., Tzionas, D., and Black, M.J. (2019, January 27\u201328). Expressive body capture: 3d hands, face, and body from a single image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seoul, Korea.","DOI":"10.1109\/CVPR.2019.01123"},{"key":"ref_42","first-page":"1","article-title":"3D Self-Portraits","volume":"32","author":"Li","year":"2013","journal-title":"Acm Trans. Graph."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Fu, B., Ye, M., and Yang, R.G. (2014, January 23\u201328). Quality Dynamic Human Body Modeling Using a Single Low-cost Depth Camera. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.92"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1770","DOI":"10.1109\/TVCG.2017.2688331","article-title":"Robust Non-Rigid Motion Tracking and Surface Reconstruction Using $ L_0 $ Regularization","volume":"24","author":"Guo","year":"2018","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Slavcheva, M., Baust, M., and Ilic, S. (2018). SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion. Proc. Cvpr. IEEE, 2646\u20132655.","DOI":"10.1109\/CVPR.2018.00280"},{"key":"ref_46","unstructured":"Zhuo, S.L.X., Zerong, Z., Tao, Y., Yebin, L., and Lu, F. (2020, January 23\u201328). RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera. Proceedings of the ECCV, Glasgow, UK."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Bogo, F., Black, M.J., Loper, M., and Romero, J. (2016, January 27\u201330). Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences. Proceedings of the IEEE International Conference on Computer Vision, Las Vegas, NV, USA.","DOI":"10.1109\/ICCV.2015.265"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Sun, S., Li, C., Guo, Z., and Tai, Y. (2019, January 27\u201328). Parametric Human Shape Reconstruction via Bidirectional Silhouette Guidance. Proceedings of the IEEE International Conference on Computer Vision Workshops, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00495"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1145\/37402.37422","article-title":"Marching cubes: A high resolution 3D surface construction algorithm","volume":"21","author":"Lorensen","year":"1987","journal-title":"ACM Siggraph Comput. Graph."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2816795.2818013","article-title":"SMPL: A skinned multi-person linear model","volume":"34","author":"Loper","year":"2015","journal-title":"ACM Trans. on Graph. (TOG)"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Rong, Y., Shiratori, T., and Joo, H. (2020). FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration. arXiv.","DOI":"10.1109\/ICCVW54120.2021.00201"},{"key":"ref_52","unstructured":"Ravi, N., Reizenstein, J., Novotny, D., Gordon, T., Lo, W.-Y., Johnson, J., and Gkioxari, G. (2020). Accelerating 3d deep learning with pytorch3d. arXiv."},{"key":"ref_53","unstructured":"Lassner, C. (2020). Fast Differentiable Raycasting for Neural Rendering using Sphere-based Representations. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/9\/3029\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:52:50Z","timestamp":1760161970000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/9\/3029"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,26]]},"references-count":53,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["s21093029"],"URL":"https:\/\/doi.org\/10.3390\/s21093029","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,4,26]]}}}