{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T14:18:03Z","timestamp":1781187483720,"version":"3.54.1"},"reference-count":23,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2015,5,26]],"date-time":"2015-05-26T00:00:00Z","timestamp":1432598400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we present human pose estimation and gesture recognition algorithms that use only depth information. The proposed methods are designed to be operated with only a CPU (central processing unit), so that the algorithm can be operated on a low-cost platform, such as an embedded board. The human pose estimation method is based on an SVM (support vector machine) and superpixels without prior knowledge of a human body model. In the gesture recognition method, gestures are recognized from the pose information of a human body. To recognize gestures regardless of motion speed, the proposed method utilizes the keyframe extraction method. Gesture recognition is performed by comparing input keyframes with keyframes in registered gestures. The gesture yielding the smallest comparison error is chosen as a recognized gesture. To prevent recognition of gestures when a person performs a gesture that is not registered, we derive the maximum allowable comparison errors by comparing each registered gesture with the other gestures. We evaluated our method using a dataset that we generated. The experiment results show that our method performs fairly well and is applicable in real environments.<\/jats:p>","DOI":"10.3390\/s150612410","type":"journal-article","created":{"date-parts":[[2015,5,26]],"date-time":"2015-05-26T11:07:05Z","timestamp":1432638425000},"page":"12410-12427","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["Real-Time Human Pose Estimation and Gesture Recognition from Depth Images Using Superpixels and SVM Classifier"],"prefix":"10.3390","volume":"15","author":[{"given":"Hanguen","family":"Kim","sequence":"first","affiliation":[{"name":"Urban Robotics Laboratory (URL), Dept. Civil and Environmental Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 305-338, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sangwon","family":"Lee","sequence":"additional","affiliation":[{"name":"Urban Robotics Laboratory (URL), Dept. Civil and Environmental Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 305-338, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dongsung","family":"Lee","sequence":"additional","affiliation":[{"name":"Image & Video Research Group, Samsung S1 Cooperation, 168 S1 Building, Soonhwa-dong,Joong-gu, Seoul 100-773, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Soonmin","family":"Choi","sequence":"additional","affiliation":[{"name":"Image & Video Research Group, Samsung S1 Cooperation, 168 S1 Building, Soonhwa-dong,Joong-gu, Seoul 100-773, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jinsun","family":"Ju","sequence":"additional","affiliation":[{"name":"Image & Video Research Group, Samsung S1 Cooperation, 168 S1 Building, Soonhwa-dong,Joong-gu, Seoul 100-773, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5799-2026","authenticated-orcid":false,"given":"Hyun","family":"Myung","sequence":"additional","affiliation":[{"name":"Urban Robotics Laboratory (URL), Dept. Civil and Environmental Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 305-338, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2015,5,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/MMUL.2012.24","article-title":"Microsoft Kinect sensor and its effect","volume":"19","author":"Zhang","year":"2012","journal-title":"IEEE Multimed."},{"key":"ref_2","unstructured":"Arieli, Y., Freedman, B., Machline, M., and Shpunt, A. (2012). Depth Mapping Using Projected Patterns. (U.S. Patent 8,150,142)."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1109\/TMM.2012.2225040","article-title":"GPU-accelerated real-time tracking of full-body motion with multi-layer search","volume":"15","author":"Zhang","year":"2013","journal-title":"IEEE Trans. Multimed."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2821","DOI":"10.1109\/TPAMI.2012.241","article-title":"Efficient human pose estimation from single depth images","volume":"35","author":"Shotton","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. (PAMI)"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Hern\u00e1ndez-Vela, A., Zlateva, N., Marinov, A., Reyes, M., Radeva, P., Dimov, D., and Escalera, S. (2012, January 14\u201318). Graph Cuts Optimization for Multi-limb Human Segmentation in Depth Maps. St. Paul, MN, USA.","DOI":"10.1109\/CVPR.2012.6247742"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ganapathi, V., Plagemann, C., Koller, D., and Thrun, S. (2010, January 13\u201318). Real Time Motion Capture Using a Single Time-of-flight Camera. San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540141"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhu, Y., and Fujimura, K. (2007, January 1\u201322). Constrained Optimization for Human Pose Estimation from Depth Sequences. Tokyo, Japan.","DOI":"10.1007\/978-3-540-76386-4_38"},{"key":"ref_8","unstructured":"Grest, D., Woetzel, J., and Koch, R. (2005). Pattern Recognition, Springer."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ye, M., Wang, X., Yang, R., Ren, L., and Pollefeys, M. (2011, January 6\u201313). Accurate 3D Pose Estimation from a Single Depth Image. Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126310"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Siddiqui, M., and Medioni, G. (2010, January 13\u201318). Human Pose Estimation from a Single View Point, Real-time Range Sensor. San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543618"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1109\/MPRV.2010.30","article-title":"Human-display interaction technology: Emerging remote interfaces for pervasive display environments","volume":"9","author":"Bellucci","year":"2010","journal-title":"IEEE Pervasive Comput."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wu, J., Konrad, J., and Ishwar, P. (2013, January 26\u201331). Dynamic Time Warping for Gesture-based User Identification and Authentication with Kinect. Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6638079"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Biswas, K., and Basu, S.K. (2011, January 6\u20138). Gesture Recognition Using Microsoft Kinect\u00ae. Wellington, New Zealand.","DOI":"10.1109\/ICARA.2011.6144864"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Megavannan, V., Agarwal, B., and Babu, R.V. (2012, January 22\u201325). Human Action Recognition Using Depth Maps. Bangalore, India.","DOI":"10.1109\/SPCOM.2012.6290032"},{"key":"ref_15","unstructured":"Kim, H., Hong, S.H., and Myung, H. (2013, January 26\u201329). Gesture recognition algorithm for moving Kinect sensor. Gyeongju, Korea."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Xia, L., Chen, C.C., and Aggarwal, J. (2012, January 16\u201321). View Invariant Human Action Recognition Using Histograms of 3D Joints. Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6239233"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/j.imavis.2009.11.014","article-title":"A survey on vision-based human action recognition","volume":"28","author":"Poppe","year":"2010","journal-title":"Image Vision Comput."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sigalas, M., Baltzakis, H., and Trahanias, P. (2010, January 18\u201322). Gesture Recognition based on Arm Tracking for Human-robot Interaction. Taipei, Taiwan.","DOI":"10.1109\/IROS.2010.5648870"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0169-7439(99)00047-7","article-title":"The mahalanobis distance","volume":"50","author":"Massart","year":"2000","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2274","DOI":"10.1109\/TPAMI.2012.120","article-title":"SLIC superpixels compared to state-of-the-art superpixel methods","volume":"34","author":"Achanta","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. (PAMI)"},{"key":"ref_21","unstructured":"ASUS Xtion pro. Available online: http:\/\/www.asus.com\/Multimedia\/Xtion_PRO\/."},{"key":"ref_22","unstructured":"OpenNI NiTE. Available online: http:\/\/www.openni.ru\/files\/nite\/."},{"key":"ref_23","unstructured":"IPi Mocap Studio. Available online: http:\/\/ipisoft.com\/."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/15\/6\/12410\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T20:46:57Z","timestamp":1760215617000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/15\/6\/12410"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,5,26]]},"references-count":23,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2015,6]]}},"alternative-id":["s150612410"],"URL":"https:\/\/doi.org\/10.3390\/s150612410","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,5,26]]}}}