{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T00:19:02Z","timestamp":1778631542638,"version":"3.51.4"},"reference-count":46,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2015,3,3]],"date-time":"2015-03-03T00:00:00Z","timestamp":1425340800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we propose a new method for spotting and recognizing continuous human actions using a vision sensor. The method is comprised of depth-MHI-HOG (DMH), action modeling, action spotting, and recognition. First, to effectively separate the foreground from background, we propose a method called DMH. It includes a standard structure for segmenting images and extracting features by using depth information, MHI, and HOG. Second, action modeling is performed to model various actions using extracted features. The modeling of actions is performed by creating sequences of actions through  k-means clustering; these sequences constitute HMM input. Third, a method of action spotting is proposed to filter meaningless actions from continuous actions and to identify precise start and end points of actions. By employing the spotter model, the proposed method improves action recognition performance. Finally, the proposed method recognizes actions based on start and end points. We evaluate recognition performance by employing the proposed method to obtain and compare probabilities by applying input sequences  in action models and the spotter model. Through various experiments, we demonstrate  that the proposed method is efficient for recognizing continuous human actions in  real environments.<\/jats:p>","DOI":"10.3390\/s150305197","type":"journal-article","created":{"date-parts":[[2015,3,3]],"date-time":"2015-03-03T10:10:05Z","timestamp":1425377405000},"page":"5197-5227","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["Continuous Human Action Recognition Using Depth-MHI-HOG and a Spotter Model"],"prefix":"10.3390","volume":"15","author":[{"given":"Hyukmin","family":"Eum","sequence":"first","affiliation":[{"name":"School of Electrical and Electronic Engineering, Yonsei University, 134 Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Changyong","family":"Yoon","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Suwon Science College, Hwaseong 445-742, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heejin","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Electrical, Electronic and Control Engineering, Hankyong National University,  Anseong 456-749, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mignon","family":"Park","sequence":"additional","affiliation":[{"name":"School of Electrical and Electronic Engineering, Yonsei University, 134 Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2015,3,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Mitra, V., Franco, H., Graciarena, M., and Vergyri, D. (2014, January 4\u20139). Medium duration modulation cepstral feature for robust speech recognition. Proceedings of the IEEE ICASSP, Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853898"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"8895","DOI":"10.3390\/s140508895","article-title":"A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context","volume":"14","author":"Chaaraoui","year":"2014","journal-title":"Sensors"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"16682","DOI":"10.3390\/s131216682","article-title":"Hierarchical recognition scheme for human facial expression recognition systems","volume":"13","author":"Siddiqi","year":"2013","journal-title":"Sensors"},{"key":"ref_4","unstructured":"Lee, L., and Grimson, W.E.L. (2002, January 20\u201321). Gait analysis for recognition and classification. Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1700","DOI":"10.1109\/TPAMI.2007.1096","article-title":"General tensor discriminant analysis and gabor features for gait recognition","volume":"29","author":"Tao","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"8363","DOI":"10.3390\/s140508363","article-title":"An Investigation on the Feasibility of Uncalibrated and Unconstrained Gaze Tracking for Human Assistive Applications by Using Head Pose Estimation","volume":"14","author":"Cazzato","year":"2014","journal-title":"Sensors"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1110","DOI":"10.1109\/TMM.2013.2246148","article-title":"Robust part-based hand gesture recognition using kinect sensor","volume":"15","author":"Ren","year":"2013","journal-title":"IEEE Trans. Multimed."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.imavis.2012.06.014","article-title":"Recognizing expressions from face and body gesture by temporal normalized motion and appearance features","volume":"31","author":"Chen","year":"2013","journal-title":"Image Vis. Comput."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kern, N., Schiele, B., and Schmidt, A. (2003, January 3\u20134). Multi-sensor activity context detection for wearable computing. Proceedings of the First European Symposium (EUSAI 2003), Veldhoven, The Netherlands.","DOI":"10.1007\/978-3-540-39863-9_17"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"3012","DOI":"10.1016\/j.patcog.2007.02.010","article-title":"Simultaneous gesture segmentation and recognition based on forward spotting accumulative HMMs","volume":"40","author":"Kim","year":"2007","journal-title":"Pattern Recognit."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1109\/TCSVT.2013.2244794","article-title":"Action Recognition Using Multilevel Features and Latent Structural SVM","volume":"23","author":"Wu","year":"2013","journal-title":"IEEE Trans. Circuits Syst. Video Techn."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/j.imavis.2009.11.014","article-title":"A survey on vision-based human action recognition","volume":"28","author":"Poppe","year":"2010","journal-title":"Image Vis. Comput."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ahad, M.A. R., Tan, J., Kim, H., and Ishikawa, S. (2010, January 13\u201318). Action recognition by employing combined directional motion history and energy images. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543160"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1109\/MPRV.2010.7","article-title":"Human activity recognition and pattern discovery","volume":"9","author":"Kim","year":"2010","journal-title":"IEEE Pervasive Comput."},{"key":"ref_15","unstructured":"Weinland, D., Ronfard, R., and Boyer, E. (2005, January 15). Motion history volumes for free viewpoint action recognition. Proceedings of the Workshop on Modeling People and Human Interaction (PHI), Beijing, China."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ahad, M.A. (2013). Motion History Images for Action Recognition and Understanding, Springer.","DOI":"10.1007\/978-1-4471-4730-5"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/34.910878","article-title":"The recognition of human movement using temporal templates","volume":"23","author":"Bobick","year":"2001","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden Markov models and selected applications in speech recognition","volume":"77","author":"Rabiner","year":"1989","journal-title":"IEEE Proc."},{"key":"ref_19","unstructured":"Dugad, R., and Desai, U. (1996). A Tutorial on Hidden Markov Models, Indian Institute of Technology. Technical Report No. SPANN-96.1."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1049\/el:20070027","article-title":"HMM based action recognition using oriented histograms of optical flow field","volume":"43","author":"Li","year":"2007","journal-title":"Electron. Lett."},{"key":"ref_21","unstructured":"Ali, A., and Aggarwal, J. (2001, January 8). Segmentation and recognition of continuous human activity. Proceedings of the IEEE Workshop on Detection and Recognition of Events in Video, Vancouver, BC, Canada."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Morency, L., Quattoni, A., and Darrell, T. (2007, January 17\u201322). Latent-dynamic discriminative models for continuous gesture recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR \u201907), Minneapolis, MN, USA.","DOI":"10.1109\/CVPR.2007.383299"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ning, H., Xu, W., Gong, Y., and Huang, T. (2008, January 12\u201318). Latent pose estimator for continuous action recognition. Proceedings of the 10th European Conference on Computer Vision, Marseille, France.","DOI":"10.1007\/978-3-540-88688-4_31"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Singh, V.K., and Nevatia, R. (2011, January 6\u201313). Action recognition in cluttered dynamic scenes using pose-specific part models. Proceedings of the IEEE International Conference on Computer Vision (ICCV \u201911), Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126232"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., and Vidal, R. (2013, January 23\u201328). Bio-inspired dynamic 3D discriminative skeletal features for human action recognition. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW \u201913), Portland, OR, USA.","DOI":"10.1109\/CVPRW.2013.153"},{"key":"ref_26","unstructured":"Yu, G., Liu, Z., and Yuan, J. (2014, January 1\u20135). Discriminative orderlet mining for real-time recognition of human-object interaction. Proceedings of the Asian Conference on Computer Vision (ACCV \u201914), Singapore."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, J., Liu, Z., and Wu, Y. (2014). Human Action Recognition with Depth Cameras, Springer.","DOI":"10.1007\/978-3-319-04561-0"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1499","DOI":"10.1109\/TCSVT.2008.2005597","article-title":"Expandable data-driven graphical modeling of human actions based on salient postures","volume":"18","author":"Li","year":"2008","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1505","DOI":"10.1109\/TPAMI.2003.1251144","article-title":"Silhouette analysis-based gait recognition for human identification","volume":"25","author":"Wang","year":"2003","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lv, F., and Nevatia, R. (2007, January 17\u201322). Single view human action recognition using key pose matching and viterbi path searching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR \u201907), Minneapolis, MN, USA.","DOI":"10.1109\/CVPR.2007.383131"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1109\/TCE.2012.6311329","article-title":"Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home","volume":"58","author":"Jalal","year":"2012","journal-title":"IEEE Trans. Consum. Electron."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Jalal, A., Uddin, M.Z., Kim, J.T., and Kim, T.-S. (2011). Recognition of Human Home Activities via Depth Silhouettes and \u211c Transformation for Smart Homes. Indoor Built Environ.","DOI":"10.1177\/1420326X11423163"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1073","DOI":"10.1016\/j.patcog.2013.06.030","article-title":"Keyword spotting for self-training of BLSTM NN based handwriting recognition systems","volume":"47","author":"Frinken","year":"2014","journal-title":"Pattern Recognit."},{"key":"ref_34","first-page":"65","article-title":"Real-time capable system for hand gesture recognition using hidden markov models in stereo color image sequences","volume":"16","author":"Elmezain","year":"2008","journal-title":"J. WSCG"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Elmezain, M., Al-Hamadi, A., and Michaelis, B. (2009, January 7\u201310). Hand trajectory-based gesture spotting and recognition using HMM. Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP \u201909), Cairo, Egypt.","DOI":"10.1109\/ICIP.2009.5414322"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1109\/TSMCC.2011.2149519","article-title":"Hierarchical filtered motion for action recognition in crowded videos","volume":"42","author":"Tian","year":"2012","journal-title":"IEEE Trans. Syst. Man Cybern. Part C Appl. Rev."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Xia, L., Chen, C.-C., and Aggarwal, J. (2011, January 20\u201325). Human detection using depth information by kinect. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW \u201911), Colorado Springs, CO, USA.","DOI":"10.1109\/CVPRW.2011.5981811"},{"key":"ref_38","unstructured":"Dalal, N., and Triggs, B. (2005, January 25). Histograms of oriented gradients for human detection. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Ordonez, C. (2003, January 13). Clustering binary data streams with K-means. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, San Diego, CA, USA.","DOI":"10.1145\/882082.882087"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1109\/34.799904","article-title":"An HMM-based threshold model approach for gesture recognition","volume":"21","author":"Lee","year":"1999","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Park, A.-Y., and Lee, S.-W. (2005, January 18\u201320). Gesture spotting in continuous whole body action sequences using discrete hidden markov models. Proceedings of Gesture in Human-Computer Interaction and Simulation, Berder Island, France.","DOI":"10.1007\/11678816_12"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1109\/TRO.2006.889491","article-title":"Gesture spotting and recognition for human\u2013robot interaction","volume":"23","author":"Yang","year":"2007","journal-title":"IEEE Trans. Robot."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1007\/s001380100064","article-title":"Motion segmentation and pose recognition with motion history gradients","volume":"13","author":"Bradski","year":"2002","journal-title":"Mach. Vis. Appl."},{"key":"ref_44","unstructured":"Yang, X., Zhang, C., and Tian, Y. (November, January 29). Recognizing actions using depth motion maps-based histograms of oriented gradients. Proceedings of the 20th ACM international conference on Multimedia, Nara Japan."},{"key":"ref_45","unstructured":"Danafar, S., and Gheissari, N. (2007, January 18\u201322). Action recognition for surveillance applications using optic flow and SVM. Proceedings of the Computer Vision-ACCV 2007, Tokyo, Japan."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"12406","DOI":"10.3390\/s130912406","article-title":"Teaching Human Poses Interactively to a Social Robot","volume":"13","author":"Malfaz","year":"2013","journal-title":"Sensors"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/15\/3\/5197\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T20:43:05Z","timestamp":1760215385000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/15\/3\/5197"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,3,3]]},"references-count":46,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2015,3]]}},"alternative-id":["s150305197"],"URL":"https:\/\/doi.org\/10.3390\/s150305197","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,3,3]]}}}