{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T06:32:18Z","timestamp":1772865138886,"version":"3.50.1"},"reference-count":51,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2019,10,1]],"date-time":"2019-10-01T00:00:00Z","timestamp":1569888000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Objective monitoring and assessment of human motor behavior can improve the diagnosis and management of several medical conditions. Over the past decade, significant advances have been made in the use of wearable technology for continuously monitoring human motor behavior in free-living conditions. However, wearable technology remains ill-suited for applications which require monitoring and interpretation of complex motor behaviors (e.g., involving interactions with the environment). Recent advances in computer vision and deep learning have opened up new possibilities for extracting information from video recordings. In this paper, we present a hierarchical vision-based behavior phenotyping method for classification of basic human actions in video recordings performed using a single RGB camera. Our method addresses challenges associated with tracking multiple human actors and classification of actions in videos recorded in changing environments with different fields of view. We implement a cascaded pose tracker that uses temporal relationships between detections for short-term tracking and appearance based tracklet fusion for long-term tracking. Furthermore, for action classification, we use pose evolution maps derived from the cascaded pose tracker as low-dimensional and interpretable representations of the movement sequences for training a convolutional neural network. The cascaded pose tracker achieves an average accuracy of 88% in tracking the target human actor in our video recordings, and overall system achieves average test accuracy of 84% for target-specific action classification in untrimmed video recordings.<\/jats:p>","DOI":"10.3390\/s19194266","type":"journal-article","created":{"date-parts":[[2019,10,1]],"date-time":"2019-10-01T11:11:16Z","timestamp":1569928276000},"page":"4266","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Target-Specific Action Classification for Automated Assessment of Human Motor Behavior from Video"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5892-5001","authenticated-orcid":false,"given":"Behnaz","family":"Rezaei","sequence":"first","affiliation":[{"name":"Augmented Cognition Lab (ACLab), Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA"}]},{"given":"Yiorgos","family":"Christakis","sequence":"additional","affiliation":[{"name":"Digital Medicine &amp; Translational Imaging Group, Pfizer, Cambridge, MA 02139, USA"}]},{"given":"Bryan","family":"Ho","sequence":"additional","affiliation":[{"name":"Neurology Department, Tufts University School of Medicine, Boston, MA 02111, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3463-5832","authenticated-orcid":false,"given":"Kevin","family":"Thomas","sequence":"additional","affiliation":[{"name":"Department of Anatomy &amp; Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA"}]},{"given":"Kelley","family":"Erb","sequence":"additional","affiliation":[{"name":"Digital Medicine &amp; Translational Imaging Group, Pfizer, Cambridge, MA 02139, USA"}]},{"given":"Sarah","family":"Ostadabbas","sequence":"additional","affiliation":[{"name":"Augmented Cognition Lab (ACLab), Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4369-3033","authenticated-orcid":false,"given":"Shyamal","family":"Patel","sequence":"additional","affiliation":[{"name":"Digital Medicine &amp; Translational Imaging Group, Pfizer, Cambridge, MA 02139, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,10,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1577","DOI":"10.1002\/mds.20640","article-title":"Unified Parkinson\u2019s disease rating scale motor examination: Are ratings of nurses, residents in neurology, and movement disorders specialists interchangeable?","volume":"20","author":"Post","year":"2005","journal-title":"Mov. Disord. Off. J. Mov. Disord. Soc."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1272","DOI":"10.1002\/mds.26642","article-title":"Movement Disorders Society Task Force on Technology. Technology in Parkinson\u2019s disease: Challenges and opportunities","volume":"31","author":"Espay","year":"2016","journal-title":"Mov. Disord."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1036","DOI":"10.3389\/fneur.2018.01036","article-title":"Monitoring Motor Symptoms During Activities of Daily Living in Individuals With Parkinson\u2019s Disease","volume":"9","author":"Thorp","year":"2018","journal-title":"Front. Neurol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1192","DOI":"10.1109\/SURV.2012.110112.00192","article-title":"A survey on human activity recognition using wearable sensors","volume":"15","author":"Lara","year":"2013","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2214","DOI":"10.1007\/s00415-011-6097-7","article-title":"Physical inactivity in Parkinson\u2019s disease","volume":"258","author":"Speelman","year":"2011","journal-title":"J. Neurol."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"10873","DOI":"10.1016\/j.eswa.2012.03.005","article-title":"A review on vision techniques applied to human behaviour analysis for ambient-assisted living","volume":"39","author":"Chaaraoui","year":"2012","journal-title":"Expert Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"28","DOI":"10.3389\/frobt.2015.00028","article-title":"A review of human activity recognition methods","volume":"2","author":"Vrigkas","year":"2015","journal-title":"Front. Robot. AI"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1109\/JBHI.2018.2819182","article-title":"Robust Activity Recognition for Aging Society","volume":"22","author":"Chen","year":"2018","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1186\/s12984-018-0446-z","article-title":"Vision-based assessment of parkinsonism and levodopa-induced dyskinesia with pose estimation","volume":"15","author":"Li","year":"2018","journal-title":"J. Neuroeng. Rehabil."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Brattoli, B., Buchler, U., Wahl, A.S., Schwab, M.E., and Ommer, B. (2017, January 21\u201326). LSTM Self-Supervision for Detailed Behavior Analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.399"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Song, S., Shen, L., and Valstar, M. (2018, January 15\u201319). Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features. Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi\u2019an, China.","DOI":"10.1109\/FG.2018.00032"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Schmitt, F., Bieg, H.J., Herman, M., and Rothkopf, C.A. (2017, January 4\u20139). I see what you see: Inferring sensor and policy models of human real-world motor behavior. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11049"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chen, A.T., Biglari-Abhari, M., and Wang, K.I. (2017, January 21\u201326). Trusting the Computer in Computer Vision: A Privacy-Affirming Framework. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.178"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Rezaei, B., and Ostadabbas, S. (2017, January 22\u201329). Background Subtraction via Fast Robust Matrix Completion. Proceedings of the 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.221"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Rezaei, B., Huang, X., Yee, J.R., and Ostadabbas, S. (2017, January 5\u20139). Long-term non-contact tracking of caged rodents. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952497"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1313","DOI":"10.1109\/JSTSP.2018.2869111","article-title":"Moving Object Detection through Robust Matrix Completion Augmented with Objectness","volume":"12","author":"Rezaei","year":"2018","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1016\/j.imavis.2017.01.010","article-title":"Going deeper into action recognition: A survey","volume":"60","author":"Herath","year":"2017","journal-title":"Image Vis. Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/LSENS.2018.2878572","article-title":"Data Augmentation in Deep Learning-Based Fusion of Depth and Inertial Sensing for Action Recognition","volume":"3","author":"Dawar","year":"2018","journal-title":"IEEE Sens. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Girdhar, R., Carreira, J., Doersch, C., and Zisserman, A. (2019, January 16\u201321). Video action transformer network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00033"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhang, H.B., Zhang, Y.X., Zhong, B., Lei, Q., Yang, L., Du, J.X., and Chen, D.S. (2019). A comprehensive survey of vision-based human action recognition methods. Sensors, 19.","DOI":"10.3390\/s19051005"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.neucom.2018.05.033","article-title":"Detecting action tubes via spatial action estimation and temporal path inference","volume":"311","author":"Li","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_22","unstructured":"Simonyan, K., and Zisserman, A. (2014, January 8\u201313). Two-stream convolutional networks for action recognition in videos. Proceedings of the Advances in Neural Information Processing Systems, Montreal, ON, Canada."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Sun, X., Zha, Z.J., and Zeng, W. (2018, January 18\u201322). MiCT: Mixed 3D\/2D Convolutional Tube for Human Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00054"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., and Paluri, M. (2018, January 18\u201322). A Closer Look at Spatiotemporal Convolutions for Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00675"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Liu, M., and Yuan, J. (2018, January 18\u201322). Recognizing Human Actions as the Evolution of Pose Estimation Maps. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00127"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Choutas, V., Weinzaepfel, P., Revaud, J., and Schmid, C. (2018, January 18\u201322). PoTion: Pose MoTion Representation for Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00734"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Cherian, A., Sra, S., Gould, S., and Hartley, R. (2018, January 18\u201322). Non-Linear Temporal Subspace Representations for Activity Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00234"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zolfaghari, M., Oliveira, G.L., Sedaghat, N., and Brox, T. (2017, January 22\u201329). Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.316"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., and Tran, D. (2018, January 18\u201322). Detect-and-Track: Efficient Pose Estimation in Videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00044"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft coco: Common objects in context. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Andriluka, M., Iqbal, U., Milan, A., Insafutdinov, E., Pishchulin, L., Gall, J., and Schiele, B. (2018, January 18\u201322). Posetrack: A benchmark for human pose estimation and tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00542"},{"key":"ref_35","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (26\u20131, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","first-page":"523","article-title":"A systematic evaluation and benchmark for person re-identification: Features, metrics, and datasets","volume":"41","author":"Gou","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gou, M., Camps, O., and Sznaier, M. (2017, January 22\u201329). Mom: Mean of moments feature for person re-identification. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.154"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Liao, S., Hu, Y., Zhu, X., and Li, S.Z. (2015, January 7\u201312). Person re-identification by local maximal occurrence representation and metric learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298832"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Ahmed, E., Jones, M., and Marks, T.K. (2015, January 7\u201312). An improved deep learning architecture for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299016"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Li, M., Zhu, X., and Gong, S. (2018, January 8\u201314). Unsupervised person re-identification by deep learning tracklet association. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01225-0_45"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lv, J., Chen, W., Li, Q., and Yang, C. (2018, January 18\u201322). Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00829"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Pirsiavash, H., Ramanan, D., and Fowlkes, C.C. (2011, January 20\u201325). Globally-optimal greedy algorithms for tracking a variable number of objects. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995604"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1002\/nav.20053","article-title":"The Hungarian method for the assignment problem","volume":"52","author":"Kuhn","year":"2005","journal-title":"Nav. Res. Logist."},{"key":"ref_45","unstructured":"Erb, K., Daneault, J., Amato, S., Bergethon, P., Demanuele, C., Kangarloo, T., Patel, S., Ramos, V., Volfson, D., and Wacnik, P. (2018, January 5\u20139). The BlueSky Project: Monitoring motor and non-motor characteristics of people with Parkinson\u2019s disease in the laboratory, a simulated apartment, and home and community settings. Proceedings of the 2018 International Congress, Hong Kong, China."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2129","DOI":"10.1002\/mds.22340","article-title":"Movement Disorder Society-sponsored revision of the Unified Parkinson\u2019s Disease Rating Scale (MDS-UPDRS): Scale presentation and clinimetric testing results","volume":"23","author":"Goetz","year":"2008","journal-title":"Mov. Disord. Off. J. Mov. Disord. Soc."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1016\/j.jocn.2018.10.043","article-title":"Quantification of discrete behavioral components of the MDS-UPDRS","volume":"61","author":"Brooks","year":"2019","journal-title":"J. Clin. Neurosci."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1037\/0096-3445.133.1.83","article-title":"Time constraints and resource sharing in adults\u2019 working memory spans","volume":"133","author":"Barrouillet","year":"2004","journal-title":"J. Exp. Psychol. Gen."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"1215","DOI":"10.1001\/jama.2017.11295","article-title":"Digital Phenotyping: Technology for a New Science of Behavior","volume":"318","author":"Insel","year":"2017","journal-title":"JAMA"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1007\/s10865-018-9966-z","article-title":"The history and future of digital health in the field of behavioral medicine","volume":"42","author":"Arigo","year":"2019","journal-title":"J. Behav. Med."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"31314","DOI":"10.3390\/s151229858","article-title":"Physical Human Activity Recognition Using Wearable Sensors","volume":"15","author":"Attal","year":"2015","journal-title":"Sensors"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/19\/4266\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:26:31Z","timestamp":1760189191000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/19\/4266"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,1]]},"references-count":51,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2019,10]]}},"alternative-id":["s19194266"],"URL":"https:\/\/doi.org\/10.3390\/s19194266","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,1]]}}}