{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T14:17:54Z","timestamp":1761401874446,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2016,12,17]],"date-time":"2016-12-17T00:00:00Z","timestamp":1481932800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No. 61375086"],"award-info":[{"award-number":["No. 61375086"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Key Project of S&amp;T Plan of Beijing Municipal Commission of Education","award":["Grant No. KZ201610005010"],"award-info":[{"award-number":["Grant No. KZ201610005010"]}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Noise and constant empirical motion constraints affect the extraction of distinctive spatiotemporal features from one or a few samples per gesture class. To tackle these problems, an adaptive local spatiotemporal feature (ALSTF) using fused RGB-D data is proposed. First, motion regions of interest (MRoIs) are adaptively extracted using grayscale and depth velocity variance information to greatly reduce the impact of noise. Then, corners are used as keypoints if their depth, and velocities of grayscale and of depth meet several adaptive local constraints in each MRoI. With further filtering of noise, an accurate and sufficient number of keypoints is obtained within the desired moving body parts (MBPs). Finally, four kinds of multiple descriptors are calculated and combined in extended gradient and motion spaces to represent the appearance and motion features of gestures. The experimental results on the ChaLearn gesture, CAD-60 and MSRDailyActivity3D datasets demonstrate that the proposed feature achieves higher performance compared with published state-of-the-art approaches under the one-shot learning setting and comparable accuracy under the leave-one-out cross validation.<\/jats:p>","DOI":"10.3390\/s16122171","type":"journal-article","created":{"date-parts":[[2016,12,23]],"date-time":"2016-12-23T04:09:09Z","timestamp":1482466149000},"page":"2171","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Adaptive Local Spatiotemporal Features from RGB-D Data for One-Shot Learning Gesture Recognition"],"prefix":"10.3390","volume":"16","author":[{"given":"Jia","family":"Lin","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"},{"name":"Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China"}]},{"given":"Xiaogang","family":"Ruan","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"},{"name":"Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China"}]},{"given":"Naigong","family":"Yu","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"},{"name":"Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China"}]},{"given":"Yee-Hong","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Computing Science, University of Alberta, Edmonton, AB T6G2E8, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2016,12,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Wan, J., Ruan, Q., Li, W., An, G., and Zhao, R. (2014). 3D SMoSIFT: Three-dimensional Sparse Motion Scale Invariant Feature Transform for Activity Recognition from RGB-D Videos. J. Electron. Imaging, 23.","DOI":"10.1117\/1.JEI.23.2.023017"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"31268","DOI":"10.3390\/s151229853","article-title":"Control and Guidance of Low-Cost Robots via Gesture Perception for Monitoring Activities in the Home","volume":"15","author":"Sempere","year":"2015","journal-title":"Sensors"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"28646","DOI":"10.3390\/s151128646","article-title":"HAGR-D: A Novel Approach for Gesture Recognition with Depth Map","volume":"15","author":"Santos","year":"2015","journal-title":"Sensors"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1332","DOI":"10.1126\/science.aab3050","article-title":"Human-level Concept Learning through Probabilistic Program Induction","volume":"350","author":"Lake","year":"2015","journal-title":"Science"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1302","DOI":"10.1016\/j.patcog.2014.10.026","article-title":"Conditional Distance Based Matching for One-shot Gesture Recognition","volume":"48","author":"Krishnan","year":"2015","journal-title":"Pattern Recognit."},{"key":"ref_6","first-page":"2549","article-title":"One-shot Learning Gesture Recognition from RGB-D Data using Bag of Features","volume":"14","author":"Wan","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Rattani, A., Roli, F., and Granger, E. (2015). Adaptive Biometric Systems, Springer. [1st ed.].","DOI":"10.1007\/978-3-319-24865-3"},{"key":"ref_8","first-page":"227","article-title":"Multi-layered Gesture Recognition with Kinect","volume":"16","author":"Jiang","year":"2015","journal-title":"J. Mach. Learn. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1626","DOI":"10.1109\/TPAMI.2015.2513479","article-title":"Explore Efficient Local Features from RGB-D Data for One-Shot Learning Gesture Recognition","volume":"38","author":"Wan","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","unstructured":"Hernandez-Vela, A., Bautista, M.A., Perez-Sala, X., Baro, X., Pujol, O., Angulo, C., and Escalera, S. (2012, January 11\u201315). BoVDW: Bag-of-visual-and-depth-words for gesture recognition. Proceedings of the IEEE International Conference on Pattern Recognition, Tsukuba, Japan."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1016\/j.patrec.2013.09.009","article-title":"Probability-based Dynamic Time Warping and Bag-Of-Visual-And-Depth-Words for Human Gesture Recognition in RGB-D","volume":"50","author":"Bautista","year":"2014","journal-title":"Pattern Recognit. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Laptev, I., and Lindeberg, T. (2003, January 13\u201316). Space-time Interest Points. Proceedings of the IEEE International Conference on Computer Vision, Beijing, China.","DOI":"10.1109\/ICCV.2003.1238378"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1007\/s11263-005-1838-7","article-title":"On Space-time Interest Points","volume":"64","author":"Laptev","year":"2005","journal-title":"Int. J. Comput. Vis."},{"key":"ref_14","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of Oriented Gradients for Human Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Laptev, I., Marsza\u0142ek, M., Schmid, C., and Rozenfeld, B. (2008, January 24\u201326). Learning Realistic Human Actions from Movies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587756"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ming, Y., Ruan, Q., and Hauptmann, A.G. (2012, January 9\u201313). Activity Recognition from RGB-D Camera with 3D Local Spatio-Temporal Features. Proceedings of the International Conference on Multimedia and Expo, Melbourne, Australia.","DOI":"10.1109\/ICME.2012.8"},{"key":"ref_17","unstructured":"Chen, M., and Hauptmann, A. (2009). MoSIFT: Recognition Human Actions in Surveillance Videos, Carnegie Mellon University. Research Report."},{"key":"ref_18","first-page":"404","article-title":"SURF: Speeded Up Robust Features","volume":"110","author":"Bay","year":"2006","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_19","unstructured":"Shi, J., and Tomasi, C. (1994, January 21\u201323). Good Features to Track. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Dalal, N., Triggs, B., and Schmid, C. (2006, January 7\u201313). Human Detection Using Oriented Histograms of Flow and Appearance. Proceedings of the European Conference on Computer Vision, Graz, Austria.","DOI":"10.1007\/11744047_33"},{"key":"ref_21","unstructured":"ChaLearn ChaLearn Gesture Dataset. Available online: http:\/\/gesture.chalearn.org\/data."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1023\/B:VISI.0000027790.02288.f2","article-title":"Scale & Affine Invariant Interest Point Detectors","volume":"60","author":"Mikolajczyk","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Aggarwal, J.K., and Ryoo, M.S. (2011). Human Activity Analysis: A Review. ACM Comput. Surv., 43.","DOI":"10.1145\/1922649.1922653"},{"key":"ref_24","unstructured":"Harris, C., and Stephens, M. (September, January 31). A Combined Corner and Edge Detector. Proceedings of the Alvey vision Conference, Manchester, UK."},{"key":"ref_25","unstructured":"Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005, January 15\u201316). Behavior Recognition via Sparse Spatiotemporal Features. Proceedings of the Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China."},{"key":"ref_26","unstructured":"Lu, X., and Aggarwal, J.K. (2013, January 25\u201327). Spatio-Temporal Depth Cuboid Similarity Feature for Activity Recognition using Depth Camera. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Willems, G., Tuytelarrs, T., and Van Gool, L. (2008, January 12\u201318). An Efficient Dense and Scale-invariant Spatio-temporal Interest Point Detector. Proceedings of the European Conference on Computer Vision, Marseille, France.","DOI":"10.1007\/978-3-540-88688-4_48"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Schuldt, C., Laptev, I., and Caputo, B. (2004, January 23\u201326). Recognizing Human Actions: A Local SVM Approach. Proceedings of the International Conference on Pattern Recognition, Cambridge, UK.","DOI":"10.1109\/ICPR.2004.1334462"},{"key":"ref_29","unstructured":"Laptev, I., and Lindeberg, T. (2004, January 15). Local Descriptors for Spatio-Temporal Recognition. Proceedings of the International Conference on Spatial Coherence for Visual Motion Analysis, Prague, Czech Republic."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s11263-012-0594-8","article-title":"Dense Trajectories and Motion Boundary Descriptors for Action Recognition","volume":"103","author":"Wang","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive Image Features from Scale-invariant Keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_32","unstructured":"Farneback, G. (July, January 29). Two-frame Motion Estimation Based on Polynomial Expansion. Proceedings of the 13th Scandinavian Conference on Image Analysis, Halmstad, Sweden."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/TSMC.1979.4310076","article-title":"A Threshold Selection Method from Gray-level Histograms","volume":"9","author":"Otsu","year":"1979","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_34","unstructured":"Lucas, B.D., and Kanade, T. (1981, January 24\u201328). An Iterative Image Registration Technique with an Application to Stereo Vision. Proceedings of the International Joint Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_35","first-page":"2513","article-title":"One-shot-learning Gesture Recognition Using HOG-HOF Features","volume":"15","author":"Konecny","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Malgireddy, M.R., Inwogu, I., and Govindaraju, V. (2012, January 16\u201321). A Temporal Bayesian Model for Classifying, Detecting and Localizing Activities in Video Sequences. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6239185"},{"key":"ref_37","first-page":"2189","article-title":"Language-motivated Approaches to Action Recognition","volume":"14","author":"Malgireddy","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, H., Ullah, M.M., Klaser, A., Laptev, I., and Schmid, C. (2009, January 7\u201310). Evaluation of Local Spatio-temporal Features for Action Recognition. Proceedings of the British Machine Vision Conference, London, UK.","DOI":"10.5244\/C.23.124"},{"key":"ref_39","unstructured":"Sung, J., Ponce, C., Selman, B., and Saxena, A. (2012, January 14\u201318). Unstructured Human Activity Detection from RGBD Images. Proceedings of the IEEE Conference on Robotics and Automation, Saint Paul, MN, USA."},{"key":"ref_40","unstructured":"Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2012, January 16\u201321). Mining Actionlet Ensemble for Action Recognition with Depth Cameras. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1929","DOI":"10.1007\/s00138-014-0596-3","article-title":"The ChaLearn Gesture Dataset (CGD 2011)","volume":"25","author":"Guyon","year":"2014","journal-title":"Mach. Vis. Appl."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Guyon, I., Athitsos, V., Jangyodsuk, P., Escalante, H.J., and Hamner, B. (2012, January 11). Results and Analysis of the ChaLearn Gesture Challenge 2012. Proceedings of the International Workshop on Depth Image Analysis, Tsukuba, Japan.","DOI":"10.1007\/978-3-642-40303-3_19"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.imavis.2014.04.005","article-title":"Evaluating Spatiotemporal Interest Point Features for Depth-based Action Recognition","volume":"32","author":"Zhu","year":"2014","journal-title":"Image Vis. Comput."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Parisi, C., Weber, C., and Wermter, S. (2015). Self-organizing Neural Integration of Pose-motion Features for Human Action Recognition. Front. Neurorobot., 9.","DOI":"10.3389\/fnbot.2015.00003"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Faria, D.R., Premebida, C., and Nunes, U. (2014, January 25\u201329). A Probabilistic Approach for Human Everyday Activities Recognition Using Body Motion from RGB-D Images. Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication, Edinburgh, UK.","DOI":"10.1109\/ROMAN.2014.6926340"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1383","DOI":"10.1109\/TCYB.2013.2276433","article-title":"Multilevel Depth and Image Fusion for Human Activity Detection","volume":"43","author":"Ni","year":"2013","journal-title":"IEEE Trans. Cybern."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Gupta, R., Chia, Y.-S.A., and Rajan, D. (2013, January 21\u201325). Human Activities Recognition Using Depth Images. Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Catalunya, Spain.","DOI":"10.1145\/2502081.2502099"},{"key":"ref_48","unstructured":"Zhang, C., and Tian, Y. (2012, January 3\u20136). RGB-D Camera-based Daily Living Activity Analysis. Proceedings of the 4th Asia-Pacific Signal & Information Processing Association Annual Summit and Conference, Hollywood, CA, USA."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Oreifej, O., and Liu, Z. (2013, January 25\u201327). Hon4d: Histogram of Oriented 4D Normal for Activity Recognition from Depth sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.98"},{"key":"ref_50","unstructured":"Liu, L., and Shao, L. (2013, January 3\u20139). Learning Discriminative Representations from RGB-D Video Data. Proceedings of the International Joint Conference on Artificial Intelligence, Beijing, China."},{"key":"ref_51","unstructured":"He, H., and Tan, J. (June, January 31). Ambient Motion Estimation in Dynamic Scenes using Wearable Visual-inertial Sensors. Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Herbst, E., Xiaofeng, R., and Dieter, F. (2013, January 6\u201310). RGB-D Flow: Dense 3-D Motion Estimation using Color and Depth. Proceedings of the IEEE International Conference on Robotics and Automation, Karlsruhe, Germany.","DOI":"10.1109\/ICRA.2013.6630885"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/16\/12\/2171\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T19:28:50Z","timestamp":1760210930000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/16\/12\/2171"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12,17]]},"references-count":52,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2016,12]]}},"alternative-id":["s16122171"],"URL":"https:\/\/doi.org\/10.3390\/s16122171","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2016,12,17]]}}}