{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T18:55:54Z","timestamp":1770749754850,"version":"3.50.0"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,3,10]],"date-time":"2021-03-10T00:00:00Z","timestamp":1615334400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,3,10]],"date-time":"2021-03-10T00:00:00Z","timestamp":1615334400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003407","name":"MIUR","doi-asserted-by":"crossref","award":["2015KBL78T"],"award-info":[{"award-number":["2015KBL78T"]}],"id":[{"id":"10.13039\/501100003407","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Intel Serv Robotics"],"published-print":{"date-parts":[[2021,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In socially assistive robotics, human activity recognition plays a central role when the adaptation of the robot behavior to the human one is required. In this paper, we present an activity recognition approach for activities of daily living based on deep learning and skeleton data. In the literature, ad hoc features extraction\/selection algorithms with supervised classification methods have been deployed, reaching an excellent classification performance. Here, we propose a deep learning approach, combining CNN and LSTM, that exploits both the learning of spatial dependencies correlating the limbs in a skeleton 3D grid representation and the learning of temporal dependencies from instances with a periodic pattern that works on raw data and so without requiring an explicit feature extraction process. These models are proposed for real-time activity recognition, and they are tested on the CAD-60 dataset. Results show that the proposed model behaves better than an LSTM model thanks to the automatic features extraction of the limbs\u2019 correlation. \u201cNew Person\u201d results show that the CNN-LSTM model achieves<jats:inline-formula><jats:alternatives><jats:tex-math>$$95.4\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>95.4<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>of precision and<jats:inline-formula><jats:alternatives><jats:tex-math>$$94.4\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>94.4<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>of recall, while the \u201cHave Seen\u201d results are<jats:inline-formula><jats:alternatives><jats:tex-math>$$96.1\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>96.1<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>of precision and<jats:inline-formula><jats:alternatives><jats:tex-math>$$94.7\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>94.7<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>of recall.<\/jats:p>","DOI":"10.1007\/s11370-021-00358-7","type":"journal-article","created":{"date-parts":[[2021,3,10]],"date-time":"2021-03-10T21:03:23Z","timestamp":1615410203000},"page":"175-185","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Combining CNN and LSTM for activity of daily living recognition with a 3D matrix skeleton representation"],"prefix":"10.1007","volume":"14","author":[{"given":"Giovanni","family":"Ercolano","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3379-1756","authenticated-orcid":false,"given":"Silvia","family":"Rossi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,3,10]]},"reference":[{"key":"358_CR1","doi-asserted-by":"crossref","unstructured":"Baccouche M, Mamalet F, Wolf C (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding, pp 29\u201339","DOI":"10.1007\/978-3-642-25446-8_4"},{"key":"358_CR2","doi-asserted-by":"crossref","unstructured":"Busetta P, Kuflik T, Merzi M, Rossi S (2004) Service delivery in smart environments by implicit organizations. In: The first annual international conference on mobile and ubiquitous systems: networking and services, MOBIQUITOUS, pp 356\u2013363","DOI":"10.1109\/MOBIQ.2004.1331742"},{"key":"358_CR3","doi-asserted-by":"crossref","unstructured":"Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: Pose motion representation for action recognition. In: CVPR 2018","DOI":"10.1109\/CVPR.2018.00734"},{"key":"358_CR4","doi-asserted-by":"crossref","unstructured":"Cippitelli E, Gasparrini S, Gambi E, Spinsante S (2016) A human activity recognition system using skeleton data from RGBD sensors. Comput Intell Neurosci 2016:4351435","DOI":"10.1155\/2016\/4351435"},{"key":"358_CR5","doi-asserted-by":"crossref","unstructured":"Di Napoli C, Rossi S (2019) A layered architecture for socially assistive robotics as a service. In: 2019 IEEE international conference on systems, man and cybernetics (SMC), pp 352\u2013357","DOI":"10.1109\/SMC.2019.8914532"},{"key":"358_CR6","doi-asserted-by":"crossref","unstructured":"Donahue J, Anne\u00a0Hendricks L, Guadarrama S (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition, pp 2625\u20132634","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"358_CR7","doi-asserted-by":"crossref","unstructured":"Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 3rd IAPR Asian conference on pattern recognition (ACPR), pp 579\u2013583","DOI":"10.1109\/ACPR.2015.7486569"},{"key":"358_CR8","unstructured":"Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1110\u20131118"},{"key":"358_CR9","doi-asserted-by":"crossref","unstructured":"Ercolano G, Riccio D, Rossi S (2017) Two deep approaches for ADL recognition: a multi-scale LSTM and a CNN-LSTM with a 3d matrix skeleton representation. In: 2017 26th IEEE international symposium on robot and human interactive communication (RO-MAN). IEEE, pp 877\u2013882","DOI":"10.1109\/ROMAN.2017.8172406"},{"key":"358_CR10","doi-asserted-by":"crossref","unstructured":"Faria DR, Premebida C, Nunes U (2014) A probabilistic approach for human everyday activities recognition using body motion from rgb-d images. In: The 23rd IEEE intern. symp. on robot and human interactive communication, RO-MAN. IEEE, pp 732\u2013737","DOI":"10.1109\/ROMAN.2014.6926340"},{"issue":"8","key":"358_CR11","doi-asserted-by":"publisher","first-page":"114","DOI":"10.5772\/59230","volume":"12","author":"M Hersh","year":"2015","unstructured":"Hersh M (2015) Overcoming barriers and increasing independence service robots for elderly and disabled people. Int J Adv Robot Syst 12(8):114. https:\/\/doi.org\/10.5772\/59230","journal-title":"Int J Adv Robot Syst"},{"issue":"1","key":"358_CR12","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","volume":"35","author":"S Ji","year":"2013","unstructured":"Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221\u2013231","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"358_CR13","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/j.patrec.2018.04.035","volume":"115","author":"P Khaire","year":"2018","unstructured":"Khaire P, Kumar P, Imran J (2018) Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognit Lett 115:107\u2013116","journal-title":"Pattern Recognit Lett"},{"key":"358_CR14","unstructured":"Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907"},{"key":"358_CR15","unstructured":"Li C, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using LSTM and CNN. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 585\u2013590"},{"key":"358_CR16","doi-asserted-by":"crossref","unstructured":"Li Y, Lan C, Xing J, Zeng W, Yuan C, Liu J (2016) Online human action detection using joint classification-regression recurrent neural networks. In: 14th European conference on computer vision \u2013 ECCV, Part VII. Springer, pp 203\u2013220","DOI":"10.1007\/978-3-319-46478-7_13"},{"issue":"2","key":"358_CR17","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1007\/s12369-018-0498-z","volume":"11","author":"T Liu","year":"2019","unstructured":"Liu T, Wang J, Hutchinson S, Meng MQH (2019) Skeleton-based human action recognition by pose specificity and weighted voting. Int J Soc Robot 11(2):219\u2013234","journal-title":"Int J Soc Robot"},{"key":"358_CR18","doi-asserted-by":"crossref","unstructured":"Luvizon DC, Picard D, Tabia H (2018) 2d\/3d pose estimation and action recognition using multitask deep learning. arXiv preprint arXiv:1802.09232","DOI":"10.1109\/CVPR.2018.00539"},{"key":"358_CR19","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1016\/j.patcog.2017.10.033","volume":"76","author":"JC Nunez","year":"2018","unstructured":"Nunez JC, Cabido R, Pantrigo JJ, Montemayor AS, Velez JF (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit 76:80\u201394","journal-title":"Pattern Recognit"},{"key":"358_CR20","doi-asserted-by":"crossref","unstructured":"Ord\u00f3\u00f1ez FJ, Roggen D (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1):115","DOI":"10.3390\/s16010115"},{"key":"358_CR21","doi-asserted-by":"publisher","first-page":"3","DOI":"10.3389\/fnbot.2015.00003","volume":"9","author":"GI Parisi","year":"2015","unstructured":"Parisi GI, Weber C, Wermter S (2015) Self-organizing neural integration of pose-motion features for human action recognition. Front Neurorobotics 9:3","journal-title":"Front Neurorobotics"},{"key":"358_CR22","doi-asserted-by":"publisher","first-page":"1265","DOI":"10.1007\/s12369-020-00650-z","volume":"12","author":"S Rossi","year":"2020","unstructured":"Rossi S, Rossi A, Dautenhahn K (2020) The secret life of robots: perspectives and challenges for robot\u2019s behaviours during non-interactive tasks. Int J Soc Robot 12:1265\u20131278","journal-title":"Int J Soc Robot"},{"key":"358_CR23","doi-asserted-by":"crossref","unstructured":"Rossi S, Staffa M, Bove L, Capasso R, Ercolano G (2017) User\u2019s personality and activity influence on hri comfortable distances. Social Robotics: 9th international conference, ICSR 2017, Tsukuba, Japan, November 22\u201324, 2017, proceedings. Springer International Publishing, Cham, pp 167\u2013177","DOI":"10.1007\/978-3-319-70022-9_17"},{"key":"358_CR24","doi-asserted-by":"crossref","unstructured":"Sasabuchi K, Ikeuchi K, Inaba M (2018) Agreeing to interact: understanding interaction as human-robot goal conflicts. Companion of the 2018 ACM\/IEEE international conference on human-robot interaction, HRI \u201918. Association for computing machinery, New York, NY, USA, pp 21\u201328","DOI":"10.1145\/3173386.3173390"},{"key":"358_CR25","doi-asserted-by":"crossref","unstructured":"Shan J, Akella S (2014) 3d human action segmentation and recognition using pose kinetic energy. In: IEEE international workshop on advanced robotics and its social impacts. IEEE, pp 69\u201375","DOI":"10.1109\/ARSO.2014.7020983"},{"key":"358_CR26","unstructured":"Staffa M, De Gregorio M, Giordano M, Rossi S (2014) Can you follow that guy? In: 22th European symposium on artificial neural networks, ESANN 2014, Bruges, Belgium, April 23-25, 2014, pp 511\u2013516"},{"key":"358_CR27","doi-asserted-by":"crossref","unstructured":"Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from rgbd images. In: 2012 IEEE international conference on robotics and automation, pp 842\u2013849","DOI":"10.1109\/ICRA.2012.6224591"},{"key":"358_CR28","unstructured":"Sung J, Ponce C, Selman Bea. CAD-60 and CAD-120. http:\/\/pr.cs.cornell.edu\/humanactivities\/data.php"},{"key":"358_CR29","doi-asserted-by":"publisher","first-page":"1155","DOI":"10.1109\/ACCESS.2017.2778011","volume":"6","author":"A Ullah","year":"2017","unstructured":"Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi-directional lSTM with CNN features. IEEE Access 6:1155\u20131166","journal-title":"IEEE Access"},{"key":"358_CR30","doi-asserted-by":"crossref","unstructured":"Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1801.07455","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"358_CR31","doi-asserted-by":"crossref","unstructured":"Zhang S, Yang Y, Xiao J, Liu X, Yang Y, Xie D, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans Multimedia","DOI":"10.1109\/WACV.2017.24"},{"key":"358_CR32","doi-asserted-by":"crossref","unstructured":"Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the AAAI conference on artificial intelligence, pp 3697\u20133703","DOI":"10.1609\/aaai.v30i1.10451"},{"issue":"8","key":"358_CR33","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1016\/j.imavis.2014.04.005","volume":"32","author":"Y Zhu","year":"2014","unstructured":"Zhu Y, Chen W, Guo G (2014) Evaluating spatiotemporal interest point features for depth-based action recognition. Image Vision Comput 32(8):453\u2013464","journal-title":"Image Vision Comput"}],"container-title":["Intelligent Service Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11370-021-00358-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11370-021-00358-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11370-021-00358-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,21]],"date-time":"2022-12-21T01:30:51Z","timestamp":1671586251000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11370-021-00358-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,10]]},"references-count":33,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,4]]}},"alternative-id":["358"],"URL":"https:\/\/doi.org\/10.1007\/s11370-021-00358-7","relation":{},"ISSN":["1861-2776","1861-2784"],"issn-type":[{"value":"1861-2776","type":"print"},{"value":"1861-2784","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,10]]},"assertion":[{"value":"3 April 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 March 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}