{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T09:21:59Z","timestamp":1780392119561,"version":"3.54.1"},"reference-count":34,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2021,3,18]],"date-time":"2021-03-18T00:00:00Z","timestamp":1616025600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Human activity recognition (HAR) remains a challenging yet crucial problem to address in computer vision. HAR is primarily intended to be used with other technologies, such as the Internet of Things, to assist in healthcare and eldercare. With the development of deep learning, automatic high-level feature extraction has become a possibility and has been used to optimize HAR performance. Furthermore, deep-learning techniques have been applied in various fields for sensor-based HAR. This study introduces a new methodology using convolution neural networks (CNN) with varying kernel dimensions along with bi-directional long short-term memory (BiLSTM) to capture features at various resolutions. The novelty of this research lies in the effective selection of the optimal video representation and in the effective extraction of spatial and temporal features from sensor data using traditional CNN and BiLSTM. Wireless sensor data mining (WISDM) and UCI datasets are used for this proposed methodology in which data are collected through diverse methods, including accelerometers, sensors, and gyroscopes. The results indicate that the proposed scheme is efficient in improving HAR. It was thus found that unlike other available methods, the proposed method improved accuracy, attaining a higher score in the WISDM dataset compared to the UCI dataset (98.53% vs. 97.05%).<\/jats:p>","DOI":"10.3390\/s21062141","type":"journal-article","created":{"date-parts":[[2021,3,18]],"date-time":"2021-03-18T22:19:36Z","timestamp":1616105976000},"page":"2141","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":148,"title":["Sensor-Based Human Activity Recognition with Spatio-Temporal Deep Learning"],"prefix":"10.3390","volume":"21","author":[{"given":"Ohoud","family":"Nafea","sequence":"first","affiliation":[{"name":"Department of Computer Science, College of Computer Science and Engineering, Taibah University, Medina 42353, Saudi Arabia"},{"name":"Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6871-6633","authenticated-orcid":false,"given":"Wadood","family":"Abdul","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia"},{"name":"Center of Smart Robotics Research, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9781-3969","authenticated-orcid":false,"given":"Ghulam","family":"Muhammad","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia"},{"name":"Center of Smart Robotics Research, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mansour","family":"Alsulaiman","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia"},{"name":"Center of Smart Robotics Research, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"56855","DOI":"10.1109\/ACCESS.2020.2982225","article-title":"LSTM-CNN architecture for human activity recognition","volume":"8","author":"Xia","year":"2020","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Tang, Y., Teng, Q., Zhang, L., Min, F., and He, J. (2020). Efficient convolutional neural networks with smaller filters for human activity recognition using wearable sensors. arXiv.","DOI":"10.1109\/JSEN.2020.3015521"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1016\/j.inffus.2019.06.014","article-title":"Imaging and fusing time series for wearable sensor-based human activity recognition","volume":"53","author":"Qin","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.patrec.2018.02.010","article-title":"Deep learning for sensor-based activity recognition: A survey","volume":"119","author":"Wang","year":"2019","journal-title":"Pattern Recognit. Lett."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Abbaspour, S., Fotouhi, F., Sedaghatbaf, A., Fotouhi, H., Vahabi, M., and Linden, M. (2020). A comparative analysis of hybrid deep learning models for human activity recognition. Sensors, 20.","DOI":"10.3390\/s20195707"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/j.eswa.2019.04.057","article-title":"A survey on wearable sensor modality centred human activity recognition in health care","volume":"137","author":"Wang","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","article-title":"A fast learning algorithm for deep belief nets","volume":"18","author":"Hinton","year":"2006","journal-title":"Neural Comput."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Fang, H., and Hu, C. (2014, January 28\u201330). Recognizing human activity in smart home using deep learning algorithm. Proceedings of the 33rd Chinese Control Conference, Nanjing, China.","DOI":"10.1109\/ChiCC.2014.6895735"},{"key":"ref_9","first-page":"577","article-title":"Comparative study of machine learning and deep learning architecture for human activity recognition using accelerometer data","volume":"8","author":"Shakya","year":"2018","journal-title":"Int. J. Mach. Learn. Comput"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"107140","DOI":"10.1016\/j.patcog.2019.107140","article-title":"Human activity recognition from UAV-captured video sequences","volume":"100","author":"Mliki","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_11","unstructured":"Alsheikh, M.A., Selim, A., Niyato, D., Doyle, L., Lin, S., and Tan, H.P. (2015). Deep activity recognition models with triaxial accelerometers. arXiv."},{"key":"ref_12","unstructured":"Brownlee, J. (2018). Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python, Machine Learning Mastery."},{"key":"ref_13","unstructured":"Wang, L., Xiong, Y., Wang, Z., and Qiao, Y. (2015). Towards good practices for very deep two-stream convnets. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., and Gould, S. (2016, January 27\u201330). Dynamic image networks for action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.331"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1016\/j.eswa.2016.04.032","article-title":"Human activity recognition with smartphone sensors using deep learning neural networks","volume":"59","author":"Ronao","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Zisserman, A. (2016, January 27\u201330). Convolutional two-stream network fusion for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.213"},{"key":"ref_17","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. Temporal segment networks: Towards good practices for deep action recognition. Proceedings of the European Conference on Computer Vision."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1715","DOI":"10.1109\/JSEN.2020.3015781","article-title":"A Novel Multi-Stage Training Approach for Human Activity Recognition From Multimodal Wearable Sensor Data Using Deep Neural Network","volume":"21","author":"Mahmud","year":"2020","journal-title":"IEEE Sens. J."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, L., Qiao, Y., and Tang, X. (2015, January 7\u201312). Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299059"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"915","DOI":"10.1016\/j.asoc.2017.09.027","article-title":"Real-time human activity recognition from accelerometer data using Convolutional Neural Networks","volume":"62","author":"Ignatov","year":"2018","journal-title":"Appl. Soft Comput."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"103177","DOI":"10.1016\/j.autcon.2020.103177","article-title":"Human activity classification based on sound recognition and residual convolutional neural network","volume":"114","author":"Jung","year":"2020","journal-title":"Autom. Constr."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 21\u201327). Quo vadis, action recognition? A new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"17913","DOI":"10.1109\/ACCESS.2018.2817253","article-title":"Human action recognition by learning spatio-temporal features with deep neural networks","volume":"6","author":"Wang","year":"2018","journal-title":"IEEE Access"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Nan, Y., Lovell, N.H., Redmond, S.J., Wang, K., Delbaere, K., and van Schooten, K.S. (2020). Deep Learning for Activity Recognition in Older People Using a Pocket-Worn Smartphone. Sensors, 20.","DOI":"10.3390\/s20247195"},{"key":"ref_26","unstructured":"Van, J. (2014, January 23). Analysis of Deep Convolutional Neural Network Architectures. Proceedings of the 21st Twente Student Conference on IT, Enschede, The Netherlands."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"100944","DOI":"10.1016\/j.aei.2019.100944","article-title":"Times-series data augmentation and deep learning for construction equipment activity recognition","volume":"42","author":"Rashid","year":"2019","journal-title":"Adv. Eng. Inform."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"102185","DOI":"10.1016\/j.ipm.2019.102185","article-title":"Exploring temporal representations by leveraging attention-based bidirectional lstm-rnns for multi-modal emotion recognition","volume":"57","author":"Li","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_29","unstructured":"Luo, W., Li, Y., Urtasun, R., and Zemel, R. (2017). Understanding the effective receptive field in deep convolutional neural networks. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Grais, E., Wierstorf, H., Ward, D., and Plumbley, M. (2017). Multi-resolution fully convolutional neural networks for monaural audio source separation. arXiv.","DOI":"10.1007\/978-3-319-93764-9_32"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1145\/1964897.1964918","article-title":"Activity recognition using cell phone accelerometers","volume":"12","author":"Kwapisz","year":"2011","journal-title":"ACM SigKDD Explor. Newsl."},{"key":"ref_32","unstructured":"Anguita, D., Ghio, A., Oneto, L., Parra, X., and Reyes-Ortiz, J.L. (2013, January 24\u201326). A Public Domain Dataset for Human Activity Recognition Using Smartphones. Proceedings of the European Symposium on Artificial Neural Networks, Bruges, Belgium."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Rothermel, K., Fritsch, D., Blochinger, W., and D\u00fcrr, F. (2009). Quality of Context, Proceedings of the First International Workshop, QuaCon 2009, Stuttgart, Germany, 25\u201326 June 2009, Springer Science & Business Media. Revised Papers.","DOI":"10.1007\/978-3-642-04559-2"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wiseman, Y. (2021). Autonomous vehicles. Encyclopedia of Information Science and Technology, IGI Global. [5th ed.].","DOI":"10.4018\/978-1-7998-3479-3.ch001"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/6\/2141\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:37:52Z","timestamp":1760161072000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/6\/2141"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,18]]},"references-count":34,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["s21062141"],"URL":"https:\/\/doi.org\/10.3390\/s21062141","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,18]]}}}