{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:53:21Z","timestamp":1760151201020,"version":"build-2065373602"},"reference-count":37,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,2,10]],"date-time":"2022-02-10T00:00:00Z","timestamp":1644451200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Applied Sciences"],"abstract":"<jats:p>In recent years, with the growth of digital media and modern imaging equipment, the use of video processing algorithms and semantic film and image management has expanded. The usage of different video datasets in training artificial intelligence algorithms is also rapidly expanding in various fields. Due to the high volume of information in a video, its processing is still expensive for most hardware systems, mainly in terms of its required runtime and memory. Hence, the optimal selection of keyframes to minimize redundant information in video processing systems has become noteworthy in facilitating this problem. Eliminating some frames can simultaneously reduce the required computational load, hardware cost, memory and processing time of intelligent video-based systems. Based on the aforementioned reasons, this research proposes a method for selecting keyframes and adaptive cropping input video for human action recognition (HAR) systems. The proposed method combines edge detection, simple difference, adaptive thresholding and 1D and 2D average filter algorithms in a hierarchical method. Some HAR methods are trained with videos processed by the proposed method to assess its efficiency. The results demonstrate that the application of the proposed method increases the accuracy of the HAR system by up to 3% compared to random image selection and cropping methods. Additionally, for most cases, the proposed method reduces the training time of the used machine learning algorithm.<\/jats:p>","DOI":"10.3390\/app12041830","type":"journal-article","created":{"date-parts":[[2022,2,11]],"date-time":"2022-02-11T02:37:46Z","timestamp":1644547066000},"page":"1830","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Best Frame Selection to Enhance Training Step Efficiency in Video-Based Human Action Recognition"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0863-1977","authenticated-orcid":false,"given":"Abdorreza Alavi","family":"Gharahbagh","sequence":"first","affiliation":[{"name":"Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s\/n, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0842-8250","authenticated-orcid":false,"given":"Vahid","family":"Hajihashemi","sequence":"additional","affiliation":[{"name":"Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s\/n, 4200-465 Porto, Portugal"}]},{"given":"Marta Campos","family":"Ferreira","sequence":"additional","affiliation":[{"name":"Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s\/n, 4200-465 Porto, Portugal"}]},{"given":"Jos\u00e9 J. M.","family":"Machado","sequence":"additional","affiliation":[{"name":"Departamento de Engenharia Mec\u00e2nica, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s\/n, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7603-6526","authenticated-orcid":false,"given":"Jo\u00e3o Manuel R. S.","family":"Tavares","sequence":"additional","affiliation":[{"name":"Departamento de Engenharia Mec\u00e2nica, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s\/n, 4200-465 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Yang, Y., Cai, Z., Yu, Y., Wu, T., and Lin, L. (2019, January 17\u201320). Human action recognition based on skeleton and convolutional neural network. Proceedings of the 2019 Photonics & Electromagnetics Research Symposium-Fall (PIERS-Fall), IEEE, Xiamen, China.","DOI":"10.1109\/PIERS-Fall48861.2019.9021648"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2742","DOI":"10.1109\/TIP.2019.2952088","article-title":"A Context knowledge map guided coarse-to-fine action recognition","volume":"29","author":"Ji","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Sim, J., Kasahara, J.Y.L., Chikushi, S., Nagatani, K., Chiba, T., Chayama, K., Yamashita, A., and Asama, H. (2021, January 11\u201314). Effects of Video Filters for Learning an Action Recognition Model for Construction Machinery from Simulated Training Data. Proceedings of the 2021 IEEE\/SICE International Symposium on System Integration (SII), Iwaki, Japan.","DOI":"10.1109\/IEEECONF49454.2021.9382735"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1728","DOI":"10.1109\/TITS.2018.2829987","article-title":"Crowd counting with limited labeling through submodular frame selection","volume":"20","author":"Zhou","year":"2018","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ren, J., Shen, X., Lin, Z., and Mech, R. (2020, January 2\u20135). Best frame selection in a short video. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093615"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1109\/TCSVT.2019.2896029","article-title":"Temporal\u2013spatial mapping for action recognition","volume":"30","author":"Song","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_7","first-page":"3152","article-title":"Human activity recognition in videos based on a Two Levels K-means and Hierarchical Codebooks","volume":"6","author":"Hajihashemi","year":"2016","journal-title":"Int. J. Mechatron. Electr. Comput. Technol."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Deshpnande, A., and Warhade, K.K. (2021, January 5\u20137). An Improved Model for Human Activity Recognition by Integrated feature Approach and Optimized SVM. Proceedings of the 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), IEEE, Pune, India.","DOI":"10.1109\/ESCI50559.2021.9396914"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhou, Z., Lui, K.S., Tam, V.W., and Lam, E.Y. (2021, January 10\u201315). Applying (3+ 2+ 1) D Residual Neural Network with Frame Selection for Hong Kong Sign Language Recognition. Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, Milan, Italy.","DOI":"10.1109\/ICPR48806.2021.9412075"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1109\/TUFFC.2020.2994028","article-title":"Fast Strain Estimation and Frame Selection in Ultrasound Elastography using Machine Learning","volume":"68","author":"Zayed","year":"2020","journal-title":"IEEE Trans. Ultrason. Ferroelectr. Freq. Control"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"6940","DOI":"10.1109\/LRA.2020.3026964","article-title":"KeySLAM: Robust RGB-D Camera Tracking Using Adaptive VO and Optimal Key-Frame Selection","volume":"5","author":"Han","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"70742","DOI":"10.1109\/ACCESS.2019.2916901","article-title":"An automatic key-frame selection method for monocular visual odometry of ground vehicle","volume":"7","author":"Lin","year":"2019","journal-title":"IEEE Access"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1016\/j.asoc.2018.10.043","article-title":"A Novel fuzzy frame selection based watermarking scheme for MPEG-4 videos using Bi-directional extreme learning machine","volume":"74","author":"Rajpal","year":"2019","journal-title":"Appl. Soft Comput."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"78991","DOI":"10.1109\/ACCESS.2019.2922679","article-title":"Pose-guided spatial alignment and key frame selection for one-shot video-based person re-identification","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xu, Y., Bai, F., Shi, Y., Chen, Q., Gao, L., Tian, K., Zhou, S., and Sun, H. (2021, January 2\u20139). GIF Thumbnails: Attract More Clicks to Your Videos. Proceedings of the AAAI Conference on Artificial Intelligence, USA, Virtual Conference.","DOI":"10.1609\/aaai.v35i4.16416"},{"key":"ref_16","unstructured":"Wu, Z., Li, H., Xiong, C., Jiang, Y.G., and Davis, L.S. (2020). A dynamic frame selection framework for fast video recognition. IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Pretorious, K., and Pillay, N. (2020, January 19\u201324). A Comparative Study of Classifiers for Thumbnail Selection. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, Glasgow, UK.","DOI":"10.1109\/IJCNN48605.2020.9206951"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhao, K., Lu, Y., Zhang, Z., and Wang, W. (2020, January 12\u201314). Adaptive visual tracking based on key frame selection and reinforcement learning. Proceedings of the 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), Qingdao, China.","DOI":"10.1109\/IWECAI50956.2020.00039"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Yan, X., Gilani, S.Z., Feng, M., Zhang, L., Qin, H., and Mian, A. (2020). Self-supervised learning to detect key frames in videos. Sensors, 20.","DOI":"10.3390\/s20236941"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wu, Z., Xiong, C., Ma, C.Y., Socher, R., and Davis, L.S. (2019, January 16\u201320). Adaframe: Adaptive frame selection for fast video recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00137"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Fasogbon, P., Heikkil\u00e4, L., and Aksu, E. (2019, January 14\u201317). Frame selection to accelerate Depth from Small Motion on smartphones. Proceedings of the IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society, Lisbon, Portugal.","DOI":"10.1109\/IECON.2019.8927485"},{"key":"ref_22","unstructured":"Kang, H., Zhang, J., Li, H., Lin, Z., Rhodes, T., and Benes, B. (2019). LeRoP: A learning-based modular robot photography framework. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.dsp.2017.09.011","article-title":"Preserving quality in minimum frame selection within multi-frame super-resolution","volume":"72","author":"Rahimi","year":"2018","journal-title":"Digit. Signal Process."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1016\/j.jvcir.2018.06.024","article-title":"Cut set-based dynamic key frame selection and adaptive layer-based background modeling for background subtraction","volume":"55","author":"Jeyabharathi","year":"2018","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.neucom.2018.08.037","article-title":"Action unit detection and key frame selection for human activity prediction","volume":"318","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"8326","DOI":"10.1109\/TIP.2020.3013162","article-title":"Matnet: Motion-attentive transition network for zero-shot video object segmentation","volume":"29","author":"Zhou","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"20200334","DOI":"10.1098\/rspa.2020.0334","article-title":"Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks","volume":"476","author":"Jagtap","year":"2020","journal-title":"Proc. R. Soc. A"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.neucom.2021.10.036","article-title":"Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions","volume":"468","author":"Jagtap","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"109136","DOI":"10.1016\/j.jcp.2019.109136","article-title":"Adaptive activation functions accelerate convergence in deep and physics-informed neural networks","volume":"404","author":"Jagtap","year":"2020","journal-title":"J. Comput. Phys."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 21\u201326). Quo vadis, action recognition? A new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zheng, Z., An, G., and Ruan, Q. (2020, January 6\u20139). Motion Guided Feature-Augmented Network for Action Recognition. Proceedings of the 2020 15th IEEE International Conference on Signal Processing (ICSP), Beijing, China.","DOI":"10.1109\/ICSP48669.2020.9321026"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"57267","DOI":"10.1109\/ACCESS.2019.2910604","article-title":"A spatiotemporal heterogeneous two-stream network for action recognition","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"115731","DOI":"10.1016\/j.image.2019.115731","article-title":"Correlation net: Spatiotemporal multimodal deep learning for action recognition","volume":"82","author":"Yudistira","year":"2020","journal-title":"Signal Process. Image Commun."},{"key":"ref_34","unstructured":"Soomro, K., Zamir, A.R., and Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre, T. (2011, January 6\u201313). HMDB: A large video database for human motion recognition. Proceedings of the 2011 International Conference on Computer Vision, IEEE, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhou, T., Wang, W., Liu, S., Yang, Y., and Van Gool, L. (2021, January 19\u201325). Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00167"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhou, T., Qi, S., Wang, W., Shen, J., and Zhu, S.C. (IEEE Trans. Pattern Anal. Mach. Intell., 2021). Cascaded parsing of human-object interaction recognition, IEEE Trans. Pattern Anal. Mach. Intell., Early Access.","DOI":"10.1109\/TPAMI.2021.3049156"}],"container-title":["Applied Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2076-3417\/12\/4\/1830\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:17:45Z","timestamp":1760134665000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2076-3417\/12\/4\/1830"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,10]]},"references-count":37,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["app12041830"],"URL":"https:\/\/doi.org\/10.3390\/app12041830","relation":{},"ISSN":["2076-3417"],"issn-type":[{"type":"electronic","value":"2076-3417"}],"subject":[],"published":{"date-parts":[[2022,2,10]]}}}