{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T21:25:58Z","timestamp":1761427558636,"version":"build-2065373602"},"reference-count":87,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,2,25]],"date-time":"2023-02-25T00:00:00Z","timestamp":1677283200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"VIT-AP University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Activity recognition in unmanned aerial vehicle (UAV) surveillance is addressed in various computer vision applications such as image retrieval, pose estimation, object detection, object detection in videos, object detection in still images, object detection in video frames, face recognition, and video action recognition. In the UAV-based surveillance technology, video segments captured from aerial vehicles make it challenging to recognize and distinguish human behavior. In this research, to recognize a single and multi-human activity using aerial data, a hybrid model of histogram of oriented gradient (HOG), mask-regional convolutional neural network (Mask-RCNN), and bidirectional long short-term memory (Bi-LSTM) is employed. The HOG algorithm extracts patterns, Mask-RCNN extracts feature maps from the raw aerial image data, and the Bi-LSTM network exploits the temporal relationship between the frames for the underlying action in the scene. This Bi-LSTM network reduces the error rate to the greatest extent due to its bidirectional process. This novel architecture generates enhanced segmentation by utilizing the histogram gradient-based instance segmentation and improves the accuracy of classifying human activities using the Bi-LSTM approach. Experimental outcomes demonstrate that the proposed model outperforms the other state-of-the-art models and has achieved 99.25% accuracy on the YouTube-Aerial dataset.<\/jats:p>","DOI":"10.3390\/s23052569","type":"journal-article","created":{"date-parts":[[2023,2,27]],"date-time":"2023-02-27T02:10:46Z","timestamp":1677463846000},"page":"2569","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Vision-Based HAR in UAV Videos Using Histograms and Deep Learning Techniques"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9862-7327","authenticated-orcid":false,"given":"Sireesha","family":"Gundu","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, VIT-AP University, Amaravati 522237, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8860-4196","authenticated-orcid":false,"given":"Hussain","family":"Syed","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, VIT-AP University, Amaravati 522237, India"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,25]]},"reference":[{"key":"ref_1","unstructured":"Choi, B., and Oh, D. (2018, January 23\u201326). Classification of Drone Type Using Deep Convolutional Neural Networks Based on Micro- Doppler Simulation. Proceedings of the ISAP 2018\u20142018 International Symposium on Antennas and Propagation, Busan, Republic of Korea."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Subash, K.V., Srinu, M.V., Siddhartha, M.R., Harsha, N.C., Akkala, P., V Subash, K.V., Siddhartha, M.R., Akkala, P., Venkata Srinu, M., and Sri Harsha, N. (2020, January 5\u20137). Object Detection using Ryze Tello Drone with Help of Mask-RCNN. Proceedings of the 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India.","DOI":"10.1109\/ICIMIA48430.2020.9074881"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Perera, A.G., Law, Y.W., and Chahl, J. (2019). Drone-action: An outdoor recorded drone video dataset for action recognition. Drones, 3.","DOI":"10.3390\/drones3040082"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.comcom.2020.03.012","article-title":"Drone-surveillance for search and rescue in natural disaster","volume":"156","author":"Mishra","year":"2020","journal-title":"Comput. Commun."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1016\/j.neucom.2019.11.064","article-title":"Crowd counting with crowd attention convolutional neural network","volume":"382","author":"Chen","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1109\/THMS.2020.2971958","article-title":"A multiviewpoint outdoor dataset for human action recognition","volume":"50","author":"Perera","year":"2020","journal-title":"IEEE Trans.-Hum.-Mach. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., and Sorkine-Hornung, A. (2017, January 21\u201326). Learning Video Object Segmentation from Static Images. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.372"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Yang, L., Wang, Y., Xiong, X., Yang, J., and Katsaggelos, A.K. (2018, January 18\u201323). Efficient Video Object Segmentation via Network Modulation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00680"},{"key":"ref_10","first-page":"5187","article-title":"Video instance segmentation","volume":"Volume 2019","author":"Yang","year":"2019","journal-title":"Proceedings of the IEEE International Conference on Computer Vision"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1109\/TIP.2020.3029901","article-title":"Hier R-CNN: Instance-Level Human Parts Detection and A New Benchmark","volume":"30","author":"Yang","year":"2021","journal-title":"IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc."},{"key":"ref_12","unstructured":"Triphena Delight, D., and Karunakaran, V. (2021, January 8\u201310). Deep Learning based Object Detection using Mask RCNN. Proceedings of the 2021 6th International Conference on Communication and Electronics Systems (ICCES), Coimbatre, India."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Dinh, T.T., Vinh, N.D., and Wook, J.J. (2018, January 27\u201329). Robust pedestrian detection via a recursive convolution neural network. Proceedings of the 2018 19th IEEE\/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel\/Distributed Computing (SNPD), Busan, Republic of Korea.","DOI":"10.1109\/SNPD.2018.8441055"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"30685","DOI":"10.1007\/s11042-020-09579-x","article-title":"Human detection and tracking with deep convolutional neural networks under the constrained of noise and occluded scenes","volume":"79","author":"Haq","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1109\/JBHI.2014.2312180","article-title":"Fall Detection in Homes of Older Adults Using the Microsoft Kinect","volume":"19","author":"Stone","year":"2015","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhuang, N., Yusufu, T., Ye, J., and Hua, K.A. (June, January 30). Group Activity Recognition with Differential Recurrent Convolutional Neural Networks. Proceedings of the 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), Washington, DC, USA.","DOI":"10.1109\/FG.2017.70"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.neucom.2014.01.019","article-title":"Recognizing human group action by layered model with multiple cues","volume":"136","author":"Cheng","year":"2014","journal-title":"Neurocomputing"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.neucom.2011.12.038","article-title":"Human behavior analysis in video surveillance: A Social Signal Processing perspective","volume":"100","author":"Cristani","year":"2013","journal-title":"Neurocomputing"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Yoon, J.H., Yang, M.H., Lim, J., and Yoon, K.J. (2015, January 5\u20139). Bayesian Multi-object Tracking Using Motion Context from Multiple Objects. Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2015.12"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"438","DOI":"10.1016\/j.patrec.2011.05.015","article-title":"Human action segmentation and recognition via motion and shape analysis","volume":"33","author":"Shao","year":"2012","journal-title":"Pattern Recognit. Lett."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1318","DOI":"10.1109\/TCYB.2013.2265378","article-title":"Enhanced Computer Vision with Microsoft Kinect Sensor: A Review","volume":"43","author":"Han","year":"2013","journal-title":"IEEE Trans. Cybern."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taix\u00e9, L., Cremers, D., and Van Gool, L. (2017, January 21\u201326). One-Shot Video Object Segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.565"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Cheng, J., Tsai, Y.H., Wang, S., and Yang, M.H. (2017, January 22\u201329). SegFlow: Joint Learning for Video Object Segmentation and Optical Flow. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.81"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chen, Y., Pont-Tuset, J., Montes, A., and Gool, L.V. (2018, January 18\u201323). Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00130"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Voigtlaender, P., Chai, Y., Schroff, F., Adam, H., Leibe, B., and Chen, L.C. (2019, January 15\u201320). FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00971"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Tokmakov, P., Alahari, K., and Schmid, C. (2017, January 21\u201326). Learning Motion Patterns in Videos. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.64"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Dutt Jain, S., Xiong, B., and Grauman, K. (2017, January 21\u2013). Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.228"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tokmakov, P., Alahari, K., and Schmid, C. (2017, January 22\u201329). Learning Video Object Segmentation with Visual Memory. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.480"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Zisserman, A. (2017, January 22\u201329). Detect to Track and Track to Detect. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.330"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhu, X., Xiong, Y., Dai, J., Yuan, L., and Wei, Y. (2017, January 21\u201326). Deep Feature Flow for Video Recognition. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.441"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhu, X., Wang, Y., Dai, J., Yuan, L., and Wei, Y. (2017, January 22\u201329). Flow-Guided Feature Aggregation for Video Object Detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.52"},{"key":"ref_33","unstructured":"O Pinheiro, P.O., Collobert, R., and Doll\u00e1r, P. (2015). Learning to segment object candidates. Adv. Neural Inf. Process. Syst., 28."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"886","DOI":"10.1109\/CVPR.2005.177","article-title":"Histograms of oriented gradients for human detection","volume":"Volume 1","author":"Dalal","year":"2005","journal-title":"Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905)"},{"key":"ref_36","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst., 29, Available online: https:\/\/proceedings.neurips.cc\/paper\/2016\/file\/577ef1154f3240ad5b9b413aa7346a1e-Paper.pdf."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Dai, J., He, K., and Sun, J. (2015, January 7\u201312). Convolutional feature masking for joint object and stuff segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299025"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hariharan, B., Arbel\u00e1ez, P., Girshick, R., and Malik, J. (2014, January 6\u201312). Simultaneous detection and segmentation. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10584-0_20"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Li, Y., Qi, H., Dai, J., Ji, X., and Wei, Y. (2017, January 21\u201326). Fully Convolutional Instance-Aware Semantic Segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.472"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Hariharan, B., Arbel\u00e1ez, P., Girshick, R., and Malik, J. (2015, January 7\u201312). Hypercolumns for object segmentation and fine-grained localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298642"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Pinheiro, P.O., Lin, T.Y., Collobert, R., and Doll\u00e1r, P. (2016, January 11\u201314). Learning to refine object segments. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_5"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Dai, J., He, K., Li, Y., Ren, S., and Sun, J. (2016, January 11\u201314). Instance-sensitive fully convolutional networks. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46466-4_32"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Dai, J., He, K., and Sun, J. (2016, January 27\u201330). Instance-Aware Semantic Segmentation via Multi-task Network Cascades. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.343"},{"key":"ref_45","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst., 28, Available online: https:\/\/proceedings.neurips.cc\/paper\/2015\/file\/14bfa6bb14875e45bba028a21ed38046-Paper.pdf."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation Applied to Handwritten Zip Code Recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2012","journal-title":"Commun. ACM"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"105820","DOI":"10.1016\/j.asoc.2019.105820","article-title":"Human action recognition using two-stream attention based LSTM networks","volume":"86","author":"Dai","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Janardhanan, J., and Umamaheswari, S. (2022). Vision based Human Activity Recognition using Deep Neural Network Framework. Int. J. Adv. Comput. Sci. Appl., 13.","DOI":"10.14569\/IJACSA.2022.0130621"},{"key":"ref_53","unstructured":"Graves, A., and Schmidhuber, J. (August, January 31). Framewise phoneme classification with bidirectional LSTM networks. Proceedings of the International Joint Conference on Neural Networks, Montreal, QC, Canada."},{"key":"ref_54","first-page":"67","article-title":"An improved Sobel edge detection","volume":"Volume 5","author":"Gao","year":"2010","journal-title":"Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1317","DOI":"10.1016\/j.procs.2018.05.048","article-title":"Human Detection and Tracking using HOG for Action Recognition","volume":"132","author":"Seemanthini","year":"2018","journal-title":"Procedia Comput. Sci."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"23729","DOI":"10.1007\/s11042-020-08976-6","article-title":"A review of object detection based on deep learning","volume":"79","author":"Xiao","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Gulshan, V., Varma, M., and Zisserman, A. (October, January 29). Multiple kernels for object detection. Proceedings of the 2009 IEEE 12th international conference on computer vision, Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459183"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object detection with discriminatively trained part-based models","volume":"32","author":"Felzenszwalb","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_59","unstructured":"Yu, Y., Zhang, J., Huang, Y., Zheng, S., Ren, W., Wang, C., Huang, K., and Tan, T. (2010, January 11). Object detection by context and boosted HOG-LBP. Proceedings of the ECCV workshop on PASCAL VOC, Crete, Greece."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective search for object recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_61","unstructured":"Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_64","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_66","unstructured":"Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C. (2017). Dssd: Deconvolutional single shot detector. arXiv."},{"key":"ref_67","unstructured":"Li, Z., and Zhou, F. (2017). FSSD: Feature fusion single shot multibox detector. arXiv."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., and Xue, X. (2017, January 22\u201329). Dsod: Learning deeply supervised object detectors from scratch. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.212"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Zhao, S., Yang, W., and Wang, Y. (2018, January 9\u201311). A new hand segmentation method based on fully convolutional network. Proceedings of the 30th Chinese Control and Decision Conference, CCDC 2018, Shenyang, China.","DOI":"10.1109\/CCDC.2018.8408176"},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1109\/TPAMI.2019.2938758","article-title":"Res2Net: A New Multi-Scale Backbone Architecture","volume":"43","author":"Gao","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Gidaris, S., and Komodakis, N. (2015, January 7\u201313). Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.135"},{"key":"ref_72","first-page":"1687","article-title":"Saliency guided faster-RCNN (SGFr-RCNN) model for object detection and recognition","volume":"34","author":"Sharma","year":"2022","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"105522","DOI":"10.1016\/j.compag.2020.105522","article-title":"AF-RCNN: An anchor-free convolutional neural network for multi-categories agricultural pest detection","volume":"174","author":"Jiao","year":"2020","journal-title":"Comput. Electron. Agric."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., and Li, H. (2020, January 13\u201319). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"121","DOI":"10.36548\/jiip.2020.3.001","article-title":"Posed inverse problem rectification using novel deep convolutional neural network","volume":"2","author":"Vijayakumar","year":"2020","journal-title":"J. Innov. Image Process."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Gundu, S., Syed, H., and Harikiran, J. (2022, January 12\u201314). Human Detection in Aerial Images using Deep Learning Techniques. Proceedings of the 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP), Vijayawada, India.","DOI":"10.1109\/AISP53593.2022.9760635"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Zhang, L., Lin, L., Liang, X., and He, K. (2016, January 11\u201314). Is faster R-CNN doing well for pedestrian detection?. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_28"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Liu, J., Gao, X., Bao, N., Tang, J., and Wu, G. (2017, January 14\u201319). Deep convolutional neural networks for pedestrian detection with skip pooling. Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA.","DOI":"10.1109\/IJCNN.2017.7966103"},{"key":"ref_79","first-page":"985","article-title":"Scale-aware fast R-CNN for pedestrian detection","volume":"20","author":"Li","year":"2017","journal-title":"IEEE Trans. Multimed."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"12415","DOI":"10.1109\/ACCESS.2019.2892425","article-title":"Fast pedestrian detection in surveillance video based on soft target training of shallow random forest","volume":"7","author":"Kim","year":"2019","journal-title":"IEEE Access"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Liu, S.A., Lv, S., Zhang, H., and Gong, J. (2019, January 3\u20135). Pedestrian detection algorithm based on the improved ssd. Proceedings of the 2019 Chinese Control And Decision Conference (CCDC), Nanchang, China.","DOI":"10.1109\/CCDC.2019.8832518"},{"key":"ref_82","first-page":"5761414","article-title":"Fast vehicle and pedestrian detection using improved Mask R-CNN","volume":"2020","author":"Xu","year":"2020","journal-title":"Math. Probl. Eng."},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Wang, W., Wang, L., Ge, X., Li, J., and Yin, B. (2020). Pedestrian detection based on two-stream udn. Appl. Sci., 10.","DOI":"10.20944\/preprints202001.0029.v1"},{"key":"ref_84","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1111\/coin.12292","article-title":"L1 norm based pedestrian detection using video analytics technique","volume":"36","author":"Selvaraj","year":"2020","journal-title":"Comput. Intell."},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"1808990","DOI":"10.1155\/2022\/1808990","article-title":"HIT HAR: Human Image Threshing Machine for Human Activity Recognition Using Deep Learning Models","volume":"2022","author":"Poulose","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"47051","DOI":"10.1109\/ACCESS.2022.3171263","article-title":"Ensembled transfer learning based multichannel attention networks for human activity recognition in still images","volume":"10","author":"Hirooka","year":"2022","journal-title":"IEEE Access"},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"63532","DOI":"10.1109\/ACCESS.2022.3182315","article-title":"A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences","volume":"10","author":"Aldahoul","year":"2022","journal-title":"IEEE Access"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2569\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:42:36Z","timestamp":1760121756000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2569"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,25]]},"references-count":87,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23052569"],"URL":"https:\/\/doi.org\/10.3390\/s23052569","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,2,25]]}}}