{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T16:49:11Z","timestamp":1778604551083,"version":"3.51.4"},"reference-count":47,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,1,5]],"date-time":"2024-01-05T00:00:00Z","timestamp":1704412800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Technology development Program of MSS","award":["RS-2023-00217732"],"award-info":[{"award-number":["RS-2023-00217732"]}]},{"name":"Technology development Program of MSS","award":["IITP-2024-2020-0-01462"],"award-info":[{"award-number":["IITP-2024-2020-0-01462"]}]},{"name":"MSIT (Ministry of Science and ICT), Korea","award":["RS-2023-00217732"],"award-info":[{"award-number":["RS-2023-00217732"]}]},{"name":"MSIT (Ministry of Science and ICT), Korea","award":["IITP-2024-2020-0-01462"],"award-info":[{"award-number":["IITP-2024-2020-0-01462"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Detecting violent behavior in videos to ensure public safety and security poses a significant challenge. Precisely identifying and categorizing instances of violence in real-life closed-circuit television, which vary across specifications and locations, requires comprehensive understanding and processing of the sequential information embedded in these videos. This study aims to introduce a model that adeptly grasps the spatiotemporal context of videos within diverse settings and specifications of violent scenarios. We propose a method to accurately capture spatiotemporal features linked to violent behaviors using optical flow and RGB data. The approach leverages a Conv3D-based ResNet-3D model as the foundational network, capable of handling high-dimensional video data. The efficiency and accuracy of violence detection are enhanced by integrating an attention mechanism, which assigns greater weight to the most crucial frames within the RGB and optical-flow sequences during instances of violence. Our model was evaluated on the UBI-Fight, Hockey, Crowd, and Movie-Fights datasets; the proposed method outperformed existing state-of-the-art techniques, achieving area under the curve scores of 95.4, 98.1, 94.5, and 100.0 on the respective datasets. Moreover, this research not only has the potential to be applied in real-time surveillance systems but also promises to contribute to a broader spectrum of research in video analysis and understanding.<\/jats:p>","DOI":"10.3390\/s24020317","type":"journal-article","created":{"date-parts":[[2024,1,5]],"date-time":"2024-01-05T03:43:00Z","timestamp":1704426180000},"page":"317","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":33,"title":["Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data"],"prefix":"10.3390","volume":"24","author":[{"given":"Jae-Hyuk","family":"Park","sequence":"first","affiliation":[{"name":"Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6764-8969","authenticated-orcid":false,"given":"Mohamed","family":"Mahmoud","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of Korea"},{"name":"Information Technology Department, Faculty of Computers and Information, Assiut University, Assiut 71515, Egypt"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4333-2852","authenticated-orcid":false,"given":"Hyun-Soo","family":"Kang","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1109\/CVPR.1999.784637","article-title":"Adaptive background mixture models for real-time tracking","volume":"Volume 2","author":"Stauffer","year":"1999","journal-title":"Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1007\/s11263-005-1838-7","article-title":"On space-time interest points","volume":"64","author":"Laptev","year":"2005","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Hassner, T., Itcher, Y., and Kliper-Gross, O. (2012, January 16\u201321). Violent flows: Real-time detection of violent crowd behavior. Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6239348"},{"key":"ref_5","unstructured":"Bank, D., Koenigstein, N., and Giryes, R. (2023). Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook, Springer."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., and Davis, L.S. (2016, January 27\u201330). Learning temporal regularity in video sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.86"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Hinami, R., Mei, T., and Satoh, S. (2017, January 22\u201329). Joint detection and recounting of abnormal events by learning deep generic knowledge. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.391"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_9","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_11","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Mahmoud, M., Kasem, M., Abdallah, A., and Kang, H.S. (2022, January 26\u201328). Ae-lstm: Autoencoder with lstm-based intrusion detection in iot. Proceedings of the 2022 International Telecommunications Conference (ITC-Egypt), Alexandria, Egypt.","DOI":"10.1109\/ITC-Egypt55520.2022.9855688"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mahmoud, M., and Kang, H.S. (2023). GANMasker: A Two-Stage Generative Adversarial Network for High-Quality Face Mask Removal. Sensors, 23.","DOI":"10.3390\/s23167094"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s11263-012-0594-8","article-title":"Dense trajectories and motion boundary descriptors for action recognition","volume":"103","author":"Wang","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ranjan, A., and Black, M.J. (2017, January 21\u201326). Optical flow estimation using a spatial pyramid network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.291"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wulff, J., Sevilla-Lara, L., and Black, M.J. (2017, January 21\u201326). Optical flow in mostly rigid scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.731"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Huang, Z., Shi, X., Zhang, C., Wang, Q., Cheung, K.C., Qin, H., Dai, J., and Li, H. (2022, January 23\u201327). Flowformer: A transformer architecture for optical flow. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19790-1_40"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 21\u201326). Quo vadis, action recognition? A new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, S., Wang, T., Wang, C., Wang, Y., Shan, G., and Snoussi, H. (2019, January 21\u201322). Video object detection base on rgb and optical flow analysis. Proceedings of the 2019 2nd China Symposium on Cognitive Computing and Hybrid Intelligence (CCHI), Xi\u2019an, China.","DOI":"10.1109\/CCHI.2019.8901921"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"132306","DOI":"10.1016\/j.physd.2019.132306","article-title":"Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network","volume":"404","author":"Sherstinsky","year":"2020","journal-title":"Phys. D Nonlinear Phenom."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., and Russell, B. (2017, January 21\u201326). Actionvlad: Learning spatio-temporal aggregation for action classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.337"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Lan, Z., Newsam, S., and Hauptmann, A. (2018, January 2\u20136). Hidden two-stream convolutional networks for action recognition. Proceedings of the Computer Vision\u2013ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia. Revised Selected Papers, Part III 14.","DOI":"10.1007\/978-3-030-20893-6_23"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Sun, S., Kuang, Z., Sheng, L., Ouyang, W., and Zhang, W. (2018, January 18\u201323). Optical flow guided feature: A fast and robust motion representation for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00151"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. (2016, January 8\u201316). Temporal segment networks: Towards good practices for deep action recognition. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Wildes, R.P. (2016). Spatiotemporal residual networks for video action recognition. corr abs\/1611.02155 (2016). arXiv.","DOI":"10.1109\/CVPR.2017.787"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","article-title":"3D convolutional neural networks for human action recognition","volume":"35","author":"Ji","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_28","unstructured":"Morais, R., Le, V., Tran, T., Saha, B., Mansour, M., and Venkatesh, S. (16, January 16\u201317). Learning regularity in skeleton trajectories for anomaly detection in videos. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Xu, H., Zhang, J., Cai, J., Rezatofighi, H., and Tao, D. (2022, January 18\u201324). Gmflow: Learning optical flow via global matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00795"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Butler, D.J., Wulff, J., Stanley, G.B., and Black, M.J. (2012, January 7\u201313). A naturalistic open source movie for optical flow evaluation. Proceedings of the Computer Vision\u2013ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy. Proceedings, Part VI 12.","DOI":"10.1007\/978-3-642-33783-3_44"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Fritsch, J., Kuehnl, T., and Geiger, A. (2013, January 6\u20139). A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms. Proceedings of the International Conference on Intelligent Transportation Systems (ITSC), The Hague, The Netherlands.","DOI":"10.1109\/ITSC.2013.6728473"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/j.patrec.2021.01.031","article-title":"Iterative weak\/self-supervised classification framework for abnormal events detection","volume":"145","author":"Degardin","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_34","first-page":"225","article-title":"Fight detection in hockey videos using deep network","volume":"4","author":"Mukherjee","year":"2017","journal-title":"J. Multimed. Inf. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Khan, S.U., Haq, I.U., Rho, S., Baik, S.W., and Lee, M.Y. (2019). Cover the violence: A novel Deep-Learning-Based approach towards violence-detection in movies. Appl. Sci., 9.","DOI":"10.3390\/app9224963"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_37","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_38","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q.V. (2018). Autoaugment: Learning augmentation policies from data. arXiv.","DOI":"10.1109\/CVPR.2019.00020"},{"key":"ref_40","first-page":"26","article-title":"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude","volume":"4","author":"Tieleman","year":"2012","journal-title":"COURSERA Neural Netw. Mach. Learn."},{"key":"ref_41","unstructured":"Cortes, C., Mohri, M., and Rostamizadeh, A. (2012). L2 regularization for learning kernels. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1016\/S0031-3203(96)00142-2","article-title":"The use of the area under the ROC curve in the evaluation of machine learning algorithms","volume":"30","author":"Bradley","year":"1997","journal-title":"Pattern Recognit."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., and Sebe, N. (2017, January 17\u201320). Abnormal event detection in videos using generative adversarial nets. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296547"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1390","DOI":"10.1109\/TIFS.2018.2878538","article-title":"Generative neural networks for anomaly detection in crowded scenes","volume":"14","author":"Wang","year":"2018","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Sultani, W., Chen, C., and Shah, M. (2018, January 18\u201323). Real-world anomaly detection in surveillance videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00678"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Su, Y., Lin, G., Zhu, J., and Wu, Q. (2020, January 23\u201328). Human interaction learning on 3d skeleton point clouds for video violence recognition. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK. Proceedings, Part IV 16.","DOI":"10.1007\/978-3-030-58548-8_5"},{"key":"ref_47","unstructured":"Degardin, B.M. (2020). Weakly and Partially Supervised Learning Frameworks for Anomaly Detection. [Ph.D. Thesis, Universidade da Beira Interior (Portugal)]."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/2\/317\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:40:31Z","timestamp":1760103631000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/2\/317"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,5]]},"references-count":47,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,1]]}},"alternative-id":["s24020317"],"URL":"https:\/\/doi.org\/10.3390\/s24020317","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,5]]}}}