{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T04:34:47Z","timestamp":1772426087643,"version":"3.50.1"},"reference-count":31,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T00:00:00Z","timestamp":1700092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute of Information &amp; Communications Technology Planning &amp; Evaluation (IITP) grant funded by the Korean government (MSIT)","award":["2021-0-00441"],"award-info":[{"award-number":["2021-0-00441"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Recently, security monitoring facilities have mainly adopted artificial intelligence (AI) technology to provide both increased security and improved performance. However, there are technical challenges in the pursuit of elevating system performance, automation, and security efficiency. In this paper, we proposed intelligent anomaly detection and classification based on deep learning (DL) using multi-modal fusion. To verify the method, we combined two DL-based schemes, such as (i) the 3D Convolutional AutoEncoder (3D-AE) for anomaly detection and (ii) the SlowFast neural network for anomaly classification. The 3D-AE can detect occurrence points of abnormal events and generate regions of interest (ROI) by the points. The SlowFast model can classify abnormal events using the ROI. These multi-modal approaches can complement weaknesses and leverage strengths in the existing security system. To enhance anomaly learning effectiveness, we also attempted to create a new dataset using the virtual environment in Grand Theft Auto 5 (GTA5). The dataset consists of 400 abnormal-state data and 78 normal-state data with clip sizes in the 8\u201320 s range. Virtual data collection can also supplement the original dataset, as replicating abnormal states in the real world is challenging. Consequently, the proposed method can achieve a classification accuracy of 85%, which is higher compared to the 77.5% accuracy achieved when only employing the single classification model. Furthermore, we validated the trained model with the GTA dataset by using a real-world assault class dataset, consisting of 1300 instances that we reproduced. As a result, 1100 data as the assault were classified and achieved 83.5% accuracy. This also shows that the proposed method can provide high performance in real-world environments.<\/jats:p>","DOI":"10.3390\/s23229214","type":"journal-article","created":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T08:19:43Z","timestamp":1700122783000},"page":"9214","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Intelligent Complementary Multi-Modal Fusion for Anomaly Surveillance and Security System"],"prefix":"10.3390","volume":"23","author":[{"given":"Jae-hyeok","family":"Jeong","sequence":"first","affiliation":[{"name":"Department of Electronic Information System Engineering, Sangmyung University, Cheonan 31066, Republic of Korea"}]},{"given":"Hwan-hee","family":"Jung","sequence":"additional","affiliation":[{"name":"Department of Human Intelligence and Robot Engineering, Sangmyung University, Cheonan 31066, Republic of Korea"}]},{"given":"Yong-hoon","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Human Intelligence and Robot Engineering, Sangmyung University, Cheonan 31066, Republic of Korea"}]},{"given":"Seong-hee","family":"Park","sequence":"additional","affiliation":[{"name":"Intelligent Convergence Research Laboratory, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4519-1683","authenticated-orcid":false,"given":"Min-suk","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Human Intelligence and Robot Engineering, Sangmyung University, Cheonan 31066, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2023,11,16]]},"reference":[{"key":"ref_1","unstructured":"Hidayat, F. (2020, January 19\u201320). Intelligent video analytic for suspicious object detection: A systematic review. Proceedings of the International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia."},{"key":"ref_2","unstructured":"Suk, H., and Kim, M. (2022, January 24\u201326). Deep learning based scheme for developing secure systems in CCTV using anomaly detection. Proceedings of the International Conference WISA, Jeju Island, Republic of Korea."},{"key":"ref_3","unstructured":"Feichtenhofer, C., Fan, H., Malik, J., and He, K. (November, January 27). Slowfast networks for video recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_4","unstructured":"Jeong, J., and Kim, M. (2022, January 24\u201326). Study of technology for anomaly detection in secure edge system via video surveillance. Proceedings of the International Conference WISA, Jeju Island, Republic of Korea."},{"key":"ref_5","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2973","DOI":"10.1049\/ipr2.12532","article-title":"Dynamic thresholding for video anomaly detection","volume":"16","author":"Jia","year":"2022","journal-title":"IET Image Process"},{"key":"ref_7","unstructured":"Shukla, V., Singh, K.G., and Shah, P. (2013, January 20\u201324). Automatic alert of security threat through video surveillance system. Proceedings of the Institute of Nuclear Material and Management Annual Meeting, Atlanta, GA, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Prakash, M.U., and Thamaraiselvi, G.V. (2014, January 8). Detecting and tracking of multiple moving objects for intelligent video surveillance systems. Proceedings of the International Conference on Current Trends In Engineering and Technology (ICCTET), Coimbatore, India.","DOI":"10.1109\/ICCTET.2014.6966297"},{"key":"ref_9","unstructured":"Wang, H., Zhang, X., Yang, S., and Zhang, W. (2021). Video anomaly detection by the duality of normality-granted optical flow. arXiv."},{"key":"ref_10","unstructured":"Gong, D., Liu, L., Saha, B., Le, V., and Mansour, R.M. (November, January 27). Memorizing normality to detect anomaly: Memory-augmented deep auto-encoder for unsupervised anomaly detection. Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"50312","DOI":"10.1109\/ACCESS.2020.2979869","article-title":"Unsupervised anomaly detection and localization based on deep spatiotemporal translation network","volume":"8","author":"Ganokratanaa","year":"2020","journal-title":"IEEE Access"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ionescu, T.R., Khan, S.F., Georgescu, I.M., and Shao, L. (2019, January 15\u201320). Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00803"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Markovitz, A., Sharir, G., Friedman, I., Zelnik-Manor, L., and Avidan, S. (2020, January 14\u201319). Graph embedded pose clustering for anomaly detection. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01055"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hu, J., Zhu, E., Wang, S., Liu, X., and Guo, X. (2019). An efficient and robust unsupervised anomaly detection method using ensemble random projection in surveillance videos. Sensors, 19.","DOI":"10.3390\/s19194145"},{"key":"ref_15","unstructured":"Astrid, M., Zaheer, M., Lee, J., and Lee, S. (2021, January 22\u201325). Learning not to reconstruct anomalies. Proceedings of the British Machine Vision Conference (BMVC), Online."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"34366","DOI":"10.1109\/ACCESS.2021.3059170","article-title":"Weapon detection in real-time CCTV videos using deep learning","volume":"9","author":"Bhatti","year":"2021","journal-title":"IEEE Access"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5495","DOI":"10.1007\/s11042-020-09964-6","article-title":"A deep learning approach to building an intelligent video surveillance system","volume":"80","author":"Xu","year":"2021","journal-title":"Multimed. Tools Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Amrutha, V.C., Jyotsna, C., and Amudha, J. (2020, January 5\u20137). Deep learning approach for suspicious activity detection from surveillance video. Proceedings of the International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India.","DOI":"10.1109\/ICIMIA48430.2020.9074920"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.procs.2020.06.030","article-title":"Real-time anomaly recognition through CCTV using neural networks","volume":"173","author":"Singh","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_20","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., and Zhai, X. (2021, January 3\u20137). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1007\/978-981-16-8403-6_59","article-title":"Suspicious activity detection in surveillance applications using slowfast convolutional neural network","volume":"106","author":"Agarwal","year":"2022","journal-title":"Adv. Data Comput. Commun. Secur."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lu, C., Shi, J., and Jia, J. (2013, January 1\u20138). Abnormal event detection at 150 FPS in MATLAB. Proceedings of the International Conference on Computer Vision (ICCV), Sydney, NSW, Australia.","DOI":"10.1109\/ICCV.2013.338"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Luo, W., Liu, W., and Gao, S. (2017, January 22\u201329). A revisit of sparse coding based anomaly detection in stacked RNN framework. Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.45"},{"key":"ref_24","unstructured":"Soomro, K., Zamir, A.R., and Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in The Wild. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre, T. (2011, January 6\u201313). HMDB: A large video database for human motion recognition. Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ritcher, S.R., Vitnee, V., Roth, S., and Koltun, V. (2016, January 11\u201314). Playing for data: Ground truth from computer games. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_7"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Cao, Z., Gao, H., Mangalam, K., Cai, Q., and Vo, M. (2020, January 23\u201328). Long-term human motion prediction with scene context. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_23"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Munawar, A., Vinayavekhin, P., and Magitris, G.D. (2017, January 25\u201328). Limiting the reconstruction capability of generative neural network using negative learning. Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, Japan.","DOI":"10.1109\/MLSP.2017.8168155"},{"key":"ref_29","unstructured":"Zaheer, M.Z., Lee, J., Astrid, M., and Lee, S. (2020, January 13\u201319). Old is gold: Redefining the adversarially learned one-class classifier training paradigm. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA."},{"key":"ref_30","unstructured":"Zong, B., Song, Q., Min, R.M., Cheng, W., and Lumezanu, C. (2020, January 26\u201330). Deep autoencoding gaussian mixture model for unsupervised anomaly detection. Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Park, H., Noh, J., and Ham, B. (2020, January 13\u201319). Learning memory-guided normality for anomaly detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01438"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/22\/9214\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:24:01Z","timestamp":1760131441000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/22\/9214"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,16]]},"references-count":31,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["s23229214"],"URL":"https:\/\/doi.org\/10.3390\/s23229214","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,16]]}}}