{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:49:17Z","timestamp":1760147357788,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,1,26]],"date-time":"2023-01-26T00:00:00Z","timestamp":1674691200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61971007","61571013"],"award-info":[{"award-number":["61971007","61571013"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>As the monitor probes are used more and more widely these days, the task of detecting abnormal behaviors in surveillance videos has gained widespread attention. The generalization ability and parameter overhead of the model affect how accurate the detection result is. To deal with the poor generalization ability and high parameter overhead of the model in existing anomaly detection methods, we propose a three-dimensional multi-branch convolutional fusion network, named \u201cBranch-Fusion Net\u201d. The network is designed with a multi-branch structure not only to significantly reduce parameter overhead but also to improve the generalization ability by understanding the input feature map from different perspectives. To ignore useless features during the model training, we propose a simple yet effective Channel Spatial Attention Module (CSAM), which sequentially focuses attention on key channels and spatial feature regions to suppress useless features and enhance important features. We combine the Branch-Fusion Net and the CSAM as a local feature extraction network and use the Bi-Directional Gated Recurrent Unit (Bi-GRU) to extract global feature information. The experiments are validated on a self-built Crimes-mini dataset, and the accuracy of anomaly detection in surveillance videos reaches 93.55% on the test set. The result shows that the model proposed in the paper significantly improves the accuracy of anomaly detection in surveillance videos with low parameter overhead.<\/jats:p>","DOI":"10.3390\/s23031385","type":"journal-article","created":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T01:27:58Z","timestamp":1674782878000},"page":"1385","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Research on Anomaly Detection of Surveillance Video Based on Branch-Fusion Net and CSAM"],"prefix":"10.3390","volume":"23","author":[{"given":"Pengjv","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Information Science and Technology, North China University of Technology, Beijing 100144, China"}]},{"given":"Yuanyao","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, North China University of Technology, Beijing 100144, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,26]]},"reference":[{"key":"ref_1","first-page":"104078","article-title":"A comprehensive review on deep learning-based methods for video anomaly detection","volume":"106","author":"Nayak","year":"2020","journal-title":"ScienceDirect"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1436","DOI":"10.1016\/j.cviu.2013.06.007","article-title":"An on-line, real-time learning method for detecting anomalies in videos using spatio-temporal compositions","volume":"117","author":"Roshtkhari","year":"2013","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"172425","DOI":"10.1109\/ACCESS.2019.2954540","article-title":"Spatio-Temporal Unity Networking for Video Anomaly Detection","volume":"7","author":"Li","year":"2019","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/TPAMI.2013.111","article-title":"Anomaly detection and localization in crowded scenes","volume":"36","author":"Li","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1016\/j.cviu.2016.10.010","article-title":"Learning deep representations of appearance and motion for anomalous event detection","volume":"156","author":"Xu","year":"2017","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_6","first-page":"2415","article-title":"Spatio-temporal video autoencoder with differentiable parameter","volume":"58","author":"Patraucean","year":"2015","journal-title":"Comput. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhao, B., Li, F.F., and Xing, E.P. (2011, January 20\u201325). Online detection of unusual events in videos via dynamic sparse coding. Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995524"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2301","DOI":"10.1109\/TNNLS.2021.3083152","article-title":"Robust Unsupervised Video Anomaly Detection by Multipath Frame Prediction","volume":"33","author":"Wang","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, W., Luo, W., Lian, D., and Gao, S. (2017). Future Frame Prediction for Anomaly Detection\u2014A New Baseline. arXiv.","DOI":"10.1109\/CVPR.2018.00684"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ionescu, R.T., Khan, F.S., and Georgescu, M.I. (2019, January 15\u201320). Object-centric Auto-encoders and Dummy Anomalies for Abnormal Event Detection in Video. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00803"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Kiran, B., Dilip, T., and Ranjith, P. (2018). An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J. Imaging, 4.","DOI":"10.3390\/jimaging4020036"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning Spatiotemporal Features with 3D Convolutional Networks. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_13","unstructured":"Liang, Z., Zhu, G., and Shen, P. (2017, January 22\u201329). Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition. Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"40757","DOI":"10.1109\/ACCESS.2019.2906654","article-title":"A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition","volume":"7","author":"Ouyang","year":"2017","journal-title":"IEEE Access"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1002\/spe.2701","article-title":"Abnormal visual event detection based on multi instance learning and autoregressive integrated moving average model in edge-based Smart City surveillance","volume":"50","author":"Xu","year":"2020","journal-title":"Softw. Pract. Exp."},{"key":"ref_16","first-page":"2021","article-title":"Squeeze-and-Excitation Networks","volume":"42","author":"Jie","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","unstructured":"Li, K., Wang, Y., and Zhang, J. (2022). UniFormer: Unifying Convolution and Self-attention for Visual Recognition. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Hasan, M., Choi, J., and Neumann, J. (2016, January 27\u201330). Learning Temporal Regularity in Video Sequences. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.86"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Gong, D., Liu, L., and Le, V. (November, January 27). Memorizing Normality to Detect Anomaly: Parameter-Augmented Deep Autoencoder for Unsupervised Anomaly Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00179"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Park, H., Noh, J., and Ham, B. (2020). Learning Parameter-guided Normality for Anomaly Detection. arXiv.","DOI":"10.1109\/CVPR42600.2020.01438"},{"key":"ref_21","unstructured":"Medel, J.R., and Savakis, A. (2016). Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Parameter Networks. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lu, Y., Reddy, M., and Nabavi, S.S. (2019, January 18\u201321). Future Frame Prediction Using Convolutional VRNN for Anomaly Detection. Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan.","DOI":"10.1109\/AVSS.2019.8909850"},{"key":"ref_23","unstructured":"Mathieu, M., Couprie, C., and Lecun, Y. (2015). Deep multi-scale video prediction beyond mean square error. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ye, M., Peng, X., and Gan, W. (2019, January 21\u201325). AnoPCN: Video Anomaly Detection via Deep Predictive Coding Network. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France.","DOI":"10.1145\/3343031.3350899"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Sabokrou, M., Khalooei, M., Fathy, M., and Adeli, E. (2018, January 18\u201323). Adversarially Learned One-Class Classifier for Novelty Detection. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00356"},{"key":"ref_26","first-page":"2609","article-title":"A Deep One-Class Neural Network for Anomalous Event Detection in Complex Scenes","volume":"31","author":"Wu","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1109\/TMM.2019.2929931","article-title":"Video Anomaly Detection and Localization Based on an Adaptive Intra-frame Classification Network","volume":"22","author":"Xu","year":"2020","journal-title":"IEEE Trans. Multimed."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Sultani, W., Chen, C., and Shah, M. (2018, January 18\u201323). Real-World Anomaly Detection in Surveillance Videos. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00678"},{"key":"ref_29","unstructured":"Kamoona, A.M., Gosta, A.K., and Bab-Hadiashar, A. (2020). Multiple Instance-Based Video Anomaly Detection using Deep Temporal Encoding-Decoding. arXiv."},{"key":"ref_30","unstructured":"Zhu, Y., and Newsam, S. (2019). Motion-Aware Feature for Improved Video Anomaly Detection. arXiv."},{"key":"ref_31","unstructured":"Simonyan, K., and Zisserman, A. (2014, January 8). Two-Stream Convolutional Networks for Action Recognition in Videos. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 21\u201326). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. (2016). Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. arXiv.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lin, J., Gan, C., and Han, S. (November, January 27). TSM: Temporal Shift Module for Efficient Video Understanding. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00718"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., and Wang, L. (2020, January 13\u201319). TEA: Temporal Excitation and Aggregation for Action Recognition. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00099"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, L., Tong, Z., Ji, B., and Wu, G. (2021, January 20\u201325). TDN: Temporal Difference Networks for Efficient Action Recognition. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00193"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Fan, H., Malik, J., and He, K. (November, January 27). SlowFast Networks for Video Recognition. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00630"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hara, K., Kataoka, H., and Satoh, Y. (2017, January 22\u201329). Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition. Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.373"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., and Paluri, M. (2018, January 18\u201323). A Closer Look at Spatiotemporal Convolutions for Action Recognition. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00675"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Xie, S., Sun, C., Huang, J., Tu, Z., and Murphy, K. (2017). Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. arXiv.","DOI":"10.1007\/978-3-030-01267-0_19"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C. (2020, January 13\u201319). X3D: Expanding Architectures for Efficient Video Recognition. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00028"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-Local Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Yao, T., and Mei, T. (2017, January 22\u201329). Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.590"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1385\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:16:17Z","timestamp":1760120177000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1385"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,26]]},"references-count":43,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["s23031385"],"URL":"https:\/\/doi.org\/10.3390\/s23031385","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,1,26]]}}}