{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T13:07:39Z","timestamp":1773493659357,"version":"3.50.1"},"reference-count":20,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,2,9]],"date-time":"2021-02-09T00:00:00Z","timestamp":1612828800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41861134010"],"award-info":[{"award-number":["41861134010"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Key Laboratory of Information Transmission and Distribution Technology of Communication Network","award":["HHX20641X002"],"award-info":[{"award-number":["HHX20641X002"]}]},{"name":"Basic scientific research project of Heilongjiang Province","award":["KJCXZD201704"],"award-info":[{"award-number":["KJCXZD201704"]}]},{"name":"Finnish Cultural Foundation, North-Ostrobothnia Regional Fund","award":["2017"],"award-info":[{"award-number":["2017"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and video sequence-based methods. Sensors and surveillance cameras are used to detect campus violence. In this paper, the authors use image features and acoustic features for campus violence detection. Campus violence data are gathered by role-playing, and 4096-dimension feature vectors are extracted from every 16 frames of video images. The C3D (Convolutional 3D) neural network is used for feature extraction and classification, and an average recognition accuracy of 92.00% is achieved. Mel-frequency cepstral coefficients (MFCCs) are extracted as acoustic features, and three speech emotion databases are involved. The C3D neural network is used for classification, and the average recognition accuracies are 88.33%, 95.00%, and 91.67%, respectively. To solve the problem of evidence conflict, the authors propose an improved Dempster\u2013Shafer (D\u2013S) algorithm. Compared with existing D\u2013S theory, the improved algorithm increases the recognition accuracy by 10.79%, and the recognition accuracy can ultimately reach 97.00%.<\/jats:p>","DOI":"10.3390\/rs13040628","type":"journal-article","created":{"date-parts":[[2021,2,14]],"date-time":"2021-02-14T08:53:56Z","timestamp":1613292836000},"page":"628","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6076-0261","authenticated-orcid":false,"given":"Liang","family":"Ye","sequence":"first","affiliation":[{"name":"Department of Information and Communication Engineering, Harbin Institute of Technology, Harbin 150001, China"},{"name":"OPEM Research Unit, University of Oulu, 90014 Oulu, Finland"},{"name":"Science and Technology on Communication Networks Laboratory, Shijiazhuang 050000, China"}]},{"given":"Tong","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Engineering, Harbin Institute of Technology, Harbin 150001, China"},{"name":"ChinaUnicom Software Harbin Branch, Harbin 150001, China"}]},{"given":"Tian","family":"Han","sequence":"additional","affiliation":[{"name":"OPEM Research Unit, University of Oulu, 90014 Oulu, Finland"},{"name":"Jinhua Advanced Research Institute, Jinhua 321000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0857-2946","authenticated-orcid":false,"given":"Hany","family":"Ferdinando","sequence":"additional","affiliation":[{"name":"OPEM Research Unit, University of Oulu, 90014 Oulu, Finland"},{"name":"Department of Electrical Engineering, Petra Christian University, Surabaya 60236, Indonesia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3963-0750","authenticated-orcid":false,"given":"Tapio","family":"Sepp\u00e4nen","sequence":"additional","affiliation":[{"name":"Physiological Signal Analysis Team, University of Oulu, 90014 Oulu, Finland"}]},{"given":"Esko","family":"Alasaarela","sequence":"additional","affiliation":[{"name":"OPEM Research Unit, University of Oulu, 90014 Oulu, Finland"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"107561","DOI":"10.1016\/j.patcog.2020.107561","article-title":"Sensor-based and vision-based human activity recognition: A comprehensive survey","volume":"108","author":"Dang","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhiqiang, G., Dawei, L., Kaizhu, H., and Yi, H. (2019). Context-aware human activity and smartphone position-mining with motion sensors. Remote Sens., 11.","DOI":"10.3390\/rs11212531"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"524","DOI":"10.1109\/TII.2020.2997032","article-title":"Online detection of action start via soft computing for smart city","volume":"17","author":"Tian","year":"2021","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"5962","DOI":"10.1109\/JIOT.2018.2847731","article-title":"A novel multichannel Internet of things based on dynamic spectrum sharing in 5G communication","volume":"6","author":"Liu","year":"2019","journal-title":"IEEE Internet Things"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"4787","DOI":"10.1109\/TIP.2018.2845742","article-title":"Fight recognition in video using Hough forests and 2D convolutional neural network","volume":"27","author":"Serrano","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1007\/s00138-017-0894-7","article-title":"Spatio-temporal elastic cuboid trajectories for efficient fight recognition using Hough forests","volume":"29","author":"Serrano","year":"2018","journal-title":"Mach. Vis. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Chen, J., Xu, Y., Zhang, C., Xu, Z., Meng, X., and Wang, J. (2019, January 5\u20137). An improved two-stream 3D convolutional neural network for human action recognition. Proceedings of the 2019 25th International Conference on Automation and Computing (ICAC), Lancaster, UK.","DOI":"10.23919\/IConAC.2019.8894962"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1142\/S2196888820500013","article-title":"Violence detection by pretrained modules with different deep learning approaches","volume":"7","author":"Sumon","year":"2020","journal-title":"Vietnam J. Comput. Sci."},{"key":"ref_9","unstructured":"Eknarin, D., Luepol, P., and Suwatchai, K. (2018, January 12\u201314). Video Representation Learning for CCTV-Based Violence Detection. Proceedings of the 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON), Bangkok, Thailand."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1080\/08839514.2020.1723876","article-title":"Violence detection in videos by combining 3D convolutional neural networks and support vector machines","volume":"34","author":"Accattoli","year":"2020","journal-title":"Appl. Artif. Intell."},{"key":"ref_11","first-page":"101","article-title":"Comparison of different feature extraction methods for EEG-based emotion recognition","volume":"1","author":"Nawaz","year":"2020","journal-title":"Biocybern. Biomed. Eng."},{"key":"ref_12","first-page":"608","article-title":"Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales","volume":"1","author":"Sugan","year":"2020","journal-title":"Digit. Signal Process."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Han, T., Zhang, J., Zhang, Z., Sun, G., Ye, L., Ferdinando, H., Alasaarela, E., Sepp\u00e4nen, T., Yu, X., and Yang, S. (2018). Emotion recognition and school violence detection from children speech. Eurasip J. Wirel. Commun. Netw., 235.","DOI":"10.1186\/s13638-018-1253-8"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.pmcj.2014.10.009","article-title":"Multi-sensor data fusion methods for indoor activity recognition using temporal evidence theory","volume":"21","author":"Kushwah","year":"2015","journal-title":"Pervasive Mob. Comput."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"102951","DOI":"10.1016\/j.dsp.2020.102951","article-title":"A survey of speech emotion recognition in natural environment\u2013science direct","volume":"110","author":"Fahad","year":"2020","journal-title":"Digit. Signal Process."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Muljono, M.R.P., Agus, H., and Catur, S. (2019). Speech emotion recognition of indonesian movie audio tracks based on MFCC and SVM. IC3I, 22\u201325.","DOI":"10.1109\/IC3I46837.2019.9055509"},{"key":"ref_17","first-page":"101963","article-title":"Fusion recognition of shearer coal-rock cutting state based on improved RBF neural network and D-S evidence theory","volume":"8","author":"Si","year":"2020","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lin, Z., Tang, S., Peng, G., Zhang, Y., and Zhong, Z. (2017, January 25\u201326). An artificial neural network model with Yager composition theory for transformer state assessment. Proceedings of the 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China.","DOI":"10.1109\/IAEAC.2017.8054097"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TMM.2019.2960588","article-title":"2d skeleton-based action recognition via two-branch stacked LSTM-RNNS","volume":"22","author":"Avola","year":"2020","journal-title":"IEEE Trans. Multimed."},{"key":"ref_20","unstructured":"Avola, D., Cinque, L., Fagioli, A., Foresti, G.L., and Massaroni, C. (2020). Deep temporal analysis for non-acted body affect recognition. IEEE Trans. Affect. Comput., 1\u201312."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/4\/628\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:22:11Z","timestamp":1760160131000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/4\/628"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,9]]},"references-count":20,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["rs13040628"],"URL":"https:\/\/doi.org\/10.3390\/rs13040628","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,9]]}}}