{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T15:02:49Z","timestamp":1776092569452,"version":"3.50.1"},"reference-count":49,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,10,16]],"date-time":"2020-10-16T00:00:00Z","timestamp":1602806400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,10,16]],"date-time":"2020-10-16T00:00:00Z","timestamp":1602806400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100010447","name":"Kementerian Riset, Teknologi dan Pendidikan Tinggi","doi-asserted-by":"publisher","award":["90\/UN11.2.1\/PT.01.03\/DPRM\/2020"],"award-info":[{"award-number":["90\/UN11.2.1\/PT.01.03\/DPRM\/2020"]}],"id":[{"id":"10.13039\/501100010447","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper describes a method for learning anomaly behavior in the video by finding an attention region from spatiotemporal information, in contrast to the full-frame learning. In our proposed method, a robust background subtraction (BG) for extracting motion, indicating the location of attention regions is employed. The resulting regions are finally fed into a three-dimensional Convolutional Neural Network (3D CNN). Specifically, by taking advantage of C3D (Convolution 3-dimensional), to completely exploit spatiotemporal relation, a deep convolution network is developed to distinguish normal and anomalous events. Our system is trained and tested against a large-scale UCF-Crime anomaly dataset for validating its effectiveness. This dataset contains 1900 long and untrimmed real-world surveillance videos and splits into 950 anomaly events and 950 normal events, respectively. In total, there are approximately\u2009~\u200913 million frames are learned during the training and testing phase. As shown in the experiments section, in terms of accuracy, the proposed visual attention model can obtain 99.25 accuracies. From the industrial application point of view, the extraction of this attention region can assist the security officer on focusing on the corresponding anomaly region, instead of a wider, full-framed inspection.<\/jats:p>","DOI":"10.1186\/s40537-020-00365-y","type":"journal-article","created":{"date-parts":[[2020,10,16]],"date-time":"2020-10-16T10:03:38Z","timestamp":1602842618000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":66,"title":["Deep anomaly detection through visual attention in surveillance videos"],"prefix":"10.1186","volume":"7","author":[{"given":"Nasaruddin","family":"Nasaruddin","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5740-1938","authenticated-orcid":false,"given":"Kahlil","family":"Muchtar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Afdhal","family":"Afdhal","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alvin Prayuda Juniarta","family":"Dwiyantoro","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,10,16]]},"reference":[{"key":"365_CR1","doi-asserted-by":"crossref","unstructured":"Mohammadi S, Perina A, Kiani H, Murino V. Angry crowds: detecting violent events in videos. In: ECCV, 2016.","DOI":"10.1007\/978-3-319-46478-7_1"},{"key":"365_CR2","unstructured":"Esen E, Arabaci MA, Soysal M. Fight detection in surveillance videos. In: 11th Int. Workshop on Content-Based Multimedia Indexing, 2011."},{"key":"365_CR3","unstructured":"Nievas EB, Suarez OD, Garc\u00b4\u0131a GB, Sukthankar R. Violence detection in video using computer vision techniques. In: CAIP, 2011."},{"key":"365_CR4","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1109\/6979.880968","volume":"1","author":"S Kamijo","year":"2000","unstructured":"Kamijo S, Matsushita Y, Ikeuchi K, Sakauchi M. Traffic monitoring and accident detection at intersections. IEEE Trans Intell Transp Syst. 2000;1:108\u201318.","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"365_CR5","doi-asserted-by":"crossref","unstructured":"Sultani W, Choi JY. Abnormal traffic detection using intelligent driver model. In: ICPR, 2010.","DOI":"10.1109\/ICPR.2010.88"},{"key":"365_CR6","doi-asserted-by":"crossref","unstructured":"Sultani W, Chen C, Shah M, Real-world anomaly detection in surveillance videos. In: CVPR, 2018.","DOI":"10.1109\/CVPR.2018.00678"},{"key":"365_CR7","unstructured":"Andrews S, Tsochantaridis I, Hofmann T, Support Vector Machines for Multiple-Instance Learning. In: Advances in neural information processing systems, 2003."},{"key":"365_CR8","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1016\/S0004-3702(96)00034-3","volume":"89","author":"TG Dietterich","year":"1997","unstructured":"Dietterich TG, Lathrop RH, Lozano-P\u00e9rez T. Solving the multiple instance problem with axis-parallel rectangles. Artif Intell. 1997;89:31\u201371.","journal-title":"Artif Intell"},{"key":"365_CR9","unstructured":"Landi F, Snoek CGM, Cucchiara R. Anomaly Locality in Video Surveillance, arXiv preprint arXiv:1901.10364, 2019."},{"key":"365_CR10","doi-asserted-by":"crossref","unstructured":"Xu Q, See J, Lin W. Localization guided fight action detection in surveillance videos. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019.","DOI":"10.1109\/ICME.2019.00104"},{"key":"365_CR11","doi-asserted-by":"crossref","unstructured":"Jain M, Gemert JV, e. J\u00b4egou H, Bouthemy P, Snoek CG. Action localization with tubelets from motion. In: CVPR, 2014.","DOI":"10.1109\/CVPR.2014.100"},{"key":"365_CR12","doi-asserted-by":"crossref","unstructured":"Lessard FBA, Bilodeau G-A, Saunier N. The countingapp, or how to count vehicles in 500 hours of video. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2016.","DOI":"10.1109\/CVPRW.2016.198"},{"key":"365_CR13","doi-asserted-by":"crossref","unstructured":"Xu D, Ricci E, Yan Y, Song J, Sebe N. Learning deep representations of appearance and motion for anomalous event detection. In: BMVC, 2015.","DOI":"10.5244\/C.29.8"},{"key":"365_CR14","doi-asserted-by":"crossref","unstructured":"Wu S, Moore BE, Shah M. Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: CVPR, 2010.","DOI":"10.1109\/CVPR.2010.5539882"},{"key":"365_CR15","doi-asserted-by":"crossref","unstructured":"Basharat A, Gritai A,Shah M. Learning object motion patterns for anomaly detection and improved object detection. In: CVPR, 2008.","DOI":"10.1109\/CVPR.2008.4587510"},{"key":"365_CR16","doi-asserted-by":"crossref","unstructured":"Cui X, Liu Q, Gao M, Metaxas DN Abnormal detection using interaction energy potentials. In CVPR, 2011.","DOI":"10.1109\/CVPR.2011.5995558"},{"key":"365_CR17","doi-asserted-by":"crossref","unstructured":"Antic B, Ommer B. Video parsing for abnormality detection. In ICCV, 2011.","DOI":"10.1109\/ICCV.2011.6126525"},{"key":"365_CR18","doi-asserted-by":"crossref","unstructured":"Hospedales T, Gong S, Xiang T. A markov clustering topic model for mining behaviour in video. In: ICCV, 2009.","DOI":"10.1109\/ICCV.2009.5459342"},{"issue":"1","key":"365_CR19","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1109\/JSTSP.2012.2234722","volume":"7","author":"Y Zhu","year":"2012","unstructured":"Zhu Y, Nayak IM, Roy-Chowdhury AK. Context-aware activity recognition and anomaly detection in video. IEEE J Select Topics Signal Process. 2012;7(1):91\u2013101.","journal-title":"IEEE J Select Topics Signal Process"},{"issue":"1","key":"365_CR20","first-page":"18","volume":"36","author":"W Li","year":"2013","unstructured":"Li W, Mahadevan V, Vasconcelos N. Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell. 2013;36(1):18\u201332.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"365_CR21","doi-asserted-by":"crossref","unstructured":"Kratz L, Nishino K. Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In: CVPR, 2009.","DOI":"10.1109\/CVPRW.2009.5206771"},{"key":"365_CR22","doi-asserted-by":"crossref","unstructured":"Lu C, Shi J, Jia J. Abnormal event detection at 150 fps in matlab. In: ICCV, 2013.","DOI":"10.1109\/ICCV.2013.338"},{"key":"365_CR23","doi-asserted-by":"crossref","unstructured":"Zhao B, Fei-Fei L, Xing EP. Online detection of unusual events in videos via dynamic sparse coding. In: CVPR, 2011.","DOI":"10.1109\/CVPR.2011.5995524"},{"key":"365_CR24","doi-asserted-by":"crossref","unstructured":"Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS. Learning temporal regularity in video sequences. In: CVPR, 2016.","DOI":"10.1109\/CVPR.2016.86"},{"key":"365_CR25","doi-asserted-by":"crossref","unstructured":"Cheng K-W, Chen Y-T, Fang W-H. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. In: CVPR, 2015.","DOI":"10.1109\/CVPR.2015.7298909"},{"key":"365_CR26","doi-asserted-by":"crossref","unstructured":"Cong Y, Yuan J, Liu J. Sparse reconstruction cost for abnormal event detection. In: CVPR, 2011.","DOI":"10.1109\/CVPR.2011.5995434"},{"key":"365_CR27","doi-asserted-by":"crossref","unstructured":"Dutta JK, Banerjee B Online detection of abnormal events using incremental coding length. In AAAI, 2015.","DOI":"10.1609\/aaai.v29i1.9799"},{"key":"365_CR28","doi-asserted-by":"crossref","unstructured":"Ionescu RT, Smeureanu S, Popescu M, Alexe B. Detecting abnormal events in video using narrowed normality clusters. In: WACV, 2019.","DOI":"10.1109\/WACV.2019.00212"},{"key":"365_CR29","doi-asserted-by":"crossref","unstructured":"Kim J, Grauman K. Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates. In: CVPR, 2009.","DOI":"10.1109\/CVPR.2009.5206569"},{"key":"365_CR30","doi-asserted-by":"crossref","unstructured":"Mehran R, Oyama A, Shah M, Abnormal crowd behavior detection using social force model. In: CVPR, 2009.","DOI":"10.1109\/CVPR.2009.5206641"},{"key":"365_CR31","doi-asserted-by":"crossref","unstructured":"Ren H, Liu W, Olsen SI, Escalera S, Moeslund TB. Unsupervised Behavior-Specific Dictionary Learning for Abnormal Event Detection. In: BMVC, 2015.","DOI":"10.5244\/C.29.28"},{"key":"365_CR32","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1016\/j.cviu.2016.10.010","volume":"156","author":"D Xu","year":"2017","unstructured":"Xu D, Yan Y, Ricci E, Sebe N. Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst. 2017;156:117\u201327.","journal-title":"Comput Vis Image Underst"},{"key":"365_CR33","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1016\/j.patcog.2015.11.018","volume":"59","author":"Y Zhang","year":"2016","unstructured":"Zhang Y, Lu H, Zhang L, Ruan X, Sakai S. Video anomaly detection based on locality sensitive hashing filters. Pattern Recogn. 2016;59:302\u201311.","journal-title":"Pattern Recogn"},{"key":"365_CR34","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1016\/j.cviu.2015.06.009","volume":"144","author":"J Kooij","year":"2016","unstructured":"Kooij J, Liem M, Krijnders J, Andringa T, Gavrila D. Multi-modal human aggression detection. Comput Vis Image Underst. 2016;144:106\u201320.","journal-title":"Comput Vis Image Underst"},{"issue":"8","key":"365_CR35","doi-asserted-by":"publisher","first-page":"1472","DOI":"10.1109\/TPAMI.2008.175","volume":"31","author":"I Saleemi","year":"2009","unstructured":"Saleemi I, Shafique K, Shah M. Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Trans Pattern Anal Mach Intell. 2009;31(8):1472\u201385.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"365_CR36","doi-asserted-by":"publisher","first-page":"358","DOI":"10.1016\/j.image.2016.06.007","volume":"47","author":"S Zhou","year":"2016","unstructured":"Zhou S, Shen W, Zeng D, Fang M, Wei Y, Zhang Z. Spatial\u2013temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Proc Image Commun. 2016;47:358\u201368.","journal-title":"Signal Proc Image Commun."},{"key":"365_CR37","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1016\/j.ins.2014.01.019","volume":"269","author":"M Jian","year":"2014","unstructured":"Jian M, Lam K-M, Dong J. Illumination-insensitive texture discrimination based on illumination compensation and enhancement. Inf Sci. 2014;269:60\u201372.","journal-title":"Inf Sci"},{"key":"365_CR38","first-page":"1","volume":"21","author":"C-Y Lin","year":"2019","unstructured":"Lin C-Y, Muchtar K, Lin W-Y, Jian Z-Y. Moving object detection through image bit-planes representation without thresholding. IEEE Transact Intell Transport Syst. 2019;21:1\u201311.","journal-title":"IEEE Transact Intell Transport Syst"},{"key":"365_CR39","doi-asserted-by":"crossref","unstructured":"Zivkovic Z. Improved adaptive Gaussian mixture model for background subtraction. In Cambridge: ICPR, 2004.","DOI":"10.1109\/ICPR.2004.1333992"},{"issue":"7","key":"365_CR40","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1016\/j.patrec.2005.11.005","volume":"27","author":"Z Zivkovic","year":"2006","unstructured":"Zivkovic Z, d Heijden F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit Lett. 2006;27(7):773\u201380.","journal-title":"Pattern Recognit Lett"},{"key":"365_CR41","unstructured":"Tomasi C, Manduchi R. Bilateral Filtering for Gray and Color Images. In: IEEE International Conference on Computer Vision, 1998."},{"key":"365_CR42","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1016\/j.ins.2013.08.014","volume":"269","author":"C-H Yeh","year":"2014","unstructured":"Yeh C-H, Lin C-Y, Muchtar K, Kang L-W. Real-time background modeling based on a multi-level texture description. Inf Sci. 2014;269:106\u201327.","journal-title":"Inf Sci"},{"key":"365_CR43","doi-asserted-by":"crossref","unstructured":"Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3d convolutional networks. In: ICCV, 2015.","DOI":"10.1109\/ICCV.2015.510"},{"key":"365_CR44","doi-asserted-by":"crossref","unstructured":"Fa L, Song Y, Shu X, Global and Local C3D Ensemble System for First Person Interactive Action Recognition. In: International Conference on Multimedia Modeling, 2018.","DOI":"10.1007\/978-3-319-73600-6_14"},{"key":"365_CR45","doi-asserted-by":"crossref","unstructured":"Bendali-Braham M, Weber J, Forestier G, Idoumghar L, Muller P-A, Transfer learning for the classification of video-recorded crowd movements. In: 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), 2019.","DOI":"10.1109\/ISPA.2019.8868704"},{"key":"365_CR46","unstructured":"Liu K, Liu W, Ma H, Tan M, Gan C. A Real-time Action Representation with Temporal Encoding and Deep Compression, IEEE Transactions on Circuits and Systems for Video Technology (Early Access), 2020; p. 1."},{"key":"365_CR47","doi-asserted-by":"crossref","unstructured":"Fan Y, Lu X, Li D, liu Y. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016.","DOI":"10.1145\/2993148.2997632"},{"key":"365_CR48","doi-asserted-by":"crossref","unstructured":"Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L, Large-scale Video Classification with Convolutional Neural Networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.","DOI":"10.1109\/CVPR.2014.223"},{"key":"365_CR49","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Machine Learn Res. 2014;15:1929\u201358.","journal-title":"J Machine Learn Res."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00365-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-020-00365-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00365-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,23]],"date-time":"2022-11-23T16:32:51Z","timestamp":1669221171000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-020-00365-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,16]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["365"],"URL":"https:\/\/doi.org\/10.1186\/s40537-020-00365-y","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,16]]},"assertion":[{"value":"16 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 October 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 October 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no conflict of interest.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"87"}}