{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:39:18Z","timestamp":1778081958995,"version":"3.51.4"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T00:00:00Z","timestamp":1687564800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T00:00:00Z","timestamp":1687564800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"published-print":{"date-parts":[[2024,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Manually labelling datasets for training violence detection systems is time-consuming, expensive, and labor-intensive. Mind wandering, boredom, and short attention span can also cause labelling errors. Moreover, collecting and distributing sensitive images containing violence has ethical implications. Automation is the future for labelling sensitive image datasets. Deep labeller is a two-stage Deep Learning (DL) method that uses pre-trained DL object detection methods on MS-COCO for automatic labelling. The Deep Labeller method labels violent and nonviolent images in WVD and USI. In stage 1, WVD generates weak labels using synthetic images. In stage 2, the Deep labeller method is retrained on weak labels. USI dataset is used to test our method on real-world violence. Deep labeller generated weak and strong labels with an IoU of 0.80036 in stage 1 and 0.95 in stage 2 on the WVD. Automatically generated labels. To test our method\u2019s generalisation power, violent and nonviolent image labels on USI dataset had a mean IoU of 0.7450.<\/jats:p>","DOI":"10.1007\/s11042-023-15621-5","type":"journal-article","created":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T07:46:34Z","timestamp":1687592794000},"page":"10717-10734","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Deep labeller: automatic bounding box generation for synthetic violence detection datasets"],"prefix":"10.1007","volume":"83","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5835-1602","authenticated-orcid":false,"given":"Muhammad\u00a0Shahroz","family":"Nadeem","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fatih","family":"Kurugollu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sara","family":"Saravi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hany\u00a0F.","family":"Atlam","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Virginia\u00a0N.\u00a0L.","family":"Franqueira","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,6,24]]},"reference":[{"key":"15621_CR1","doi-asserted-by":"crossref","unstructured":"Akt\u0131 \u015e, Tataro\u011flu GA, Ekenel HK (2019) Vision-based fight detection from surveillance cameras. In: 2019 ninth international conference on image processing theory, tools and applications (IPTA), IEEE, pp 1\u20136","DOI":"10.1109\/IPTA.2019.8936070"},{"key":"15621_CR2","doi-asserted-by":"crossref","unstructured":"Aljundi R, Chakravarty P, Tuytelaars T (2016) Who\u2019s that actor? automatic labelling of actors in tv series starting from imdb images. In: Asian conference on computer vision, Springer, pp 467\u2013483","DOI":"10.1007\/978-3-319-54187-7_31"},{"issue":"11","key":"15621_CR3","doi-asserted-by":"publisher","first-page":"1690","DOI":"10.3390\/rs10111690","volume":"10","author":"MD Bah","year":"2018","unstructured":"Bah MD, Hafiane A, Canals R (2018) Deep learning with unsupervised data labeling for weed detection in line crops in uav images. Remote Sens 10(11):1690","journal-title":"Remote Sens"},{"key":"15621_CR4","unstructured":"Dai J, Li Y, He K, et\u00a0al (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th international conference on neural information processing systems. NIPS\u201916, Curran Associates Inc., Red Hook, pp 379\u2013387"},{"key":"15621_CR5","doi-asserted-by":"crossref","unstructured":"Demarty C, Ionescu B, Jiang Y, et\u00a0al (2014) Benchmarking violent scenes detection in movies. In: 2014 12th international workshop on content-based multimedia indexing (CBMI), pp 1\u20136","DOI":"10.1109\/CBMI.2014.6849827"},{"key":"15621_CR6","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1016\/j.neucom.2015.09.115","volume":"187","author":"Y Dong","year":"2016","unstructured":"Dong Y, Liu Y, Lian S (2016) Automatic age estimation based on deep learning algorithm. Neurocomputing 187:4\u201310","journal-title":"Neurocomputing"},{"issue":"5","key":"15621_CR7","doi-asserted-by":"publisher","first-page":"482","DOI":"10.1177\/1745691612456044","volume":"7","author":"JD Eastwood","year":"2012","unstructured":"Eastwood JD, Frischen A, Fenske MJ et al (2012) The unengaged mind: defining boredom in terms of attention. Perspect Psychol Sci 7(5):482\u2013495","journal-title":"Perspect Psychol Sci"},{"issue":"2","key":"15621_CR8","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham M, Van Gool L, Williams CK et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303\u2013338","journal-title":"Int J Comput Vis"},{"issue":"3","key":"15621_CR9","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1109\/TCSVT.2016.2615443","volume":"27","author":"H Fradi","year":"2016","unstructured":"Fradi H, Luvison B, Pham QC (2016) Crowd behavior analysis using local mid-level visual descriptors. IEEE Trans Circuits Syst Video Technol 27(3):589\u2013602","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"15621_CR10","doi-asserted-by":"crossref","unstructured":"Huang J, Rathod V, Sun C, et\u00a0al (2017) Speed\/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7310\u20137311","DOI":"10.1109\/CVPR.2017.351"},{"issue":"6006","key":"15621_CR11","doi-asserted-by":"publisher","first-page":"932","DOI":"10.1126\/science.1192439","volume":"330","author":"MA Killingsworth","year":"2010","unstructured":"Killingsworth MA, Gilbert DT (2010) A wandering mind is an unhappy mind. Science 330(6006):932\u2013932","journal-title":"Science"},{"key":"15621_CR12","doi-asserted-by":"crossref","unstructured":"Le QV (2013) Building high-level features using large scale unsupervised learning. In: 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 8595\u20138598","DOI":"10.1109\/ICASSP.2013.6639343"},{"issue":"3","key":"15621_CR13","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1109\/TCSVT.2014.2358029","volume":"25","author":"T Li","year":"2014","unstructured":"Li T, Chang H, Wang M et al (2014) Crowded scene analysis: a survey. IEEE Trans Circuits Syst Video Technol 25(3):367\u2013386","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"15621_CR14","doi-asserted-by":"crossref","unstructured":"Lin TY, Maire M, Belongie S, et\u00a0al (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Springer, pp 740\u2013755","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"15621_CR15","doi-asserted-by":"crossref","unstructured":"Lin TY, Goyal P, Girshick R, et\u00a0al (2017) Focal loss for dense object detection. In: The IEEE international conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2017.324"},{"issue":"2","key":"15621_CR16","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1016\/j.patrec.2008.02.011","volume":"30","author":"J Liu","year":"2009","unstructured":"Liu J, Tong X, Li W et al (2009) Automatic player detection, labeling and tracking in broadcast soccer video. Pattern Recogn Lett 30(2):103\u2013113","journal-title":"Pattern Recogn Lett"},{"key":"15621_CR17","doi-asserted-by":"publisher","unstructured":"Liu L, \u00d6zsu MT (eds) (2009) Mean average precision, Springer US, Boston, MA, pp 1703\u20131703. https:\/\/doi.org\/10.1007\/978-0-387-39940-9_3032","DOI":"10.1007\/978-0-387-39940-9_3032"},{"key":"15621_CR18","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, et\u00a0al (2016) Ssd: single shot multibox detector. In: European conference on computer vision, Springer, pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"15621_CR19","doi-asserted-by":"crossref","unstructured":"Marsza\u0142ek M, Laptev I, Schmid C (2009) Actions in context. In: CVPR 2009-IEEE conference on computer vision & pattern recognition, IEEE computer society, pp 2929\u20132936","DOI":"10.1109\/CVPR.2009.5206557"},{"key":"15621_CR20","unstructured":"Moses, Olafenwa, J (2018) Imageai, an open source python library built to empower developers to build applications and systems with self-contained computer vision capabilities. https:\/\/github.com\/OlafenwaMoses\/ImageAI\u00a0. Accessed 25 June 2019."},{"key":"15621_CR21","doi-asserted-by":"crossref","unstructured":"Nadeem MS, Franqueira VN, Kurugollu F, et\u00a0al (2019) Wvd: A new synthetic dataset for video-based violence detection. In: International conference on innovative techniques and applications of artificial intelligence, Springer, pp 158\u2013164","DOI":"10.1007\/978-3-030-34885-4_13"},{"issue":"3","key":"15621_CR22","doi-asserted-by":"publisher","first-page":"299","DOI":"10.1007\/s11263-007-0122-4","volume":"79","author":"JC Niebles","year":"2008","unstructured":"Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299\u2013318","journal-title":"Int J Comput Vis"},{"key":"15621_CR23","doi-asserted-by":"crossref","unstructured":"Niemeyer M, Arandjelovi\u0107 O (2018) Automatic semantic labelling of images by their content using non-parametric bayesian machine learning and image search using synthetically generated image collages. In: 2018 IEEE 5th international conference on data science and advanced analytics (DSAA), IEEE, pp 160\u2013168","DOI":"10.1109\/DSAA.2018.00026"},{"key":"15621_CR24","doi-asserted-by":"crossref","unstructured":"Nievas EB, Suarez OD, Garc\u00eda GB, et\u00a0al (2011) Violence detection in video using computer vision techniques. In: International conference on computer analysis of images and patterns, Springer, pp 332\u2013339","DOI":"10.1007\/978-3-642-23678-5_39"},{"key":"15621_CR25","doi-asserted-by":"crossref","unstructured":"Papadopoulos DP, Uijlings JR, Keller F, et\u00a0al (2016) We don\u2019t need no bounding-boxes: training object class detectors using only human verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 854\u2013863","DOI":"10.1109\/CVPR.2016.99"},{"key":"15621_CR26","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, et\u00a0al (2016) You only look once: Unified, real-time object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2016.91"},{"key":"15621_CR27","unstructured":"Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, et al (eds) Advances in neural information processing systems 28. Curran Associates, Inc., pp 91\u201399"},{"key":"15621_CR28","doi-asserted-by":"crossref","unstructured":"Rezatofighi H, Tsoi N, Gwak J, et\u00a0al (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: The IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2019.00075"},{"key":"15621_CR29","doi-asserted-by":"crossref","unstructured":"Rota P, Conci N, Sebe N (2012) Real time detection of social interactions in surveillance video. Computer vision-ECCV 2012. Springer, Workshops and demonstrations, pp 111\u2013120","DOI":"10.1007\/978-3-642-33885-4_12"},{"issue":"5","key":"15621_CR30","doi-asserted-by":"publisher","first-page":"1390","DOI":"10.1109\/TIFS.2018.2878538","volume":"14","author":"T Wang","year":"2018","unstructured":"Wang T, Qiao M, Lin Z et al (2018) Generative neural networks for anomaly detection in crowded scenes. IEEE Trans Inf Forensics Secur 14(5):1390\u20131399","journal-title":"IEEE Trans Inf Forensics Secur"},{"key":"15621_CR31","doi-asserted-by":"crossref","unstructured":"Xiang T, Gong S (2005) Video behaviour profiling and abnormality detection without manual labelling. In: Tenth IEEE international conference on computer vision (ICCV\u201905), vol 1. IEEE, pp 1238\u20131245","DOI":"10.1109\/ICCV.2005.248"},{"issue":"3","key":"15621_CR32","doi-asserted-by":"publisher","first-page":"696","DOI":"10.1109\/TCSVT.2016.2589858","volume":"27","author":"T Zhang","year":"2016","unstructured":"Zhang T, Jia W, He X et al (2016) Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans Circuits Syst Video Technol 27(3):696\u2013709","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"15621_CR33","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1109\/TIP.2021.3132834","volume":"31","author":"T Zhou","year":"2021","unstructured":"Zhou T, Li L, Li X et al (2021) Group-wise learning for weakly supervised semantic segmentation. IEEE Trans Image Process 31:799\u2013811","journal-title":"IEEE Trans Image Process"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-023-15621-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11042-023-15621-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-023-15621-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T04:31:17Z","timestamp":1704861077000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11042-023-15621-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,24]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,1]]}},"alternative-id":["15621"],"URL":"https:\/\/doi.org\/10.1007\/s11042-023-15621-5","relation":{"has-preprint":[{"id-type":"doi","id":"10.36227\/techrxiv.15169041.v1","asserted-by":"object"},{"id-type":"doi","id":"10.36227\/techrxiv.15169041","asserted-by":"object"}],"is-supplemented-by":[{"id-type":"doi","id":"10.36227\/techrxiv.15169041","asserted-by":"object"}]},"ISSN":["1380-7501","1573-7721"],"issn-type":[{"value":"1380-7501","type":"print"},{"value":"1573-7721","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,24]]},"assertion":[{"value":"9 February 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 June 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 April 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 June 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The preprint of this paper is available on TechRXiv.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}