{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T17:13:26Z","timestamp":1778346806053,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":25,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,12,15]],"date-time":"2021-12-15T00:00:00Z","timestamp":1639526400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12,15]]},"DOI":"10.1145\/3508072.3512288","type":"proceedings-article","created":{"date-parts":[[2022,4,13]],"date-time":"2022-04-13T17:01:56Z","timestamp":1649869316000},"page":"834-840","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["A Vision Transformer Model for Violence Detection from Real-Time Videos"],"prefix":"10.1145","author":[{"given":"Arfin","family":"Shagufta","sequence":"first","affiliation":[{"name":"Department of Applied Sc. and Humanities, Jamia Millia Islamia, India"}]},{"given":"Mohammad Tarique","family":"Hesham","sequence":"additional","affiliation":[{"name":"Department of Applied Sc. and Humanities, Jamia Millia Islamia, India"}]},{"given":"Sarfaraz","family":"Masood","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Jamia Millia Islamia, India"}]},{"given":"Ahmed","family":"Abd El-latif","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Menoufia University, Egypt"}]}],"member":"320","published-online":{"date-parts":[[2022,4,13]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Robust Real-Time Violence Detection in Video Using CNN And LSTM. In 2019 2nd Scientific Conference of Computer Sciences (SCCS). IEEE. https:\/\/doi.org\/10","author":"R.","year":"2019","unstructured":"Al-Maamoon\u00a0 R. Abdali and Rana\u00a0F. Al-Tuma. 2019 . Robust Real-Time Violence Detection in Video Using CNN And LSTM. In 2019 2nd Scientific Conference of Computer Sciences (SCCS). IEEE. https:\/\/doi.org\/10 .1109\/sccs. 2019 .8852616 10.1109\/sccs.2019.8852616 Al-Maamoon\u00a0R. Abdali and Rana\u00a0F. Al-Tuma. 2019. Robust Real-Time Violence Detection in Video Using CNN And LSTM. In 2019 2nd Scientific Conference of Computer Sciences (SCCS). IEEE. https:\/\/doi.org\/10.1109\/sccs.2019.8852616"},{"key":"e_1_3_2_1_2_1","unstructured":"Alexei Baevski and Michael Auli. 2018. Adaptive Input Representations for Neural Language Modeling. CoRR abs\/1809.10853(2018). arXiv:1809.10853http:\/\/arxiv.org\/abs\/1809.10853  Alexei Baevski and Michael Auli. 2018. Adaptive Input Representations for Neural Language Modeling. CoRR abs\/1809.10853(2018). arXiv:1809.10853http:\/\/arxiv.org\/abs\/1809.10853"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/AVSS.2016.7738019"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7908-2604-3_16"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_1_6_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018)."},{"key":"e_1_3_2_1_7_1","volume-title":"Advances in Visual Computing","author":"Ding Chunhui","unstructured":"Chunhui Ding , Shouke Fan , Ming Zhu , Weiguo Feng , and Baozhi Jia . 2014. Violence Detection in Video by Using 3D Convolutional Neural Networks . In Advances in Visual Computing . Springer International Publishing , 551\u2013558. https:\/\/doi.org\/10.1007\/978-3-319-14364-4_53 10.1007\/978-3-319-14364-4_53 Chunhui Ding, Shouke Fan, Ming Zhu, Weiguo Feng, and Baozhi Jia. 2014. Violence Detection in Video by Using 3D Convolutional Neural Networks. In Advances in Visual Computing. Springer International Publishing, 551\u2013558. https:\/\/doi.org\/10.1007\/978-3-319-14364-4_53"},{"key":"e_1_3_2_1_8_1","volume-title":"Video Representation Learning for CCTV-Based Violence Detection. In 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON). IEEE. https:\/\/doi.org\/10","author":"Ditsanthia Eknarin","year":"2018","unstructured":"Eknarin Ditsanthia , Luepol Pipanmaekaporn , and Suwatchai Kamonsantiroj . 2018 . Video Representation Learning for CCTV-Based Violence Detection. In 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON). IEEE. https:\/\/doi.org\/10 .1109\/times-icon.2018.8621751 10.1109\/times-icon.2018.8621751 Eknarin Ditsanthia, Luepol Pipanmaekaporn, and Suwatchai Kamonsantiroj. 2018. Video Representation Learning for CCTV-Based Violence Detection. In 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON). IEEE. https:\/\/doi.org\/10.1109\/times-icon.2018.8621751"},{"key":"e_1_3_2_1_9_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR abs\/2010.11929(2020). arXiv:2010.11929https:\/\/arxiv.org\/abs\/2010.11929  Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR abs\/2010.11929(2020). arXiv:2010.11929https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_2_1_10_1","unstructured":"Timothy Dozat. 2016. Incorporating nesterov momentum into adam. (2016).  Timothy Dozat. 2016. Incorporating nesterov momentum into adam. (2016)."},{"key":"e_1_3_2_1_11_1","volume-title":"Long Short-Term Memory. 9, 8 (November","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long Short-Term Memory. 9, 8 (November 1997 ), 1735\u20131780. https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735 10.1162\/neco.1997.9.8.1735 Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long Short-Term Memory. 9, 8 (November 1997), 1735\u20131780. https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_12_1","volume-title":"Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE. https:\/\/doi.org\/10","author":"Islam Zahidul","year":"2021","unstructured":"Zahidul Islam , Mohammad Rukonuzzaman , Raiyan Ahmed , Md.\u00a0 Hasanul Kabir , and Moshiur Farazi . 2021 . Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE. https:\/\/doi.org\/10 .1109\/ijcnn52387.2021.9534280 10.1109\/ijcnn52387.2021.9534280 Zahidul Islam, Mohammad Rukonuzzaman, Raiyan Ahmed, Md.\u00a0Hasanul Kabir, and Moshiur Farazi. 2021. Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE. https:\/\/doi.org\/10.1109\/ijcnn52387.2021.9534280"},{"key":"e_1_3_2_1_13_1","volume-title":"Deep NeuralNet For Violence Detection Using Motion Features From Dynamic Images. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT). IEEE. https:\/\/doi.org\/10","author":"Jain Aayush","year":"2020","unstructured":"Aayush Jain and Dinesh\u00a0Kumar Vishwakarma . 2020 . Deep NeuralNet For Violence Detection Using Motion Features From Dynamic Images. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT). IEEE. https:\/\/doi.org\/10 .1109\/icssit48917.2020.9214153 10.1109\/icssit48917.2020.9214153 Aayush Jain and Dinesh\u00a0Kumar Vishwakarma. 2020. Deep NeuralNet For Violence Detection Using Motion Features From Dynamic Images. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT). IEEE. https:\/\/doi.org\/10.1109\/icssit48917.2020.9214153"},{"key":"e_1_3_2_1_14_1","volume-title":"Kingma and Jimmy Ba","author":"P.","year":"2017","unstructured":"Diederik\u00a0 P. Kingma and Jimmy Ba . 2017 . Adam : A Method for Stochastic Optimization . arxiv:1412.6980\u00a0[cs.LG] Diederik\u00a0P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arxiv:1412.6980\u00a0[cs.LG]"},{"key":"e_1_3_2_1_15_1","volume-title":"Advances in Neural Information Processing Systems, F.\u00a0Pereira, C.\u00a0J.\u00a0C. Burges, L.\u00a0Bottou, and K.\u00a0Q","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey\u00a0 E Hinton . 2012. ImageNet Classification with Deep Convolutional Neural Networks . In Advances in Neural Information Processing Systems, F.\u00a0Pereira, C.\u00a0J.\u00a0C. Burges, L.\u00a0Bottou, and K.\u00a0Q . Weinberger (Eds.). Vol.\u00a025. Curran Associates, Inc .https:\/\/proceedings.neurips.cc\/paper\/ 2012 \/file\/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf Alex Krizhevsky, Ilya Sutskever, and Geoffrey\u00a0E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, F.\u00a0Pereira, C.\u00a0J.\u00a0C. Burges, L.\u00a0Bottou, and K.\u00a0Q. Weinberger (Eds.). Vol.\u00a025. Curran Associates, Inc.https:\/\/proceedings.neurips.cc\/paper\/2012\/file\/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf"},{"key":"e_1_3_2_1_16_1","volume-title":"Computer Analysis of Images and Patterns","author":"Nievas Enrique\u00a0Bermejo","unstructured":"Enrique\u00a0Bermejo Nievas , Oscar\u00a0Deniz Suarez , Gloria\u00a0Bueno Garcia , and Rahul Sukthankar . 2011. Hockey fight detection dataset . In Computer Analysis of Images and Patterns . Springer , 332\u2013339. Enrique\u00a0Bermejo Nievas, Oscar\u00a0Deniz Suarez, Gloria\u00a0Bueno Garcia, and Rahul Sukthankar. 2011. Hockey fight detection dataset. In Computer Analysis of Images and Patterns. Springer, 332\u2013339."},{"key":"e_1_3_2_1_17_1","unstructured":"World\u00a0Health Organisation. 2022. World Report on Violence and Health. https:\/\/www.who.int\/violence_injury_prevention\/violence\/world_report\/en\/abstract_en.pdf  World\u00a0Health Organisation. 2022. World Report on Violence and Health. https:\/\/www.who.int\/violence_injury_prevention\/violence\/world_report\/en\/abstract_en.pdf"},{"key":"e_1_3_2_1_18_1","unstructured":"Xingjian SHI Zhourong Chen Hao Wang Dit-Yan Yeung Wai-kin Wong and Wang-chun WOO. 2015. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Advances in Neural Information Processing Systems C.\u00a0Cortes N.\u00a0Lawrence D.\u00a0Lee M.\u00a0Sugiyama and R.\u00a0Garnett (Eds.). Vol.\u00a028. Curran Associates Inc.https:\/\/proceedings.neurips.cc\/paper\/2015\/file\/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf  Xingjian SHI Zhourong Chen Hao Wang Dit-Yan Yeung Wai-kin Wong and Wang-chun WOO. 2015. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Advances in Neural Information Processing Systems C.\u00a0Cortes N.\u00a0Lawrence D.\u00a0Lee M.\u00a0Sugiyama and R.\u00a0Garnett (Eds.). Vol.\u00a028. Curran Associates Inc.https:\/\/proceedings.neurips.cc\/paper\/2015\/file\/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf"},{"key":"e_1_3_2_1_19_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv:1409.1556\u00a0[cs.CV]  Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv:1409.1556\u00a0[cs.CV]"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICICIS46948.2019.9014714"},{"key":"e_1_3_2_1_21_1","volume-title":"Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. 19, 11 (May","author":"U\u00a0Min Ullah Fath","year":"2019","unstructured":"Fath U\u00a0Min Ullah , Amin Ullah , Khan Muhammad , Ijaz\u00a0Ul Haq , and Sung\u00a0Wook Baik . 2019. Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. 19, 11 (May 2019 ), 2472. https:\/\/doi.org\/10.3390\/s19112472 10.3390\/s19112472 Fath U\u00a0Min Ullah, Amin Ullah, Khan Muhammad, Ijaz\u00a0Ul Haq, and Sung\u00a0Wook Baik. 2019. Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. 19, 11 (May 2019), 2472. https:\/\/doi.org\/10.3390\/s19112472"},{"key":"e_1_3_2_1_22_1","volume-title":"Advances in Neural Information Processing Systems, I.\u00a0Guyon, U.\u00a0V. Luxburg, S.\u00a0Bengio, H.\u00a0Wallach, R.\u00a0Fergus, S.\u00a0Vishwanathan, and R.\u00a0Garnett (Eds.). Vol.\u00a030. Curran Associates","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan\u00a0 N Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is All you Need . In Advances in Neural Information Processing Systems, I.\u00a0Guyon, U.\u00a0V. Luxburg, S.\u00a0Bengio, H.\u00a0Wallach, R.\u00a0Fergus, S.\u00a0Vishwanathan, and R.\u00a0Garnett (Eds.). Vol.\u00a030. Curran Associates , Inc .https:\/\/proceedings.neurips.cc\/paper\/ 2017 \/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\u00a0N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I.\u00a0Guyon, U.\u00a0V. Luxburg, S.\u00a0Bengio, H.\u00a0Wallach, R.\u00a0Fergus, S.\u00a0Vishwanathan, and R.\u00a0Garnett (Eds.). Vol.\u00a030. Curran Associates, Inc.https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_3_2_1_24_1","volume-title":"Violent Interaction Detection in Video Based on Deep Learning. 844 (June","author":"Zhou Peipei","year":"2017","unstructured":"Peipei Zhou , Qinghai Ding , Haibo Luo , and Xinglin Hou . 2017. Violent Interaction Detection in Video Based on Deep Learning. 844 (June 2017 ), 012044. https:\/\/doi.org\/10.1088\/1742-6596\/844\/1\/012044 10.1088\/1742-6596 Peipei Zhou, Qinghai Ding, Haibo Luo, and Xinglin Hou. 2017. Violent Interaction Detection in Video Based on Deep Learning. 844 (June 2017), 012044. https:\/\/doi.org\/10.1088\/1742-6596\/844\/1\/012044"},{"key":"e_1_3_2_1_25_1","volume-title":"Violence detection in surveillance video using low-level features. 13, 10 (August","author":"Zhou Peipei","year":"2018","unstructured":"Peipei Zhou , Qinghai Ding , Haibo Luo , and Xinglin Hou . 2018. Violence detection in surveillance video using low-level features. 13, 10 (August 2018 ), e0203668. https:\/\/doi.org\/10.1371\/journal.pone.0203668 10.1371\/journal.pone.0203668 Peipei Zhou, Qinghai Ding, Haibo Luo, and Xinglin Hou. 2018. Violence detection in surveillance video using low-level features. 13, 10 (August 2018), e0203668. https:\/\/doi.org\/10.1371\/journal.pone.0203668"}],"event":{"name":"ICFNDS 2021: The 5th International Conference on Future Networks & Distributed Systems","location":"Dubai United Arab Emirates","acronym":"ICFNDS 2021"},"container-title":["The 5th International Conference on Future Networks &amp; Distributed Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3508072.3512288","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3508072.3512288","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:31Z","timestamp":1750191151000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3508072.3512288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,15]]},"references-count":25,"alternative-id":["10.1145\/3508072.3512288","10.1145\/3508072"],"URL":"https:\/\/doi.org\/10.1145\/3508072.3512288","relation":{},"subject":[],"published":{"date-parts":[[2021,12,15]]},"assertion":[{"value":"2022-04-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}