{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T10:12:27Z","timestamp":1760609547558,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,12,6]],"date-time":"2023-12-06T00:00:00Z","timestamp":1701820800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62376186, 61932009"],"award-info":[{"award-number":["62376186, 61932009"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,12,6]]},"DOI":"10.1145\/3595916.3626375","type":"proceedings-article","created":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T16:34:41Z","timestamp":1704126881000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["A Cross-modal and Redundancy-reduced Network for Weakly-Supervised Audio-Visual Violence Detection"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-8106-9853","authenticated-orcid":false,"given":"Yidan","family":"Fan","sequence":"first","affiliation":[{"name":"School of Future Technology &amp; College of Intelligence and Computing, Tianjin University, CN"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2515-8061","authenticated-orcid":false,"given":"Yongxin","family":"Yu","sequence":"additional","affiliation":[{"name":"College of Intelligence and Computing, Tianjin University, CN"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7951-8907","authenticated-orcid":false,"given":"Wenhuan","family":"Lu","sequence":"additional","affiliation":[{"name":"College of Intelligence and Computing, Tianjin University, CN"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2768-1398","authenticated-orcid":false,"given":"Yahong","family":"Han","sequence":"additional","affiliation":[{"name":"College of Intelligence and Computing, Tianjin University, CN"}]}],"member":"320","published-online":{"date-parts":[[2024,1]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037","author":"Chen Yingxian","year":"2023","unstructured":"Yingxian Chen , Zhengzhe Liu , Baoheng Zhang , Wilton Fok , Xiaojuan Qi , and Yik-Chung Wu . 2023 . Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection . In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037 . 387\u2013395. Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, and Yik-Chung Wu. 2023. Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037. 387\u2013395."},{"key":"e_1_3_2_1_3_1","volume-title":"An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929","author":"Dosovitskiy Alexey","year":"2020","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 ( 2020 ). Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)."},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 14009\u201314018","author":"Feng Jia-Chang","year":"2021","unstructured":"Jia-Chang Feng , Fa-Ting Hong , and Wei-Shi Zheng . 2021 . Mist: Multiple instance self-training framework for video anomaly detection . In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 14009\u201314018 . Jia-Chang Feng, Fa-Ting Hong, and Wei-Shi Zheng. 2021. Mist: Multiple instance self-training framework for video anomaly detection. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 14009\u201314018."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1631\/FITEE.2000722","article-title":"Visual commonsense reasoning with directional visual connections","volume":"22","author":"Han Yahong","year":"2021","unstructured":"Yahong Han , Aming Wu , Linchao Zhu , and Yi Yang . 2021 . Visual commonsense reasoning with directional visual connections . Frontiers of Information Technology & Electronic Engineering 22 , 5 (2021), 625 \u2013 637 . Yahong Han, Aming Wu, Linchao Zhu, and Yi Yang. 2021. Visual commonsense reasoning with directional visual connections. Frontiers of Information Technology & Electronic Engineering 22, 5 (2021), 625\u2013637.","journal-title":"Frontiers of Information Technology & Electronic Engineering"},{"volume-title":"CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp)","author":"Hershey Shawn","key":"e_1_3_2_1_7_1","unstructured":"Shawn Hershey , Sourish Chaudhuri , Daniel\u00a0 PW Ellis , Jort\u00a0 F Gemmeke , Aren Jansen , R\u00a0Channing Moore , Manoj Plakal , Devin Platt , Rif\u00a0 A Saurous , Bryan Seybold , 2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp) . IEEE , 131\u2013135. Shawn Hershey, Sourish Chaudhuri, Daniel\u00a0PW Ellis, Jort\u00a0F Gemmeke, Aren Jansen, R\u00a0Channing Moore, Manoj Plakal, Devin Platt, Rif\u00a0A Saurous, Bryan Seybold, 2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 131\u2013135."},{"key":"e_1_3_2_1_8_1","volume-title":"The kinetics human action video dataset. arXiv preprint arXiv:1705.06950","author":"Kay Will","year":"2017","unstructured":"Will Kay , Joao Carreira , Karen Simonyan , Brian Zhang , Chloe Hillier , Sudheendra Vijayanarasimhan , Fabio Viola , Tim Green , Trevor Back , Paul Natsev , 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 ( 2017 ). Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","first-page":"166708","DOI":"10.1007\/s11704-021-1248-1","article-title":"Instance-sequence reasoning for video question answering","volume":"16","author":"Liu Rui","year":"2022","unstructured":"Rui Liu and Yahong Han . 2022 . Instance-sequence reasoning for video question answering . Frontiers of Computer Science 16 , 6 (2022), 166708 . Rui Liu and Yahong Han. 2022. Instance-sequence reasoning for video question answering. Frontiers of Computer Science 16, 6 (2022), 166708.","journal-title":"Frontiers of Computer Science"},{"key":"e_1_3_2_1_10_1","volume-title":"Localizing anomalies from weakly-labeled videos","author":"Lv Hui","year":"2021","unstructured":"Hui Lv , Chuanwei Zhou , Zhen Cui , Chunyan Xu , Yong Li , and Jian Yang . 2021. Localizing anomalies from weakly-labeled videos . IEEE transactions on image processing 30 ( 2021 ), 4505\u20134515. Hui Lv, Chuanwei Zhou, Zhen Cui, Chunyan Xu, Yong Li, and Jian Yang. 2021. Localizing anomalies from weakly-labeled videos. IEEE transactions on image processing 30 (2021), 4505\u20134515."},{"key":"e_1_3_2_1_11_1","volume-title":"ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2260\u20132264","author":"Pang Wen-Feng","year":"2021","unstructured":"Wen-Feng Pang , Qian-Hua He , Yong-jian Hu, and Yan-Xiong Li . 2021 . Violence detection in videos based on fusing visual and audio information . In ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2260\u20132264 . Wen-Feng Pang, Qian-Hua He, Yong-jian Hu, and Yan-Xiong Li. 2021. Violence detection in videos based on fusing visual and audio information. In ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2260\u20132264."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 2665\u20132674","author":"Park Seongheon","year":"2023","unstructured":"Seongheon Park , Hanjae Kim , Minsu Kim , Dahye Kim , and Kwanghoon Sohn . 2023 . Normality Guided Multiple Instance Learning for Weakly Supervised Video Anomaly Detection . In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 2665\u20132674 . Seongheon Park, Hanjae Kim, Minsu Kim, Dahye Kim, and Kwanghoon Sohn. 2023. Normality Guided Multiple Instance Learning for Weakly Supervised Video Anomaly Detection. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 2665\u20132674."},{"key":"e_1_3_2_1_13_1","volume-title":"ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2957\u20132961","author":"Peixoto Bruno","year":"2020","unstructured":"Bruno Peixoto , Bahram Lavi , Paolo Bestagini , Zanoni Dias , and Anderson Rocha . 2020 . Multimodal violence detection in videos . In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2957\u20132961 . Bruno Peixoto, Bahram Lavi, Paolo Bestagini, Zanoni Dias, and Anderson Rocha. 2020. Multimodal violence detection in videos. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2957\u20132961."},{"key":"e_1_3_2_1_14_1","volume-title":"2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE). IEEE, 219\u2013223","author":"Pu Yujiang","year":"2022","unstructured":"Yujiang Pu and Xiaoyu Wu . 2022 . Audio-guided attention network for weakly supervised violence detection . In 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE). IEEE, 219\u2013223 . Yujiang Pu and Xiaoyu Wu. 2022. Audio-guided attention network for weakly supervised violence detection. In 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE). IEEE, 219\u2013223."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00678"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00493"},{"key":"e_1_3_2_1_17_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan\u00a0 N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\u00a0N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01271"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","first-page":"1780","DOI":"10.1631\/FITEE.2200284","article-title":"Dual collaboration for decentralized multi-source domain adaptation","volume":"23","author":"Wei Yikang","year":"2022","unstructured":"Yikang Wei and Yahong Han . 2022 . Dual collaboration for decentralized multi-source domain adaptation . Frontiers of Information Technology & Electronic Engineering 23 , 12 (2022), 1780 \u2013 1794 . Yikang Wei and Yahong Han. 2022. Dual collaboration for decentralized multi-source domain adaptation. Frontiers of Information Technology & Electronic Engineering 23, 12 (2022), 1780\u20131794.","journal-title":"Frontiers of Information Technology & Electronic Engineering"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","first-page":"174705","DOI":"10.1007\/s11704-022-2146-x","article-title":"Domain-specific feature elimination: multi-source domain adaptation for image classification","volume":"17","author":"Wu Kunhong","year":"2023","unstructured":"Kunhong Wu , Fan Jia , and Yahong Han . 2023 . Domain-specific feature elimination: multi-source domain adaptation for image classification . Frontiers of Computer Science 17 , 4 (2023), 174705 . Kunhong Wu, Fan Jia, and Yahong Han. 2023. Domain-specific feature elimination: multi-source domain adaptation for image classification. Frontiers of Computer Science 17, 4 (2023), 174705.","journal-title":"Frontiers of Computer Science"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3062192"},{"key":"e_1_3_2_1_23_1","volume-title":"Proceedings, Part XXX 16","author":"Wu Peng","year":"2020","unstructured":"Peng Wu , Jing Liu , Yujia Shi , Yujia Sun , Fangtao Shao , Zhaoyang Wu , and Zhiwei Yang . 2020 . Not only look, but also listen: Learning multimodal violence detection under weak supervision. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020 , Proceedings, Part XXX 16 . Springer, 322\u2013339. Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. 2020. Not only look, but also listen: Learning multimodal violence detection under weak supervision. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXX 16. Springer, 322\u2013339."},{"key":"e_1_3_2_1_24_1","volume-title":"The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation. arXiv preprint arXiv:2206.06487","author":"Xue Zihui","year":"2022","unstructured":"Zihui Xue , Zhengqi Gao , Sucheng Ren , and Hang Zhao . 2022. The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation. arXiv preprint arXiv:2206.06487 ( 2022 ). Zihui Xue, Zhengqi Gao, Sucheng Ren, and Hang Zhao. 2022. The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation. arXiv preprint arXiv:2206.06487 (2022)."},{"key":"e_1_3_2_1_25_1","volume-title":"MCL: A Contrastive Learning Method for Multimodal Data Fusion in Violence Detection","author":"Yang Liu","year":"2022","unstructured":"Liu Yang , Zhenjie Wu , Junkun Hong , and Jun Long . 2022 . MCL: A Contrastive Learning Method for Multimodal Data Fusion in Violence Detection . IEEE Signal Processing Letters ( 2022). Liu Yang, Zhenjie Wu, Junkun Hong, and Jun Long. 2022. MCL: A Contrastive Learning Method for Multimodal Data Fusion in Violence Detection. IEEE Signal Processing Letters (2022)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547868"},{"key":"e_1_3_2_1_27_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 1237\u20131246","author":"Zhong Jia-Xing","year":"2019","unstructured":"Jia-Xing Zhong , Nannan Li , Weijie Kong , Shan Liu , Thomas\u00a0 H Li , and Ge Li . 2019 . Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection . In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 1237\u20131246 . Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas\u00a0H Li, and Ge Li. 2019. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 1237\u20131246."},{"key":"e_1_3_2_1_28_1","volume-title":"Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection. arXiv preprint arXiv:2302.05160","author":"Zhou Hang","year":"2023","unstructured":"Hang Zhou , Junqing Yu , and Wei Yang . 2023. Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection. arXiv preprint arXiv:2302.05160 ( 2023 ). Hang Zhou, Junqing Yu, and Wei Yang. 2023. Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection. arXiv preprint arXiv:2302.05160 (2023)."}],"event":{"name":"MMAsia '23: ACM Multimedia Asia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Tainan Taiwan","acronym":"MMAsia '23"},"container-title":["ACM Multimedia Asia 2023"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3595916.3626375","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3595916.3626375","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:48:40Z","timestamp":1750286920000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3595916.3626375"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,6]]},"references-count":28,"alternative-id":["10.1145\/3595916.3626375","10.1145\/3595916"],"URL":"https:\/\/doi.org\/10.1145\/3595916.3626375","relation":{},"subject":[],"published":{"date-parts":[[2023,12,6]]},"assertion":[{"value":"2024-01-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}