{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T16:57:14Z","timestamp":1775667434500,"version":"3.50.1"},"reference-count":76,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,11,8]],"date-time":"2025-11-08T00:00:00Z","timestamp":1762560000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>CCTV safety monitoring demands anomaly detectors combine reliable clip-level accuracy with predictable per-clip latency despite weak supervision. This work investigates compact vision\u2013language models (VLMs) as practical detectors for this regime. A unified evaluation protocol standardizes preprocessing, prompting, dataset splits, metrics, and runtime settings to compare parameter-efficiently adapted compact VLMs against training-free VLM pipelines and weakly supervised baselines. Evaluation spans accuracy, precision, recall, F1, ROC-AUC, and average per-clip latency to jointly quantify detection quality and efficiency. With parameter-efficient adaptation, compact VLMs achieve performance on par with, and in several cases exceeding, established approaches while retaining competitive per-clip latency. Adaptation further reduces prompt sensitivity, producing more consistent behavior across prompt regimes under the shared protocol. These results show that parameter-efficient fine-tuning enables compact VLMs to serve as dependable clip-level anomaly detectors, yielding a favorable accuracy\u2013efficiency trade-off within a transparent and consistent experimental setup.<\/jats:p>","DOI":"10.3390\/jimaging11110400","type":"journal-article","created":{"date-parts":[[2025,11,10]],"date-time":"2025-11-10T11:47:27Z","timestamp":1762775247000},"page":"400","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Benchmarking Compact VLMs for Clip-Level Surveillance Anomaly Detection Under Weak Supervision"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-8203-1059","authenticated-orcid":false,"given":"Kirill","family":"Borodin","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, Moscow Technical University of Communication and Informatics, Moscow 111024, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5385-4636","authenticated-orcid":false,"given":"Kirill","family":"Kondrashov","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Moscow Technical University of Communication and Informatics, Moscow 111024, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5559-470X","authenticated-orcid":false,"given":"Nikita","family":"Vasiliev","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Moscow Technical University of Communication and Informatics, Moscow 111024, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-9961-6708","authenticated-orcid":false,"given":"Ksenia","family":"Gladkova","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Moscow Technical University of Communication and Informatics, Moscow 111024, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8477-150X","authenticated-orcid":false,"given":"Inna","family":"Larina","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Moscow Technical University of Communication and Informatics, Moscow 111024, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1739-9831","authenticated-orcid":false,"given":"Mikhail","family":"Gorodnichev","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Moscow Technical University of Communication and Informatics, Moscow 111024, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5802-5513","authenticated-orcid":false,"given":"Grach","family":"Mkrtchian","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Moscow Technical University of Communication and Informatics, Moscow 111024, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,8]]},"reference":[{"key":"ref_1","unstructured":"Hu, M., Luo, Z., Pasdar, A., Lee, Y.C., Zhou, Y., and Wu, D. (2023). Edge-Based Video Analytics: A Survey. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Abdalla, M., Javed, S., Radi, M.A., Ulhaq, A., and Werghi, N. (2024). Video Anomaly Detection in 10 Years: A Survey and Outlook. arXiv.","DOI":"10.1007\/s00521-025-11659-8"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"105793","DOI":"10.1016\/j.scs.2024.105793","article-title":"Elevating urban surveillance: A deep CCTV monitoring system for detection of anomalous events via human action recognition","volume":"114","author":"Kim","year":"2024","journal-title":"Sustain. Cities Soc."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2951","DOI":"10.1109\/COMST.2023.3323091","article-title":"Edge video analytics: A survey on applications, systems and enabling techniques","volume":"25","author":"Xu","year":"2023","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"101716","DOI":"10.1016\/j.iot.2025.101716","article-title":"From lab to field: Real-world evaluation of an AI-driven Smart Video Solution to enhance community safety","volume":"33","author":"Yao","year":"2025","journal-title":"Internet Things"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sultani, W., Chen, C., and Shah, M. (2018, January 18\u201322). Real-World Anomaly Detection in Surveillance Videos. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00678"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3742794","article-title":"ViEdge: Video Analytics on Distributed Edge","volume":"6","author":"Hou","year":"2025","journal-title":"ACM Trans. Internet Things"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"e6317","DOI":"10.1002\/cpe.6317","article-title":"Performance characterization of video analytics workloads in heterogeneous edge infrastructures","volume":"35","author":"Rivas","year":"2023","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"100204","DOI":"10.1016\/j.hcc.2024.100204","article-title":"Privacy-preserving human activity sensing: A survey","volume":"4","author":"Yang","year":"2024","journal-title":"High-Confid. Comput."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhang, T., Aftab, W., Mihaylova, L., Langran-Wheeler, C., Rigby, S., Fletcher, D., Maddock, S., and Bosworth, G. (2022). Recent Advances in Video Analytics for Rail Network Surveillance for Security, Trespass and Suicide Prevention\u2014A Survey. Sensors, 22.","DOI":"10.3390\/s22124324"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ezzat, M.A., Abd El Ghany, M.A., Almotairi, S., and Salem, M.A.M. (2021). Horizontal Review on Video Surveillance for Smart Cities: Edge Devices, Applications, Datasets, and Future Trends. Sensors, 21.","DOI":"10.3390\/s21093222"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"108146","DOI":"10.1016\/j.patcog.2021.108146","article-title":"Edge computing enabled video segmentation for real-time traffic monitoring in internet of vehicles","volume":"121","author":"Wan","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Borawar, L., and Kaur, R. (2022, January 23\u201325). Anomaly Detection Methods in Surveillance Videos: A Survey. Proceedings of the 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India.","DOI":"10.1109\/SMARTGENCON56628.2022.10084028"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wu, P., Liu, J., Shi, Y., Sun, Y., Shao, F., Wu, Z., and Yang, Z. (2020, January 23\u201328). Not only Look, But Also Listen: Learning Multimodal Violence Detection Under Weak Supervision. Proceedings of the 16th European Conference of Computer Vision (ECCV 2020), Glasgow, UK. Proceedings, Part XXX.","DOI":"10.1007\/978-3-030-58577-8_20"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ramachandra, B., and Jones, M.J. (2020, January 1\u20135). Street Scene: A new dataset and evaluation protocol for video anomaly detection. Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093457"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liu, W., Luo, W., Lian, D., and Gao, S. (2018, January 18\u201322). Future Frame Prediction for Anomaly Detection\u2014A New Baseline. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00684"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"12635","DOI":"10.1007\/s11042-022-13954-1","article-title":"Analysis of anomaly detection in surveillance video: Recent trends and future vision","volume":"82","author":"Raja","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Dalvi, J., Dabouei, A., Dhanuka, G., and Xu, M. (March, January 28). Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection. Proceedings of the 2025 IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA.","DOI":"10.1109\/WACV61041.2025.00531"},{"key":"ref_19","first-page":"200236","article-title":"Unveiling the performance of video anomaly detection models\u2014A benchmark-based review","volume":"18","author":"Caetano","year":"2023","journal-title":"Intell. Syst. Appl."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yao, S., Noghre, G.A., Pazho, A.D., and Tabkhi, H. (2024, January 17\u201318). Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00486"},{"key":"ref_21","unstructured":"Zhu, L., Wang, L., Raj, A., Gedeon, T., and Chen, C. (2025, January 9\u201315). Advancing video anomaly detection: A concise review and a new dataset. Proceedings of the 38th International Conference on Neural Information Processing Systems (NIPS\u201924), Red Hook, NY, USA."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.comcom.2024.05.021","article-title":"IoT video analytics for surveillance-based systems in smart cities","volume":"224","author":"Aminiyeganeh","year":"2024","journal-title":"Comput. Commun."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Karim, H., Doshi, K., and Yilmaz, Y. (2024, January 3\u20138). Real-Time Weakly Supervised Video Anomaly Detection. Proceedings of the 2024 IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV57701.2024.00670"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"127726","DOI":"10.1016\/j.neucom.2024.127726","article-title":"Video anomaly detection: A systematic review of issues and prospects","volume":"591","author":"Samaila","year":"2024","journal-title":"Neurocomputing"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Song, J., Jiang, Y., and Li, H. (2023). Online Video Anomaly Detection. Sensors, 23.","DOI":"10.3390\/s23177442"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yang, Z., and Radke, R.J. (2024, January 16\u201322). Context-aware Video Anomaly Detection in Long-Term Datasets. Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00404"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5480","DOI":"10.1109\/TCSVT.2024.3350084","article-title":"Weakly-Supervised Video Anomaly Detection With Snippet Anomalous Attention","volume":"34","author":"Fan","year":"2024","journal-title":"IEEE Trans. Cir. Syst. Video Technol."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Duong, H.T., Le, V.T., and Hoang, V.T. (2023). Deep Learning-Based Anomaly Detection in Video Surveillance: A Survey. Sensors, 23.","DOI":"10.3390\/s23115024"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"4505","DOI":"10.1109\/TIP.2021.3072863","article-title":"Localizing Anomalies From Weakly-Labeled Videos","volume":"30","author":"Lv","year":"2021","journal-title":"Trans. Img. Proc."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Sun, C., Jia, Y., Hu, Y., and Wu, Y. (2020, January 12\u201316). Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413887"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"104163","DOI":"10.1016\/j.cviu.2024.104163","article-title":"Delving into CLIP latent space for Video Anomaly Recognition","volume":"249","author":"Zanella","year":"2024","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Majhi, S., Dai, R., Kong, Q., Garattoni, L., Francesca, G., and Br\u00e9mond, F. (2024, January 3\u20138). OE-CTST: Outlier-Embedded Cross Temporal Scale Transformer for Weakly-supervised Video Anomaly Detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV57701.2024.00838"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"106106","DOI":"10.1016\/j.neunet.2024.106106","article-title":"Self-supervised anomaly detection in computer vision and beyond: A survey and outlook","volume":"172","author":"Hojjati","year":"2024","journal-title":"Neural Netw."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wang, Y., Qin, C., Bai, Y., Xu, Y., Ma, X., and Fu, Y. (December, January 28). Making Reconstruction-based Method Great Again for Video Anomaly Detection. Proceedings of the 2022 IEEE International Conference on Data Mining (ICDM), Orlando, FL, USA.","DOI":"10.1109\/ICDM54844.2022.00157"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lv, H., Yue, Z., Sun, Q., Luo, B., Cui, Z., and Zhang, H. (2023, January 18\u201322). Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00775"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., and Carneiro, G. (2021, January 10\u201317). Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning. Proceedings of the International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00493"},{"key":"ref_37","unstructured":"Li, S., Liu, F., and Jiao, L. (March, January 22). Self-Training Multi-Sequence Learning with Transformer for Weakly Supervised Video Anomaly Detection. Proceedings of the AAAI Conference on Artificial Intelligence, Online."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y. (2016, January 27\u201330). Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.70"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chen, J., Li, L., Su, L., Zha, Z.J., and Huang, Q. (2024, January 16\u201322). Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01734"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"4923","DOI":"10.1109\/TIP.2024.3451935","article-title":"Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection","volume":"33","author":"Pu","year":"2024","journal-title":"IEEE Trans. Image Process."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Yang, Z., Liu, J., and Wu, P. (2024, January 16\u201322). Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01788"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Feng, J.C., Hong, F.T., and Zheng, W.S. (2021, January 19\u201325). MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01379"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zanella, L., Menapace, W., Mancini, M., Wang, Y., and Ricci, E. (2024, January 16\u201322). Harnessing Large Language Models for Training-Free Video Anomaly Detection. Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01753"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Ye, M., Liu, W., and He, P. (2025, January 11\u201315). VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR52734.2025.00811"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Yang, Y., Lee, K., Dariush, B., Cao, Y., and Lo, S. Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models. Proceedings of the Computer Vision\u2014ECCV 2024, 2025 Lecture Notes in Computer Science, Springer.","DOI":"10.1007\/978-3-031-73004-7_18"},{"key":"ref_46","unstructured":"Li, C., and Jiang, Y. (2024, January 25\u201328). VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection. Proceedings of the British Machine Vision Conference (BMVC), Glasgow, UK."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"16706","DOI":"10.1109\/TNNLS.2025.3553556","article-title":"Dynamic Erasing Network With Adaptive Temporal Modeling for Weakly Supervised Video Anomaly Detection","volume":"36","author":"Zhang","year":"2025","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Biradar, K., Tyagi, D.K., Battula, R.B., and Subbarao, P. (2024, January 8\u201312). Robust Anomaly Detection Through Transformer-Encoded Feature Diversity Learning. Proceedings of the 17th Asian Conference on Computer Vision (ACCV 2024 Workshops), Hanoi, Vietnam. Revised Selected Papers, Part I.","DOI":"10.1007\/978-981-96-2641-0_8"},{"key":"ref_49","unstructured":"Keles, F.D., Wijewardena, P.M., and Hegde, C. (2023, January 20\u201323). On The Computational Complexity of Self-Attention. Proceedings of the 34th International Conference on Algorithmic Learning Theory (ALT), Singapore."},{"key":"ref_50","unstructured":"Acharya, S., Jia, F., and Ginsburg, B. (2025, January 13\u201319). Star Attention: Efficient LLM Inference over Long Sequences. Proceedings of the 42nd International Conference on Machine Learning (ICML), Vancouver, BC, Canada."},{"key":"ref_51","unstructured":"Lou, C., Jia, Z., Zheng, Z., and Tu, K. (2024). Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers. arXiv."},{"key":"ref_52","unstructured":"Rao, Y., Zhao, W., Liu, B., Lu, J., Zhou, J., and Hsieh, C.J. (2021, January 6\u201314). DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification. Proceedings of the Advances in Neural Information Processing Systems, Online."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"2124","DOI":"10.1109\/TMM.2023.3292596","article-title":"DSS-Net: Dynamic Self-Supervised Network for Video Anomaly Detection","volume":"26","author":"Wu","year":"2024","journal-title":"Trans. Multi."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Wang, Z., Zou, Y., and Zhang, Z. (2020, January 12\u201316). Cluster Attention Contrast for Video Anomaly Detection. Proceedings of the 28th ACM International Conference on Multimedia (MM \u201920), New York, NY, USA.","DOI":"10.1145\/3394171.3413529"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Sun, S., and Gong, X. (2023, January 17\u201324). Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.02188"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Rai, A.K., Krishna, T., Hu, F., Drimbarean, A., McGuinness, K., Smeaton, A.F., and O\u2019Connor, N.E. (2024, January 17\u201318). Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation: A Unified Approach. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), VAND Workshop, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00393"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"104074","DOI":"10.1016\/j.cviu.2024.104074","article-title":"Lightning fast video anomaly detection via multi-scale adversarial distillation","volume":"247","author":"Croitoru","year":"2024","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_58","unstructured":"Sakai, S., He, X., Gu, C., Sigal, L., and Hasegawa, T. (2025). Reconstruction-Free Anomaly Detection with Diffusion Models. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Sun, W., Cao, L., Guo, Y., and Du, K. (2024). Multimodal and multiscale feature fusion for weakly supervised video anomaly detection. Sci. Rep., 14.","DOI":"10.1038\/s41598-024-73462-0"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/OJCS.2024.3517154","article-title":"Multimodal Attention-Enhanced Feature Fusion-Based Weakly Supervised Anomaly Violence Detection","volume":"6","author":"Shin","year":"2025","journal-title":"IEEE Open J. Comput. Soc."},{"key":"ref_61","unstructured":"Wu, P., Zhou, X., Pang, G., Yang, Z., Yan, Q., Wang, P., and Zhang, Y. (November, January 28). Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts. Proceedings of the 32nd ACM International Conference on Multimedia (MM \u201924), New York, NY, USA."},{"key":"ref_62","unstructured":"Wang, Y., Guo, D., Li, S., Camps, O., and Fu, Y. (2025). Unveiling the Unseen: A Comprehensive Survey on Explainable Anomaly Detection in Images and Videos. arXiv."},{"key":"ref_63","unstructured":"Wu, P., Pan, C., Yan, Y., Pang, G., Wang, P., and Zhang, Y. (2024). Deep Learning for Video Anomaly Detection: A Review. arXiv."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Jebur, S.A., Hussein, K.A., Hoomod, H.K., Alzubaidi, L., Saihood, A.A., and Gu, Y. (2024). A Scalable and Generalized Deep Learning Framework for Anomaly Detection in Surveillance Videos. arXiv.","DOI":"10.1155\/int\/1947582"},{"key":"ref_65","unstructured":"Ding, X., and Wang, L. (2024). Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight. arXiv."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Chen, L., Li, J., Dong, X., Zhang, P., Zang, Y., Chen, Z., Duan, H., Wang, J., Qiao, Y., and Lin, D. (2024, January 9\u201315). Are We on the Right Way for Evaluating Large Vision-Language Models?. Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada.","DOI":"10.52202\/079017-0850"},{"key":"ref_67","unstructured":"Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2022, January 25\u201329). LoRA: Low-Rank Adaptation of Large Language Models. Proceedings of the 10th International Conference on Learning Representations (ICLR 2022), Online."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Zanella, M., and Ben Ayed, I. (2024, January 17\u201318). Low-Rank Few-Shot Adaptation of Vision-Language Models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00166"},{"key":"ref_69","unstructured":"Zhu, J., Wang, W., Chen, Z., Liu, Z., Ye, S., Gu, L., Tian, H., Duan, Y., Su, W., and Shao, J. (2025). InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models. arXiv."},{"key":"ref_70","unstructured":"Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., and Tang, J. (2025). Qwen2.5-VL Technical Report. arXiv."},{"key":"ref_71","unstructured":"Gemma Team (2025). Gemma 3 Technical Report. arXiv."},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Lin, B., Xia, J., Zhang, S., Liu, Z., Wang, Y., Liu, Z., and Li, C. (2024, January 12\u201316). Video-LLaVA: Learning United Visual Representation by Alignment Before Projection. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), Miami, FL, USA.","DOI":"10.18653\/v1\/2024.emnlp-main.342"},{"key":"ref_73","unstructured":"Liu, H., Li, C., Wu, Q., and Lee, Y.J. (2023, January 10\u201316). Visual Instruction Tuning. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Zheng, Y., Zhang, R., Zhang, J., Ye, Y., Luo, Z., Feng, Z., and Ma, Y. (2024, January 11\u201316). LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.acl-demos.38"},{"key":"ref_75","unstructured":"Shuttleworth, R., Andreas, J., Torralba, A., and Sharma, P. (2025). LoRA vs Full Fine-tuning: An Illusion of Equivalence. arXiv."},{"key":"ref_76","unstructured":"Schulman, J., and Thinking Machines Lab (2025, October 12). LoRA Without Regret; Thinking Machines Lab: Connectionism. Available online: https:\/\/thinkingmachines.ai\/blog\/lora\/."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/11\/400\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T05:12:48Z","timestamp":1762837968000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/11\/400"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,8]]},"references-count":76,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["jimaging11110400"],"URL":"https:\/\/doi.org\/10.3390\/jimaging11110400","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,8]]}}}