{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T16:28:58Z","timestamp":1781368138872,"version":"3.54.1"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"DOI":"10.13039\/501100001459","name":"Ministry of Education, Singapore","doi-asserted-by":"crossref","award":["MOE-MOET32022-0001"],"award-info":[{"award-number":["MOE-MOET32022-0001"]}],"id":[{"id":"10.13039\/501100001459","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>The existing deepfake detection methods have reached a bottleneck in generalizing to unseen forgeries and manipulation approaches. Based on the observation that the deepfake detectors exhibit a preference for overfitting specific primary regions in input, this article enhances the generalization capability from a novel regularization perspective. This can be simply achieved by augmenting the images through primary region removal, thereby preventing the detector from over-relying on data bias. Our method consists of two stages, namely the static localization for primary region maps, as well as the dynamic exploitation of primary region masks. The proposed method can be seamlessly integrated into different backbones without affecting their inference efficiency. We conduct extensive experiments over five widely used deepfake datasets\u2014DFDC, DF-1.0, Celeb-DF, WildDF, and FFIW with seven backbones. Our method demonstrates an average performance improvement of 6% across different backbones and performs competitively with several state-of-the-art baselines.<\/jats:p>","DOI":"10.1145\/3777474","type":"journal-article","created":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T16:05:03Z","timestamp":1763568303000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Towards Generalizable Deepfake Detection by Primary Region Regularization"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7436-0162","authenticated-orcid":false,"given":"Harry","family":"Cheng","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8691-5372","authenticated-orcid":false,"given":"Yangyang","family":"Guo","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2920-6099","authenticated-orcid":false,"given":"Tianyi","family":"Wang","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1476-0273","authenticated-orcid":false,"given":"Liqiang","family":"Nie","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4846-2015","authenticated-orcid":false,"given":"Mohan","family":"Kankanhalli","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,2,9]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/WIFS.2018.8630761"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3612928"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00408"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58574-7_7"},{"key":"e_1_3_2_6_2","first-page":"416","volume-title":"Advances in Neural Information Processing Systems","author":"Chapelle Olivier","year":"2000","unstructured":"Olivier Chapelle, Jason Weston, L\u00e9on Bottou, and Vladimir Vapnik. 2000. Vicinal risk minimization. In Advances in Neural Information Processing Systems, 416\u2013422."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00097"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01815"},{"key":"e_1_3_2_9_2","first-page":"1","volume-title":"Advances in Neural Information Processing Systems","author":"Chen Liang","year":"2022","unstructured":"Liang Chen, Yong Zhang, Yibing Song, Jue Wang, and Lingqiao Liu. 2022. OST: Improving generalization of DeepFake detection via one-shot test-time training. In Advances in Neural Information Processing Systems, 1\u201314."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3225476"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00890"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3625231"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00582"},{"key":"e_1_3_2_14_2","unstructured":"Brian Dolhansky Joanna Bitton Ben Pflaum Jikuo Lu Russ Howes Menglin Wang and Cristian Canton-Ferrer .2020. The DeepFake detection challenge dataset. arXiv:2006.07397. Retrieved from https:\/\/arxiv.org\/abs\/2006.07397"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00389"},{"key":"e_1_3_2_16_2","first-page":"1","volume-title":"International Conference on Learning Representations","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16\u2009\u00d7\u200916 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 1\u201312."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i1.19938"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01963"},{"key":"e_1_3_2_19_2","first-page":"10750","volume-title":"Advances in Neural Information Processing Systems","author":"Ghiasi Golnaz","year":"2018","unstructured":"Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V. Le. 2018. DropBlock: A regularization method for convolutional networks. In Advances in Neural Information Processing Systems, 10750\u201310760."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01453"},{"key":"e_1_3_2_21_2","first-page":"5039","volume-title":"Conference on Computer Vision and Pattern Recognition","author":"Haliassos Alexandros","year":"2021","unstructured":"Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2021. Lips don\u2019t lie: A generalisable and robust approach to face forgery detection. In Conference on Computer Vision and Pattern Recognition, 5039\u20135049."},{"key":"e_1_3_2_22_2","first-page":"22995","volume-title":"Computer Vision and Pattern Recognition Conference","author":"Hua Han Yue-","year":"2025","unstructured":"Yue-Hua Han, Tai-Ming Huang, Shu-Tzu Lo, Po-Han Huang, Kai-Lung Hua, and Jun-Cheng Chen. 2025. Towards more general video-based deepfake detection through facial feature guided adaptation for foundation model. In Computer Vision and Pattern Recognition Conference, 22995\u201323005."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00072"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2022.3141262"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00407"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00296"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643030"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00512"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00505"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00327"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3623639"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01605"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58571-6_39"},{"key":"e_1_3_2_35_2","first-page":"2823","volume-title":"ACM International Conference on Multimedia","author":"Mittal Trisha","year":"2020","unstructured":"Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2020. Emotions don\u2019t lie: An audio-visual deepfake detection method using affective cues. In ACM International Conference on Multimedia, 2823\u20132832."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-019-09784-7"},{"key":"e_1_3_2_37_2","first-page":"2307","volume-title":"International Conference on Acoustics, Speech, and Signal Processing","author":"Nguyen Huy H.","year":"2019","unstructured":"Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In International Conference on Acoustics, Speech, and Signal Processing, 2307\u20132311."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00728"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41576-022-00532-2"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58610-2_6"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00009"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3686162"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01816"},{"key":"e_1_3_2_46_2","unstructured":"Ravid Shwartz-Ziv and Naftali Tishby. 2017. Opening the black box of deep neural networks via information. arXiv:1703.00810. Retrieved from https:\/\/arxiv.org\/abs\/1703.00810"},{"key":"e_1_3_2_47_2","first-page":"1","volume-title":"International Conference on Learning Representations","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 1\u201314."},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","unstructured":"Ke Sun Shen Chen Taiping Yao Hong Liu Xiaoshuai Sun Shouhong Ding and Rongrong Ji. 2024. DiffusionFake: Enhancing generalization in deepfake detection via guided stable diffusion. In Advances in Neural Information Processing Systems 101474\u2013101497.","DOI":"10.52202\/079017-3218"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-024-02160-1"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01823"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i2.20130"},{"key":"e_1_3_2_52_2","first-page":"6105","volume-title":"International Conference on Machine Learning","volume":"97","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, Vol. 97, 6105\u20136114."},{"key":"e_1_3_2_53_2","first-page":"1427","volume-title":"European Conference on Computer Vision","author":"Tilgner Stephan","year":"2020","unstructured":"Stephan Tilgner, Daniel Wagner, Kathrin Kalischewski, Jan-Christoph Schmitz, and Anton Kummert. 2020. Study on the influence of multiple image inputs of a multi-view fusion neural network based on Grad-CAM and masked image inputs. In European Conference on Computer Vision, 1427\u20131431."},{"key":"e_1_3_2_54_2","unstructured":"Naftali Tishby Fernando C. Pereira and William Bialek. 2000. The information bottleneck method. arXiv:physics\/0004057. Retrieved from https:\/\/arxiv.org\/abs\/physics\/0004057"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01468"},{"key":"e_1_3_2_56_2","unstructured":"Deressa Wodajo and Solomon Atnafu. 2021. Deepfake video detection using convolutional vision transformer. arXiv:2102.11126. Retrieved from https:\/\/arxiv.org\/abs\/2102.11126"},{"key":"e_1_3_2_57_2","first-page":"1","volume-title":"Advances in Neural Information Processing Systems","author":"Yan Zhiyuan","year":"2024","unstructured":"Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, and Li Yuan. 2024. DF40: Toward next-generation deepfake detection. In Advances in Neural Information Processing Systems, 1\u201316."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02048"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01177"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00612"},{"key":"e_1_3_2_61_2","first-page":"1","volume-title":"International Conference on Learning Representations","author":"Zhang Hongyi","year":"2018","unstructured":"Hongyi Zhang, Moustapha Ciss\u00e9, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 1\u201313."},{"key":"e_1_3_2_62_2","first-page":"1","volume-title":"International Conference on Learning Representations","author":"Zhang Linjun","year":"2021","unstructured":"Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, and James Zou. 2021. How does mixup help with robustness and generalization? In International Conference on Learning Representations, 1\u201312."},{"key":"e_1_3_2_63_2","first-page":"26135","volume-title":"International Conference on Machine Learning","volume":"162","author":"Zhang Linjun","year":"2022","unstructured":"Linjun Zhang, Zhun Deng, Kenji Kawaguchi, and James Zou. 2022. When and how mixup improves calibration. In International Conference on Machine Learning, Vol. 162, 26135\u201326160."},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00222"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00572"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01453"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413769"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3777474","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,9]],"date-time":"2026-02-09T14:57:26Z","timestamp":1770649046000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3777474"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,9]]},"references-count":66,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3777474"],"URL":"https:\/\/doi.org\/10.1145\/3777474","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,9]]},"assertion":[{"value":"2025-04-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}