{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T09:50:42Z","timestamp":1756461042356,"version":"3.41.0"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,12,24]],"date-time":"2024-12-24T00:00:00Z","timestamp":1734998400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U23B2022, U22A2030"],"award-info":[{"award-number":["U23B2022, U22A2030"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Guangdong Major Project of Basic and Applied Basic Research","award":["2023B0303000010"],"award-info":[{"award-number":["2023B0303000010"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:p>Hyper-realistic avatars in the metaverse have already raised security concerns about deepfake techniques; deepfakes involving generated video \u201crecording\u201d may be mistaken for a real recording of the people it depicts. As a result, deepfake detection has drawn considerable attention in the multimedia forensic community. Though existing methods for deepfake detection achieve fairly good performance under the intra-dataset scenario, many of them gain unsatisfying results in the case of cross-dataset testing with more practical value, where the forged faces in training and testing datasets are from different domains. To tackle this issue, in this article, we propose a novel Domain-Invariant and Patch-Discriminative feature learning framework\u2014DI&amp;PD. For image-level feature learning, a single-side adversarial domain generalization is introduced to eliminate domain variances and learn domain-invariant features in training samples from different manipulation methods, along with the global and local random crop augmentation strategy to generate more data views of forged images at various scales. A graph structure is then built by splitting the learned image-level feature maps, with each spatial location corresponding to a local patch, which facilitates patch representation learning by message-passing among similar nodes. Two types of center losses are utilized to learn more discriminative features in both image-level and patch-level embedding spaces. Extensive experimental results on several datasets demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.<\/jats:p>","DOI":"10.1145\/3657297","type":"journal-article","created":{"date-parts":[[2024,4,27]],"date-time":"2024-04-27T09:28:01Z","timestamp":1714210081000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Domain-invariant and Patch-discriminative Feature Learning for General Deepfake Detection"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-9632-3785","authenticated-orcid":false,"given":"Jian","family":"Zhang","sequence":"first","affiliation":[{"name":"Sun Yat-sen University, School of Computer Science and Engineering, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7520-9031","authenticated-orcid":false,"given":"Jiangqun","family":"Ni","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen, China and Department of New Networks, Peng Cheng Laboratory, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-6867-498X","authenticated-orcid":false,"given":"Fan","family":"Nie","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, School of Computer Science and Engineering, Guangzhou, China and Department of New Networks, Peng Cheng Laboratory, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7625-5689","authenticated-orcid":false,"given":"Jiwu","family":"Huang","sequence":"additional","affiliation":[{"name":"Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen, China and Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,12,24]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/WIFS.2018.8630761"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00676"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2909827.2930786"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","unstructured":"Federico Becattini Carmen Bisogni Vincenzo Loia Chiara Pero and Fei Hao. 2023. Head pose estimation patterns as deepfake detectors. ACM Trans. Multimedia Comput. Commun. Appl. (Aug. 2023). 10.1145\/3612928","DOI":"10.1145\/3612928"},{"key":"e_1_3_1_6_2","volume-title":"International Conference on Learning Representations (ICLR\u201914)","author":"Bruna Joan","year":"2014","unstructured":"Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR\u201914)."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i2.16193"},{"key":"e_1_3_1_8_2","unstructured":"Zehao Chen and Hua Yang. 2020. Manipulated face detector: Joint spatial and frequency domain attention network. ArXiv abs\/2005.02958 (2020). https:\/\/api.semanticscholar.org\/CorpusID:218516585"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3082031.3083247"},{"key":"e_1_3_1_10_2","article-title":"ForensicTransfer: Weakly-supervised domain adaptation for forgery detection","author":"Cozzolino Davide","year":"2018","unstructured":"Davide Cozzolino, Justus Thies, Andreas R\u00f6ssler, Christian Riess, Matthias Nie\u00dfner, and Luisa Verdoliva. 2018. ForensicTransfer: Weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510 (2018).","journal-title":"arXiv preprint arXiv:1812.02510"},{"volume-title":"Retrieved from https:\/\/www.github.com\/deepfakes\/faceswap","year":"2019","key":"e_1_3_1_11_2","unstructured":"Deepfakes. 2019. Retrieved from https:\/\/www.github.com\/deepfakes\/faceswap"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00520"},{"key":"e_1_3_1_14_2","article-title":"The Deepfakes Detection Challenge (DFDC) preview dataset","author":"Dolhansky Brian","year":"2019","unstructured":"Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. 2019. The Deepfakes Detection Challenge (DFDC) preview dataset. arXiv preprint arXiv:1910.08854 (2019).","journal-title":"arXiv preprint arXiv:1910.08854"},{"key":"e_1_3_1_15_2","article-title":"Unmasking deepfakes with simple features","author":"Durall Ricard","year":"2019","unstructured":"Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, and Janis Keuper. 2019. Unmasking deepfakes with simple features. arXiv preprint arXiv:1911.00686 (2019).","journal-title":"arXiv preprint arXiv:1911.00686"},{"volume-title":"Retrieved from https:\/\/www.github.com\/MarekKowalski\/FaceSwap","year":"2019","key":"e_1_3_1_16_2","unstructured":"Faceswap. 2019. Retrieved from https:\/\/www.github.com\/MarekKowalski\/FaceSwap"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01963"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2012.2190402"},{"key":"e_1_3_1_19_2","first-page":"1180","volume-title":"International Conference on Machine Learning","author":"Ganin Yaroslav","year":"2015","unstructured":"Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning. PMLR, 1180\u20131189."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219947"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3536426"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2005.1555942"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01453"},{"key":"e_1_3_1_24_2","first-page":"5039","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Haliassos Alexandros","year":"2021","unstructured":"Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2021. Lips don\u2019t lie: A generalisable and robust approach to face forgery detection. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5039\u20135049."},{"key":"e_1_3_1_25_2","volume-title":"Annual Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Hamilton William L.","year":"2017","unstructured":"William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Annual Conference on Neural Information Processing Systems (NIPS\u201917)."},{"key":"e_1_3_1_26_2","unstructured":"Kai Han Yunhe Wang Jianyuan Guo Yehui Tang and Enhua Wu. 2024. Vision GNN: an image is worth graph of nodes. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS \u201922). Curran Associates Inc. Red Hook NY USA 8291\u20138303."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","unstructured":"Farkhund Iqbal Ahmed Abbasi Abdul Rehman Javed Ahmad Almadhor Zunera Jalil Sajid Anwar and Imad Rida. 2023. Data augmentation-based novel deep learning method for deepfaked images detection. ACM Trans. Multimedia Comput. Commun. Appl. (Apr. 2023). 10.1145\/3592615. Just Accepted.","DOI":"10.1145\/3592615"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00296"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.241"},{"key":"e_1_3_1_31_2","first-page":"9","volume-title":"Neural Networks: Tricks of the Trade","author":"LeCun Yann","year":"2002","unstructured":"Yann LeCun, L\u00e9on Bottou, Genevieve B. Orr, and Klaus-Robert M\u00fcller. 2002. Efficient backprop. In Neural Networks: Tricks of the Trade. Springer, 9\u201350."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11596"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00936"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2020.107616"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00566"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00639"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00505"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/WIFS.2018.8630787"},{"key":"e_1_3_1_39_2","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW\u201919)","author":"Li Yuezun","year":"2019","unstructured":"Yuezun Li and Siwei Lyu. 2019. Exposing DeepFake videos by detecting face warping artifacts. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW\u201919)."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00327"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00083"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2022.108790"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","unstructured":"Xiaolong Liu Yang Yu Xiaolong Li Yao Zhao and Guodong Guo. 2023. TCSD: Triple complementary streams detector for comprehensive deepfake detection. ACM Trans. Multimedia Comput. Commun. Appl. 19 6 Article 213 (jul 2023) 22 pages. 10.1145\/3558004","DOI":"10.1145\/3558004"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01605"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2010350"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.576"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.609"},{"key":"e_1_3_1_48_2","article-title":"Multi-task learning for detecting and segmenting manipulated facial images and videos","author":"Nguyen Huy H.","year":"2019","unstructured":"Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, and Isao Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv preprint arXiv:1906.06876 (2019).","journal-title":"arXiv preprint arXiv:1906.06876"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682602"},{"key":"e_1_3_1_50_2","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017). https:\/\/openreview.net\/forum?id=BJJsrmfCZ"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58610-2_6"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/WIFS.2017.8267647"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00009"},{"key":"e_1_3_1_54_2","volume-title":"International Conference on Learning Representations","author":"Satorras Victor Garcia","year":"2018","unstructured":"Victor Garcia Satorras and Joan Bruna Estrach. 2018. Few-shot learning with graph neural networks. In International Conference on Learning Representations."},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2005605"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01026"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16367"},{"key":"e_1_3_1_59_2","first-page":"6105","volume-title":"International Conference on Machine Learning","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105\u20136114."},{"issue":"11","key":"e_1_3_1_60_2","article-title":"Visualizing data using t-SNE.","volume":"9","author":"Maaten Laurens Van der","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 11 (2008).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","unstructured":"Tianyi Wang Harry Cheng Kam Pui Chow and Liqiang Nie. 2023. Deep convolutional pooling transformer for deepfake detection. ACM Trans. Multimedia Comput. Commun. Appl. 19 6 Article 179 (may 2023) 20 pages. 10.1145\/3588574","DOI":"10.1145\/3588574"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","unstructured":"Yuntao Wang Zhou Su Ning Zhang Rui Xing Dongxiao Liu Tom H. Luan and Xuemin Shen. 2023. A survey on metaverse: Fundamentals security and privacy. IEEE Communications Surveys & Tutorials 25 1 (2023) 319\u2013352. 10.1109\/COMST.2022.3202047","DOI":"10.1109\/COMST.2022.3202047"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2022.108558"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683164"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i3.20233"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME55011.2023.00396"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00378"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2023.3239223"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00222"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01477"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.229"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3657297","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3657297","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:39Z","timestamp":1750295859000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3657297"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,24]]},"references-count":70,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,28]]}},"alternative-id":["10.1145\/3657297"],"URL":"https:\/\/doi.org\/10.1145\/3657297","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2024,12,24]]},"assertion":[{"value":"2023-12-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}