{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T06:56:30Z","timestamp":1781333790428,"version":"3.54.1"},"reference-count":79,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2025,3,12]],"date-time":"2025-03-12T00:00:00Z","timestamp":1741737600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62276112"],"award-info":[{"award-number":["62276112"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Jilin Province Science and Technology Development Plan Key R & D Project","award":["20230201088GX"],"award-info":[{"award-number":["20230201088GX"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,4,30]]},"abstract":"<jats:p>\n            With the continuous development of deep counterfeiting technology, the information security in our daily life is under serious threat. While existing face forgery detection methods exhibit impressive accuracy when applied to datasets such as FaceForensics++ and Celeb-DF, they falter significantly when confronted with out-of-domain scenarios. This causes specialization of learned representations to known forgery patterns presented in the training set, rendering it difficult to detect forgeries with unknown patterns. To address this challenge, we propose a novel end-to-end\n            <jats:bold>Face Reconstruction-Based Generalized Deepfake Detection (FRG2D) model<\/jats:bold>\n            with\n            <jats:bold>Residual Outlook Attention (ROA)<\/jats:bold>\n            , which emphasizes the robust visual representations of genuine faces and discerns the subtle differences between authentic and manipulated facial images. Our methodology entails reconstructing authentic face images using an encoder\u2013decoder architecture based on U-net, facilitating a deeper understanding of disparities between genuine and manipulated facial images. Furthermore, we integrate the\n            <jats:bold>convolutional block attention module (CBAM)<\/jats:bold>\n            and\n            <jats:bold>channel attention block (CAB)<\/jats:bold>\n            to selectively focus the network\u2019s attention on salient features within real face images. Furthermore, we employ ROA to guide the network\u2019s focus towards precise features within manipulated facial images. Simultaneously, the computed reconstruction differences obtained through ROA serves as the ultimate representation fed into the classifier for face forgery detection. Both the reconstruction and classification learning processes are optimized end-to-end. Through extensive experimentation, our model demonstrated a substantial improvement in deepfake detection across unknown domains, while maintaining a high accuracy within the known domain.\n          <\/jats:p>","DOI":"10.1145\/3686162","type":"journal-article","created":{"date-parts":[[2024,8,2]],"date-time":"2024-08-02T16:04:57Z","timestamp":1722614697000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Face Reconstruction-Based Generalized Deepfake Detection Model with Residual Outlook Attention"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8554-4127","authenticated-orcid":false,"given":"Zenan","family":"Shi","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Changchun, China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6371-2833","authenticated-orcid":false,"given":"Wenyu","family":"Liu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Changchun, China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9410-4120","authenticated-orcid":false,"given":"Haipeng","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Changchun, China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,3,12]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow I.","year":"2014","unstructured":"I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27, 2672\u20132680.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1145\/3082031.3083247","volume-title":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","author":"Cozzolino D.","year":"2017","unstructured":"D. Cozzolino, G. Poggi, and L. Verdoliva. 2017. Recasting residual-based local descriptors as convolutional neural networks: An application to image forgery detection. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, 159\u2013164."},{"key":"e_1_3_1_4_2","first-page":"373","volume-title":"Proceedings of the 2018 24th International Conference on Pattern Recognition","author":"Zhang X.","year":"2018","unstructured":"X. Zhang, Y. Zou, and W. Wang. 2018. LD-CNN: A lightweight dilated convolutional neural network for environmental sound classification. In Proceedings of the 2018 24th International Conference on Pattern Recognition, 373\u2013378."},{"key":"e_1_3_1_5_2","first-page":"5781","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Dang H.","year":"2020","unstructured":"H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain. 2020. On the detection of digital face manipulation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 5781\u20135790."},{"key":"e_1_3_1_6_2","unstructured":"S. Tariq S. Lee and S. S. Woo. 2020. A convolutional LSTM based residual network for deepfake video detection. arXiv:2009.07480. Retrieved from https:\/\/arxiv.org\/abs\/2009.07480"},{"key":"e_1_3_1_7_2","first-page":"2278","volume-title":"Proceedings of the IEEE","volume":"86","author":"Lecun Y.","year":"1998","unstructured":"Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE 86, 11 (1998), 2278\u20132324."},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"142","DOI":"10.5220\/0006922101420153","volume-title":"Proceedings of the 7th International Conference on Data Science, Technology and Applications","author":"Fabbri M.","year":"2018","unstructured":"M. Fabbri and G. Moro. 2018. Dow Jones trading with deep learning: The unreasonable effectiveness of recurrent neural networks. In Proceedings of the 7th International Conference on Data Science, Technology and Applications, 142\u2013153."},{"key":"e_1_3_1_9_2","first-page":"1","volume-title":"Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security","author":"Afchar D.","year":"2018","unstructured":"D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen. 2018. Mesonet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security, 1\u20137."},{"key":"e_1_3_1_10_2","first-page":"1831","volume-title":"Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops","author":"Zhou P.","year":"2017","unstructured":"P. Zhou, X. Han, V. I. Morariu, and L. S. Davis. 2017. Two-stream neural networks for tampered face detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 1831\u20131839."},{"key":"e_1_3_1_11_2","first-page":"2307","volume-title":"Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Nguyen H. H.","year":"2019","unstructured":"H. H. Nguyen, J. Yamagishi, and I. Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, 2307\u20132311."},{"key":"e_1_3_1_12_2","unstructured":"Y. Li and S. Lyu. 2018. Exposing deepfake videos by detecting face warping artifacts. arXiv:1811.00656. Retrieved from https:\/\/arxiv.org\/abs\/1811.00656"},{"key":"e_1_3_1_13_2","first-page":"6105","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tan M.","year":"2019","unstructured":"M. Tan and Q. Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, 6105\u20136114."},{"key":"e_1_3_1_14_2","unstructured":"T. Yasuno J. Fujii R. Ogata and M. Okano. 2022. VAE-iForest: Auto-encoding reconstruction and isolation-based anomalies detecting fallen objects on road surface. arXiv:2203.01193. Retrieved from https:\/\/arxiv.org\/abs\/2203.01193"},{"issue":"9","key":"e_1_3_1_15_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11432-021-3444-9","article-title":"Reverse erasure guided spatio-temporal autoencoder with compact feature representation for video anomaly detection","volume":"65","author":"Zhong Y.","year":"2022","unstructured":"Y. Zhong, X. Chen, J. Jiang, and F. Ren. 2022. Reverse erasure guided spatio-temporal autoencoder with compact feature representation for video anomaly detection. Science China Information Sciences 65, 9 (2022), 1\u20133.","journal-title":"Science China Information Sciences"},{"key":"e_1_3_1_16_2","first-page":"3914","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang X.","year":"2023","unstructured":"X. Zhang, S. Li, X. Li, P. Huang, J. Shan, and T. Chen. 2023. Destseg: Segmentation guided denoising student-teacher for anomaly detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 3914\u20133923."},{"key":"e_1_3_1_17_2","first-page":"8472","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"He H.","year":"2024","unstructured":"H. He, J. Zhang, H. Chen, X. Chen, Z. Li, X. Chen, Y. Wang, C. Wang, and L. Xie. 2024. A diffusion-based framework for multi-class anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence. 38 (8), 8472\u20138480."},{"key":"e_1_3_1_18_2","first-page":"8445","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"Gu Z.","year":"2024","unstructured":"Z. Gu, J. Zhang, L. Liu, X. Chen, J. Peng, Z. Gan, G. Jiang, A. Shu, Y. Wang, and L. Ma. 2024. Rethinking reverse distillation for multi-modal anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence. 38 (8), 8445\u20138453."},{"key":"e_1_3_1_19_2","first-page":"3","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Woo S.","year":"2018","unstructured":"S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, 3\u201319."},{"key":"e_1_3_1_20_2","first-page":"14821","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zamir S. W.","year":"2021","unstructured":"S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao. 2021. Multi-stage progressive image restoration. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14821\u201314831."},{"issue":"5","key":"e_1_3_1_21_2","first-page":"6575","article-title":"VOLO: Vision outlooker for visual recognition","volume":"45","author":"Yuan L.","year":"2022","unstructured":"L. Yuan, Q. Hou, Z. Jiang, J. Feng, and S. Yan. 2022. VOLO: Vision outlooker for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 5 (2022), 6575\u20136586.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"10","key":"e_1_3_1_22_2","doi-asserted-by":"crossref","first-page":"3549","DOI":"10.1007\/s10994-021-06111-6","article-title":"A network-based positive and unlabeled learning approach for fake news detection","volume":"111","author":"de Souza M. C.","year":"2022","unstructured":"M. C. de Souza, B. M. Nogueira, R. G. Rossi, R. M. Marcacini, B. N. dos Santos, and S. O. Rezende. 2022. A network-based positive and unlabeled learning approach for fake news detection. Machine Learning 111 (10) 3549\u20133592.","journal-title":"Machine Learning"},{"key":"e_1_3_1_23_2","first-page":"656","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops","author":"Khalid H.","year":"2020","unstructured":"H. Khalid and S. S. Woo. 2020. Oc-fakedect: Classifying deepfakes using one-class variational autoencoder. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, 656\u2013657."},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1109\/LSP.2021.3076358","article-title":"One-class learning towards synthetic voice spoofing detection","volume":"28","author":"Zhang Y.","year":"2021","unstructured":"Y. Zhang, F. Jiang, and Z. Duan. 2021. One-class learning towards synthetic voice spoofing detection. IEEE Signal Processing Letters 28, 937\u2013941.","journal-title":"IEEE Signal Processing Letters"},{"key":"e_1_3_1_25_2","first-page":"1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"R\u00f6ssler A.","year":"2019","unstructured":"A. R\u00f6ssler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nie\u00dfner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 1\u201311."},{"key":"e_1_3_1_26_2","first-page":"3207","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Y.","year":"2020","unstructured":"Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 3207\u20133216."},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","first-page":"2382","DOI":"10.1145\/3394171.3413769","volume-title":"Proceedings of the 28th ACM International Conference on Multimedia","author":"Zi B.","year":"2020","unstructured":"B. Zi, M. Chang, J. Chen, X. Ma, Y.-G. Jiang. 2020. Wilddeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM International Conference on Multimedia, 2382\u20132390."},{"issue":"10","key":"e_1_3_1_28_2","doi-asserted-by":"crossref","first-page":"6111","DOI":"10.1109\/TPAMI.2021.3093446","article-title":"Deepfake detection based on discrepancies between faces and their context","volume":"44","author":"Nirkin Y.","year":"2021","unstructured":"Y. Nirkin, L. Wolf, Y. Keller, and T. Hassner. 2021. Deepfake detection based on discrepancies between faces and their context. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6111\u20136121.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_29_2","first-page":"2929","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhu X.","year":"2021","unstructured":"X. Zhu, H. Wang, H. Fei, Z. Lei, and S. Z. Li. 2021.Face forgery detection by 3d decomposition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2929\u20132939."},{"key":"e_1_3_1_30_2","first-page":"667","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Masi I.","year":"2020","unstructured":"I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, W. AbdAlmageed. 2020. Two-branch recurrent network for isolating deepfakes in videos. In Proceedings of the European Conference on Computer Vision, 667\u2013684."},{"key":"e_1_3_1_31_2","first-page":"13466","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Doloriel C. T.","year":"2024","unstructured":"C. T. Doloriel and N. M. Cheung. 2024. Frequency masking for universal deepfake detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 13466\u201313470."},{"key":"e_1_3_1_32_2","unstructured":"W. Lu L. Liu J. Luo X. Zhao Y. Zhou and J. Huang. 2023. Detection of deepfake videos using long-distance attention. arXiv:2106.12832. Retrieved from https:\/\/arxiv.org\/abs\/2106.12832"},{"key":"e_1_3_1_33_2","first-page":"1060","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"36","author":"Jeong Y.","year":"2022","unstructured":"Y. Jeong, D. Kim, Y. Ro, and J. Choi. 2022. Frepgan: Robust deepfake detection using frequency-level perturbations. In Proceedings of the AAAI Conference on Artificial Intelligence. 36 (1), 1060\u20131068."},{"key":"e_1_3_1_34_2","first-page":"2904","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Guo H.","year":"2022","unstructured":"H. Guo, S. Hu, X. Wang, M.-C. Chang, and S. Lyu. 2022. Eyes tell all: Irregular pupil shapes reveal gan-generated faces. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2904\u20132908."},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","first-page":"1387","DOI":"10.24963\/ijcai.2023\/154","volume-title":"Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence","author":"Shi Z.","year":"2023","unstructured":"Z. Shi, H. Chen, L. Chen, and D. Zhang. 2023. Discrepancy-guided reconstruction learning for image forgery detection. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 1387\u20131395."},{"key":"e_1_3_1_36_2","first-page":"5039","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Haliassos A.","year":"2021","unstructured":"A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic. 2021. Lips don\u2019t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 5039\u20135049."},{"key":"e_1_3_1_37_2","unstructured":"D. Guo K. Li B. Hu Y. Zhang and M. Wang. 2024. Benchmarking micro-action recognition: dataset method and application. arXiv:2403.05243. Retrieved from https:\/\/arxiv.org\/abs\/2403.05243"},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1109\/TIP.2019.2941267","article-title":"Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation","volume":"29","author":"Guo D.","year":"2019","unstructured":"D. Guo, W. Zhou, A. Li, H. Li, and M. Wang. 2019. Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. IEEE Transactions on Image Processing 29, 1575\u20131590.","journal-title":"IEEE Transactions on Image Processing"},{"key":"e_1_3_1_39_2","doi-asserted-by":"crossref","first-page":"4433","DOI":"10.1109\/TMM.2021.3117124","article-title":"Graph-based multimodal sequential embedding for sign language translation","volume":"24","author":"Tang S.","year":"2021","unstructured":"S. Tang, D. Guo, R. Hong, and M. Wang. 2021. Graph-based multimodal sequential embedding for sign language translation. IEEE Transactions on Multimedia 24, 4433\u20134445.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_40_2","first-page":"6845","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"32","author":"Guo D.","year":"2018","unstructured":"D. Guo, W. Zhou, H. Li, and M. Wang. 2018. Hierarchical LSTM for sign language translation. In Proceedings of the AAAI Conference on Artificial Intelligence. 32, 1 (2018), 6845\u20136852."},{"issue":"1","key":"e_1_3_1_41_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3152121","article-title":"Online early-late fusion based on adaptive HMM for sign language recognition","volume":"14","author":"Guo D.","year":"2017","unstructured":"D. Guo, W. Zhou, H. Li, and M. Wang. 2017. Online early-late fusion based on adaptive HMM for sign language recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2017), 1\u201318.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_1_42_2","first-page":"14923","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang C.","year":"2021","unstructured":"C. Wang and W. Deng. 2021. Representative forgery mining for fake face detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14923\u201314932."},{"key":"e_1_3_1_43_2","first-page":"18720","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Shiohara K.","year":"2022","unstructured":"K. Shiohara and T. Yamasaki. 2022. Detecting deepfakes with self-blended images. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 18720\u201318729."},{"key":"e_1_3_1_44_2","doi-asserted-by":"crossref","first-page":"1039","DOI":"10.1109\/TIFS.2022.3233774","article-title":"F2Trans: High-frequency fine-grained transformer for face forgery detection","volume":"18","author":"Miao C.","year":"2023","unstructured":"C. Miao, Z. Tan, Q. Chu, and H. Liu. 2023. F2Trans: High-frequency fine-grained transformer for face forgery detection. IEEE Transactions on Information Forensics and Security 18, 1039\u20131051.","journal-title":"IEEE Transactions on Information Forensics and Security"},{"key":"e_1_3_1_45_2","first-page":"12105","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Tan C.","year":"2023","unstructured":"C. Tan, Y. Zhao, S. Wei, G. Gu, and Y. Wei. 2023. Learning on gradients: Generalized artifacts representation for GANgenerated images detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 12105\u201312114."},{"key":"e_1_3_1_46_2","doi-asserted-by":"crossref","first-page":"110077","DOI":"10.1016\/j.patcog.2023.110077","article-title":"Deepfake detection via inter-frame inconsistency recomposition and enhancement","volume":"147","author":"Zhu C.","year":"2024","unstructured":"C. Zhu, B. Zhang, Q. Yin, C. Yin, and W. Lu. 2024. Deepfake detection via inter-frame inconsistency recomposition and enhancement. Pattern Recognition 147, 110077.","journal-title":"Pattern Recognition"},{"key":"e_1_3_1_47_2","first-page":"2821","volume-title":"In Proceedings of the AAAI Conference on Artificial Intelligence","volume":"37","author":"Wu J.","year":"2023","unstructured":"J. Wu, D. Chang, A. Sain, X. Li, Z. Ma, J. Cao, J. Guo, and Y.-Z. Song. 2023. Bi-directional feature reconstruction network for fine-grained few-shot image classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 37, 3 (2023), 2821\u20132829."},{"key":"e_1_3_1_48_2","doi-asserted-by":"crossref","unstructured":"X. Wu X. Liao and B. Ou. 2023. SepMark: Deep separable watermarking for unified source tracing and deepfake detection. arXiv:2305.06321. Retrieved from https:\/\/arxiv.org\/abs\/2305.06321","DOI":"10.1145\/3581783.3612471"},{"key":"e_1_3_1_49_2","first-page":"13091","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Diniz M. M.","year":"2024","unstructured":"M. M. Diniz and A. Rocha. 2024. Open-set deepfake detection to fight the unknown. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 13091\u201313095."},{"key":"e_1_3_1_50_2","first-page":"4113","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cao J.","year":"2022","unstructured":"J. Cao, C. Ma, T. Yao, S. Chen, S. Ding, and X. Yang. 2022. End-to-end reconstruction-classification learning for face forgery detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 4113\u20134122."},{"key":"e_1_3_1_51_2","first-page":"17006","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Luo Y.","year":"2024","unstructured":"Y. Luo, J. Du, K. Yan, and S. Ding. 2024. LaRE 2: Latent Reconstruction error based method for diffusion-generated image detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 17006\u201317015."},{"key":"e_1_3_1_52_2","first-page":"234","volume-title":"Proceedings of the Medical Image Computing and Computer-Assisted Intervention","author":"Ronneberger O.","year":"2015","unstructured":"O. Ronneberger, P. Fischer, and T. Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 234\u2013241."},{"key":"e_1_3_1_53_2","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.neucom.2023.01.017","article-title":"Forgery face detection via adaptive learning from multiple experts","volume":"527","author":"Fu X.","year":"2023","unstructured":"X. Fu, S. Li, Y. Yuan, B. Li, and X. Li.2023. Forgery face detection via adaptive learning from multiple experts. Neurocomputing 527, 110\u2013118.","journal-title":"Neurocomputing"},{"key":"e_1_3_1_54_2","first-page":"633","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Sun Y.Y.","year":"2023","unstructured":"Y.Y. Sun, Z. Y. Zhang, I. Echizen, H. H. Nguyen, C. Z. Qiu, and L. Sun. 2023. Face forgery detection based on facial region displacement trajectory series. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, 633\u2013642."},{"key":"e_1_3_1_55_2","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1109\/TIFS.2023.3235579","article-title":"Forensic symmetry for DeepFakes","volume":"18","author":"Li G.","year":"2023","unstructured":"G. Li, X. Zhao, and Y. Cao. 2023. Forensic symmetry for DeepFakes. IEEE Transactions on Information Forensics and Security 18, 1095\u20131110.","journal-title":"IEEE Transactions on Information Forensics and Security"},{"key":"e_1_3_1_56_2","first-page":"2185","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhao H.","year":"2021","unstructured":"H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu. 2021. Multi-attentional deepfake detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2185\u20132194."},{"key":"e_1_3_1_57_2","first-page":"86","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Qian Y.","year":"2020","unstructured":"Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao. 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 86\u2013103."},{"issue":"3","key":"e_1_3_1_58_2","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1109\/TIFS.2012.2190402","article-title":"Rich models for steganalysis of digital images","volume":"7","author":"Fridrich J.","year":"2012","unstructured":"J. Fridrich and J. Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7 (3), 868\u2013882.","journal-title":"IEEE Transactions on Information Forensics and Security"},{"key":"e_1_3_1_59_2","first-page":"770","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"He K.","year":"2016","unstructured":"K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770\u2013778."},{"key":"e_1_3_1_60_2","first-page":"1","volume-title":"Proceedings of the 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems","author":"Nguyen H. H.","year":"2019","unstructured":"H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. In Proceedings of the 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems, 1\u20138."},{"key":"e_1_3_1_61_2","first-page":"5001","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li L.","year":"2020","unstructured":"L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, B. Guo. 2020. Face x-ray for more general face forgery detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 5001\u20135010."},{"key":"e_1_3_1_62_2","first-page":"16317","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Luo Y.","year":"2021","unstructured":"Y. Luo, Y. Zhang, J. Yan, and W. Liu. 2021. Generalizing face forgery detection with high-frequency features. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 16317\u201316326."},{"key":"e_1_3_1_63_2","first-page":"18","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Dong S.","year":"2022","unstructured":"S. Dong, J. Wang, J. Liang, H. Fan, and R. Ji. 2022. Explaining deepfake detection by analysing image matching. In Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 18\u201335."},{"key":"e_1_3_1_64_2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1145\/2909827.2930786","volume-title":"Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security","author":"Bayar B.","year":"2016","unstructured":"B. Bayar and M. C. Stamm. 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, 5\u201310."},{"key":"e_1_3_1_65_2","first-page":"367","article-title":"Audio-visual contrastive pre-train for face forgery detection","author":"Hanqing Z.","unstructured":"Z. Hanqing, W. Zhou, D. Chen, W. Zhang, Y. Guo, Z. Cheng, P. Yan, and N. Yu. Audio-visual contrastive pre-train for face forgery detection. ACM Transactions on Multimedia Computing, Communications, and Applications, 367\u2013374.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_1_66_2","unstructured":"R. Durall M. Keuper F.-J. Pfreundt and J. Keuper. 2019. Unmasking deepfakes with simple features. arXiv:.1911.00686. Retrieved from https:\/\/arxiv.org\/abs\/1911.00686"},{"key":"e_1_3_1_67_2","unstructured":"Y. Li S. Lyu. 2019. Dsp-fwa: Dual spatial pyramid for exposing face warp artifacts in deepfake videos. Retrieved from https:\/\/github.com\/yuezunli\/DSP-FWA"},{"key":"e_1_3_1_68_2","first-page":"1864","volume-title":"Proceedings of the 28th ACM International Conference on Multimedia","author":"Li X.","unstructured":"X. Li, Y. Lang, Y. Chen, X. Mao, Y. He, S. Wang, H. Xue, and Q. Lu. Sharp multiple instance learning for deepfake video detection. In Proceedings of the 28th ACM International Conference on Multimedia, 1864\u20131872."},{"key":"e_1_3_1_69_2","doi-asserted-by":"crossref","first-page":"108832","DOI":"10.1016\/j.patcog.2022.108832","article-title":"Learning a deep dual-level network for robust DeepFake detection","volume":"130","author":"Pu W.","year":"2022","unstructured":"W. Pu, J. Hu, X. Wang, Y. Li, S. Hu, B. B. Zhu, Q. Song, X. Wu, and S. Lyu. 2022. Learning a deep dual-level network for robust DeepFake detection. Pattern Recognition 130, 108832.","journal-title":"Pattern Recognition"},{"key":"e_1_3_1_70_2","first-page":"615","volume-title":"Proceedings of the 2022 International Conference on Multimedia Retrieval","author":"Wang J.","unstructured":"J. Wang, Z. Wu, W. Ouyang, X. Han, J. Chen, S.-N. Lim, and Y.-G. Jiang. M2tr: Multi-modal multi-scale transformers for deepfake detection. In Proceedings of the 2022 International Conference on Multimedia Retrieval, 615\u2013623."},{"key":"e_1_3_1_71_2","first-page":"2638","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"35","author":"Sun K.","year":"2021","unstructured":"K. Sun, H. Liu, Q. Ye, Y. Gao, J. Liu, L. Shao, and R. Ji. 2021. Domain general face forgery detection by learning to weight. In Proceedings of the AAAI Conference on Artificial Intelligence, 35, 3 (2021), 2638\u20132646."},{"key":"e_1_3_1_72_2","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1016\/j.neunet.2022.11.031","article-title":"Depth map guided triplet network for deepfake face detection","volume":"159","author":"Liang B.","year":"2023","unstructured":"B. Liang, Z. Wang, B. Huang, Q. Zou, Q. Wang, and J. Liang. 2023. Depth map guided triplet network for deepfake face detection. Neural Networks 159, 34\u201342.","journal-title":"Neural Networks"},{"key":"e_1_3_1_73_2","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1016\/j.neucom.2022.06.013","article-title":"Patch-DFD: Patch-based end-to-end DeepFake discriminator","volume":"501","author":"Yu M.","year":"2022","unstructured":"M. Yu, S. Ju, J. Yang, S. Li, J. Lei, and X. Li. 2022. Patch-DFD: Patch-based end-to-end DeepFake discriminator. Neurocomputing, 501 583\u2013595.","journal-title":"Neurocomputing"},{"key":"e_1_3_1_74_2","doi-asserted-by":"crossref","first-page":"109114","DOI":"10.1016\/j.knosys.2022.109114","article-title":"MC-LCR: Multimodal contrastive classification by locally correlated representations for effective face forgery detection","volume":"250","author":"Wang G.","year":"2022","unstructured":"G. Wang, Q. Jiang, X. Jin, W. Li, X.i Cui. 2022. MC-LCR: Multimodal contrastive classification by locally correlated representations for effective face forgery detection. Knowledge-Based Systems 250, 109114.","journal-title":"Knowledge-Based Systems"},{"key":"e_1_3_1_75_2","doi-asserted-by":"crossref","first-page":"119352","DOI":"10.1016\/j.ins.2023.119352","article-title":"Holisticdfd: Infusing spatiotemporal transformer embeddings for deepfake detection","volume":"645","author":"Raza M. A.","year":"2023","unstructured":"M. A. Raza, K. M. Malik, and I. U. Haq. 2023. Holisticdfd: Infusing spatiotemporal transformer embeddings for deepfake detection. Information Sciences 645, 119352.","journal-title":"Information Sciences"},{"key":"e_1_3_1_76_2","first-page":"5345","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"Wang F.","year":"2024","unstructured":"F. Wang, D. Guo, K. Li, M. Wang. 2024. Eulermormer: Robust eulerian motion magnification via dynamic filtering within transformer. In Proceedings of the AAAI Conference on Artificial Intelligence 38, 6 (2024), 5345\u20135353."},{"key":"e_1_3_1_77_2","first-page":"751","volume-title":"Proceedings of the 28th International Joint Conference on Artificial Intelligence","author":"Guo D.","year":"2019","unstructured":"D. Guo, S. Tang, and M. Wang. 2019. Connectionist temporal modeling of video and language: A joint model for translation and sign labeling. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 751\u2013757."},{"key":"e_1_3_1_78_2","doi-asserted-by":"crossref","first-page":"1122","DOI":"10.1109\/TIP.2024.3359045","article-title":"Emotional video captioning with vision-based emotion interpretation network","author":"Song P.","year":"2024","unstructured":"P. Song, D. Guo, X. Yang, S. Tang and M. Wang. 2024. Emotional video captioning with vision-based emotion interpretation network. IEEE Transactions on Image Processing, 1122\u20131135.","journal-title":"IEEE Transactions on Image Processing"},{"issue":"6","key":"e_1_3_1_79_2","doi-asserted-by":"crossref","first-page":"7239","DOI":"10.1109\/TPAMI.2022.3223688","article-title":"Contrastive positive sample propagation along the audio-visual event line","volume":"45","author":"Zhou J.","year":"2022","unstructured":"J. Zhou, D. Guo, and M. Wang. 2022. Contrastive positive sample propagation along the audio-visual event line. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (6), 7239\u20137257.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_80_2","first-page":"1","article-title":"Advancing weakly-supervised audio-visual video parsing via segmentwise pseudo labeling","author":"Zhou J.","year":"2024","unstructured":"J. Zhou, D. Guo, Y. Zhong, and M. Wang. 2024. Advancing weakly-supervised audio-visual video parsing via segmentwise pseudo labeling. International Journal of Computer Vision, 1\u201322.","journal-title":"International Journal of Computer Vision"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3686162","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3686162","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:50Z","timestamp":1750295870000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3686162"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,12]]},"references-count":79,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,4,30]]}},"alternative-id":["10.1145\/3686162"],"URL":"https:\/\/doi.org\/10.1145\/3686162","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,12]]},"assertion":[{"value":"2024-03-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-25","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}