{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T17:33:00Z","timestamp":1778693580814,"version":"3.51.4"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T00:00:00Z","timestamp":1685404800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2023,11,30]]},"abstract":"<jats:p>Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improve the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.<\/jats:p>\n          <jats:p\/>","DOI":"10.1145\/3588574","type":"journal-article","created":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T12:12:28Z","timestamp":1679919148000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":82,"title":["Deep Convolutional Pooling Transformer for Deepfake Detection"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2920-6099","authenticated-orcid":false,"given":"Tianyi","family":"Wang","sequence":"first","affiliation":[{"name":"The University of Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7436-0162","authenticated-orcid":false,"given":"Harry","family":"Cheng","sequence":"additional","affiliation":[{"name":"Shandong University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4552-9744","authenticated-orcid":false,"given":"Kam Pui","family":"Chow","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1476-0273","authenticated-orcid":false,"given":"Liqiang","family":"Nie","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology (Shenzhen)"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/wifs.2018.8630761"},{"key":"e_1_3_2_3_2","first-page":"1205","volume-title":"Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop","author":"Amerini Irene","year":"2019","unstructured":"Irene Amerini, Leonardo Galteri, Roberto Caldelli, and Alberto Del Bimbo. 2019. Deepfake video detection through optical flow based CNN. In Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop. IEEE, 1205\u20131207."},{"key":"e_1_3_2_4_2","unstructured":"Apple. 2021. MPEG-2 Reference Information. Retrieved August 29 2021 from https:\/\/tinyurl.com\/m7cef9mc."},{"key":"e_1_3_2_5_2","first-page":"5012","volume-title":"Proceedings of the 2020 25th International Conference on Pattern Recognition","author":"Bonettini Nicol\u00f2","year":"2021","unstructured":"Nicol\u00f2 Bonettini, Edoardo Daniele Cannas, Sara Mandelli, Luca Bondi, Paolo Bestagini, and Stefano Tubaro. 2021. Video face manipulation detection through ensemble of CNNs. In Proceedings of the 2020 25th International Conference on Pattern Recognition. IEEE, 5012\u20135019."},{"key":"e_1_3_2_6_2","unstructured":"BuzzFeedVideo. 2018. You Won\u2019t Believe What Obama Says In This Video! Retrieved August 29 2021 from https:\/\/www.youtube.com\/watch?v=cQ54GDm1eL0."},{"key":"e_1_3_2_7_2","volume-title":"MM\u201920: The 28th ACM International Conference on Multimedia","author":"Chen Renwang","year":"2020","unstructured":"Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. SimSwap: An efficient framework for high fidelity face swapping. In Proceedings of theMM\u201920: The 28th ACM International Conference on Multimedia."},{"key":"e_1_3_2_8_2","first-page":"1251","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Patten Recognition","author":"Chollet Francois","year":"2017","unstructured":"Francois Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition. 1251\u20131258."},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1007\/978-3-031-06433-3_19","volume-title":"Proceedings of the Image Analysis and Processing \u2013 ICIAP 2022: 21st International Conference, Lecce, Italy, May 23\u201327, 2022, Proceedings, Part III","author":"Coccomini Davide Alessandro","year":"2022","unstructured":"Davide Alessandro Coccomini, Nicola Messina, Claudio Gennaro, and Fabrizio Falchi. 2022. Combining efficientnet and vision transformers for video deepfake detection. In Proceedings of the Image Analysis and Processing \u2013 ICIAP 2022: 21st International Conference, Lecce, Italy, May 23\u201327, 2022, Proceedings, Part III. Springer-Verlag, Berlin, 219\u2013229. DOI:10.1007\/978-3-031-06433-3_19"},{"key":"e_1_3_2_10_2","unstructured":"Davide Alessandro Coccomini Giorgos Kordopatis Zilos Giuseppe Amato Roberto Caldelli Fabrizio Falchi Symeon Papadopoulos and Claudio Gennaro. 2022. MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection. (2022). Retrieved from DOI:10.48550\/ARXIV.2211.10996. Accessed February 20 2022."},{"key":"e_1_3_2_11_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Dang Hao","year":"2020","unstructured":"Hao Dang, Feng Liu, Joel Stehouwer, Xiaoming Liu, and Anil K. Jain. 2020. On the detection of digital face manipulation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_12_2","unstructured":"deepfakes. 2019. FACESWAP. Retrieved August 29 2021 from https:\/\/faceswap.dev\/."},{"key":"e_1_3_2_13_2","unstructured":"Brian Dolhansky Joanna Bitton Ben Pflaum Jikuo Lu Russ Howes Menglin Wang and Cristian Canton Ferrer. 2020. The DeepFake Detection Challenge (DFDC) Dataset. arxiv:cs.CV\/2006.07397. Retrieved from https:\/\/arxiv.org\/abs\/2006.07397. Accessed August 29 2021."},{"key":"e_1_3_2_14_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3002178"},{"key":"e_1_3_2_16_2","unstructured":"FFmpeg. 2021. FFmpeg. Retrieved August 29 2021 from https:\/\/www.ffmpeg.org\/."},{"key":"e_1_3_2_17_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"27","author":"Goodfellow Ian","year":"2014","unstructured":"Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Eds.), Vol. 27. Curran Associates, Inc."},{"key":"e_1_3_2_18_2","unstructured":"Mark Greaves. 2021. Deepfakes\u2019 ranked as most serious AI crime threat. Retrieved November 16 2021 from https:\/\/tinyurl.com\/hfmtrat4."},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","first-page":"41596","DOI":"10.1109\/ACCESS.2019.2905689","article-title":"Combating deepfake videos using blockchain and smart contracts","volume":"7","author":"Hasan Haya R.","year":"2019","unstructured":"Haya R. Hasan and Khaled Salah. 2019. Combating deepfake videos using blockchain and smart contracts. IEEE Access 7 (2019), 41596\u201341606. https:\/\/ieeexplore.ieee.org\/document\/8668407.","journal-title":"IEEE Access"},{"key":"e_1_3_2_20_2","first-page":"770","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Patten Recognition","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition. IEEE, 770\u2013778."},{"key":"e_1_3_2_21_2","first-page":"4358","volume-title":"Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"He Yinan","year":"2021","unstructured":"Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, and Ziwei Liu. 2021. ForgeryNet: A versatile benchmark for comprehensive forgery analysis. In Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4358\u20134367. DOI:10.1109\/CVPR46437.2021.00434"},{"key":"e_1_3_2_22_2","volume-title":"Proceedings of the International Conference on Computer Vision","author":"Heo Byeongho","year":"2021","unstructured":"Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, and Seong Joon Oh. 2021. Rethinking spatial dimensions of vision transformers. In Proceedings of the International Conference on Computer Vision."},{"key":"e_1_3_2_23_2","first-page":"053","volume-title":"Proceedings of the 2020 11th International Conference on Information and Communication Systems","author":"Jafar Mousa Tayseer","year":"2020","unstructured":"Mousa Tayseer Jafar, Mohammad Ababneh, Mohammad Al-Zoube, and Ammar Elhassan. 2020. Forensics and analysis of deepfake videos. In Proceedings of the 2020 11th International Conference on Information and Communication Systems. IEEE, 053\u2013058."},{"key":"e_1_3_2_24_2","first-page":"2889","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Patten Recognition","author":"Jiang Liming","year":"2020","unstructured":"Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020. DeeperForensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition. 2889\u20132898."},{"key":"e_1_3_2_25_2","unstructured":"Leo Kelion. 2018. Deepfake porn videos deleted from internet by Gfycat. Retrieved August 29 2021 from https:\/\/www.bbc.com\/news\/technology-42905185."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925871"},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1145\/3549555.3549588","volume-title":"Proceedings of the 19th International Conference on Content-Based Multimedia Indexing","author":"Khan Sohail Ahmed","year":"2022","unstructured":"Sohail Ahmed Khan and Duc-Tien Dang-Nguyen. 2022. Hybrid transformer network for deepfake detection. In Proceedings of the 19th International Conference on Content-Based Multimedia Indexing. Association for Computing Machinery, New York, 8\u201314. DOI:10.1145\/3549555.3549588"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2422956.2422963"},{"issue":"2","key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.bushor.2019.11.006","article-title":"Deepfakes: Trick or treat?","volume":"63","author":"Kietzmann Jan","year":"2020","unstructured":"Jan Kietzmann, Linda W. Lee, Ian P. McCarthy, and Tim C. Kietzmann. 2020. Deepfakes: Trick or treat? Business Horizons 63, 2 (2020), 135\u2013146.","journal-title":"Business Horizons"},{"key":"e_1_3_2_30_2","unstructured":"Davis King. 2021. dlib 19.22.1. Retrieved August 29 2021 from https:\/\/pypi.org\/project\/dlib\/."},{"key":"e_1_3_2_31_2","volume-title":"Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_2_32_2","first-page":"10744","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Kwon Patrick","year":"2021","unstructured":"Patrick Kwon, Jaeseong You, Gyuhyeon Nam, Sungwoo Park, and Gyeongsu Chae. 2021. KoDF: A large-scale Korean deepfake detection dataset. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 10744\u201310753."},{"key":"e_1_3_2_33_2","article-title":"FaceShifter: Towards high fidelity and occlusion aware face swapping","author":"Li Lingzhi","year":"2019","unstructured":"Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. FaceShifter: Towards high fidelity and occlusion aware face swapping. arXiv:1912.13457 (2019). Retrieved from https:\/\/arxiv.org\/abs\/1912.13457. Accessed August 29, 2021.","journal-title":"arXiv:1912.13457"},{"key":"e_1_3_2_34_2","first-page":"3207","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Patten Recognition","author":"Li Yuezun","year":"2020","unstructured":"Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-DF: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition. 3207\u20133216."},{"issue":"2","key":"e_1_3_2_35_2","first-page":"1489","article-title":"Contextual transformer networks for visual recognition","volume":"45","author":"Li Yehao","year":"2022","unstructured":"Yehao Li, Ting Yao, Yingwei Pan, and Tao Mei. 2022. Contextual transformer networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 2 (2022), 1489\u20131500.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_36_2","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Liu Ze","year":"2021","unstructured":"Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE\/CVF International Conference on Computer Vision."},{"key":"e_1_3_2_37_2","first-page":"16317","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Patten Recognition (CVPR)","author":"Luo Yuchen","year":"2021","unstructured":"Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. 2021. Generalizing face forgery detection with high-frequency features. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition (CVPR). 16317\u201316326."},{"key":"e_1_3_2_38_2","unstructured":"Kirsti Melville. 2019. The insidious rise of deepfake porn videos \u2013 and one woman who won\u2019t be silenced. Retrieved November 16 2021 from https:\/\/www.abc.net.au\/news\/2019-08-30\/11437774."},{"key":"e_1_3_2_39_2","volume-title":"Proceedings of the ACM SIGGRAPH 2018 Posters (SIGGRAPH\u201918)","author":"Natsume Ryota","year":"2018","unstructured":"Ryota Natsume, Tatsuya Yatagawa, and Shigeo Morishima. 2018. RSGAN: Face swapping and editing using face and hair representation in latent spaces. In Proceedings of the ACM SIGGRAPH 2018 Posters (SIGGRAPH\u201918). Association for Computing Machinery, New York, Article 69, 2 pages. DOI:10.1145\/3230744.3230818"},{"key":"e_1_3_2_40_2","unstructured":"Huy H. Nguyen Junichi Yamagishi and Isao Echizen. 2019. Use of a Capsule Network to Detect Fake Images and Videos. (2019). arxiv:cs.CV\/1910.12467. Retrieved from https:\/\/arxiv.org\/abs\/1910.12467. Accessed August 29 2021."},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/B978-0-12-800022-9.00011-5","volume-title":"Proceedings of the Integrated Security Systems Design (Second Edition) (2nd ed.)","author":"Norman Thomas","year":"2014","unstructured":"Thomas Norman. 2014. 11 - information technology systems infrastructure. In Proceedings of the Integrated Security Systems Design (Second Edition) (2nd ed.). Butterworth-Heinemann, Boston, 203\u2013249."},{"key":"e_1_3_2_42_2","unstructured":"Ivan Perov Daiheng Gao Nikolay Chervoniy Kunlin Liu Sugasa Marangonda Chris Um\u00e9 Mr. Dpfks Carl Shift Facenheim Luis R. P. Jian Jiang Sheng Zhang Pingyu Wu Bo Zhou and Weiming Zhang. 2021. DeepFaceLab: Integrated flexible and extensible face-swapping framework. arxiv:cs.CV\/2005.05535. Retrieved from https:\/\/arxiv.org\/abs\/2005.05535. Accessed August 29 2021."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/2857069"},{"key":"e_1_3_2_44_2","first-page":"1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Rossler Andreas","year":"2019","unstructured":"Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Niessner. 2019. FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 1\u201311."},{"key":"e_1_3_2_45_2","first-page":"618","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Selvaraju Ramprasaath R.","year":"2017","unstructured":"Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618\u2013626."},{"key":"e_1_3_2_46_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR)","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). IEEE."},{"key":"e_1_3_2_47_2","article-title":"Dual contrastive learning for general face forgery detection","author":"Sun Ke","year":"2022","unstructured":"Ke Sun, Taiping Yao, Shen Chen, Shouhong Ding, Rongrong Ji, et\u00a0al. 2022. Dual contrastive learning for general face forgery detection. In Proceedings of the AAAI Conference on Artificial Intelligence (2022).","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"e_1_3_2_48_2","first-page":"6105","volume-title":"Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"97","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 97. PMLR, 6105\u20136114."},{"key":"e_1_3_2_49_2","unstructured":"TechBytes. 2021. Retrieved August 29 2021 from https:\/\/tinyurl.com\/8tsj2th9."},{"issue":"4","key":"e_1_3_2_50_2","first-page":"66","article-title":"Deferred neural rendering: Image synthesis using neural textures","volume":"38","author":"Thies Justus","year":"2019","unstructured":"Justus Thies, Michael Zollh\u00f6fer, and Matthias Nie\u00dfner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. 38, 4, Article 66 (July2019), 12 pages.","journal-title":"ACM Trans. Graph."},{"key":"e_1_3_2_51_2","first-page":"2387","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Patten Recognition (CVPR)","author":"Thies Justus","year":"2016","unstructured":"Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Niessner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition (CVPR). 2387\u20132395."},{"key":"e_1_3_2_52_2","unstructured":"Rob Toews. 2020. Deepfakes Are Going To Wreak Havoc On Society. We Are Not Prepared. Retrieved August 29 2021 from https:\/\/tinyurl.com\/58mpuac7."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/j.inffus.2020.06.014","article-title":"Deepfakes and beyond: A survey of face manipulation and fake detection","volume":"64","author":"Tolosana Ruben","year":"2020","unstructured":"Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. 2020. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion 64 (2020), 131\u2013148. https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S1566253520303110.","journal-title":"Information Fusion"},{"key":"e_1_3_2_54_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc."},{"key":"e_1_3_2_55_2","unstructured":"Krishna Rao Vijayanagar. 2020. I P and B-frames \u2013 Differences and Use Cases Made Easy. Retrieved August 29 2021 from https:\/\/tinyurl.com\/syfv27hk."},{"key":"e_1_3_2_56_2","article-title":"Face. evoLVe: A high-performance face recognition library","author":"Wang Qingzhong","year":"2021","unstructured":"Qingzhong Wang, Pengfei Zhang, Haoyi Xiong, and Jian Zhao. 2021. Face. evoLVe: A high-performance face recognition library. arXiv:2107.08621. Retrieved from https:\/\/arxiv.org\/abs\/2107.08621. Accessed February 20, 2022.","journal-title":"arXiv:2107.08621"},{"key":"e_1_3_2_57_2","first-page":"1136","volume-title":"Proceedings of the 13th International Joint Conference on Artificial Intelligence, IJCAI-21","author":"Wang Yuhan","year":"2021","unstructured":"Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021. HifiFace: 3D shape and semantic prior guided high fidelity face swapping. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, IJCAI-21, Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 1136\u20131142. DOI:10.24963\/ijcai.2021\/157"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3183518"},{"issue":"11","key":"e_1_3_2_59_2","first-page":"40","article-title":"The emergence of deepfake technology: A review","volume":"9","author":"Westerlund Mika","year":"2019","unstructured":"Mika Westerlund. 2019. The emergence of deepfake technology: A review. Technology Innovation Management Review 9, 11 (11\/20192019), 40\u201353.","journal-title":"Technology Innovation Management Review"},{"key":"e_1_3_2_60_2","unstructured":"Deressa Wodajo and Solomon Atnafu. 2021. Deepfake Video Detection Using Convolutional Vision Transformer. (2021). arXiv:2102.11126. Retrieved from https:\/\/arxiv.org\/abs\/2102.11126. Accessed February 20 2022."},{"key":"e_1_3_2_61_2","article-title":"Dual vision transformer","author":"Yao Ting","year":"2022","unstructured":"Ting Yao, Yehao Li, Yingwei Pan, Yu Wang, and Tao Mei. 2022. Dual vision transformer. arXiv:2207.04976 (2022). Retrieved from https:\/\/arxiv.org\/abs\/2207.04976. Accessed February 20, 2022.","journal-title":"arXiv:2207.04976"},{"key":"e_1_3_2_62_2","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Yao Ting","year":"2022","unstructured":"Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, and Tao Mei. 2022. Wave-ViT: Unifying wavelet and transformers for visual representation learning. In Proceedings of the European Conference on Computer Vision."},{"key":"e_1_3_2_63_2","first-page":"2185","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Patten Recognition","author":"Zhao Hanqing","year":"2021","unstructured":"Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021. Multi-attentional deepfake detection. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition. 2185\u20132194."},{"key":"e_1_3_2_64_2","first-page":"2207","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Patten Recognition","author":"Zhao Jian","year":"2018","unstructured":"Jian Zhao, Yu Cheng, Yan Xu, Lin Xiong, Jianshu Li, Fang Zhao, Karlekar Jayashree, Sugiri Pranata, Shengmei Shen, Junliang Xing, et\u00a0al. 2018. Towards pose invariant face recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition. 2207\u20132216."},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1145\/3240508.3240509","volume-title":"Proceedings of the 26th ACM International Conference on Multimedia","author":"Zhao Jian","year":"2018","unstructured":"Jian Zhao, Jianshu Li, Yu Cheng, Terence Sim, Shuicheng Yan, and Jiashi Feng. 2018. Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In Proceedings of the 26th ACM International Conference on Multimedia. Association for Computing Machinery, New York, 792\u2013800. DOI:10.1145\/3240508.3240509"},{"key":"e_1_3_2_66_2","first-page":"1184","volume-title":"Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI-18","author":"Zhao Jian","year":"2018","unstructured":"Jian Zhao, Lin Xiong, Yu Cheng, Yi Cheng, Jianshu Li, Li Zhou, Yan Xu, Jayashree Karlekar, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, and Jiashi Feng. 2018. 3D-aided deep pose-invariant face recognition. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, 1184\u20131190. DOI:10.24963\/ijcai.2018\/165"},{"key":"e_1_3_2_67_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"30","author":"Zhao Jian","year":"2017","unstructured":"Jian Zhao, Lin Xiong, Panasonic Karlekar Jayashree, Jianshu Li, Fang Zhao, Zhecan Wang, Panasonic Sugiri Pranata, Panasonic Shengmei Shen, Shuicheng Yan, and Jiashi Feng. 2017. Dual-agent GANs for photorealistic and identity preserving profile face synthesis. In Proceedings of the Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc."},{"key":"e_1_3_2_68_2","unstructured":"Daquan Zhou Bingyi Kang Xiaojie Jin Linjie Yang Xiaochen Lian Qibin Hou and Jiashi Feng. 2021. DeepViT: Towards Deeper Vision Transformer. arXiv:2103.11886. Retrieved from https:\/\/arxiv.org\/abs\/2103.11886. Accessed February 20 2022."},{"key":"e_1_3_2_69_2","first-page":"4834","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Zhu Yuhao","year":"2021","unstructured":"Yuhao Zhu, Qi Li, Jian Wang, Chengzhong Xu, and Zhenan Sun. 2021. One shot face swapping on megapixels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4834\u20134844."},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413769"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588574","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588574","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:13Z","timestamp":1750178833000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588574"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,30]]},"references-count":69,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,11,30]]}},"alternative-id":["10.1145\/3588574"],"URL":"https:\/\/doi.org\/10.1145\/3588574","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,30]]},"assertion":[{"value":"2022-09-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-15","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}