{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,17]],"date-time":"2026-05-17T08:01:16Z","timestamp":1779004876065,"version":"3.51.4"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,11,10]],"date-time":"2023-11-10T00:00:00Z","timestamp":1699574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection approaches contribute to exploring the specific artifacts in deepfake videos and fit well on certain data. However, the growing technique on these artifacts keeps challenging the robustness of traditional deepfake detectors. As a result, the development of these approaches has reached a blockage. In this article, we propose to perform deepfake detection from an unexplored voice-face matching view. Our approach is founded on two supporting points: first, there is a high degree of homogeneity between the voice and face of an individual (i.e., they are highly correlated), and second, deepfake videos often involve mismatched identities between the voice and face due to face-swapping techniques. To this end, we develop a voice-face matching method that measures the matching degree between these two modalities to identify deepfake videos. Nevertheless, training on specific deepfake datasets makes the model overfit certain traits of deepfake algorithms. We instead advocate a method that quickly adapts to untapped forgery, with a pre-training then fine-tuning paradigm. Specifically, we first pre-train the model on a generic audio-visual dataset, followed by the fine-tuning on downstream deepfake data. We conduct extensive experiments over three widely exploited deepfake datasets: DFDC, FakeAVCeleb, and DeepfakeTIMIT. Our method obtains significant performance gains as compared to other state-of-the-art competitors. For instance, our method outperforms the baselines by nearly 2%, achieving an AUC of 86.11% on FakeAVCeleb. It is also worth noting that our method already achieves competitive results when fine-tuned on limited deepfake data.<\/jats:p>","DOI":"10.1145\/3625231","type":"journal-article","created":{"date-parts":[[2023,9,21]],"date-time":"2023-09-21T11:27:13Z","timestamp":1695295633000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":77,"title":["Voice-Face Homogeneity Tells Deepfake"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7436-0162","authenticated-orcid":false,"given":"Harry","family":"Cheng","sequence":"first","affiliation":[{"name":"Shandong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8691-5372","authenticated-orcid":false,"given":"Yangyang","family":"Guo","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2920-6099","authenticated-orcid":false,"given":"Tianyi","family":"Wang","sequence":"additional","affiliation":[{"name":"The University of Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8095-5573","authenticated-orcid":false,"given":"Qi","family":"Li","sequence":"additional","affiliation":[{"name":"Shandong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7778-8807","authenticated-orcid":false,"given":"Xiaojun","family":"Chang","sequence":"additional","affiliation":[{"name":"University of Technology Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1476-0273","authenticated-orcid":false,"given":"Liqiang","family":"Nie","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology (Shenzhen), China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,11,10]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"1","volume-title":"Proceedings of the International Workshop on Information Forensics and Security","author":"Afchar Darius","year":"2018","unstructured":"Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. MesoNet: A compact facial video forgery detection network. In Proceedings of the International Workshop on Information Forensics and Security. IEEE, Los Alamitos, CA, 1\u20137."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360638"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2004.00799.x"},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Zhixi Cai Kalin Stefanov Abhinav Dhall and Munawar Hayat. 2022. Do you really mean that? Content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization. arXiv:2204.06228 (2022).","DOI":"10.1109\/DICTA56598.2022.10034605"},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","unstructured":"Harry Cheng Yangyang Guo Liqiang Nie Zhiyong Cheng and Mohan S. Kankanhalli. 2023. Sample less learn more: Efficient action recognition via frame feature restoration. arXiv:2307.14866 (2023).","DOI":"10.1145\/3581783.3611696"},{"key":"e_1_3_2_7_2","article-title":"Audio-driven talking video frame restoration","author":"Cheng Harry","year":"2021","unstructured":"Harry Cheng, Yangyang Guo, Jianhua Yin, Haonan Chen, Jiafang Wang, and Liqiang Nie. 2021. Audio-driven talking video frame restoration. IEEE Transactions on Multimedia. Early access, October 7, 2021.","journal-title":"IEEE Transactions on Multimedia."},{"key":"e_1_3_2_8_2","first-page":"448","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Cheng Kai","year":"2020","unstructured":"Kai Cheng, Xin Liu, Yiu-ming Cheung, Rui Wang, Xing Xu, and Bineng Zhong. 2020. Hearing like seeing: Improving voice-face interactions and associations via adversarial deep semantic matching network. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 448\u2013455."},{"key":"e_1_3_2_9_2","first-page":"1086","volume-title":"Proceedings of the Conference of the International Speech Communication Association","author":"Chung Joon Son","year":"2018","unstructured":"Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. VoxCeleb2: Deep speaker recognition. In Proceedings of the Conference of the International Speech Communication Association. 1086\u20131090."},{"key":"e_1_3_2_10_2","unstructured":"Brian Dolhansky Joanna Bitton Ben Pflaum Jikuo Lu Russ Howes Menglin Wang and Cristian Canton-Ferrer. 2020. The DeepFake Detection Challenge (DFDC) dataset. arXiv:2006.07397 (2020)."},{"key":"e_1_3_2_11_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations.1\u201312."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3536426"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3422622"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2110013119"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2021\/98"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3128322"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2021.103170"},{"key":"e_1_3_2_18_2","article-title":"Exposing deepfake face forgeries with guided residuals","author":"Guo Zhiqing","year":"2023","unstructured":"Zhiqing Guo, Gaobo Yang, Jiyou Chen, and Xingming Sun. 2023. Exposing deepfake face forgeries with guided residuals. IEEE Transactions on Multimedia. Early access, January 18, 2023.","journal-title":"IEEE Transactions on Multimedia."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2022.103587"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.119361"},{"key":"e_1_3_2_21_2","first-page":"5039","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Haliassos Alexandros","year":"2021","unstructured":"Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2021. Lips don\u2019t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 5039\u20135049."},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1145\/3240508.3240601","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Horiguchi Shota","year":"2018","unstructured":"Shota Horiguchi, Naoyuki Kanda, and Kenji Nagamatsu. 2018. Face-voice matching using cross-modal embeddings. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 1011\u20131019."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cub.2003.09.005"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2970919"},{"issue":"4","key":"e_1_3_2_25_2","first-page":"Article 94, 8 p","article-title":"Transfiguring portraits","volume":"35","author":"Kemelmacher-Shlizerman Ira","year":"2016","unstructured":"Ira Kemelmacher-Shlizerman. 2016. Transfiguring portraits. ACM Transactions on Graphics 35, 4 (2016), Article 94, 8 pages.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_2_26_2","first-page":"7","volume-title":"Proceedings of the ACM Workshop on Synthetic Multimedia","author":"Khalid Hasam","year":"2021","unstructured":"Hasam Khalid, Minha Kim, Shahroz Tariq, and Simon S. Woo. 2021. Evaluation of an audio-video multimodal deepfake dataset using unimodal and multimodal detectors. In Proceedings of the ACM Workshop on Synthetic Multimedia. ACM, New York, NY, 7\u201315."},{"key":"e_1_3_2_27_2","first-page":"1","volume-title":"Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS \u201921): Track on Datasets and Benchmarks","author":"Khalid Hasam","year":"2021","unstructured":"Hasam Khalid, Shahroz Tariq, Minha Kim, and Simon S. Woo. 2021. FakeAVCeleb: A novel audio-video multimodal deepfake dataset. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS \u201921): Track on Datasets and Benchmarks. 1\u201315."},{"issue":"4","key":"e_1_3_2_28_2","first-page":"Article 163, 14","article-title":"Deep video portraits","volume":"37","author":"Kim Hyeongwoo","year":"2018","unstructured":"Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nie\u00dfner, Patrick P\u00e9rez, Christian Richardt, Michael Zollh\u00f6fer, and Christian Theobalt. 2018. Deep video portraits. ACM Transactions on Graphics 37, 4 (2018), Article 163, 14 pages.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_2_29_2","article-title":"DeepFakes: A new threat to face recognition? Assessment and detection","volume":"1812","author":"Korshunov Pavel","year":"2018","unstructured":"Pavel Korshunov and S\u00e9bastien Marcel. 2018. DeepFakes: A new threat to face recognition? Assessment and detection. CoRR abs\/1812.08685 (2018).","journal-title":"CoRR"},{"key":"e_1_3_2_30_2","first-page":"2375","volume-title":"Proceedings of the European Signal Processing Conference","author":"Korshunov Pavel","year":"2018","unstructured":"Pavel Korshunov and S\u00e9bastien Marcel. 2018. Speaker inconsistency detection in tampered video. In Proceedings of the European Signal Processing Conference. IEEE, Los Alamitos, CA, 2375\u20132379."},{"key":"e_1_3_2_31_2","first-page":"16","volume-title":"Proceedings of the International Conference on Automatic Face and Gesture Recognition","author":"Koujan Mohammad Rami","year":"2020","unstructured":"Mohammad Rami Koujan, Michail Christos Doukas, Anastasios Roussos, and Stefanos Zafeiriou. 2020. Head2Head: Video-based neural head synthesis. In Proceedings of the International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 16\u201323."},{"key":"e_1_3_2_32_2","first-page":"1","volume-title":"Proceedings of the Applied Imagery Pattern Recognition Workshop","author":"Lewis John K.","year":"2020","unstructured":"John K. Lewis, Imad Eddine Toubal, Helen Chen, Vishal Sandesera, Michael Lomnitz, Zigfried Hampel-Arias, Prasad Calyam, and Kannappan Palaniappan. 2020. Deepfake video detection based on spatial, spectral, and temporal inconsistencies using multimodal deep learning. In Proceedings of the Applied Imagery Pattern Recognition Workshop. IEEE, Los Alamitos, CA, 1\u20139."},{"key":"e_1_3_2_33_2","first-page":"5073","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Li Lingzhi","year":"2020","unstructured":"Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2020. Advancing high fidelity identity swapping for forgery detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 5073\u20135082."},{"key":"e_1_3_2_34_2","first-page":"5000","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Li Lingzhi","year":"2020","unstructured":"Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face x-ray for more general face forgery detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 5000\u20135009."},{"issue":"1","key":"e_1_3_2_35_2","first-page":"1","article-title":"SPGAN: Face forgery using spoofing generative adversarial networks","volume":"17","author":"Li Yidong","year":"2021","unstructured":"Yidong Li, Wenhua Liu, Yi Jin, and Yuanzhouhan Cao. 2021. SPGAN: Face forgery using spoofing generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1\u201320.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_2_36_2","first-page":"46","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops","author":"Li Yuezun","year":"2019","unstructured":"Yuezun Li and Siwei Lyu. 2019. Exposing DeepFake videos by detecting face warping artifacts. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Los Alamitos, CA, 46\u201352."},{"key":"e_1_3_2_37_2","first-page":"3204","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Li Yuezun","year":"2020","unstructured":"Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-DF: A large-scale challenging dataset for DeepFake forensics. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 3204\u20133213."},{"key":"e_1_3_2_38_2","first-page":"772","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Liu Honggu","year":"2021","unstructured":"Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. 2021. Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 772\u2013781."},{"key":"e_1_3_2_39_2","first-page":"Article 213, 22","article-title":"TCSD: Triple complementary streams detector for comprehensive deepfake detection","author":"Liu Xiaolong","year":"2023","unstructured":"Xiaolong Liu, Yang Yu, Xiaolong Li, Yao Zhao, and Guodong Guo. 2023. TCSD: Triple complementary streams detector for comprehensive deepfake detection. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 6 (2023), Article 213, 22 pages.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_2_40_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations. 1\u20138."},{"key":"e_1_3_2_41_2","first-page":"Article 47, 21","article-title":"A study of human-AI symbiosis for creative work: Recent developments and future directions in deep learning","author":"Mahmud Bahar Uddin","year":"2023","unstructured":"Bahar Uddin Mahmud, Guan Yue Hong, and Bernard Fong. 2023. A study of human-AI symbiosis for creative work: Recent developments and future directions in deep learning. ACM Transactions on Multimedia Computing, Communications, and Applications 20, 2 (2023), Article 47, 21 pages.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58571-6_39"},{"key":"e_1_3_2_43_2","first-page":"83","volume-title":"Proceedings of the Winter Applications of Computer Vision Workshops","author":"Matern Falko","year":"2019","unstructured":"Falko Matern, Christian Riess, and Marc Stamminger. 2019. Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the Winter Applications of Computer Vision Workshops. IEEE, Los Alamitos, CA, 83\u201392."},{"key":"e_1_3_2_44_2","first-page":"2823","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Mittal Trisha","year":"2020","unstructured":"Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2020. Emotions don\u2019t lie: An audio-visual deepfake detection method using affective cues. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 2823\u20132832."},{"key":"e_1_3_2_45_2","first-page":"8427","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Nagrani Arsha","year":"2018","unstructured":"Arsha Nagrani, Samuel Albanie, and Andrew Zisserman. 2018. Seeing voices and hearing faces: Cross-modal biometric matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 8427\u20138436."},{"key":"e_1_3_2_46_2","first-page":"2616","volume-title":"Proceedings of the Conference of the International Speech Communication Association","author":"Nagrani Arsha","year":"2017","unstructured":"Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: A large-scale speaker identification dataset. In Proceedings of the Conference of the International Speech Communication Association. 2616\u20132620."},{"key":"e_1_3_2_47_2","first-page":"2307","volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing","author":"Nguyen Huy H.","year":"2019","unstructured":"Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 2307\u20132311."},{"key":"e_1_3_2_48_2","first-page":"7183","volume-title":"Proceedings of the International Conference on Computer Vision","author":"Nirkin Yuval","year":"2019","unstructured":"Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. FSGAN: Subject agnostic face swapping and reenactment. In Proceedings of the International Conference on Computer Vision. IEEE, Los Alamitos, CA, 7183\u20137192."},{"key":"e_1_3_2_49_2","first-page":"7183","volume-title":"Proceedings of the International Conference on Computer Vision","author":"Nirkin Yuval","year":"2019","unstructured":"Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. FSGAN: Subject agnostic face swapping and reenactment. In Proceedings of the International Conference on Computer Vision. IEEE, Los Alamitos, CA, 7183\u20137192."},{"key":"e_1_3_2_50_2","first-page":"7539","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Oh Tae-Hyun","year":"2019","unstructured":"Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, and Wojciech Matusik. 2019. Speech2Face: Learning the face behind a voice. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 7539\u20137548."},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3287309"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413532"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58610-2_6"},{"key":"e_1_3_2_54_2","first-page":"1","volume-title":"Proceedings of the International Conference on Computer Vision","author":"R\u00f6ssler Andreas","year":"2019","unstructured":"Andreas R\u00f6ssler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nie\u00dfner. 2019. FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the International Conference on Computer Vision. IEEE, Los Alamitos, CA, 1\u201311."},{"key":"e_1_3_2_55_2","first-page":"199","volume-title":"Proceedings of the International Conference on Biometrics","author":"Sanderson Conrad","year":"2009","unstructured":"Conrad Sanderson and Brian C. Lovell. 2009. Multi-region probabilistic histograms for robust and scalable identity inference. In Proceedings of the International Conference on Biometrics. 199\u2013208."},{"issue":"4","key":"e_1_3_2_56_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3399659","article-title":"Eye-based recognition for user identification on mobile devices","volume":"16","author":"Shao Huiru","year":"2020","unstructured":"Huiru Shao, Jing Li, Jia Zhang, Hui Yu, and Jiande Sun. 2020. Eye-based recognition for user identification on mobile devices. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 4 (2020), 1\u201319.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-020-09974-4"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00530-021-00837-y"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-021-10989-8"},{"key":"e_1_3_2_60_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations. 1\u201310."},{"key":"e_1_3_2_61_2","first-page":"6105","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. 6105\u20136114."},{"key":"e_1_3_2_62_2","article-title":"Representation learning with contrastive predictive coding","volume":"1807","author":"Oord A\u00e4ron van den","year":"2018","unstructured":"A\u00e4ron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. CoRR abs\/1807.03748 (2018).","journal-title":"CoRR"},{"issue":"86","key":"e_1_3_2_63_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten Laurens van der","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579\u20132605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_64_2","first-page":"5998","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_2_65_2","first-page":"1136","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence","author":"Wang Yuhan","year":"2021","unstructured":"Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021. HifiFace: 3D shape and semantic prior guided high fidelity face swapping. In Proceedings of the International Joint Conference on Artificial Intelligence. 1136\u20131142."},{"key":"e_1_3_2_66_2","first-page":"16347","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Wen Peisong","year":"2021","unstructured":"Peisong Wen, Qianqian Xu, Yangbangyan Jiang, Zhiyong Yang, Yuan He, and Qingming Huang. 2021. Seeking the shape of sound: An adaptive framework for learning voice-face association. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 16347\u201316356."},{"key":"e_1_3_2_67_2","first-page":"2952","volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing","author":"Wu Xi","year":"2020","unstructured":"Xi Wu, Zhen Xie, YuTao Gao, and Yu Xiao. 2020. SSTNet: Detecting manipulated faces through spatial, steganalysis and temporal features. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 2952\u20132956."},{"key":"e_1_3_2_68_2","first-page":"Article 122, 23","article-title":"High-fidelity face reenactment via identity-matched correspondence learning","author":"Xue Han","year":"2023","unstructured":"Han Xue, Jun Ling, Anni Tang, Li Song, Rong Xie, and Wenjun Zhang. 2023. High-fidelity face reenactment via identity-matched correspondence learning. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 3 (2023), Article 122, 23 pages.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472810"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.108223"},{"key":"e_1_3_2_71_2","first-page":"8261","volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing","author":"Yang Xin","year":"2019","unstructured":"Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 8261\u20138265."},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3157450"},{"issue":"4","key":"e_1_3_2_73_2","first-page":"Article 94, 23","article-title":"Detection of AI-manipulated fake faces via mining generalized features","volume":"18","author":"Yu Yang","year":"2022","unstructured":"Yang Yu, Rongrong Ni, Wenjie Li, and Yao Zhao. 2022. Detection of AI-manipulated fake faces via mining generalized features. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 4 (2022), Article 94, 23 pages.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_2_74_2","first-page":"2185","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Zhao Hanqing","year":"2021","unstructured":"Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021. Multi-attentional deepfake detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2185\u20132194."},{"key":"e_1_3_2_75_2","first-page":"14800","volume-title":"Proceedings of the International Conference on Computer Vision","author":"Zhou Yipin","year":"2021","unstructured":"Yipin Zhou and Ser-Nam Lim. 2021. Joint audio-visual deepfake detection. In Proceedings of the International Conference on Computer Vision. IEEE, Los Alamitos, CA, 14800\u201314809."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3625231","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3625231","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:50:04Z","timestamp":1750287004000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3625231"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,10]]},"references-count":74,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3625231"],"URL":"https:\/\/doi.org\/10.1145\/3625231","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,10]]},"assertion":[{"value":"2023-01-27","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-18","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}