{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T05:57:03Z","timestamp":1761631023283,"version":"build-2065373602"},"reference-count":91,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,13]],"date-time":"2023-01-13T00:00:00Z","timestamp":1673568000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Department of Homeland Security (DHS), United States Secret Service","award":["70US0920D70090004"],"award-info":[{"award-number":["70US0920D70090004"]}]},{"DOI":"10.13039\/100000180","name":"National Computer Forensics Institute (NCFI)","doi-asserted-by":"publisher","award":["70US0920D70090004"],"award-info":[{"award-number":["70US0920D70090004"]}],"id":[{"id":"10.13039\/100000180","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Human faces are a core part of our identity and expression, and thus, understanding facial geometry is key to capturing this information. Automated systems that seek to make use of this information must have a way of modeling facial features in a way that makes them accessible. Hierarchical, multi-level architectures have the capability of capturing the different resolutions of representation involved. In this work, we propose using a hierarchical transformer architecture as a means of capturing a robust representation of facial geometry. We further demonstrate the versatility of our approach by using this transformer as a backbone to support three facial representation problems: face anti-spoofing, facial expression representation, and deepfake detection. The combination of effective fine-grained details alongside global attention representations makes this architecture an excellent candidate for these facial representation problems. We conduct numerous experiments first showcasing the ability of our approach to address common issues in facial modeling (pose, occlusions, and background variation) and capture facial symmetry, then demonstrating its effectiveness on three supplemental tasks.<\/jats:p>","DOI":"10.3390\/s23020929","type":"journal-article","created":{"date-parts":[[2023,1,13]],"date-time":"2023-01-13T02:57:33Z","timestamp":1673578653000},"page":"929","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Can Hierarchical Transformers Learn Facial Geometry?"],"prefix":"10.3390","volume":"23","author":[{"given":"Paul","family":"Young","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nima","family":"Ebadi","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX 78249, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7512-0523","authenticated-orcid":false,"given":"Arun","family":"Das","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX 78249, USA"},{"name":"Department of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mazal","family":"Bethany","sequence":"additional","affiliation":[{"name":"Department of Information Systems, University of Texas at San Antonio, San Antonio, TX 78249, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2964-8981","authenticated-orcid":false,"given":"Kevin","family":"Desai","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peyman","family":"Najafirad","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249, USA"},{"name":"Department of Information Systems, University of Texas at San Antonio, San Antonio, TX 78249, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1145\/954339.954342","article-title":"Face recognition: A literature survey","volume":"35","author":"Zhao","year":"2003","journal-title":"ACM Comput. Surv. CSUR"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kortli, Y., Jridi, M., Al Falou, A., and Atri, M. (2020). Face recognition systems: A survey. Sensors, 20.","DOI":"10.3390\/s20020342"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1530","DOI":"10.1109\/ACCESS.2014.2381273","article-title":"Biometric antispoofing methods: A survey in face recognition","volume":"2","author":"Galbally","year":"2014","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/j.inffus.2020.06.014","article-title":"Deepfakes and beyond: A survey of face manipulation and fake detection","volume":"64","author":"Tolosana","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1497","DOI":"10.1109\/JBHI.2017.2754861","article-title":"A survey on computer vision for assistive medical diagnosis from faces","volume":"22","author":"Thevenot","year":"2017","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_6","unstructured":"Meng, Q., Zhou, F., Ren, H., Feng, T., Liu, G., and Lin, Y. (2021, January 3\u20137). Improving Federated Learning Face Recognition via Privacy-Agnostic Clusters. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_7","unstructured":"Liu, C.T., Wang, C.Y., Chien, S.Y., and Lai, S.H. (March, January 22). FedFR: Joint optimization federated framework for generic and personalized face recognition. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Shome, D., and Kar, T. (2021, January 11\u201317). FedAffect: Few-shot federated learning for facial expression recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00463"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., and Song, L. (2017, January 21\u201326). Sphereface: Deep hypersphere embedding for face recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.713"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Hsu, H.K., Yao, C.H., Tsai, Y.H., Hung, W.C., Tseng, H.Y., Singh, M., and Yang, M.H. (2020, January 2\u20135). Progressive domain adaptation for object detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093358"},{"key":"ref_11","unstructured":"Tian, J., Hsu, Y.C., Shen, Y., Jin, H., and Kira, Z. (2021, January 13). Exploring Covariate and Concept Shift for Out-of-Distribution Detection. Proceedings of the NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, Virtual Event."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, Z., Wang, Z., Yu, Z., Deng, W., Li, J., Gao, T., and Wang, Z. (2022, January 19\u201320). Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00409"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Jia, Y., Zhang, J., Shan, S., and Chen, X. (2020, January 14\u201319). Single-side domain generalization for face anti-spoofing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00851"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yang, C., and Lim, S.N. (2020, January 14\u201319). One-shot domain adaptation for face generation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00596"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Cao, D., Zhu, X., Huang, X., Guo, J., and Lei, Z. (2020, January 14\u201319). Domain balancing: Face recognition on long-tailed domains. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00571"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, G., Han, H., Shan, S., and Chen, X. (2020, January 14\u201319). Cross-domain face presentation attack detection via multi-domain disentangled representation learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00671"},{"key":"ref_17","unstructured":"Zhu, X., Lei, Z., Yan, J., Yi, D., and Li, S.Z. (2015, January 27\u201330). High-fidelity pose and expression normalization for face recognition in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Quebec City, QC, Canada."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Dou, P., Wu, Y., Shah, S.K., and Kakadiaris, I.A. (2014, January 22\u201324). Robust 3D face shape reconstruction from single images via two-fold coupled structure learning. Proceedings of the British Machine Vision Conference, Vancouver, BC, Canada.","DOI":"10.5244\/C.28.131"},{"key":"ref_19","first-page":"394","article-title":"3D face reconstruction from a single image using a single reference face shape","volume":"33","author":"Basri","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Murugappan, M., and Mutawa, A. (2021). Facial geometric feature extraction based emotional expression classification using machine learning algorithms. PLoS ONE, 16.","DOI":"10.1371\/journal.pone.0247131"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Blanz, V., and Vetter, T. (1999, January 8\u201313). A morphable model for the synthesis of 3D faces. Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA.","DOI":"10.1145\/311535.311556"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Breuer, P., Kim, K.I., Kienzle, W., Scholkopf, B., and Blanz, V. (2008, January 12\u201315). Automatic 3D face reconstruction from single images or video. Proceedings of the 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, San Diego, CA, USA.","DOI":"10.1109\/AFGR.2008.4813339"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Saito, S., Wei, L., Hu, L., Nagano, K., and Li, H. (2017, January 21\u201326). Photorealistic facial texture inference using deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.250"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Hassner, T., Harel, S., Paz, E., and Enbar, R. (2015, January 7\u201312). Effective face frontalization in unconstrained images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299058"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1938","DOI":"10.1109\/TPAMI.2011.49","article-title":"Using facial symmetry to handle pose variations in real-world 3D face recognition","volume":"33","author":"Passalis","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Singh, A.K., and Nandi, G.C. (2012, January 26\u201328). Face recognition using facial symmetry. Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, Coimbatore, India.","DOI":"10.1145\/2393216.2393308"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Galterio, M.G., Shavit, S.A., and Hayajneh, T. (2018). A review of facial biometrics security for smart devices. Computers, 7.","DOI":"10.3390\/computers7030037"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Meng, Q., Zhao, S., Huang, Z., and Zhou, F. (2021, January 20\u201325). Magface: A universal representation for face recognition and quality assessment. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01400"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Lu, Y.D., Yang, S.T., and Lai, S.H. (2022, January 19\u201324). PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01964"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1242","DOI":"10.1016\/j.dss.2006.02.004","article-title":"Machine assessment of neonatal facial expressions of acute pain","volume":"43","author":"Brahnam","year":"2007","journal-title":"Decis. Support Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Das, A., Mock, J., Huang, Y., Golob, E., and Najafirad, P. (2021, January 2\u20139). Interpretable self-supervised facial micro-expression learning to predict cognitive state and neurological disorders. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event.","DOI":"10.1609\/aaai.v35i1.16164"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Xue, F., Wang, Q., and Guo, G. (2021, January 19\u201325). Transfer: Learning relation-aware facial expression representations with transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00358"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 19\u201325). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Haar, A. (1909). Zur Theorie der Orthogonalen Funktionensysteme, Georg-August-Universitat.","DOI":"10.1007\/BF01456326"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1007\/s001700050062","article-title":"Application of the discrete wavelet transform to the monitoring of tool failure in end milling using the spindle motor current","volume":"15","author":"Lee","year":"1999","journal-title":"Int. J. Adv. Manuf. Technol."},{"key":"ref_36","unstructured":"Papageorgiou, C.P., Oren, M., and Poggio, T. (1998, January 7). A general framework for object detection. Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), Bombay, India."},{"key":"ref_37","unstructured":"Lienhart, R., and Maydt, J. (2002, January 16\u201319). An extended set of haar-like features for rapid object detection. Proceedings of the International Conference on Image Processing, Bordeaux, France."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cviu.2007.09.014","article-title":"Speeded-up robust features (SURF)","volume":"110","author":"Bay","year":"2008","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_39","unstructured":"Viola, P., and Jones, M. (2001, January 8\u201314). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Mita, T., Kaneko, T., and Hori, O. (2005, January 13\u201316). Joint haar-like features for face detection. Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV\u201905) Volume 1, Nice, France.","DOI":"10.1109\/ICCV.2005.129"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Pham, M.T., and Cham, T.J. (2007, January 4\u201321). Fast training and selection of haar features using statistics in boosting-based face detection. Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil.","DOI":"10.1109\/ICCV.2007.4409038"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hu, G., Yang, Y., Yi, D., Kittler, J., Christmas, W., Li, S.Z., and Hospedales, T. (2015, January 7\u201313). When face recognition meets with deep learning: An evaluation of convolutional neural networks for face recognition. Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile.","DOI":"10.1109\/ICCVW.2015.58"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1761","DOI":"10.1109\/TPAMI.2018.2842770","article-title":"Wasserstein CNN: Learning invariant features for NIR-VIS face recognition","volume":"41","author":"He","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Sharma, S., Shanmugasundaram, K., and Ramasamy, S.K. (2016, January 25\u201327). FAREC\u2014CNN based efficient face recognition technique using Dlib. Proceedings of the 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), Ramanathapuram, India.","DOI":"10.1109\/ICACCCT.2016.7831628"},{"key":"ref_45","unstructured":"Parkhi, O.M., Vedaldi, A., and Zisserman, A. (2015, January 7\u201310). Deep face recognition. Proceedings of the British Machine Vision Conference, Swansea, UK."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_47","unstructured":"Kolesnikov, A., Dosovitskiy, A., Weissenborn, D., Heigold, G., Uszkoreit, J., Beyer, L., Minderer, M., Dehghani, M., Houlsby, N., and Gelly, S. (2021, January 3\u20137). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event."},{"key":"ref_48","unstructured":"Wang, W., Cao, Y., Zhang, J., and Tao, D. (2021, January 3\u20137). Fp-detr: Detection transformer advanced by fully pre-training. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_49","unstructured":"Song, H., Sun, D., Chun, S., Jampani, V., Han, D., Heo, B., Kim, W., and Yang, M.H. (2022, January 25\u201329). ViDT: An Efficient and Effective Fully Transformer-based Object Detector. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_50","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2021, January 3\u20137). Deformable DETR: Deformable Transformers for End-to-End Object Detection. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_51","unstructured":"Tang, S., Zhang, J., Zhu, S., and Tan, P. (2022, January 25\u201329). Quadtree Attention for Vision Transformers. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_52","unstructured":"Chen, R., Panda, R., and Fan, Q. (2022, January 25\u201329). RegionViT: Regional-to-Local Attention for Vision Transformers. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_53","first-page":"15908","article-title":"Transformer in transformer","volume":"34","author":"Han","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 10\u201317). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., and Yan, S. (2021, January 10\u201317). Tokens-to-token vit: Training vision transformers from scratch on imagenet. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00060"},{"key":"ref_56","unstructured":"Chu, X., Tian, Z., Zhang, B., Wang, X., Wei, X., Xia, H., and Shen, C. (2021). Conditional positional encodings for vision transformers. arXiv."},{"key":"ref_57","unstructured":"Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., and Kaiser, L. (2020). Rethinking attention with performers. arXiv."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Wang, Y.C., Wang, C.Y., and Lai, S.H. (2022, January 4\u20138). Disentangled Representation with Dual-stage Feature Learning for Face Anti-spoofing. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV51458.2022.00130"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"George, A., and Marcel, S. (2021, January 4\u20137). On the effectiveness of vision transformers for zero-shot face anti-spoofing. Proceedings of the 2021 IEEE International Joint Conference on Biometrics (IJCB), Shenzhen, China.","DOI":"10.1109\/IJCB52358.2021.9484333"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"1254","DOI":"10.1109\/TIFS.2022.3158062","article-title":"Learning multi-granularity temporal characteristics for face anti-spoofing","volume":"17","author":"Wang","year":"2022","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Zhang, W., Ji, X., Chen, K., Ding, Y., and Fan, C. (2021, January 10\u201317). Learning a facial expression embedding disentangled from identity. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Montreal, QC, Canada.","DOI":"10.1109\/CVPR46437.2021.00669"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Ruan, D., Yan, Y., Lai, S., Chai, Z., Shen, C., and Wang, H. (2021, January 10\u201317). Feature decomposition and reconstruction learning for effective facial expression recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Montreal, QC, Canada.","DOI":"10.1109\/CVPR46437.2021.00757"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Hong, J., Lee, C., and Jung, H. (2022). Late Fusion-Based Video Transformer for Facial Micro-Expression Recognition. Appl. Sci., 12.","DOI":"10.3390\/app12031169"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., and Yu, N. (2021, January 10\u201317). Multi-attentional deepfake detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Montreal, QC, Canada.","DOI":"10.1109\/CVPR46437.2021.00222"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Matern, F., Riess, C., and Stamminger, M. (2019, January 7\u201311). Exploiting visual artifacts to expose deepfakes and face manipulations. Proceedings of the 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), Waikoloa Village, HI, USA.","DOI":"10.1109\/WACVW.2019.00020"},{"key":"ref_66","unstructured":"Li, Y., and Lyu, S. (2018). Exposing deepfake videos by detecting face warping artifacts. arXiv."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"3286","DOI":"10.1109\/TIP.2019.2895466","article-title":"Hybrid lstm and encoder\u2013decoder architecture for detection of image forgeries","volume":"28","author":"Bappy","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Dong, X., Bao, J., Chen, D., Zhang, T., Zhang, W., Yu, N., Chen, D., Wen, F., and Guo, B. (2022, January 19\u201320). Protecting Celebrities from DeepFake with Identity Consistency Transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00925"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Mazaheri, G., and Roy-Chowdhury, A.K. (2022, January 4\u20138). Detection and Localization of Facial Expression Manipulations. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV51458.2022.00283"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Hosler, B., Salvi, D., Murray, A., Antonacci, F., Bestagini, P., Tubaro, S., and Stamm, M.C. (2021, January 10\u201317). Do deepfakes feel emotions? A semantic approach to detecting deepfakes via emotional inconsistencies. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Montreal, QC, Canada.","DOI":"10.1109\/CVPRW53098.2021.00112"},{"key":"ref_71","unstructured":"Ilyas, H., Javed, A., and Malik, K.M. (2022, December 01). Avfakenet: A Unified End-to-End Dense Swin Transformer Deep Learning Model for Audio-Visual Deepfakes Detection. Available online: https:\/\/www.scopus.com\/record\/display.uri?eid=2-s2.0-85138317182&origin=inward&txGid=925378ef2e24c5aebd9db8ca01390b3c."},{"key":"ref_72","first-page":"1","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_73","unstructured":"Loshchilov, I., and Hutter, F. (May, January 30). Decoupled Weight Decay Regularization. Proceedings of the International Conference on Learning Representations, Vancouver Convention Center, Vancouver, BC, Canada."},{"key":"ref_74","unstructured":"Wang, J., Liu, Y., Hu, Y., Shi, H., and Mei, T. (November, January 28). Facex-zoo: A pytorch toolbox for face recognition. Proceedings of the 29th ACM International Conference on Multimedia, Ottawa, ON, Canada."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Zhang, S., Wang, X., Liu, A., Zhao, C., Wan, J., Escalera, S., Shi, H., Wang, Z., and Li, S.Z. (2019, January 15\u201320). A Dataset and Benchmark for Large-Scale Multi-Modal Face Anti-Spoofing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00101"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Boulkenafet, Z., Komulainen, J., Li, L., Feng, X., and Hadid, A. (June, January 30). OULU-NPU: A Mobile Face Presentation Attack Database with Real-World Variations. Proceedings of the 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), Washington, DC, USA.","DOI":"10.1109\/FG.2017.77"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., and Nie\u00dfner, M. (2019, January 15\u201320). Faceforensics++: Learning to detect manipulated facial images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00009"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., and Matthews, I. (2010, January 13\u201318). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543262"},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"6111","DOI":"10.1109\/TPAMI.2021.3093446","article-title":"DeepFake detection based on discrepancies between faces and their context","volume":"44","author":"Nirkin","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Zhao, X., Liang, X., Liu, L., Li, T., Han, Y., Vasconcelos, N., and Yan, S. (2016, January 11\u201314). Peak-piloted deep network for facial expression recognition. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_27"},{"key":"ref_81","unstructured":"Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., and Guo, B. Face x-ray for more general face forgery detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA."},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Zeng, J., Shan, S., and Chen, X. (2018, January 8\u201314). Facial expression recognition with inconsistently annotated datasets. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_14"},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhang, Y., Song, Y., Liu, L., and Wang, J. (2022, January 19\u201320). Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01815"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Yang, H., Ciftci, U., and Yin, L. (2018, January 18\u201323). Facial expression recognition by de-expression residue learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00231"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., and Yu, N. (2021, January 10\u201317). Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Montreal, QC, Canada.","DOI":"10.1109\/CVPR46437.2021.00083"},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Ding, H., Zhou, S.K., and Chellappa, R. (June, January 30). Facenet2expnet: Regularizing a deep face recognition net for expression recognition. Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA.","DOI":"10.1109\/FG.2017.23"},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Li, X., Lang, Y., Chen, Y., Mao, X., He, Y., Wang, S., Xue, H., and Lu, Q. (2020, January 12\u201316). Sharp multiple instance learning for deepfake video detection. Proceedings of the 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA.","DOI":"10.1145\/3394171.3414034"},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Ruan, D., Yan, Y., Chen, S., Xue, J.H., and Wang, H. (2020, January 12\u201316). Deep disturbance-disentangled learning for facial expression recognition. Proceedings of the 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA.","DOI":"10.1145\/3394171.3413907"},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Liu, A., Tan, Z., Wan, J., Escalera, S., Guo, G., and Li, S.Z. (2021, January 5\u20139). Casia-surf cefa: A benchmark for multi-modal cross-ethnicity face anti-spoofing. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00122"},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Yang, X., Luo, W., Bao, L., Gao, Y., Gong, D., Zheng, S., Li, Z., and Liu, W. (2019, January 15\u201320). Face anti-spoofing: Model matters, so does data. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00362"},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Wang, Z., Yu, Z., Zhao, C., Zhu, X., Qin, Y., Zhou, Q., Zhou, F., and Lei, Z. (2020, January 14\u201319). Deep spatial gradient and temporal depth learning for face anti-spoofing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00509"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/2\/929\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:05:03Z","timestamp":1760119503000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/2\/929"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,13]]},"references-count":91,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["s23020929"],"URL":"https:\/\/doi.org\/10.3390\/s23020929","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,1,13]]}}}