{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T18:49:37Z","timestamp":1780339777490,"version":"3.54.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"11","license":[{"start":{"date-parts":[[2024,9,12]],"date-time":"2024-09-12T00:00:00Z","timestamp":1726099200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"European Union\u2014NextGenerationEU","award":["PE00000014"],"award-info":[{"award-number":["PE00000014"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>The capacity to create \u201cfake\u201d videos has recently raised concerns about the reliability of multimedia content. Identifying between true and false information is a critical step toward resolving this problem. On this issue, several algorithms utilizing deep learning and facial landmarks have yielded intriguing results. Facial landmarks are traits that are solely tied to the subject\u2019s head posture. Based on this observation, we study how Head Pose Estimation (HPE) patterns may be utilized to detect deepfakes in this work. The HPE patterns studied are based on FSA-Net, SynergyNet, and WSM, which are among the most performant approaches on the state-of-the-art. Finally, using a machine learning technique based on K-Nearest Neighbor and Dynamic Time Warping, their temporal patterns are categorized as authentic or false. We also offer a set of experiments for examining the feasibility of using deep learning techniques on such patterns. The findings reveal that the ability to recognize a deepfake video utilizing an HPE pattern is dependent on the HPE methodology. On the contrary, performance is less dependent on the performance of the utilized HPE technique. Experiments are carried out on the FaceForensics++ dataset that presents both identity swap and expression swap examples. The findings show that FSA-Net is an effective feature extraction method for determining whether a pattern belongs to a deepfake or not. The approach is also robust in comparison to deepfake videos created using various methods or for different goals. In the mean the method obtain 86% of accuracy on the identity swap task and 86.5% of accuracy on the expression swap. These findings offer up various possibilities and future directions for solving the deepfake detection problem using specialized HPE approaches, which are also known to be fast and reliable.<\/jats:p>","DOI":"10.1145\/3612928","type":"journal-article","created":{"date-parts":[[2023,8,3]],"date-time":"2023-08-03T12:27:17Z","timestamp":1691065637000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":29,"title":["Head Pose Estimation Patterns as Deepfake Detectors"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2537-2700","authenticated-orcid":false,"given":"Federico","family":"Becattini","sequence":"first","affiliation":[{"name":"Universit\u00e0 degli Studi di Siena, Siena, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1358-006X","authenticated-orcid":false,"given":"Carmen","family":"Bisogni","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Salerno, Fisciano, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4807-8942","authenticated-orcid":false,"given":"Vincenzo","family":"Loia","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Salerno, Fisciano, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5517-2198","authenticated-orcid":false,"given":"Chiara","family":"Pero","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Salerno, Fisciano, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5288-5523","authenticated-orcid":false,"given":"Fei","family":"Hao","sequence":"additional","affiliation":[{"name":"School of Computer Science, Shaanxi Normal University, Xi\u2019An, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,9,12]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2021. Face Swap algorithm. (2021). Retrieved from https:\/\/faceswap.dev\/. Accessed 10 September 2022."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2020.10.003"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.108591"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Darius Afchar Vincent Nozick Junichi Yamagishi and Isao Echizen. 2018. MesoNet: A compact facial video forgery detection network. IEEE International Workshop on Information Forensics and Security (WIFS\u201918) 1\u20137.","DOI":"10.1109\/WIFS.2018.8630761"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","unstructured":"Darius Afchar Vincent Nozick Junichi Yamagishi and I. Echizen. 2018. MesoNet: A compact facial video forgery detection network. IEEE International Workshop on Information Forensics and Security (WIFS\u201918) Hong Kong 1\u20137. DOI:10.1109\/WIFS.2018.8630761","DOI":"10.1109\/WIFS.2018.8630761"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/HORA55278.2022.9799858"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2984373"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3059409"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9413227"},{"key":"e_1_3_1_11_2","unstructured":"Kyunghyun Cho Bart Van Merri\u00ebnboer Caglar Gulcehre Dzmitry Bahdanau Fethi Bougares Holger Schwenk and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics Doha 1724\u20131734."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.195"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413700"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-06433-3_19"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448017.3457387"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.3390\/jimaging8100263"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58529-7_10"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683503"},{"key":"e_1_3_1_19_2","unstructured":"Young Jin Heo Young Ju Choi Young-Woon Lee and Byung-Gyu Kim. 2021. Deepfake detection scheme based on vision transformer and distillation. arXiv:2104.01353. Retrieved from https:\/\/arxiv.org\/abs\/2104.01353"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2866770"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICOSST57195.2022.10016871"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-019-00619-1"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.241"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.119843"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00336"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2977346"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00327"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3558004"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00156"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2021.3051497"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3093446"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3154404"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00009"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00009"},{"key":"e_1_3_1_36_2","volume-title":"Nearest-neighbor Methods in Learning and Vision","author":"Shakhnarovich Gregory","year":"2005","unstructured":"Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk. 2005. Nearest-neighbor Methods in Learning and Vision. MIT Press."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-66218-9_27"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3046323"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/3DV53792.2021.00055"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2909327"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/SSCI47803.2020.9308428"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACVW58289.2023.00074"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00118"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683164"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1049\/bme2.12031"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3612928","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3612928","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:29:18Z","timestamp":1750285758000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3612928"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,12]]},"references-count":44,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3612928"],"URL":"https:\/\/doi.org\/10.1145\/3612928","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,12]]},"assertion":[{"value":"2023-03-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-26","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}