{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T23:05:07Z","timestamp":1773356707947,"version":"3.50.1"},"publisher-location":"Cham","reference-count":55,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783030876630","type":"print"},{"value":"9783030876647","type":"electronic"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T00:00:00Z","timestamp":1643587200000},"content-version":"vor","delay-in-days":30,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Several\u00a0sophisticated convolutional neural network (CNN)\u00a0architectures have been devised that have achieved impressive results in various domains. One downside of this success is the advent of attacks using deepfakes, a family of tools that enable anyone to use a personal computer to easily create fake videos of someone from a short video found online. Several detectors have been introduced to deal with such attacks. To achieve state-of-the-art performance, CNN-based\u00a0detectors have usually been upgraded by increasing their depth and\/or their width, adding more internal connections, or fusing several features or predicted probabilities from multiple CNNs. As a result, CNN-based detectors have become bigger, consume more memory and computation power, and require more training data. Moreover, there is concern about their generalizability to deal with unseen manipulation methods. In this chapter, we argue that our forensic-oriented capsule network\u00a0overcomes these limitations and is more suitable than conventional CNNs\u00a0to detect deepfakes. The superiority of our \u201cCapsule-Forensics\u201d  network is due to the use of a pretrained feature extractor, statistical pooling layers, and a dynamic routing algorithm. This design enables the Capsule-Forensics network to outperform a CNN\u00a0with a similar design and to be from 5 to 11 times smaller than a CNN\u00a0with similar performance.<\/jats:p>","DOI":"10.1007\/978-3-030-87664-7_13","type":"book-chapter","created":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T09:03:06Z","timestamp":1643619786000},"page":"275-301","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Capsule-Forensics Networks for\u00a0Deepfake Detection"],"prefix":"10.1007","author":[{"given":"Huy H.","family":"Nguyen","sequence":"first","affiliation":[]},{"given":"Junichi","family":"Yamagishi","sequence":"additional","affiliation":[]},{"given":"Isao","family":"Echizen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,31]]},"reference":[{"key":"13_CR1","unstructured":"Contributing data to deepfake detection research. https:\/\/ai.googleblog.com\/2019\/09\/contributing-data-to-deepfake-detection.html. Accessed 24 Sept 2019"},{"key":"13_CR2","unstructured":"Dexter studio. http:\/\/dexterstudios.com\/en\/. Accessed 01 Sept 2019"},{"key":"13_CR3","unstructured":"Terrifying high-tech porn: Creepy \u2019deepfake\u2019 videos are on the rise. https:\/\/www.foxnews.com\/tech\/terrifying-high-tech-porn-creepy-deepfake-videos-are-on-the-rise. Accessed 17 Feb 2018"},{"key":"13_CR4","doi-asserted-by":"crossref","unstructured":"Afchar D, Nozick V, Yamagishi J, Echizen I (2018) MesoNet: a compact facial video forgery detection network. In: International workshop on information forensics and security (WIFS). IEEE","DOI":"10.1109\/WIFS.2018.8630761"},{"key":"13_CR5","unstructured":"Agarwal S, Farid H, Gu Y, He M, Nagano K, Li H (2019) Protecting world leaders against deep fakes. In: Conference on computer vision and pattern recognition workshops (CVPRW), pp 38\u201345"},{"issue":"4","key":"13_CR6","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1109\/MCG.2010.65","volume":"30","author":"O Alexander","year":"2010","unstructured":"Alexander O, Rogers M, Lambeth W, Chiang JY, Ma WC, Wang CC, Debevec P (2010) The digital emily project: Achieving a photorealistic digital actor. IEEE Comput Graph Appl 30(4):20\u201331","journal-title":"IEEE Comput Graph Appl"},{"key":"13_CR7","doi-asserted-by":"crossref","unstructured":"Averbuch-Elor H, Cohen-Or D, Kopf J, Cohen MF (2017) Bringing portraits to life. ACM Trans Graph","DOI":"10.1145\/3130800.3130818"},{"key":"13_CR8","unstructured":"Bahadori MT (2018) Spectral capsule networks. In: International conference on learning representations (ICLR)"},{"key":"13_CR9","doi-asserted-by":"crossref","unstructured":"Bappy JH, Simons C, Nataraj L, Manjunath B, Roy-Chowdhury AK (2019) Hybrid lstm and encoder-decoder architecture for detection of image forgeries. IEEE Trans Image Process","DOI":"10.1109\/TIP.2019.2895466"},{"key":"13_CR10","doi-asserted-by":"crossref","unstructured":"Bayar B, Stamm MC (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Workshop on information hiding and multimedia security (IH&MMSEC). ACM","DOI":"10.1145\/2909827.2930786"},{"key":"13_CR11","doi-asserted-by":"crossref","unstructured":"Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Conference on computer vision and pattern recognition (CVPR). IEEE","DOI":"10.1109\/CVPR.2017.195"},{"key":"13_CR12","doi-asserted-by":"crossref","unstructured":"Cozzolino D, Poggi G, Verdoliva L (2017) Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Workshop on information hiding and multimedia security (IH&MMSEC). ACM","DOI":"10.1145\/3082031.3083247"},{"key":"13_CR13","unstructured":"Dolhansky B, Bitton J, Pflaum B, Lu J, Howes R, Wang M, Ferrer CC (2020) The deepfake detection challenge dataset. arXiv preprint arXiv:2006.07397"},{"key":"13_CR14","doi-asserted-by":"crossref","unstructured":"Fridrich J, Kodovsky J (2012) Rich models for stage analysis of digital images. IEEE Trans Inf Foren Sec","DOI":"10.1109\/TIFS.2012.2190402"},{"key":"13_CR15","doi-asserted-by":"crossref","unstructured":"Fried O, Tewari A, Zollh\u00f6fer M, Finkelstein A, Shechtman E, Goldman DB, Genova K, Jin Z, Theobalt C, Agrawala M (2019) Text-based editing of talking-head video. In: International conference and exhibition on computer graphics and interactive techniques (SIGGRAPH). ACM","DOI":"10.1145\/3306346.3323028"},{"key":"13_CR16","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (CVPR), pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"13_CR17","doi-asserted-by":"crossref","unstructured":"Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: International conference on artificial neural networks (ICANN). Springer","DOI":"10.1007\/978-3-642-21735-7_6"},{"key":"13_CR18","unstructured":"Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with EM routing. In: International conference on learning representations workshop (ICLRW)"},{"key":"13_CR19","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Van Der\u00a0Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Conference on computer vision and pattern recognition (CVPR), pp 4700\u20134708","DOI":"10.1109\/CVPR.2017.243"},{"key":"13_CR20","doi-asserted-by":"crossref","unstructured":"Jiang L, Li R, Wu W, Qian C, Loy CC (2020) Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In: Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR42600.2020.00296"},{"key":"13_CR21","doi-asserted-by":"crossref","unstructured":"Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Conference on computer vision and pattern recognition (CVPR), pp 4401\u20134410","DOI":"10.1109\/CVPR.2019.00453"},{"key":"13_CR22","doi-asserted-by":"crossref","unstructured":"Kim H, Garrido P, Tewari A, Xu W, Thies J, Nie\u00dfner M, P\u00e9rez P, Richardt C, Zollh\u00f6fer M, Theobalt C (2018) Deep video portraits. In: International conference and exhibition on computer graphics and interactive techniques (SIGGRAPH). ACM","DOI":"10.1145\/3197517.3201283"},{"key":"13_CR23","unstructured":"Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations (ICLR)"},{"key":"13_CR24","doi-asserted-by":"crossref","unstructured":"Korshunov P, Marcel S (2018) Speaker inconsistency detection in tampered video. In: European signal processing conference (EUSIPCO). IEEE, pp 2375\u20132379","DOI":"10.23919\/EUSIPCO.2018.8553270"},{"key":"13_CR25","doi-asserted-by":"crossref","unstructured":"Korshunov P, Marcel S (2019) Vulnerability assessment and detection of deepfake videos. In: International conference on biometrics (ICB)","DOI":"10.1109\/ICB45273.2019.8987375"},{"key":"13_CR26","doi-asserted-by":"crossref","unstructured":"Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B (2020) Face x-ray for more general face forgery detection. In: Conference on computer vision and pattern recognition (CVPR), pp 5001\u20135010","DOI":"10.1109\/CVPR42600.2020.00505"},{"key":"13_CR27","doi-asserted-by":"crossref","unstructured":"Li Y, Chang MC, Farid H, Lyu S (2018) In ictu oculi: Exposing AI generated fake face videos by detecting eye blinking. arXiv preprint arXiv:1806.02877","DOI":"10.1109\/WIFS.2018.8630787"},{"key":"13_CR28","doi-asserted-by":"crossref","unstructured":"Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-df: A large-scale challenging dataset for deepfake forensics. In: Conference on computer vision and pattern recognition (CVPR), pp 3207\u20133216","DOI":"10.1109\/CVPR42600.2020.00327"},{"key":"13_CR29","doi-asserted-by":"crossref","unstructured":"Liu A, Wan J, Escalera S, Jair\u00a0Escalante H, Tan Z, Yuan Q, Wang K, Lin C, Guo G, Guyon I et\u00a0al (2019) Multi-modal face anti-spoofing attack detection challenge at cvpr2019. In: Conference on computer vision and pattern recognition workshops (CVPRW), pp 0\u20130","DOI":"10.1109\/CVPRW.2019.00202"},{"key":"13_CR30","doi-asserted-by":"crossref","unstructured":"Nguyen HH, Fang F, Yamagishi J, Echizen I (2019) Multi-task learning for detecting and segmenting manipulated facial images and videos. In: International conference on biometrics: theory, applications and systems (BTAS). IEEE","DOI":"10.1109\/BTAS46853.2019.9185974"},{"key":"13_CR31","doi-asserted-by":"crossref","unstructured":"Nguyen HH, Tieu NDT, Nguyen-Son HQ, Nozick V, Yamagishi J, Echizen I (2018) Modular convolutional neural network for discriminating between computer-generated images and photographic images. In: International conference on availability, reliability and security (ARES). ACM","DOI":"10.1145\/3230833.3230863"},{"key":"13_CR32","doi-asserted-by":"crossref","unstructured":"Nguyen HH, Yamagishi J, Echizen I (2019) Capsule-forensics: Using capsule networks to detect forged images and videos. In: International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2307\u20132311","DOI":"10.1109\/ICASSP.2019.8682602"},{"key":"13_CR33","doi-asserted-by":"crossref","unstructured":"Nirkin Y, Keller Y, Hassner T (2019) Fsgan: Subject agnostic face swapping and reenactment. In: International conference on computer vision (ICCV). IEEE","DOI":"10.1109\/ICCV.2019.00728"},{"key":"13_CR34","unstructured":"Ozbulak U (2019) Pytorch cnn visualizations. https:\/\/github.com\/utkuozbulak\/pytorch-cnn-visualizations"},{"key":"13_CR35","doi-asserted-by":"crossref","unstructured":"Rahmouni N, Nozick V, Yamagishi J, Echizen I (2017) Distinguishing computer graphics from natural images using convolution neural networks. In: International workshop on information forensics and security (WIFS). IEEE","DOI":"10.1109\/WIFS.2017.8267647"},{"key":"13_CR36","doi-asserted-by":"crossref","unstructured":"R\u00f6ssler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nie\u00dfner M (2019) Faceforensics++: learning to detect manipulated facial images. In: International conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2019.00009"},{"key":"13_CR37","doi-asserted-by":"crossref","unstructured":"Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet Large scale visual recognition challenge. Int J Comput Vis","DOI":"10.1007\/s11263-015-0816-y"},{"key":"13_CR38","unstructured":"Sabir E, Cheng J, Jaiswal A, AbdAlmageed W, Masi I, Natarajan P (2019) Recurrent convolutional strategies for face manipulation detection in videos. In: Conference on computer vision and pattern recognition workshops (CVPRW), pp 80\u201387"},{"key":"13_CR39","unstructured":"Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Conference on Neural Information Processing Systems (NIPS)"},{"key":"13_CR40","doi-asserted-by":"crossref","unstructured":"Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: International conference on computer vision (ICCV). IEEE, pp 618\u2013626","DOI":"10.1109\/ICCV.2017.74"},{"key":"13_CR41","unstructured":"Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)"},{"key":"13_CR42","doi-asserted-by":"crossref","unstructured":"Suwajanakorn S, Seitz SM, Kemelmacher-Shlizerman I (2017) Synthesizing obama: learning lip sync from audio. ACM Trans Graph","DOI":"10.1145\/3072959.3073640"},{"key":"13_CR43","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Conference on computer vision and pattern recognition (CVPR), pp 1\u20139","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"13_CR44","unstructured":"Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning (ICML), pp 6105\u20136114"},{"key":"13_CR45","doi-asserted-by":"crossref","unstructured":"Thies J, Zollh\u00f6fer M, Nie\u00dfner M (2019) Deferred neural rendering: image synthesis using neural textures. In: Computer graphics and interactive techniques (SIGGRAPH). ACM","DOI":"10.1145\/3306346.3323035"},{"key":"13_CR46","doi-asserted-by":"crossref","unstructured":"Thies, J, Zollhofer, M, Stamminger, M, Theobalt, C, Nie\u00dfner, M (2016) Face2Face: real-time face capture and reenactment of RGB videos. In: Conference on computer vision and pattern recognition (CVPR). IEEE","DOI":"10.1109\/CVPR.2016.262"},{"key":"13_CR47","doi-asserted-by":"crossref","unstructured":"Tripathy S, Kannala J, Rahtu E (2019) Icface: interpretable and controllable face reenactment using gans. arXiv preprint arXiv:1904.01909","DOI":"10.1109\/WACV45572.2020.9093474"},{"key":"13_CR48","doi-asserted-by":"crossref","unstructured":"Vougioukas K, Center SA, Petridis S, Pantic M (2019) End-to-end speech-driven realistic facial animation with temporal gans. In: Conference on computer vision and pattern recognition workshops (CVPRW), pp 37\u201340","DOI":"10.1007\/s11263-019-01251-8"},{"key":"13_CR49","doi-asserted-by":"crossref","unstructured":"Wang SY, Wang O, Owens A, Zhang R, Efros AA (2019) Detecting photoshopped faces by scripting photoshop. In: International conference on computer vision (ICCV). IEEE","DOI":"10.1109\/ICCV.2019.01017"},{"key":"13_CR50","unstructured":"Xi E, Bing S, Jin Y (2017) Capsule network performance on complex data. arXiv preprint arXiv:1712.03480"},{"issue":"12","key":"13_CR51","doi-asserted-by":"publisher","first-page":"1850","DOI":"10.1109\/LSP.2018.2873892","volume":"25","author":"C Xiang","year":"2018","unstructured":"Xiang C, Zhang L, Tang Y, Zou W, Xu C (2018) Ms-capsnet: a novel multi-scale capsule network. IEEE Signal Process Lett 25(12):1850\u20131854","journal-title":"IEEE Signal Process Lett"},{"key":"13_CR52","doi-asserted-by":"crossref","unstructured":"Zagoruyko S, Komodakis N (2016) Wide residual networks. In: British machine vision conference (BMVC). BMVA","DOI":"10.5244\/C.30.87"},{"key":"13_CR53","doi-asserted-by":"crossref","unstructured":"Zakharov E, Shysheya A, Burkov E, Lempitsky V (2019) Few-shot adversarial learning of realistic neural talking head models. arXiv preprint arXiv:1905.08233","DOI":"10.1109\/ICCV.2019.00955"},{"key":"13_CR54","doi-asserted-by":"crossref","unstructured":"Zhou P, Han X, Morariu VI, Davis LS (2017) Two-stream neural networks for tampered face detection. In: Conference on computer vision and pattern recognition workshop (CVPRW). IEEE","DOI":"10.1109\/CVPRW.2017.229"},{"key":"13_CR55","doi-asserted-by":"crossref","unstructured":"Zhou P, Han X, Morariu VI, Davis LS (2018) Learning rich features for image manipulation detection. In: Conference on computer vision and pattern recognition (CVPR), pp 1053\u20131061","DOI":"10.1109\/CVPR.2018.00116"}],"container-title":["Advances in Computer Vision and Pattern Recognition","Handbook of Digital Face Manipulation and Detection"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-87664-7_13","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,27]],"date-time":"2024-01-27T16:07:08Z","timestamp":1706371628000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-87664-7_13"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783030876630","9783030876647"],"references-count":55,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-87664-7_13","relation":{},"ISSN":["2191-6586","2191-6594"],"issn-type":[{"value":"2191-6586","type":"print"},{"value":"2191-6594","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"31 January 2022","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}}]}}