{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T11:44:54Z","timestamp":1769946294472,"version":"3.49.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2018,12,4]],"date-time":"2018-12-04T00:00:00Z","timestamp":1543881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2018,12,31]]},"abstract":"<jats:p>With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.<\/jats:p>","DOI":"10.1145\/3272127.3275075","type":"journal-article","created":{"date-parts":[[2018,11,28]],"date-time":"2018-11-28T19:16:10Z","timestamp":1543432570000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":125,"title":["paGAN"],"prefix":"10.1145","volume":"37","author":[{"given":"Koki","family":"Nagano","sequence":"first","affiliation":[{"name":"USC Institute for Creative Technologies"}]},{"given":"Jaewoo","family":"Seo","sequence":"additional","affiliation":[{"name":"Pinscreen"}]},{"given":"Jun","family":"Xing","sequence":"additional","affiliation":[{"name":"USC Institute for Creative Technologies"}]},{"given":"Lingyu","family":"Wei","sequence":"additional","affiliation":[{"name":"Pinscreen"}]},{"given":"Zimo","family":"Li","sequence":"additional","affiliation":[{"name":"University of Southern California"}]},{"given":"Shunsuke","family":"Saito","sequence":"additional","affiliation":[{"name":"University of Southern California"}]},{"given":"Aviral","family":"Agarwal","sequence":"additional","affiliation":[{"name":"Pinscreen"}]},{"given":"Jens","family":"Fursund","sequence":"additional","affiliation":[{"name":"Pinscreen"}]},{"given":"Hao","family":"Li","sequence":"additional","affiliation":[{"name":"University of Southern California, USC Institute for Creative Technologies"}]}],"member":"320","published-online":{"date-parts":[[2018,12,4]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"crossref","unstructured":"P. Ekman and W. Friesen. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press Palo Alto.  P. Ekman and W. Friesen. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press Palo Alto.","DOI":"10.1037\/t27734-000"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503385.2503387"},{"key":"e_1_2_2_3_1","volume-title":"International Conference on Automatic Face Gesture Recognition. 1--6.","author":"Amberg B.","unstructured":"B. Amberg , R. Knothe , and T. Vetter . 2008. Expression Invariant 3D Face Recognition with a Morphable Model . In International Conference on Automatic Face Gesture Recognition. 1--6. B. Amberg, R. Knothe, and T. Vetter. 2008. Expression Invariant 3D Face Recognition with a Morphable Model. In International Conference on Automatic Face Gesture Recognition. 1--6."},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130818"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/311535.311556"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1009-7"},{"key":"e_1_2_2_7_1","volume-title":"Conference on Computer Vision and Pattern Recognition. 5543--5552","author":"Booth J.","unstructured":"J. Booth , A. Roussos , S. Zafeiriou , A. Ponniahy , and D. Dunaway . 2016. A 3D Morphable Model Learnt from 10,000 Faces . In Conference on Computer Vision and Pattern Recognition. 5543--5552 . J. Booth, A. Roussos, S. Zafeiriou, A. Ponniahy, and D. Dunaway. 2016. A 3D Morphable Model Learnt from 10,000 Faces. In Conference on Computer Vision and Pattern Recognition. 5543--5552."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461912.2461976"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766943"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2013.249"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925873"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2915926.2915936"},{"key":"e_1_2_2_13_1","doi-asserted-by":"crossref","unstructured":"Y. Choi M. Choi M. Kim J.-W. Ha S. Kim and J. Choo. 2017. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. arXiv preprint arXiv:1711.09020 (2017).  Y. Choi M. Choi M. Kim J.-W. Ha S. Kim and J. Choo. 2017. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. arXiv preprint arXiv:1711.09020 (2017).","DOI":"10.1109\/CVPR.2018.00916"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2070781.2024164"},{"key":"e_1_2_2_15_1","volume-title":"Exprgan: Facial expression editing with controllable expression intensity. arXiv preprint arXiv:1709.03842","author":"Ding H.","year":"2017","unstructured":"H. Ding , K. Sricharan , and R. Chellappa . 2017 . Exprgan: Facial expression editing with controllable expression intensity. arXiv preprint arXiv:1709.03842 (2017). H. Ding, K. Sricharan, and R. Chellappa. 2017. Exprgan: Facial expression editing with controllable expression intensity. arXiv preprint arXiv:1709.03842 (2017)."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1322355111"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2890493"},{"key":"e_1_2_2_18_1","unstructured":"L. A. Gatys A. S. Ecker and M. Bethge. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).  L. A. Gatys A. S. Ecker and M. Bethge. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)."},{"key":"e_1_2_2_19_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1675--1683","author":"Hsieh P.-L.","unstructured":"P.-L. Hsieh , C. Ma , J. Yu , and H. Li . 2015. Unconstrained realtime facial performance capture . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1675--1683 . P.-L. Hsieh, C. Ma, J. Yu, and H. Li. 2015. Unconstrained realtime facial performance capture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1675--1683."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.31310887"},{"key":"e_1_2_2_21_1","volume-title":"Mesoscopic Facial Geometry Inference Using Deep Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Huynh L.","unstructured":"L. Huynh , W. Chen , S. Saito , J. Xing , K. Nagano , A. Jones , P. Debevec , and H. Li . 2018 . Mesoscopic Facial Geometry Inference Using Deep Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). L. Huynh, W. Chen, S. Saito, J. Xing, K. Nagano, A. Jones, P. Debevec, and H. Li. 2018. Mesoscopic Facial Geometry Inference Using Deep Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_22_1","unstructured":"P. Isola J.-Y. Zhu T. Zhou and A. A. Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016).  P. Isola J.-Y. Zhu T. Zhou and A. A. Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016)."},{"key":"e_1_2_2_23_1","unstructured":"J. Jimenez T. Scully N. Barbosa C. Donner X. Alvarez T. Viera P. Matts V. Orvalho D. Gutierrez and T. Weyrich. 2010. A Practical Appearance Model for Dynamic Facial Color. 29 5 (2010) 141:1--141:9.  J. Jimenez T. Scully N. Barbosa C. Donner X. Alvarez T. Viera P. Matts V. Orvalho D. Gutierrez and T. Weyrich. 2010. A Practical Appearance Model for Dynamic Facial Color. 29 5 (2010) 141:1--141:9."},{"key":"e_1_2_2_24_1","unstructured":"T. Karras T. Aila S. Laine and J. Lehtinen. 2017. Progressive growing of gans for improved quality stability and variation. arXiv preprint arXiv:1710.10196 (2017).  T. Karras T. Aila S. Laine and J. Lehtinen. 2017. Progressive growing of gans for improved quality stability and variation. arXiv preprint arXiv:1710.10196 (2017)."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201283"},{"key":"e_1_2_2_26_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma D.","year":"2014","unstructured":"D. Kingma and J. Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_2_2_27_1","volume-title":"Presentation and validation of the Radboud Faces Database. Cognition and emotion 24, 8","author":"Langner O.","year":"2010","unstructured":"O. Langner , R. Dotsch , G. Bijlstra , D. H. Wigboldus , S. T. Hawk , and A. Van Knippenberg . 2010. Presentation and validation of the Radboud Faces Database. Cognition and emotion 24, 8 ( 2010 ), 1377--1388. O. Langner, R. Dotsch, G. Bijlstra, D. H. Wigboldus, S. T. Hawk, and A. Van Knippenberg. 2010. Presentation and validation of the Radboud Faces Database. Cognition and emotion 24, 8 (2010), 1377--1388."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618521"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778769"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461912.2462019"},{"key":"e_1_2_2_31_1","doi-asserted-by":"crossref","unstructured":"D. S. Ma J. Correll and B. Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47 4 (2015) 1122--1135.  D. S. Ma J. Correll and B. Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47 4 (2015) 1122--1135.","DOI":"10.3758\/s13428-014-0532-5"},{"key":"e_1_2_2_32_1","unstructured":"K. Olszewski Z. Li C. Yang Y. Zhou R. Yu Z. Huang S. Xiang S. Saito P. Kohli and H. Li. Realistic dynamic facial textures from a single image using gans.  K. Olszewski Z. Li C. Yang Y. Zhou R. Yu Z. Huang S. Xiang S. Saito P. Kohli and H. Li. Realistic dynamic facial textures from a single image using gans."},{"key":"e_1_2_2_33_1","doi-asserted-by":"crossref","unstructured":"S. Saito T. Li and H. Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. In ECCV.  S. Saito T. Li and H. Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. In ECCV.","DOI":"10.1007\/978-3-319-46484-8_15"},{"key":"e_1_2_2_34_1","volume-title":"Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.","author":"Saito S.","unstructured":"S. Saito , L. Wei , L. Hu , K. Nagano , and H. Li . 2017. Photorealistic Facial Texture Inference Using Deep Neural Networks . In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE. S. Saito, L. Wei, L. Hu, K. Nagano, and H. Li. 2017. Photorealistic Facial Texture Inference Using Deep Neural Networks. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2070781.2024196"},{"key":"e_1_2_2_36_1","unstructured":"L. Song Z. Lu R. He Z. Sun and T. Tan. 2017. Geometry Guided Adversarial Facial Expression Synthesis. arXiv preprint arXiv:1712.03474 (2017).  L. Song Z. Lu R. He Z. Sun and T. Tan. 2017. Geometry Guided Adversarial Facial Expression Synthesis. arXiv preprint arXiv:1712.03474 (2017)."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2011.5771467"},{"key":"e_1_2_2_38_1","doi-asserted-by":"crossref","unstructured":"S. Suwajanakorn I. Kemelmacher-Shlizerman and S. M. Seitz. 2014. Total moving face reconstruction. In ECCV. Springer 796--812.  S. Suwajanakorn I. Kemelmacher-Shlizerman and S. M. Seitz. 2014. Total moving face reconstruction. In ECCV. Springer 796--812.","DOI":"10.1007\/978-3-319-10593-2_52"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073640"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818056"},{"key":"e_1_2_2_41_1","doi-asserted-by":"crossref","unstructured":"J. Thies M. Zollh\u00f6fer M. Stamminger C. Theobalt and M. Nie\u00dfner. 2016a. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In IEEE CVPR.  J. Thies M. Zollh\u00f6fer M. Stamminger C. Theobalt and M. Nie\u00dfner. 2016a. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In IEEE CVPR.","DOI":"10.1109\/CVPR.2016.262"},{"key":"e_1_2_2_42_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387--2395","author":"Thies J.","unstructured":"J. Thies , M. Zollhofer , M. Stamminger , C. Theobalt , and M. Nie\u00dfner . 2016b. Face2face: Real-time face capture and reenactment of rgb videos . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387--2395 . J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nie\u00dfner. 2016b. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387--2395."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1073204.1073209"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964972"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1599470.1599472"},{"key":"e_1_2_2_46_1","unstructured":"X. Wu R. He Z. Sun and T. Tan. 2015. A light CNN for deep face representation with noisy labels. arXiv preprint arXiv:1511.02683 (2015).  X. Wu R. He Z. Sun and T. Tan. 2015. A light CNN for deep face representation with noisy labels. arXiv preprint arXiv:1511.02683 (2015)."},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201364"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964955"},{"key":"e_1_2_2_49_1","doi-asserted-by":"crossref","unstructured":"J.-Y. Zhu T. Park P. Isola and A. A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).  J.-Y. Zhu T. Park P. Isola and A. A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).","DOI":"10.1109\/ICCV.2017.244"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3272127.3275075","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3272127.3275075","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:44:05Z","timestamp":1750207445000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3272127.3275075"}},"subtitle":["real-time avatars using dynamic textures"],"short-title":[],"issued":{"date-parts":[[2018,12,4]]},"references-count":49,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2018,12,31]]}},"alternative-id":["10.1145\/3272127.3275075"],"URL":"https:\/\/doi.org\/10.1145\/3272127.3275075","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,4]]},"assertion":[{"value":"2018-12-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}