{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T14:39:09Z","timestamp":1767969549140,"version":"3.49.0"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2005,10,1]],"date-time":"2005-10-01T00:00:00Z","timestamp":1128124800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2005,10]]},"abstract":"<jats:p>Speech-driven facial motion synthesis is a well explored research topic. However, little has been done to model expressive visual behavior during speech. We address this issue using a machine learning approach that relies on a database of speech-related high-fidelity facial motions. From this training set, we derive a generative model of expressive facial motion that incorporates emotion control, while maintaining accurate lip-synching. The emotional content of the input speech can be manually specified by the user or automatically extracted from the audio signal using a Support Vector Machine classifier.<\/jats:p>","DOI":"10.1145\/1095878.1095881","type":"journal-article","created":{"date-parts":[[2005,11,7]],"date-time":"2005-11-07T16:00:45Z","timestamp":1131379245000},"page":"1283-1302","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":120,"title":["Expressive speech-driven facial animation"],"prefix":"10.1145","volume":"24","author":[{"given":"Yong","family":"Cao","sequence":"first","affiliation":[{"name":"University of California, Los Angeles, Los Angeles, CA; University of Southern California, ICT"}]},{"given":"Wen C.","family":"Tien","sequence":"additional","affiliation":[{"name":"University of Southern California, ICT"}]},{"given":"Petros","family":"Faloutsos","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles, Los Angeles, CA"}]},{"given":"Fr\u00e9d\u00e9ric","family":"Pighin","sequence":"additional","affiliation":[{"name":"University of Southern California, ICT"}]}],"member":"320","published-online":{"date-parts":[[2005,10]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the International Conference on Computer Graphics, Visualization, and Computer Vision (WSCG'02)","author":"Albrecht I.","year":"2002","unstructured":"Albrecht , I. , Haber , J. , and peter Seidel , H. 2002 . Speech synchronization for physics-based facial animation . In Proceedings of the International Conference on Computer Graphics, Visualization, and Computer Vision (WSCG'02) . 9--16. Albrecht, I., Haber, J., and peter Seidel, H. 2002. Speech synchronization for physics-based facial animation. In Proceedings of the International Conference on Computer Graphics, Visualization, and Computer Vision (WSCG'02). 9--16."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of ACM SIGGRAPH","author":"Brand M.","year":"1999","unstructured":"Brand , M. 1999 . Voice puppetry . In Proceedings of ACM SIGGRAPH 1999. ACM Press\/Addison-Wesley Publishing Co. 21--28. 10.1145\/311535.311537 Brand, M. 1999. Voice puppetry. In Proceedings of ACM SIGGRAPH 1999. ACM Press\/Addison-Wesley Publishing Co. 21--28. 10.1145\/311535.311537"},{"key":"e_1_2_1_3_1","volume-title":"SIGGRAPH 97 Conference Proceedings. 353--360","author":"Bregler C.","unstructured":"Bregler , C. , Covell , M. , and Slaney , M . 1997. Video rewrite: Driving visual speech with audio . In SIGGRAPH 97 Conference Proceedings. 353--360 . 10.1145\/258734.258880 Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving visual speech with audio. In SIGGRAPH 97 Conference Proceedings. 353--360. 10.1145\/258734.258880"},{"key":"e_1_2_1_4_1","volume-title":"the International Symposium on Speech, Image Processing, and Neural Networkds.","author":"Brook N.","unstructured":"Brook , N. and Scott , S . 1994. Computer graphics animations of talking faces based on stochastic models . In the International Symposium on Speech, Image Processing, and Neural Networkds. Brook, N. and Scott, S. 1994. Computer graphics animations of talking faces based on stochastic models. In the International Symposium on Speech, Image Processing, and Neural Networkds."},{"key":"e_1_2_1_5_1","volume-title":"Radial Basis Functions : Theory and Implementations","author":"Buhmann M. D.","unstructured":"Buhmann , M. D. 2003. Radial Basis Functions : Theory and Implementations . Cambridge University Press , Cambridge, UK . Buhmann, M. D. 2003. Radial Basis Functions : Theory and Implementations. Cambridge University Press, Cambridge, UK."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009715923555"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of Eurographics\/ACM SIGGRAPH Symposium on Computer Animation. 225--231","author":"Cao Y.","unstructured":"Cao , Y. , Faloutsos , P. , and Pighin , F . 2003. Unsupervised learning for speech motion editing . In Proceedings of Eurographics\/ACM SIGGRAPH Symposium on Computer Animation. 225--231 . Cao, Y., Faloutsos, P., and Pighin, F. 2003. Unsupervised learning for speech motion editing. In Proceedings of Eurographics\/ACM SIGGRAPH Symposium on Computer Animation. 225--231."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of ACM SIGGRAPH 1994","author":"Cassell J.","year":"1921","unstructured":"Cassell , J. , Pelachaud , C. , Badler , N. , Steedman , M. , Achorn , B. , Becket , W. , Douville , B. , Prevost , S. , and Stone , M . 1994. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents . In Proceedings of ACM SIGGRAPH 1994 . 10.1145\/ 1921 61.192272 Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, W., Douville, B., Prevost, S., and Stone, M. 1994. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of ACM SIGGRAPH 1994. 10.1145\/192161.192272"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976601750399335"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of Pacific Graphics.","author":"Chuang E.","unstructured":"Chuang , E. , Deshpande , H. , and Bregler , C . 2002. Facial expression space learning . In Proceedings of Pacific Graphics. Chuang, E., Deshpande, H., and Bregler, C. 2002. Facial expression space learning. In Proceedings of Pacific Graphics."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Cohen N. and Massaro D. W. 1993. Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation N. M. Thalmann and D. Thalmann Eds. Springer--Verlang 139--156.  Cohen N. and Massaro D. W. 1993. Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation N. M. Thalmann and D. Thalmann Eds. Springer--Verlang 139--156.","DOI":"10.1007\/978-4-431-66911-1_13"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Ekman P. and Friesen W. 1978. Manual for Facial Action Coding System. Consulting Psychologists Press Inc. Palo Alto CA.  Ekman P. and Friesen W. 1978. Manual for Facial Action Coding System. Consulting Psychologists Press Inc. Palo Alto CA.","DOI":"10.1037\/t27734-000"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of ACM SIGGRPAH","author":"Ezzat T.","year":"2002","unstructured":"Ezzat , T. , Geiger , G. , and Poggio , T . 2002. Trainable videorealistic speech animation . In Proceedings of ACM SIGGRPAH 2002 . ACM Press, 388--398. 10.1145\/566570.566594 Ezzat, T., Geiger, G., and Poggio, T. 2002. Trainable videorealistic speech animation. In Proceedings of ACM SIGGRPAH 2002. ACM Press, 388--398. 10.1145\/566570.566594"},{"key":"e_1_2_1_15_1","volume-title":"Laboratory of Computer Information Science","author":"Fast ICA","unstructured":"Fast ICA . Helsinki University of Technology , Laboratory of Computer Information Science , Neural Networks Research Centre . Available at www.cis.hut.fi\/projects\/ica\/fastica\/. FastICA. Helsinki University of Technology, Laboratory of Computer Information Science, Neural Networks Research Centre. Available at www.cis.hut.fi\/projects\/ica\/fastica\/."},{"key":"e_1_2_1_16_1","unstructured":"House of Moves Inc. Diva software. Available at www.moves.com\/moveshack\/diva.htm.  House of Moves Inc. Diva software. Available at www.moves.com\/moveshack\/diva.htm."},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Hyvarinen A. Karhunen J. and Oja E. 2001. Independent Component Analysis. John Wiley & Sons.  Hyvarinen A. Karhunen J. and Oja E. 2001. Independent Component Analysis. John Wiley & Sons.","DOI":"10.1002\/0471221317"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(00)00026-5"},{"key":"e_1_2_1_19_1","unstructured":"International Computer Science Institute Berkeley CA. Rasta software. Available at www.icsi.berkeley.edu\/Speech\/rasta.html.  International Computer Science Institute Berkeley CA. Rasta software. Available at www.icsi.berkeley.edu\/Speech\/rasta.html."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 2003 ACM SIGGRAPH\/Eurographics Symposium on Computer Animation. Eurographics Association, 187--192","author":"Joshi P.","unstructured":"Joshi , P. , Tien , W. C. , Desbrun , M. , and Pighin , F . 2003. Learning controls for blend shape based realistic facial animation . In Proceedings of the 2003 ACM SIGGRAPH\/Eurographics Symposium on Computer Animation. Eurographics Association, 187--192 . Joshi, P., Tien, W. C., Desbrun, M., and Pighin, F. 2003. Learning controls for blend shape based realistic facial animation. In Proceedings of the 2003 ACM SIGGRAPH\/Eurographics Symposium on Computer Animation. Eurographics Association, 187--192."},{"key":"e_1_2_1_21_1","unstructured":"Kalberer G. A. Mueller P. and Gool L. V. 2002. Speech animation using viseme space. In Vision Modeling and Visualization VMV 2002. Akademische Verlagsgesellschaft Aka GmbH Berlin Germany. 463--470.  Kalberer G. A. Mueller P. and Gool L. V. 2002. Speech animation using viseme space. In Vision Modeling and Visualization VMV 2002. Akademische Verlagsgesellschaft Aka GmbH Berlin Germany. 463--470."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of ACM SIGGRAPH","author":"Kovar L.","year":"2002","unstructured":"Kovar , L. , Gleicher , M. , and Pighin , F . 2002. Motion graphs . In Proceedings of ACM SIGGRAPH 2002 . ACM Press, 473--482. 10.1145\/566570.566605 Kovar, L., Gleicher, M., and Pighin, F. 2002. Motion graphs. In Proceedings of ACM SIGGRAPH 2002. ACM Press, 473--482. 10.1145\/566570.566605"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of Eurographics","author":"Kshirsagar S.","year":"2003","unstructured":"Kshirsagar , S. and Magnenat-Thalmann , N . 2003. Visyllable based speech animation . In Proceedings of Eurographics 2003 . Kshirsagar, S. and Magnenat-Thalmann, N. 2003. Visyllable based speech animation. In Proceedings of Eurographics 2003."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press, 491--500","author":"Lee J.","unstructured":"Lee , J. , Chai , J. , Reitsma , P. S. A. , Hodgins , J. K. , and Pollard , N. S . 2002. Interactive control of avatars animated with human motion data . In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press, 491--500 . 10.1145\/566570.566607 Lee, J., Chai, J., Reitsma, P. S. A., Hodgins, J. K., and Pollard, N. S. 2002. Interactive control of avatars animated with human motion data. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press, 491--500. 10.1145\/566570.566607"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of ACM SIGGRAPH","author":"Lee S. P.","year":"2003","unstructured":"Lee , S. P. , Badler , J. B. , and Badler , N. I . 2003. Eyes alive . In Proceedings of ACM SIGGRAPH 2003 . ACM Press, 637--644. 10.1145\/566570.566629 Lee, S. P., Badler, J. B., and Badler, N. I. 2003. Eyes alive. In Proceedings of ACM SIGGRAPH 2003. ACM Press, 637--644. 10.1145\/566570.566629"},{"key":"e_1_2_1_26_1","volume-title":"SIGGRAPH 95 Conference Proceedings. ACM SIGGRAPH, 55--62","author":"Lee Y.","year":"1838","unstructured":"Lee , Y. , Terzopoulos , D. , and Waters , K . 1995. Realistic modeling for facial animation . In SIGGRAPH 95 Conference Proceedings. ACM SIGGRAPH, 55--62 . 10.1145\/2 1838 0.218407 Lee, Y., Terzopoulos, D., and Waters, K. 1995. Realistic modeling for facial animation. In SIGGRAPH 95 Conference Proceedings. ACM SIGGRAPH, 55--62. 10.1145\/218380.218407"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1002\/vis.4340020404","article-title":"Autmated lip-sync: Background and techniques","volume":"2","author":"Lewis J.","year":"1991","unstructured":"Lewis , J. 1991 . Autmated lip-sync: Background and techniques . J. Visualiz. Comput. Animat. 2 , 118 -- 122 . Lewis, J. 1991. Autmated lip-sync: Background and techniques. J. Visualiz. Comput. Animat. 2, 118--122.","journal-title":"J. Visualiz. Comput. Animat."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/566570.566604"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. 390--395","author":"Lien J.","unstructured":"Lien , J. , Cohn , J. , Kanade , T. , and Li , C . 1998. Automatic facial expression recognition based on FACS action units . In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. 390--395 . Lien, J., Cohn, J., Kanade, T., and Li, C. 1998. Automatic facial expression recognition based on FACS action units. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. 390--395."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98)","author":"Masuko T.","year":"1998","unstructured":"Masuko , T. , Kobayashi , T. , Tamura , M. , Masubuchi , J. , and Tokuda ., K. 1998 . Text-to-visual speech synthesis based on parameter generation from hmm . In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98) . Masuko, T., Kobayashi, T., Tamura, M., Masubuchi, J., and Tokuda., K. 1998. Text-to-visual speech synthesis based on parameter generation from hmm. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98)."},{"key":"e_1_2_1_31_1","volume-title":"ACM Symposium on Virtual Realisty Software and Technology. 166--174","author":"Noh J.","unstructured":"Noh , J. , Fidaleo , D. , and Neumann , U . 2000. Animated deformations with radial basis functions . In ACM Symposium on Virtual Realisty Software and Technology. 166--174 . 10.1145\/502390.502422 Noh, J., Fidaleo, D., and Neumann, U. 2000. Animated deformations with radial basis functions. In ACM Symposium on Virtual Realisty Software and Technology. 166--174. 10.1145\/502390.502422"},{"key":"e_1_2_1_32_1","first-page":"1","article-title":"A model for human faces that allows speech synchronized animation","volume":"1","author":"Parke F.","year":"1975","unstructured":"Parke , F. 1975 . A model for human faces that allows speech synchronized animation . J. Comput. Graph. 1 , 1, 1 -- 4 . Parke, F. 1975. A model for human faces that allows speech synchronized animation. J. Comput. Graph. 1, 1, 1--4.","journal-title":"J. Comput. Graph."},{"key":"e_1_2_1_34_1","volume-title":"SIGGRAPH 98 Conference Proceedings. ACM SIGGRAPH, 75--84","author":"Pighin F.","unstructured":"Pighin , F. , Hecker , J. , Lischinski , D. , Szeliski , R. , and Salesin , D . 1998. Synthesizing realistic facial expressions from photographs . In SIGGRAPH 98 Conference Proceedings. ACM SIGGRAPH, 75--84 . 10.1145\/280814.280825 Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. 1998. Synthesizing realistic facial expressions from photographs. In SIGGRAPH 98 Conference Proceedings. ACM SIGGRAPH, 75--84. 10.1145\/280814.280825"},{"key":"e_1_2_1_35_1","volume-title":"Numerical Recipes in C: The Art of Scientific Computing","author":"Press W. H.","unstructured":"Press , W. H. , Flannery , B. P. , Teukolsky , S. A. , and Vetterling , W. T . Numerical Recipes in C: The Art of Scientific Computing , 2 nd ed. Cambridge University Press , Cambridge, UK . Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press, Cambridge, UK.","edition":"2"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the ACM SIGGRAPH\/Eurographics Symposium on Computer Animation. Eurographics Association, 167--176","author":"Pyun H.","unstructured":"Pyun , H. , Kim , Y. , Chae , W. , Kang , H. W. , and Shin , S. Y . 2003. An example-based approach for facial expression cloning . In Proceedings of the ACM SIGGRAPH\/Eurographics Symposium on Computer Animation. Eurographics Association, 167--176 . Pyun, H., Kim, Y., Chae, W., Kang, H. W., and Shin, S. Y. 2003. An example-based approach for facial expression cloning. In Proceedings of the ACM SIGGRAPH\/Eurographics Symposium on Computer Animation. Eurographics Association, 167--176."},{"key":"e_1_2_1_37_1","volume-title":"Data Structures, Algorithms, and Applications in C&plus;&plus;","author":"Sahni S.","unstructured":"Sahni , S. 1999. Data Structures, Algorithms, and Applications in C&plus;&plus; . McGraw-Hill Publishing Co. , New York, NY . Sahni, S. 1999. Data Structures, Algorithms, and Applications in C&plus;&plus;. McGraw-Hill Publishing Co., New York, NY."},{"key":"e_1_2_1_38_1","volume-title":"European Conference on Computer Vision. 456--467","author":"Saisan P.","unstructured":"Saisan , P. , Bissacco , A. , Chiuso , A. , and Soatto , S . 2004. Modeling and synthesis of facial motion driven by speech . In European Conference on Computer Vision. 456--467 . Saisan, P., Bissacco, A., Chiuso, A., and Soatto, S. 2004. Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision. 456--467."},{"key":"e_1_2_1_39_1","unstructured":"Sankoff D. and Kruskal J. B. 1983. Time Warps String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. CSLI Publications Stanford University Stanford CA.  Sankoff D. and Kruskal J. B. 1983. Time Warps String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. CSLI Publications Stanford University Stanford CA."},{"key":"e_1_2_1_40_1","unstructured":"Speech Group Carnegie Mellon University. Festival software. Available at www.speech.cs.cmu.edu\/festival.  Speech Group Carnegie Mellon University. Festival software. Available at www.speech.cs.cmu.edu\/festival."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976600300015349"},{"key":"e_1_2_1_42_1","volume-title":"Conference on Computer Vision and Pattern Recognition.","author":"Vasilescu M. A. O.","unstructured":"Vasilescu , M. A. O. and Terzopoulos , D . 2003. Multilinear subspace analysis of image ensembles . In Conference on Computer Vision and Pattern Recognition. Vasilescu, M. A. O. and Terzopoulos, D. 2003. Multilinear subspace analysis of image ensembles. In Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_43_1","volume-title":"SIGGRAPH 87 Conference Proceedings. ACM SIGGRAPH","volume":"21","author":"Waters K.","year":"1987","unstructured":"Waters , K. 1987 . A muscle model for animating three-dimensional facial expression . In SIGGRAPH 87 Conference Proceedings. ACM SIGGRAPH , Vol. 21 . 17--24. 10.1145\/37401.37405 Waters, K. 1987. A muscle model for animating three-dimensional facial expression. In SIGGRAPH 87 Conference Proceedings. ACM SIGGRAPH, Vol. 21. 17--24. 10.1145\/37401.37405"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of Neurol Information Processing System (NIPS'03)","author":"Wu T.","unstructured":"Wu , T. , Lin , C.-J. , and Weng , R. C . 2003. Probability estimates for multi-class classification by pairwise coupling . In Proceedings of Neurol Information Processing System (NIPS'03) . Wu, T., Lin, C.-J., and Weng, R. C. 2003. Probability estimates for multi-class classification by pairwise coupling. In Proceedings of Neurol Information Processing System (NIPS'03)."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1095878.1095881","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1095878.1095881","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:08:28Z","timestamp":1750262908000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1095878.1095881"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,10]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2005,10]]}},"alternative-id":["10.1145\/1095878.1095881"],"URL":"https:\/\/doi.org\/10.1145\/1095878.1095881","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,10]]},"assertion":[{"value":"2005-10-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}