{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T22:45:03Z","timestamp":1776465903613,"version":"3.51.2"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2004,8,1]],"date-time":"2004-08-01T00:00:00Z","timestamp":1091318400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2004,8]]},"abstract":"<jats:p>We describe a method for using a database of recorded speech and captured motion to create an animated conversational character. People's utterances are composed of short, clearly-delimited phrases; in each phrase, gesture and speech go together meaningfully and synchronize at a common point of maximum emphasis. We develop tools for collecting and managing performance data that exploit this structure. The tools help create scripts for performers, help annotate and segment performance data, and structure specific messages for characters to use within application contexts. Our animations then reproduce this structure. They recombine motion samples with new speech samples to recreate coherent phrases, and blend segments of speech and motion together phrase-by-phrase into extended utterances. By framing problems for utterance generation and synthesis so that they can draw closely on a talented performance, our techniques support the rapid construction of animated characters with rich and appropriate expression.<\/jats:p>","DOI":"10.1145\/1015706.1015753","type":"journal-article","created":{"date-parts":[[2004,10,7]],"date-time":"2004-10-07T17:38:56Z","timestamp":1097170736000},"page":"506-513","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":66,"title":["Speaking with hands"],"prefix":"10.1145","volume":"23","author":[{"given":"Matthew","family":"Stone","sequence":"first","affiliation":[{"name":"Rutgers"}]},{"given":"Doug","family":"DeCarlo","sequence":"additional","affiliation":[{"name":"Rutgers"}]},{"given":"Insuk","family":"Oh","sequence":"additional","affiliation":[{"name":"Rutgers"}]},{"given":"Christian","family":"Rodriguez","sequence":"additional","affiliation":[{"name":"Rutgers"}]},{"given":"Adrian","family":"Stere","sequence":"additional","affiliation":[{"name":"Rutgers"}]},{"given":"Alyssa","family":"Lees","sequence":"additional","affiliation":[{"name":"NYU"}]},{"given":"Chris","family":"Bregler","sequence":"additional","affiliation":[{"name":"NYU"}]}],"member":"320","published-online":{"date-parts":[[2004,8]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566606"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/882262.882284"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/990820.990827"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1177\/0261927X00019002001"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.21437\/ICSLP.2002-115"},{"key":"e_1_2_2_6_1","unstructured":"BESKOW J. 2003. Talking Heads: Models and Applications for Multimodal Speech Synthesis. PhD thesis KTH Stockholm."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.424924"},{"key":"e_1_2_2_8_1","unstructured":"BICKMORE T. W. 2003. Relational Agents: Effecting Change through Human-Computer Relationships. PhD thesis MIT."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.21236\/ADA461150"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","unstructured":"BRAND M. 1999. Voice puppetry. In SIGGRAPH 21--28. 10.1145\/311535.311537","DOI":"10.1145\/311535.311537"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0885-2308(02)00023-2"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","unstructured":"CASSELL J. PELACHAUD C. BADLER N. STEEDMAN M. ACHORN B. BECKET T. DOUVILLE B. PREVOST S. AND STONE M. 1994. Animated conversation: Rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In SIGGRAPH 413--420. 10.1145\/192161.192272","DOI":"10.1145\/192161.192272"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","unstructured":"CASSELL J. SULLIVAN J. PREVOST S. AND CHURCHILL E. Eds. 2000. Embodied Conversational Agents. MIT. 10.1145\/332051.332075","DOI":"10.1145\/332051.332075"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","unstructured":"CASSELL J. VILHJ\u00c1LMSSON H. AND BICKMORE T. 2001. BEAT: the behavioral expression animation toolkit. In SIGGRAPH 477--486. 10.1145\/383259.383315","DOI":"10.1145\/383259.383315"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/332051.332075"},{"key":"e_1_2_2_16_1","volume-title":"Proceedings of AVSP, 251--256","author":"CERRATO L.","year":"2003","unstructured":"CERRATO, L., AND SKHIRI, M. 2003. A method for the analysis and measurement of communicative head movements in human dialogues. In Proceedings of AVSP, 251--256."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","unstructured":"CHI D. COSTA M. ZHAO L. AND BADLER N. 2000. The EMOTE model for effort and shape. In SIGGRAPH 173--182. 10.1145\/344779.352172","DOI":"10.1145\/344779.352172"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/791218.791604"},{"key":"e_1_2_2_19_1","volume-title":"Human Ethology: Claims and Limits of a New Discipline: Contributions to the Colloquium","author":"EKMAN P.","unstructured":"EKMAN, P. 1979. About brows: Emotional and conversational signals. In Human Ethology: Claims and Limits of a New Discipline: Contributions to the Colloquium, M. von Cranach, K. Foppa, W. Lepenies, and D. Ploog, Eds. Cambridge University Press, Cambridge, 169--202."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566594"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","unstructured":"GLEICHER M. 1998. Retargeting motion to new characters. In SIGGRAPH 33--42. 10.1145\/280814.280820","DOI":"10.1145\/280814.280820"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/557657"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1996.541110"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/555733"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/882262.882283"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/1630659.1630837"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1002\/cav.v15:1"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/846276.846307"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566605"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/545261.545277"},{"key":"e_1_2_2_32_1","volume-title":"Proceedings of ICSLP.","author":"KRAHMER E.","year":"2002","unstructured":"KRAHMER, E., RUTTKAY, Z., SWERTS, M., AND WESSELINK, W. 2002. Audiovisual cues to prominence. In Proceedings of ICSLP."},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/974305.974328"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566607"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566629"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566604"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1075\/gest.1.1.03mcn"},{"key":"e_1_2_2_38_1","volume-title":"Hand and Mind: What Gestures Reveal about Thought","author":"MCNEILL D.","unstructured":"MCNEILL, D. 1992. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago."},{"key":"e_1_2_2_39_1","volume-title":"Proceedings of Int. Conf. on Natural Language Generation, 49--56","author":"PAN S.","year":"2002","unstructured":"PAN, S., AND WANG, W. 2002. Designing a speech corpus for instance-based spoken language generation. In Proceedings of Int. Conf. on Natural Language Generation, 49--56."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/545261.545279"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog2001_1"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","unstructured":"PERLIN K. AND GOLDBERG A. 1996. Improv: a system for interactive actors in virtual worlds. In SIGGRAPH 205--216. 10.1145\/237170.237258","DOI":"10.1145\/237170.237258"},{"key":"e_1_2_2_43_1","volume-title":"Intentions in Communication","author":"PIERREHUMBERT J.","unstructured":"PIERREHUMBERT, J., AND HIRSCHBERG, J. 1990. The meaning of intonational contours in the interpretation of discourse. In Intentions in Communication, P. Cohen, J. Morgan, and M. Pollack, Eds. MIT Press, Cambridge MA, 271--311."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","unstructured":"POPOVI\u0106 Z. AND WITKIN A. 1999. Physically based motion transformation. In SIGGRAPH 11--20. 10.1145\/311535.311536","DOI":"10.1145\/311535.311536"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566608"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/331955"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/882262.882304"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/38.708559"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2000.869666"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0885-2308(02)00011-6"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.21437\/ICSLP.1992-260"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1162\/002438900554505"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.5555\/791221.791866"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.5555\/973927.973930"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","unstructured":"WILLIAMS L. 1990. Performance-driven facial animation. In SIGGRAPH 235--242. 10.1145\/97879.97906","DOI":"10.1145\/97879.97906"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","unstructured":"WITKIN A. AND POPOVI\u0106 Z. 1995. Motion warping. In SIGGRAPH 105--108. 10.1145\/218380.218422","DOI":"10.1145\/218380.218422"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1015706.1015753","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1015706.1015753","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:26:23Z","timestamp":1750281983000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1015706.1015753"}},"subtitle":["creating animated conversational characters from recordings of human performance"],"short-title":[],"issued":{"date-parts":[[2004,8]]},"references-count":55,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2004,8]]}},"alternative-id":["10.1145\/1015706.1015753"],"URL":"https:\/\/doi.org\/10.1145\/1015706.1015753","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2004,8]]},"assertion":[{"value":"2004-08-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}