{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T08:56:57Z","timestamp":1765357017823,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,12,6]],"date-time":"2021-12-06T00:00:00Z","timestamp":1638748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Engineering and Physical Research Council","award":["EP\/S001816\/1"],"award-info":[{"award-number":["EP\/S001816\/1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,12,6]]},"DOI":"10.1145\/3485441.3485647","type":"proceedings-article","created":{"date-parts":[[2021,10,23]],"date-time":"2021-10-23T22:04:00Z","timestamp":1635026640000},"page":"1-9","source":"Crossref","is-referenced-by-count":12,"title":["Speech-Driven Conversational Agents using Conditional Flow-VAEs"],"prefix":"10.1145","author":[{"given":"Sarah","family":"Taylor","sequence":"first","affiliation":[{"name":"University of East Anglia, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan","family":"Windle","sequence":"additional","affiliation":[{"name":"University of East Anglia, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Greenwood","sequence":"additional","affiliation":[{"name":"University of East Anglia, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Iain","family":"Matthews","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, US"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,12,6]]},"reference":[{"volume-title":"Style Transfer for Co-speech Gesture Animation: A Multi-speaker Conditional-Mixture Approach. In European Conference on Computer Vision (ECCV). Cham, 248\u2013265","year":"2020","author":"Ahuja Chaitanya","key":"e_1_3_2_2_1_1"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340555.3353725"},{"volume-title":"The StyleGestures entry to the GENEA Challenge","year":"2020","author":"Alexanderson Simon","key":"e_1_3_2_2_3_1"},{"volume-title":"Style Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. Computer Graphics Forum 39, 2 (05","year":"2020","author":"Alexanderson Simon","key":"e_1_3_2_2_4_1"},{"volume-title":"Conditional Flow Variational Autoencoders for Structured Sequence Prediction. In Bayesian Deep Learning NeurIPS 2019 Workshop.","year":"2019","author":"Bhattacharyya Apratim","key":"e_1_3_2_2_5_1"},{"volume-title":"Speech, and Computational Stages: A Reply to McNeill. Psychological review 96 (02","year":"1989","author":"Butterworth Brian","key":"e_1_3_2_2_6_1"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2929257"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Justine Cassell Catherine Pelachaud Norman Badler Mark Steedman Brett Achorn Tripp Becket Brett Douville Scott Prevost and Matthew Stone. 1994. Animated conversation: rule-based generation of facial expression gesture & spoken intonation for multiple conversational agents. In Computer Graphics and Interactive Techniques. 413\u2013420.  Justine Cassell Catherine Pelachaud Norman Badler Mark Steedman Brett Achorn Tripp Becket Brett Douville Scott Prevost and Matthew Stone. 1994. Animated conversation: rule-based generation of facial expression gesture & spoken intonation for multiple conversational agents. In Computer Graphics and Interactive Techniques. 413\u2013420.","DOI":"10.1145\/192161.192272"},{"volume-title":"Life-Like Characters","author":"Cassell Justine","key":"e_1_3_2_2_9_1"},{"volume-title":"International Conference on Autonomous Agents and Multi-agent Systems. 781\u2013788","year":"2014","author":"Chiu Chung-Cheng","key":"e_1_3_2_2_10_1"},{"volume-title":"Inference Suboptimality in Variational Autoencoders. In International Conference on Machine Learning (ICML). 1078\u20131086","year":"2018","author":"Cremer Chris","key":"e_1_3_2_2_11_1"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1756-8765.2012.01183.x"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778828"},{"volume-title":"IVA: Investigating the use of recurrent motion modelling for speech gesture generation. In Intelligent Virtual Agents (IVA). https:\/\/trinityspeechgesture.scss.tcd.ie","year":"2018","author":"Ferstl Ylva","key":"e_1_3_2_2_14_1"},{"key":"e_1_3_2_2_15_1","first-page":"1","article-title":"Multi-objective adversarial gesture generation","author":"Ferstl Ylva","year":"2019","journal-title":"Motion, Interaction and Games."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00361"},{"volume-title":"Intelligent Virtual Agents (IVA)","author":"Greenwood David","key":"e_1_3_2_2_17_1"},{"volume-title":"ICLR Workshop on Deep Generative Models for Highly Structured Data","year":"2019","author":"Gritsenko A","key":"e_1_3_2_2_18_1"},{"volume-title":"International Gesture Workshop. Springer, 188\u2013199","year":"2005","author":"Hartmann Bj\u00f6rn","key":"e_1_3_2_2_19_1"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3414685.3417836"},{"volume-title":"International Conference on Machine Learning (ICML). 1510\u20131519","year":"2017","author":"Hoffman D","key":"e_1_3_2_2_21_1"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Patrik Jonell Taras Kucherenko Gustav\u00a0Eje Henter and Jonas Beskow. 2020. Let\u2019s face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings. In Intelligent Virtual Agents (IVA). 1\u20138.  Patrik Jonell Taras Kucherenko Gustav\u00a0Eje Henter and Jonas Beskow. 2020. Let\u2019s face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings. In Intelligent Virtual Agents (IVA). 1\u20138.","DOI":"10.1145\/3383652.3423911"},{"volume-title":"Some relationships between body motion and speech. Studies in dyadic communication 7, 177","year":"1972","author":"Kendon Adam","key":"e_1_3_2_2_23_1"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15327973rlsi2703_2"},{"volume-title":"Glow: Generative flow with invertible 1 \u00d7 1 convolutions.","year":"2018","author":"Kingma P","key":"e_1_3_2_2_25_1"},{"volume-title":"The FineMotion entry to the GENEA Challenge","year":"2020","author":"Korzun Vladislav","key":"e_1_3_2_2_26_1"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Taras Kucherenko Dai Hasegawa Gustav\u00a0Eje Henter Naoshi Kaneko and Hedvig Kjellstr\u00f6m. 2019. Analyzing input and output representations for speech-driven gesture generation. In Intelligent Virtual Agents (IVA). 97\u2013104.  Taras Kucherenko Dai Hasegawa Gustav\u00a0Eje Henter Naoshi Kaneko and Hedvig Kjellstr\u00f6m. 2019. Analyzing input and output representations for speech-driven gesture generation. In Intelligent Virtual Agents (IVA). 97\u2013104.","DOI":"10.1145\/3308532.3329472"},{"key":"e_1_3_2_2_28_1","unstructured":"Taras Kucherenko Patrik Jonell Youngwoo Yoon Pieter Wolfert and Gustav\u00a0Eje Henter. 2020. The GENEA Challenge 2020: Benchmarking gesture-generation systems on common data. https:\/\/doi.org\/10.5281\/zenodo.4094697  Taras Kucherenko Patrik Jonell Youngwoo Yoon Pieter Wolfert and Gustav\u00a0Eje Henter. 2020. The GENEA Challenge 2020: Benchmarking gesture-generation systems on common data. https:\/\/doi.org\/10.5281\/zenodo.4094697"},{"volume-title":"ACM Transactions on Graphics (TOG) 29 (07","year":"2010","author":"Levine Sergey","key":"e_1_3_2_2_29_1"},{"volume-title":"ACM SIGGRAPH (Yokohama, Japan) (SIGGRAPH Asia \u201909)","year":"1851","author":"Levine Sergey","key":"e_1_3_2_2_30_1"},{"key":"e_1_3_2_2_31_1","unstructured":"Xugang Lu Yu Tsao Shigeki Matsuda and Chiori Hori. 2013. Speech enhancement based on deep denoising autoencoder.. In Interspeech. 436\u2013440.  Xugang Lu Yu Tsao Shigeki Matsuda and Chiori Hori. 2013. Speech enhancement based on deep denoising autoencoder.. In Interspeech. 436\u2013440."},{"volume-title":"Intelligent Virtual Agents (IVA)","author":"Maatman R\u00a0M","key":"e_1_3_2_2_32_1"},{"volume-title":"Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings. In International Conference on Learning Representations (ICLR).","year":"2020","author":"Mahajan Shweta","key":"e_1_3_2_2_33_1"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485895.2485900"},{"volume-title":"So You Think Gestures are Nonverbal?Psychological Review 92 (07","year":"1985","author":"Mcneill David","key":"e_1_3_2_2_35_1"},{"volume-title":"Hand and mind: What gestures reveal about thought","author":"McNeill David","key":"e_1_3_2_2_36_1","doi-asserted-by":"crossref","DOI":"10.1515\/9783110874259.351"},{"volume-title":"Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks. In International Conference on Machine Learning (ICML). PMLR, 2391\u20132400","year":"2017","author":"Mescheder Lars","key":"e_1_3_2_2_37_1"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1330511.1330516"},{"volume-title":"CGVU: Semantics-guided 3D Body Gesture Synthesis. https:\/\/doi.org\/10.5281\/zenodo.4090879","year":"2020","author":"Pang Kunkun","key":"e_1_3_2_2_39_1"},{"volume-title":"International Conference on Machine Learning (ICML), Vol.\u00a037","year":"2015","author":"Rezende Danilo\u00a0Jimenez","key":"e_1_3_2_2_40_1"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02289451"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00790"},{"volume-title":"Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems 33","year":"2020","author":"Sitzmann Vincent","key":"e_1_3_2_2_43_1"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2013.09.008"},{"volume-title":"Computer Graphics Forum, Vol.\u00a039","author":"Yang Yanzhe","key":"e_1_3_2_2_45_1"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3414685.3417838"},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793720"},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00589"},{"volume-title":"International Conference on Machine Learning (ICML). PMLR, 7673\u20137682","year":"2019","author":"Ziegler Zachary","key":"e_1_3_2_2_49_1"}],"event":{"name":"CVMP '21: European Conference on Visual Media Production","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"],"location":"London United Kingdom","acronym":"CVMP '21"},"container-title":["European Conference on Visual Media Production"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485441.3485647","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3485441.3485647","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:08Z","timestamp":1750191128000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485441.3485647"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,6]]},"references-count":49,"alternative-id":["10.1145\/3485441.3485647","10.1145\/3485441"],"URL":"https:\/\/doi.org\/10.1145\/3485441.3485647","relation":{},"subject":[],"published":{"date-parts":[[2021,12,6]]}}}