{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:48:16Z","timestamp":1773802096570,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"14","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>This paper focuses on the task of speech-driven 3D facial animation, which aims to generate realistic and synchronized facial motions driven by speech inputs. Recent methods have employed audio-conditioned diffusion models for 3D facial animation, achieving impressive results in generating expressive and natural animations. However, these methods process the whole audio sequences in a single pass, which poses two major challenges: they tend to perform poorly when handling audio sequences that exceed the training horizon and will suffer from significant latency when processing long audio inputs. To address these limitations, we propose a novel autoregressive diffusion model that outputs facial motions in a streaming manner. This design ensures flexibility with varying audio lengths and achieves low latency independent of audio duration. Specifically, we select a limited number of past frames as historical motion context and combine them with the audio input to create a dynamic condition. This condition guides a lightweight diffusion head to iteratively generate facial motion frames, enabling real-time synthesis with high-quality results. Experiments conducted on public datasets demonstrate that our approach outperforms recent baseline methods.<\/jats:p>","DOI":"10.1609\/aaai.v40i14.38162","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:12:29Z","timestamp":1773792749000},"page":"11766-11774","source":"Crossref","is-referenced-by-count":0,"title":["StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model"],"prefix":"10.1609","volume":"40","author":[{"given":"Yifan","family":"Yang","sequence":"first","affiliation":[]},{"given":"Zhi","family":"Cen","sequence":"additional","affiliation":[]},{"given":"Sida","family":"Peng","sequence":"additional","affiliation":[]},{"given":"Xiangwei","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Yifu","family":"Deng","sequence":"additional","affiliation":[]},{"given":"Xinyu","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Fan","family":"Jia","sequence":"additional","affiliation":[]},{"given":"Xiaowei","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Hujun","family":"Bao","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38162\/42124","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38162\/42124","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:12:29Z","timestamp":1773792749000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38162"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i14.38162","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}