{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:42:37Z","timestamp":1773801757813,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>The introduction of diffusion models has brought significant advances to the field of audio-driven talking head generation. However, the extremely slow inference speed severely limits the practical implementation of diffusion-based talking head generation models. In this study, we propose READ, a real-time diffusion-transformer-based talking head generation framework. Our approach first learns a spatiotemporal highly compressed video latent space via a temporal VAE, significantly reducing the token count to accelerate generation. To achieve better audio-visual alignment within this compressed latent space, a pre-trained Speech Autoencoder (SpeechAE) is proposed to generate temporally compressed speech latent codes corresponding to the video latent space. These latent representations are then modeled by a carefully designed Audio-to-Video Diffusion Transformer (A2V-DiT) backbone for efficient talking head synthesis. Furthermore, to ensure temporal consistency and accelerated inference in extended generation, we propose a novel asynchronous noise scheduler (ANS) for both the training and inference processes of our framework. The ANS leverages asynchronous add-noise and asynchronous motion-guided generation in the latent space, ensuring consistency in generated video clips. Experimental results demonstrate that READ outperforms state-of-the-art methods by generating competitive talking head videos with significantly reduced runtime, achieving an optimal balance between quality and speed while maintaining robust metric stability in long-time generation.<\/jats:p>","DOI":"10.1609\/aaai.v40i12.37940","type":"journal-article","created":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:56:38Z","timestamp":1773791798000},"page":"9766-9774","source":"Crossref","is-referenced-by-count":0,"title":["READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation"],"prefix":"10.1609","volume":"40","author":[{"given":"Haotian","family":"Wang","sequence":"first","affiliation":[]},{"given":"Yuzhe","family":"Weng","sequence":"additional","affiliation":[]},{"given":"Jun","family":"Du","sequence":"additional","affiliation":[]},{"given":"Haoran","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Xiaoyan","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Shan","family":"He","sequence":"additional","affiliation":[]},{"given":"Bing","family":"Yin","sequence":"additional","affiliation":[]},{"given":"Cong","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Jianqing","family":"Gao","sequence":"additional","affiliation":[]},{"given":"Qingfeng","family":"Liu","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/37940\/41902","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/37940\/41902","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:56:38Z","timestamp":1773791798000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/37940"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i12.37940","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}