{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T23:37:17Z","timestamp":1761176237201,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>Audio-driven portrait animation has achieved significant advances propelled by the development of diffusion models. Despite remarkable improvements in driving capability and temporal consistency, diffusion model-based methods still suffer from audio-lip misalignment and facial detail loss. To address them, we present a novel stable diffusion-based approach by conditioning on aligned audio-lip features and 3D dense sequential geometry features. Specifically, we enhance phoneme-lip synchronization by coupling fine-grained local lip features with corresponding audio details with the designed Audio-Lip multi-head Cross-Attention module. To improve the facial local details, we derive 3D dense sequential geometry features from 3D dense geometric prior via the developed Mesh Spatio-Temporal Encoder. Extensive experiments on public benchmarks demonstrate that APasco achieves superior performance in both visual quality and lip-sync accuracy compared to existing approaches. Further supplementary material and project detail can be found at: https:\/\/github.com\/xiejinhan0428\/APasco.<\/jats:p>","DOI":"10.3233\/faia251193","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:54:11Z","timestamp":1761126851000},"source":"Crossref","is-referenced-by-count":0,"title":["APasco: High Fidelity Audio-Driven Portrait Animation Based on Audio-Lip Multi-Head Cross-Attention and 3D Dense Geometric Prior"],"prefix":"10.3233","author":[{"given":"Jinhan","family":"Xie","sequence":"first","affiliation":[{"name":"School of Electronic and Computer Engineering, Peking University"},{"name":"Pengcheng Laboratory"}]},{"given":"Kanglin","family":"Liu","sequence":"additional","affiliation":[{"name":"Pengcheng Laboratory"}]},{"given":"Zhenyu","family":"Bao","sequence":"additional","affiliation":[{"name":"School of Electronic and Computer Engineering, Peking University"},{"name":"Pengcheng Laboratory"}]},{"given":"Qing","family":"Li","sequence":"additional","affiliation":[{"name":"Pengcheng Laboratory"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA251193","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:54:12Z","timestamp":1761126852000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA251193"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia251193","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}