{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:49:23Z","timestamp":1773802163214,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>DiT models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale. High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs). Realistic and visually appealing details are typically reflected in high-resolution outputs, further amplifying computational demands\u2014especially for single-stage DiT models. To address these challenges, we propose a novel two-stage framework, FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality. In the first stage, prompt fidelity is prioritized through a low-resolution generation process utilizing large parameters and sufficient NFEs to enhance computational efficiency. \nThe second stage achieves a nearly straight ODE trajectory between low and high resolutions via flow matching, effectively generating fine details and fixing artifacts with minimal NFEs. To ensure a seamless connection between the two independently trained stages during inference, we carefully design degradation strategies during the second-stage training. Quantitative and visual results demonstrate that FlashVideo achieves state-of-the-art high-resolution video generation with superior computational efficiency. Additionally, the two-stage design enables users to preview the initial output and accordingly adjust the prompt before committing to full-resolution generation, thereby significantly reducing computational costs and wait times as well as enhancing commercial viability.<\/jats:p>","DOI":"10.1609\/aaai.v40i15.38270","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:21:05Z","timestamp":1773793265000},"page":"12735-12743","source":"Crossref","is-referenced-by-count":0,"title":["FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation"],"prefix":"10.1609","volume":"40","author":[{"given":"Shilong","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Wenbo","family":"Li","sequence":"additional","affiliation":[]},{"given":"Shoufa","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Chongjian","family":"GE","sequence":"additional","affiliation":[]},{"given":"Peize","family":"Sun","sequence":"additional","affiliation":[]},{"given":"Yifu","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yi","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Zehuan","family":"Yuan","sequence":"additional","affiliation":[]},{"given":"Bingyue","family":"Peng","sequence":"additional","affiliation":[]},{"given":"Ping","family":"Luo","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38270\/42232","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38270\/42232","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:21:05Z","timestamp":1773793265000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38270"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i15.38270","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}