{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:42:25Z","timestamp":1773801745446,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Bokeh is used in photography to emphasize the selected subject by smoothly blurring the out-of-focus region with appealing highlights. While recent advances have achieved impressive results in rendering realistic blur, existing frameworks typically rely on disparity maps and bokeh-relevant inputs (e.g., focal distance and blur size), and face significant challenges in video bokeh rendering due to limited temporal consistency. In this paper, we propose BokehCrafter, the first video diffusion framework that generates temporally coherent and visually pleasing bokeh effects from all-in-focus video inputs under user-friendly input conditions. Specifically, we leverage a dual-stream attention mechanism, integrating a reference image branch and a rendering instruction branch. We propose a Bokeh Image Extraction (BIE) module and a CLIP-based text encoder to extract image and text features, respectively, whose outputs are fused via a Text-Image Fusion (TIF) module to enable fine-grained and controllable bokeh rendering. To support the novel capabilities of our model, we construct Video Bokeh Scenes (VBS), a large-scale dataset containing a wide variety of bokeh videos with corresponding rendering instructions, across various scenes and rendering settings. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art methods in both bokeh rendering quality and temporal consistency.<\/jats:p>","DOI":"10.1609\/aaai.v40i12.37969","type":"journal-article","created":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:55:12Z","timestamp":1773791712000},"page":"10029-10037","source":"Crossref","is-referenced-by-count":0,"title":["BokehCrafter: Taming Video Diffusion Models for Controllable Bokeh Rendering"],"prefix":"10.1609","volume":"40","author":[{"given":"Qiwen","family":"Wang","sequence":"first","affiliation":[]},{"given":"Liao","family":"Shen","sequence":"additional","affiliation":[]},{"given":"Jiaqi","family":"Li","sequence":"additional","affiliation":[]},{"given":"Tianqi","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Huiqiang","family":"Sun","sequence":"additional","affiliation":[]},{"given":"Zihao","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Yachuan","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Xianrui","family":"Luo","sequence":"additional","affiliation":[]},{"given":"Zhiguo","family":"Cao","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/37969\/41931","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/37969\/41931","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:55:12Z","timestamp":1773791712000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/37969"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i12.37969","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}