{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:00:26Z","timestamp":1773802826467,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"22","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Recent vision-language-action (VLA) models built on pretrained vision-language models (VLMs) have demonstrated strong performance in robotic manipulation. However, these models remain constrained by the single-frame image paradigm and fail to fully leverage the temporal information offered by multi-frame histories, as directly feeding multiple frames into VLM backbones incurs substantial computational overhead and inference latency. We propose CronusVLA, a unified framework that extends single-frame VLA models to the multi-frame paradigm. CronusVLA follows a two-stage process: (1) Single-frame pretraining on large-scale embodied datasets with autoregressive prediction of action tokens, establishing an effective embodied vision-language foundation; (2) Multi-frame post-training, which adapts the prediction of the vision-language backbone from discrete tokens to learnable features, and aggregates historical information via feature chunking. CronusVLA effectively addresses the existing challenges of multi-frame modeling while enhancing performance. To evaluate the robustness under temporal and spatial disturbances, we introduce SimplerEnv-OR, a novel benchmark featuring 24 types of observational disturbances and 120 severity levels. Experiments across three embodiments in simulated and real-world environments demonstrate that CronusVLA achieves leading performance and superior robustness, with a 70.9% success rate on SimplerEnv, a 26.8% improvement over OpenVLA on LIBERO, and the highest robustness score on SimplerEnv-OR, showing the promise of efficient multi-frame adaptation for real-world VLA deployment.<\/jats:p>","DOI":"10.1609\/aaai.v40i22.38903","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:03:47Z","timestamp":1773795827000},"page":"18388-18396","source":"Crossref","is-referenced-by-count":0,"title":["Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling"],"prefix":"10.1609","volume":"40","author":[{"given":"Hao","family":"Li","sequence":"first","affiliation":[]},{"given":"Shuai","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Yilun","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Xinyi","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Xiaoda","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Yang","family":"Tian","sequence":"additional","affiliation":[]},{"given":"Hanqing","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Tai","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Dahua","family":"Lin","sequence":"additional","affiliation":[]},{"given":"Feng","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Jiangmiao","family":"Pang","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38903\/42865","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38903\/42865","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:03:48Z","timestamp":1773795828000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38903"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i22.38903","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}