{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:21:59Z","timestamp":1773246119233,"version":"3.50.1"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:p>Long video understanding with Large Language Models (LLMs) enables the description of objects that are not explicitly present in the training data. However, continuous changes in known objects and the emergence of new ones require up-to-date knowledge of objects and their dynamics for effective understanding of the open world. To alleviate this, we propose an efficient Retrieval-Enhanced Video Understanding method, dubbed REVU, which leverages external knowledge to enhance the performance of open-world learning. First, REVU introduces an extensible external text-object memory with minimal text-visual mapping, involving static and dynamic multimodal information to help LLMs-based models align text and vision features. Second, REVU retrieves object information from external databases and dynamically integrates frame-specific data from videos, enabling effective knowledge aggregation to comprehend the open world. We conducted experiments on multiple benchmark datasets, and our model demonstrates strong adaptability to out-of-domain data without requiring additional fine-tuning or re-training. Experiments on benchmark video understanding datasets reveal that our model achieves state-of-the-art performance and robust generalization.<\/jats:p>","DOI":"10.24963\/ijcai.2025\/97","type":"proceedings-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:10:40Z","timestamp":1758269440000},"page":"864-872","source":"Crossref","is-referenced-by-count":3,"title":["External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding"],"prefix":"10.24963","author":[{"given":"Jisheng","family":"Dang","sequence":"first","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University, China"},{"name":"School of Computer Science and Engineering, Sun Yat-sen University, China"},{"name":"School of Computing, National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huicheng","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Sun Yat-sen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xudong","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Technology, Sun Yat-sen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingmei","family":"Jiao","sequence":"additional","affiliation":[{"name":"School of Electronic Information Engineering, Lanzhou Jiaotong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bimei","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Cyber Security, Jinan University, China"},{"name":"School of Computing, National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Electronic Information Engineering, Lanzhou Jiaotong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bin","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianhuang","family":"Lai","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Sun Yat-sen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tat Seng","family":"Chua","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Thirty-Fourth International Joint Conference on Artificial Intelligence {IJCAI-25}","theme":"Artificial Intelligence","location":"Montreal, Canada","acronym":"IJCAI-2025","number":"34","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2025,8,16]]},"end":{"date-parts":[[2025,8,22]]}},"container-title":["Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:32:58Z","timestamp":1758627178000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2025\/97"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2025\/97","relation":{},"subject":[],"published":{"date-parts":[[2025,9]]}}}