{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:36:31Z","timestamp":1773801391564,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Video question answering (VideoQA), whose goal is to produce answers through the integration of linguistic and visual understanding, has emerged as a significant research focus. Although Large Multimodal Models (LMMs) and autonomous agent methods have achieved notable advances in VideoQA, excessive computational overhead and restricted multimodal interaction capabilities limit their ability to facilitate the continuous evolution of the VideoQA system. To address the challenge, we introduce DigimonGPT, an evolvable VideoQA agent inspired by cognitive psychology. Specifically, DigimonGPT integrates a multimodal memory mechanism to achieve the continuous evolution of VideoQA systems.  An intra-video declarative memory contains fundamental features of the video and semantic contexts extracted from historical QA pairs. Another inter-task procedural memory encodes task-solving experience for further question answering. Additionally, we introduce a hierarchical memory replay mechanism for VideoQA that selects appropriate memories by their relevance and question complexity. Extensive experiments demonstrate that DigimonGPT's accuracy averagely outperforms 13.71% on NExT-QA datasets and 9.89% on Intent-QA datasets over LMM and autonomous agents.<\/jats:p>","DOI":"10.1609\/aaai.v40i8.37523","type":"journal-article","created":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:28:35Z","timestamp":1773790115000},"page":"6001-6009","source":"Crossref","is-referenced-by-count":0,"title":["DigimonGPT: An Evolvable Agent with Hierarchical Human-like Memory for Video Question Answering"],"prefix":"10.1609","volume":"40","author":[{"given":"Borui","family":"Li","sequence":"first","affiliation":[]},{"given":"Xingcai","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Tianen","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Shuai","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Yun","family":"Cheng","sequence":"additional","affiliation":[]},{"given":"Shuai","family":"Wang","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/37523\/41485","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/37523\/41485","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:28:35Z","timestamp":1773790115000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/37523"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i8.37523","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}