{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:48:31Z","timestamp":1773802111468,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Embodied navigation is a fundamental capability that enables embodied agents to effectively interact with the physical world in various complex environments. However, a significant gap remains between current embodied navigation tasks and real-world requirements, as existing methods often struggle to integrate high-level human instructions with spatial understanding. To address this gap, we propose a new task of embodied navigation called spatial navigation, which encompasses two key components: spatial object navigation (SpON) for object-specific guidance and spatial area navigation (SpAN) for navigating to designated areas. Specifically, SpON guides agents to specific objects by leveraging spatial relationships and contextual understanding, while SpAN focuses on navigating to defined areas within complex environments. Together, these components significantly enhance agents\u2019 navigation capabilities, enabling more effective interactions in real-world scenarios. To support this task, we have generated a spatial navigation dataset consisting of 10K trajectories within the simulator. This dataset includes high-level human instructions, detailed observations, and corresponding navigation actions, providing a comprehensive resource to enhance agent training and performance. Building on the spatial navigation dataset, we introduce SpNav, a hierarchical navigation framework. Specifically, SpNav employs vision-language model (VLM) to interpret high-level human instructions and accurately identify goal objects or areas within the observation range, achieving precise point-to-point navigation using a map and enhancing the agent\u2019s ability to oper-\nate effectively in complex environments by bridging the gap between perception and action. Extensive experiments show that SpNav achieves state-of-the-art (SOTA) performance in spatial navigation tasks across both simulated and real-world environments, validating the effectiveness of our method.<\/jats:p>","DOI":"10.1609\/aaai.v40i15.38258","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:15:21Z","timestamp":1773792921000},"page":"12627-12635","source":"Crossref","is-referenced-by-count":0,"title":["What You See Is What You Reach: Towards Spatial Navigation with High-Level Human Instructions"],"prefix":"10.1609","volume":"40","author":[{"given":"Lingfeng","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Haoxiang","family":"Fu","sequence":"additional","affiliation":[]},{"given":"Xiaoshuai","family":"Hao","sequence":"additional","affiliation":[]},{"given":"Shuyi","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Qiang","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Rui","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Long","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Wenbo","family":"Ding","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38258\/42220","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38258\/42220","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:15:21Z","timestamp":1773792921000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38258"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i15.38258","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}