{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,18]],"date-time":"2026-05-18T22:44:22Z","timestamp":1779144262621,"version":"3.51.4"},"reference-count":66,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T00:00:00Z","timestamp":1722470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Robot. AI"],"abstract":"<jats:p>Assisting individuals in their daily activities through autonomous mobile robots is a significant concern, especially for users without specialized knowledge. Specifically, the capability of a robot to navigate to destinations based on human speech instructions is crucial. Although robots can take different paths toward the same objective, the shortest path is not always the most suitable. A preferred approach would be to accommodate waypoint specifications flexibly for planning an improved alternative path even with detours. Furthermore, robots require real-time inference capabilities. In this sense, spatial representations include semantic, topological, and metric-level representations, each capturing different aspects of the environment. This study aimed to realize a hierarchical spatial representation using a topometric semantic map and path planning with speech instructions by including waypoints. Thus, we present a hierarchical path planning method called spatial concept-based topometric semantic mapping for hierarchical path planning (SpCoTMHP), which integrates place connectivity. This approach provides a novel integrated probabilistic generative model and fast approximate inferences with interactions among the hierarchy levels. A formulation based on \u201ccontrol as probabilistic inference\u201d theoretically supports the proposed path planning algorithm. We conducted experiments in a home environment using the Toyota human support robot on the SIGVerse simulator and in a lab\u2013office environment with the real robot Albert. Here, the user issues speech commands that specify the waypoint and goal, such as \u201cGo to the bedroom via the corridor<jats:italic>.<\/jats:italic>\u201d Navigation experiments were performed using speech instructions with a waypoint to demonstrate the performance improvement of the SpCoTMHP over the baseline hierarchical path planning method with heuristic path costs (HPP-I) in terms of the weighted success rate at which the robot reaches the closest target (0.590) and passes the correct waypoints. The computation time was significantly improved by 7.14 s with the SpCoTMHP than the baseline HPP-I in advanced tasks. Thus, hierarchical spatial representations provide mutually understandable instruction forms for both humans and robots, thus enabling language-based navigation.<\/jats:p>","DOI":"10.3389\/frobt.2024.1291426","type":"journal-article","created":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T04:10:54Z","timestamp":1722485454000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Hierarchical path planning from speech instructions with spatial concept-based topometric semantic mapping"],"prefix":"10.3389","volume":"11","author":[{"given":"Akira","family":"Taniguchi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuya","family":"Ito","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tadahiro","family":"Taniguchi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2024,8,1]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arxiv.2204.01691","article-title":"Do as I can, not as I say: grounding language in robotic affordances","author":"Ahn","year":"2022","journal-title":"arXiv Prepr."},{"key":"B2","volume-title":"On evaluation of embodied navigation agents","author":"Anderson","year":""},{"key":"B3","first-page":"3674","article-title":"Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)","author":"Anderson","year":""},{"key":"B4","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1038\/s41586-018-0102-6","article-title":"Vector-based navigation using grid-like representations in artificial agents","volume":"557","author":"Banino","year":"2018","journal-title":"Nature"},{"key":"B5","doi-asserted-by":"publisher","first-page":"1877","DOI":"10.48550\/arxiv.2005.14165","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. neural Inf. Process. Syst."},{"key":"B6","doi-asserted-by":"publisher","first-page":"11509","DOI":"10.1109\/ICRA48891.2023.10161534","article-title":"Open-vocabulary queryable scene representations for real world planning","author":"Chen","year":"2023","journal-title":"Proc. - IEEE Int. Conf. Robotics Automation"},{"key":"B7","first-page":"11271","article-title":"Topological planning with transformers for Vision-and-language navigation","volume-title":"Proceedings of the IEEE computer society conference on computer vision and pattern recognition","author":"Chen","year":"2021"},{"key":"B8","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/S0921-8890(03)00021-6","article-title":"An introduction to the anchoring problem","volume":"43","author":"Coradeschi","year":"2003","journal-title":"Robotics Aut. Syst."},{"key":"B9","volume-title":"Latent space policies for hierarchical reinforcement learning","author":"[Dataset] Haarnoja","year":"2018"},{"key":"B10","first-page":"176","article-title":"Rao-Blackwellised particle filtering for dynamic Bayesian networks","volume-title":"Proceedings of the 16th conference on uncertainty in artificial intelligence","author":"Doucet","year":"2000"},{"key":"B11","volume-title":"Foundation models in robotics: applications, challenges, and the future","author":"Firoozi","year":"2023"},{"key":"B12","doi-asserted-by":"publisher","first-page":"680","DOI":"10.1038\/nature04587","article-title":"Reverse replay of behavioural sequences in hippocampal place cells during the awake state","volume":"440","author":"Foster","year":"2006","journal-title":"Nature"},{"key":"B13","first-page":"2278","article-title":"Multi-hierarchical semantic maps for mobile robotics","volume-title":"2005 IEEE\/RSJ international conference on intelligent robots and systems, IROS","author":"Galindo","year":"2005"},{"key":"B14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/2300000059","article-title":"Semantics for robotic mapping, perception and interaction: a survey","volume":"8","author":"Garg","year":"2020","journal-title":"Found. Trends\u00ae Robotics"},{"key":"B15","doi-asserted-by":"crossref","DOI":"10.21437\/Eurospeech.1999-479","article-title":"Topic-based language models using EM","volume-title":"Proceedings of the European conference on speech communication and technology (EUROSPEECH)","author":"Gildea","year":"1999"},{"key":"B16","first-page":"9673","article-title":"Hybrid topological and 3D dense mapping through autonomous exploration for large indoor environments","volume-title":"Proceedings of the IEEE international conference on robotics and automation (ICRA)","author":"Gomez","year":"2020"},{"key":"B17","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1109\/tro.2006.889486","article-title":"Improved techniques for grid mapping with rao-blackwellized particle filters","volume":"23","author":"Grisetti","year":"2007","journal-title":"IEEE Trans. Robotics"},{"key":"B18","doi-asserted-by":"publisher","first-page":"7606","DOI":"10.18653\/V1\/2022.ACL-LONG.524","article-title":"Vision-and-language navigation: a survey of tasks, methods, and future directions","volume":"1","author":"Gu","year":"2022","journal-title":"Proc. Annu. Meet. Assoc. Comput. Linguistics"},{"key":"B19","doi-asserted-by":"crossref","DOI":"10.1109\/SII55687.2023.10039318","article-title":"Inferring place-object relationships by integrating probabilistic logic and multimodal spatial concepts","volume-title":"2023 IEEE\/SICE international symposium on system integration","author":"Hasegawa","year":"2023"},{"key":"B20","first-page":"4190","article-title":"Learning topometric semantic maps from occupancy grids","volume-title":"Proceedings of the IEEE\/RSJ international conference on intelligent robots and systems (IROS)","author":"Hiller","year":"2019"},{"key":"B21","first-page":"530","article-title":"Hierarchical A*: searching abstraction hierarchies efficiently","volume":"1","author":"Holte","year":"1996","journal-title":"Proc. Natl. Conf. Artif. Intell."},{"key":"B22","doi-asserted-by":"crossref","DOI":"10.1109\/ICRA48891.2023.10160969","article-title":"Visual Language maps for robot navigation","volume-title":"Proceedings of the IEEE international conference on robotics and automation (ICRA)","author":"Huang","year":"2023"},{"key":"B23","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/bf01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classif."},{"key":"B24","doi-asserted-by":"publisher","first-page":"549360","DOI":"10.3389\/frobt.2021.549360","article-title":"SIGVerse: a cloud-based vr platform for research on multimodal human-robot interaction","volume":"8","author":"Inamura","year":"2021","journal-title":"Front. Robotics AI"},{"key":"B25","doi-asserted-by":"crossref","DOI":"10.1109\/SMC53992.2023.10394143","article-title":"Active semantic mapping for household robots: rapid indoor adaptation and reduced user burden","volume-title":"2023 IEEE international conference on systems, man, and cybernetics (SMC)","author":"Ishikawa","year":"2023"},{"key":"B26","first-page":"673","article-title":"Bayesian nonparametric hidden semi-markov models","volume":"14","author":"Johnson","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"B27","doi-asserted-by":"publisher","first-page":"1379","DOI":"10.1007\/s10514-015-9514-4","article-title":"An integrated model of autonomous topological spatial cognition","volume":"40","author":"Karao\u011fuz","year":"2016","journal-title":"Aut. Robots"},{"key":"B28","first-page":"7927","article-title":"SpCoMapGAN: spatial concept formation-based semantic mapping with generative adversarial networks","volume-title":"Proceedings of the IEEE\/RSJ international conference on intelligent robots and systems (IROS)","author":"Katsumata","year":"2020"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1055","DOI":"10.1080\/01691864.2020.1778521","article-title":"Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model","volume":"34","author":"Kinose","year":"2020","journal-title":"Adv. Robot."},{"key":"B30","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1016\/j.engappai.2015.11.004","article-title":"Robot navigation via spatial and temporal coherent semantic maps","volume":"48","author":"Kostavelis","year":"2016","journal-title":"Eng. Appl. Artif. Intell."},{"key":"B31","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1016\/j.robot.2014.12.006","article-title":"Semantic mapping for mobile robotics tasks: a survey","volume":"66","author":"Kostavelis","year":"2015","journal-title":"Robotics Aut. Syst."},{"key":"B32","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1007\/978-3-030-58604-1_7","article-title":"Beyond the nav-graph: vision-and-language navigation in continuous environments","author":"Krantz","year":"2020","journal-title":"Tech. Rep."},{"key":"B33","first-page":"3682","article-title":"Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation","volume-title":"Proceedings of the advances in neural information processing systems (NeurIPS)","author":"Kulkarni","year":"2016"},{"key":"B34","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1109\/tsmc.1987.4309069","article-title":"Entropy and correlation: some comments","volume":"17","author":"Kvalseth","year":"1987","journal-title":"IEEE Trans. Syst. Man, Cybern."},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1805.00909","article-title":"Reinforcement learning and control as probabilistic inference: tutorial and review","author":"Levine","year":"2018","journal-title":"Tech. Rep"},{"key":"B36","doi-asserted-by":"publisher","DOI":"10.1115\/1.4043115","article-title":"Stochastic predictive control for partially observable Markov decision processes with TimeJoint chance constraints and application to autonomous vehicle control","volume":"141","author":"Li","year":"2019","journal-title":"J. Dyn. Syst. Meas. Control, Trans. ASME"},{"key":"B37","doi-asserted-by":"publisher","first-page":"813","DOI":"10.1007\/s10514-018-9732-7","article-title":"Predicting the global structure of indoor environments: a constructive machine learning approach","volume":"43","author":"Luperto","year":"2018","journal-title":"Aut. Robots"},{"key":"B38","volume-title":"ClipCap: CLIP prefix for image captioning","author":"Mokady","year":"2021"},{"key":"B39","first-page":"1151","article-title":"FastSLAM 2.0: an improved particle filtering algorithm for simultaneous localization and mapping that provably converges","volume-title":"Proceedings of the international joint conference on artificial intelligence (IJCAI)","author":"Montemerlo","year":"2003"},{"key":"B40","volume-title":"Machine learning: a probabilistic perspective","author":"Murphy","year":"2012"},{"key":"B41","doi-asserted-by":"publisher","first-page":"614","DOI":"10.1587\/transinf.e95.d.614","article-title":"Bayesian learning of a language model from continuous speech","volume":"95","author":"Neubig","year":"2012","journal-title":"IEICE Trans. Inf. Syst."},{"key":"B42","first-page":"2065","article-title":"City-scale grid-topological hybrid maps for autonomous mobile robot navigation in urban area","volume-title":"IEEE international conference on intelligent robots and systems","author":"Niijima","year":"2020"},{"key":"B43","first-page":"8748","article-title":"Learning transferable visual models from natural language supervision","volume":"139","author":"Radford","year":"2021","journal-title":"Proc. Mach. Learn. Res."},{"key":"B44","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1080\/01691864.2016.1261045","article-title":"LexToMap: lexical-based topological mapping","volume":"31","author":"Rangel","year":"2017","journal-title":"Adv. Robot."},{"key":"B45","doi-asserted-by":"publisher","first-page":"1510","DOI":"10.1177\/02783649211056674","article-title":"Kimera: from SLAM to spatial perception with 3D dynamic scene graphs","volume":"40","author":"Rosinol","year":"2021","journal-title":"Int. J. Robotics Res."},{"key":"B46","doi-asserted-by":"publisher","DOI":"10.15607\/rss.2023.xix.074","article-title":"CLIP-fields: weakly supervised semantic fields for robotic memory","author":"Shafiullah","year":"2023","journal-title":"Robotics Sci. Syst."},{"key":"B47","article-title":"LM-nav: robotic navigation with large pre-trained models of language, vision, and action","volume-title":"Conference on robot learning (CoRL)","author":"Shah","year":"2022"},{"key":"B48","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1613\/jair.874","article-title":"Learning geometrically-constrained Hidden Markov models for robot navigation: bridging the topological-geometrical gap","volume":"16","author":"Shatkay","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"B49","doi-asserted-by":"publisher","first-page":"4110","DOI":"10.1109\/LRA.2022.3149572","article-title":"Topological semantic mapping by consolidation of deep visual features","volume":"7","author":"Sousa, Y","year":"2022","journal-title":"IEEE Robotics Automation Lett."},{"key":"B50","volume-title":"The robotics data set repository (radish)","author":"Stachniss","year":"2003"},{"key":"B51","doi-asserted-by":"publisher","first-page":"632","DOI":"10.1016\/j.sysconle.2011.05.001","article-title":"PF-MPC: particle filter-model predictive control","volume":"60","author":"Stahl","year":"2011","journal-title":"Syst. Control Lett."},{"key":"B52","first-page":"1667","article-title":"Enabling topological planning with monocular vision","volume-title":"Proceedings of the IEEE international conference on robotics and automation (ICRA)","author":"Stein","year":"2020"},{"key":"B53","doi-asserted-by":"publisher","first-page":"700","DOI":"10.1080\/01691864.2019.1632223","article-title":"Survey on frontiers of language and robotics","volume":"33","author":"Taniguchi","year":"2019","journal-title":"Adv. Robot."},{"key":"B54","first-page":"811","article-title":"Online spatial concept and lexical acquisition with simultaneous localization and mapping","volume-title":"Proceedings of the IEEE\/RSJ international conference on intelligent robots and systems (IROS)","author":"Taniguchi","year":"2017"},{"key":"B55","doi-asserted-by":"publisher","first-page":"927","DOI":"10.1007\/s10514-020-09905-0","article-title":"Improved and scalable online learning of spatial concepts and language models with mapping","volume":"44","author":"Taniguchi","year":"","journal-title":"Aut. Robots"},{"key":"B56","doi-asserted-by":"publisher","first-page":"1213","DOI":"10.1080\/01691864.2020.1817777","article-title":"Spatial concept-based navigation with human speech instructions via probabilistic inference on bayesian generative model","volume":"34","author":"Taniguchi","year":"","journal-title":"Adv. Robot."},{"key":"B57","doi-asserted-by":"publisher","first-page":"840","DOI":"10.1080\/01691864.2023.2225175","article-title":"Active exploration based on information gain by particle filter for efficient spatial concept formation","volume":"37","author":"Taniguchi","year":"2023","journal-title":"Adv. Robot."},{"key":"B58","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1109\/TCDS.2016.2565542","article-title":"Spatial concept acquisition for a mobile robot that integrates self-localization and unsupervised word discovery from spoken sentences","volume":"8","author":"Taniguchi","year":"","journal-title":"IEEE Trans. Cognitive Dev. Syst."},{"key":"B59","doi-asserted-by":"publisher","first-page":"706","DOI":"10.1080\/01691864.2016.1164622","article-title":"Symbol emergence in robotics: a survey","volume":"30","author":"Taniguchi","year":"","journal-title":"Adv. Robot."},{"key":"B60","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1007\/s00354-019-00084-w","article-title":"Neuro-SERKET: development of integrative cognitive system through the composition of deep probabilistic generative models","volume":"38","author":"Taniguchi","year":"","journal-title":"New Gener. Comput."},{"key":"B61","doi-asserted-by":"publisher","first-page":"494","DOI":"10.1109\/TCDS.2018.2867772","article-title":"Symbol emergence in cognitive developmental systems: a survey","volume":"11","author":"Taniguchi","year":"2019","journal-title":"IEEE Trans. Cognitive Dev. Syst."},{"key":"B62","volume-title":"Probabilistic robotics","author":"Thrun","year":"2005"},{"key":"B63","doi-asserted-by":"publisher","first-page":"20","DOI":"10.48550\/arXiv.2306.17582","article-title":"ChatGPT for robotics: design principles and model abilities","volume":"2","author":"Vemprala","year":"2023","journal-title":"Microsoft Auton. Syst. Robot. Res."},{"key":"B64","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1109\/tit.1967.1054010","article-title":"Error bounds for convolutional codes and an asymptotically optimum decoding algorithm","volume":"13","author":"Viterbi","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"B65","volume-title":"Large Language models for robotics: a survey","author":"Zeng","year":"2023"},{"key":"B66","first-page":"4547","article-title":"Learning graph-structured sum-product networks for probabilistic semantic maps","volume-title":"32nd AAAI conference on artificial intelligence","author":"Zheng","year":"2018"}],"container-title":["Frontiers in Robotics and AI"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2024.1291426\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T04:11:22Z","timestamp":1722485482000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2024.1291426\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,1]]},"references-count":66,"alternative-id":["10.3389\/frobt.2024.1291426"],"URL":"https:\/\/doi.org\/10.3389\/frobt.2024.1291426","relation":{},"ISSN":["2296-9144"],"issn-type":[{"value":"2296-9144","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,1]]},"article-number":"1291426"}}