{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T05:31:24Z","timestamp":1770355884546,"version":"3.49.0"},"reference-count":44,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,7,16]],"date-time":"2023-07-16T00:00:00Z","timestamp":1689465600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004569","name":"Ministry of Science and Higher Education of the Russian Federation","doi-asserted-by":"publisher","award":["075-15-2020-799"],"award-info":[{"award-number":["075-15-2020-799"]}],"id":[{"id":"10.13039\/501100004569","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>In recent years, Embodied AI has become one of the main topics in robotics. For the agent to operate in human-centric environments, it needs the ability to explore previously unseen areas and to navigate to objects that humans want the agent to interact with. This task, which can be formulated as ObjectGoal Navigation (ObjectNav), is the main focus of this work. To solve this challenging problem, we suggest a hybrid framework consisting of both not-learnable and learnable modules and a switcher between them\u2014SkillFusion. The former are more accurate, while the latter are more robust to sensors\u2019 noise. To mitigate the sim-to-real gap, which often arises with learnable methods, we suggest training them in such a way that they are less environment-dependent. As a result, our method showed top results in both the Habitat simulator and during the evaluations on a real robot.<\/jats:p>","DOI":"10.3390\/robotics12040104","type":"journal-article","created":{"date-parts":[[2023,7,17]],"date-time":"2023-07-17T00:35:04Z","timestamp":1689554104000},"page":"104","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Skill Fusion in Hybrid Robotic Framework for Visual Object Goal Navigation"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4730-1543","authenticated-orcid":false,"given":"Aleksei","family":"Staroverov","sequence":"first","affiliation":[{"name":"AIRI, 105064 Moscow, Russia"},{"name":"Federal Research Center for Computer Science and Control of Russian Academy of Sciences, 119333 Moscow, Russia"},{"name":"Moscow Institute of Physics and Technology, 141707 Dolgoprudny, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5897-0702","authenticated-orcid":false,"given":"Kirill","family":"Muravyev","sequence":"additional","affiliation":[{"name":"Federal Research Center for Computer Science and Control of Russian Academy of Sciences, 119333 Moscow, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4377-321X","authenticated-orcid":false,"given":"Konstantin","family":"Yakovlev","sequence":"additional","affiliation":[{"name":"Federal Research Center for Computer Science and Control of Russian Academy of Sciences, 119333 Moscow, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9747-3837","authenticated-orcid":false,"given":"Aleksandr I.","family":"Panov","sequence":"additional","affiliation":[{"name":"AIRI, 105064 Moscow, Russia"},{"name":"Federal Research Center for Computer Science and Control of Russian Academy of Sciences, 119333 Moscow, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2023,7,16]]},"reference":[{"key":"ref_1","unstructured":"Wijmans, E., Kadian, A., Morcos, A., Lee, S., Essa, I., Parikh, D., Savva, M., and Batra, D. (2019). DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames. arXiv."},{"key":"ref_2","unstructured":"Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A., and Salakhutdinov, R. (2020). Learning to Explore using Active Neural SLAM. arXiv."},{"key":"ref_3","unstructured":"Shacklett, B., Wijmans, E., Petrenko, A., Savva, M., Batra, D., Koltun, V., and Fatahalian, K. (2021, January 3\u20137). Large Batch Simulation for Deep Reinforcement Learning. Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event."},{"key":"ref_4","unstructured":"Batra, D., Gokaslan, A., Kembhavi, A., Maksymets, O., Mottaghi, R., Savva, M., Toshev, A., and Wijmans, E. (2020). ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1007\/s10846-008-9235-4","article-title":"Visual Navigation for Mobile Robots: A Survey","volume":"53","author":"Ortiz","year":"2008","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_6","unstructured":"Kadian, A., Truong, J., Gokaslan, A., Clegg, A., Wijmans, E., Lee, S., Savva, M., Chernova, S., and Batra, D. (2019). Are we making real progress in simulated environments? measuring the sim2real gap in embodied visual navigation. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1109\/TRO.2016.2624754","article-title":"Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age","volume":"32","author":"Cadena","year":"2016","journal-title":"IEEE Trans. Robot."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ye, J., Batra, D., Das, A., and Wijmans, E. (2021, January 10\u201317). Auxiliary tasks and exploration enable objectgoal navigation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01581"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Xue, H., Hein, B., Bakr, M., Schildbach, G., Abel, B., and Rueckert, E. (2022). Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics. Appl. Sci., 12.","DOI":"10.3390\/app12063153"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Fugal, J., Bae, J., and Poonawala, H.A. (2021). On the Impact of Gravity Compensation on Reinforcement Learning in Goal-Reaching Tasks for Robotic Manipulators. Robotics, 10.","DOI":"10.3390\/robotics10010046"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","article-title":"Mastering Atari, Go, chess and shogi by planning with a learned model","volume":"588","author":"Schrittwieser","year":"2020","journal-title":"Nature"},{"key":"ref_12","unstructured":"Hafner, D., Lillicrap, T., Norouzi, M., and Ba, J. (2020). Mastering Atari with Discrete World Models. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Padmakumar, A., Thomason, J., Shrivastava, A., Lange, P., Narayan-Chen, A., Gella, S., Piramuthu, R., Tur, G., and Hakkani-Tur, D. (2021). TEACh: Task-driven Embodied Agents that Chat. arXiv.","DOI":"10.1609\/aaai.v36i2.20097"},{"key":"ref_14","unstructured":"Chaplot, D.S., Gandhi, D., Gupta, A., and Salakhutdinov, R. (2020). Object Goal Navigation using Goal-Oriented Semantic Exploration. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Gadzicki, K., Khamsehashari, R., and Zetzsche, C. (2020, January 6\u20139). Early vs late fusion in multimodal convolutional neural networks. Proceedings of the 2020 IEEE 23rd International Conference on Information Fusion (FUSION), Rustenburg, South Africa.","DOI":"10.23919\/FUSION45008.2020.9190246"},{"key":"ref_16","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Gordon, D., Kadian, A., Parikh, D., Hoffman, J., and Batra, D. (2019). SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation. arXiv.","DOI":"10.1109\/ICCV.2019.00111"},{"key":"ref_18","unstructured":"Reed, S., Zolna, K., Parisotto, E., Colmenarejo, S.G., Novikov, A., Barth-Maron, G., Gimenez, M., Sulsky, Y., Kay, J., and Springenberg, J.T. (2022). A Generalist Agent. arXiv."},{"key":"ref_19","unstructured":"Yadav, K., Ramrakhya, R., Majumdar, A., Berges, V.P., Kuhar, S., Batra, D., Baevski, A., and Maksymets, O. (2022). Offline Visual Representation Learning for Embodied Navigation. arXiv."},{"key":"ref_20","unstructured":"Baker, B., Akkaya, I., Zhokhov, P., Huizinga, J., Tang, J., Ecoffet, A., Houghton, B., Sampedro, R., and Clune, J. (2022). Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos. arXiv."},{"key":"ref_21","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Khandelwal, A., Weihs, L., Mottaghi, R., and Kembhavi, A. (2021). Simple but Effective: CLIP Embeddings for Embodied AI. arXiv.","DOI":"10.1109\/CVPR52688.2022.01441"},{"key":"ref_23","unstructured":"Deitke, M., VanderBilt, E., Herrasti, A., Weihs, L., Salvador, J., Ehsani, K., Han, W., Kolve, E., Farhadi, A., and Kembhavi, A. (2022). ProcTHOR: Large-Scale Embodied AI Using Procedural Generation. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1874","DOI":"10.1109\/TRO.2021.3075644","article-title":"Orb-slam3: An accurate open-source library for visual, visual\u2013inertial, and multimap slam","volume":"37","author":"Campos","year":"2021","journal-title":"IEEE Trans. Robot."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hess, W., Kohler, D., Rapp, H., and Andor, D. (2016, January 16\u201321). Real-time loop closure in 2D LIDAR SLAM. Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden.","DOI":"10.1109\/ICRA.2016.7487258"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Sumikura, S., Shibuya, M., and Sakurada, K. (2019, January 21\u201325). OpenVSLAM: A versatile visual SLAM framework. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France.","DOI":"10.1145\/3343031.3350539"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1002\/rob.21831","article-title":"RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation","volume":"36","author":"Michaud","year":"2019","journal-title":"J. Field Robot."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/TSSC.1968.300136","article-title":"A formal basis for the heuristic determination of minimum cost paths","volume":"4","author":"Hart","year":"1968","journal-title":"IEEE Trans. Syst. Sci. Cybern."},{"key":"ref_29","unstructured":"Nash, A., Daniel, K., Koenig, S., and Felner, A. (2007, January 22\u201326). Theta*: Any-angle path planning on grids. Proceedings of the AAAI, Vancouver, BC, Canada."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1505","DOI":"10.1109\/TCST.2016.2601624","article-title":"Implementation of nonlinear model predictive path-following control for an industrial robot","volume":"25","author":"Faulwasser","year":"2016","journal-title":"IEEE Trans. Control Syst. Technol."},{"key":"ref_31","unstructured":"Soetanto, D., Lapierre, L., and Pascoal, A. (2003, January 9\u201312). Adaptive, non-singular path-following control of dynamic wheeled robots. Proceedings of the 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475), Maui, HI, USA."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.ymssp.2018.08.028","article-title":"Model predictive path following control for autonomous cars considering a measurable disturbance: Implementation, testing, and verification","volume":"118","author":"Guo","year":"2019","journal-title":"Mech. Syst. Signal Process."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Santosh, D., Achar, S., and Jawahar, C. (2008, January 19\u201323). Autonomous image-based exploration for mobile robot navigation. Proceedings of the 2008 IEEE International Conference on Robotics and Automation, Pasadena, CA, USA.","DOI":"10.1109\/ROBOT.2008.4543622"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Gao, W., Booker, M., Adiwahono, A., Yuan, M., Wang, J., and Yun, Y.W. (2018, January 18\u201321). An improved frontier-based approach for autonomous exploration. Proceedings of the 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore.","DOI":"10.1109\/ICARCV.2018.8581245"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Muravyev, K., Bokovoy, A., and Yakovlev, K. (2021, January 11\u201316). Enhancing exploration algorithms for navigation with visual SLAM. Proceedings of the Russian Conference on Artificial Intelligence, Taganrog, Russia.","DOI":"10.1007\/978-3-030-86855-0_14"},{"key":"ref_36","unstructured":"Kojima, N., and Deng, J. (2019). To Learn or Not to Learn: Analyzing the Role of Learning for Navigation in Virtual Environments. arXiv."},{"key":"ref_37","unstructured":"Mishkin, D., Dosovitskiy, A., and Koltun, V. (2019). Benchmarking Classic and Learned Navigation in Complex 3D Environments. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gupta, S., Tolani, V., Davidson, J., Levine, S., Sukthankar, R., and Malik, J. (2017). Cognitive Mapping and Planning for Visual Navigation. arXiv.","DOI":"10.1109\/CVPR.2017.769"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"195608","DOI":"10.1109\/ACCESS.2020.3034524","article-title":"Real-Time Object Navigation with Deep Neural Networks and Hierarchical Reinforcement Learning","volume":"8","author":"Staroverov","year":"2020","journal-title":"IEEE Access"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"70447","DOI":"10.1109\/ACCESS.2022.3182803","article-title":"Hierarchical Landmark Policy Optimization for Visual Indoor Navigation","volume":"10","author":"Staroverov","year":"2022","journal-title":"IEEE Access"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Rana, K., Dasagi, V., Haviland, J., Talbot, B., Milford, M., and S\u00fcnderhauf, N. (2023). Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics. arXiv.","DOI":"10.1177\/02783649231167210"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Rozenberszki, D., and Majdik, A.L. (August, January 31). LOL: Lidar-only odometry and localization in 3D point cloud maps. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9197450"},{"key":"ref_43","unstructured":"Ramakrishnan, S.K., Gokaslan, A., Wijmans, E., Maksymets, O., Clegg, A., Turner, J., Undersander, E., Galuba, W., Westbury, A., and Chang, A.X. (2021). Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI. arXiv."},{"key":"ref_44","unstructured":"Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., and Malik, J. (November, January 27). Habitat: A Platform for Embodied AI Research. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/12\/4\/104\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:12:50Z","timestamp":1760127170000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/12\/4\/104"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,16]]},"references-count":44,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,8]]}},"alternative-id":["robotics12040104"],"URL":"https:\/\/doi.org\/10.3390\/robotics12040104","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,16]]}}}