{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T22:34:32Z","timestamp":1762900472847,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Natural Science Foundation of Xinjiang Uygur Autonomous Region","award":["2022D01C392","62063033","2022B01050-2"],"award-info":[{"award-number":["2022D01C392","62063033","2022B01050-2"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2022D01C392","62063033","2022B01050-2"],"award-info":[{"award-number":["2022D01C392","62063033","2022B01050-2"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Key R&amp;D Program of Xinjiang Uygur Autonomous Region","award":["2022D01C392","62063033","2022B01050-2"],"award-info":[{"award-number":["2022D01C392","62063033","2022B01050-2"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Visual navigation based on deep reinforcement learning requires a large amount of interaction with the environment, and due to the reward sparsity, it requires a large amount of training time and computational resources. In this paper, we focus on sample efficiency and navigation performance and propose a framework for visual navigation based on multiple self-supervised auxiliary tasks. Specifically, we present an LSTM-based dynamics model and an attention-based image-reconstruction model as auxiliary tasks. These self-supervised auxiliary tasks enable agents to learn navigation strategies directly from the original high-dimensional images without relying on ResNet features by constructing latent representation learning. Experimental results show that without manually designed features and prior demonstrations, our method significantly improves the training efficiency and outperforms the baseline algorithms on the simulator and real-world image datasets.<\/jats:p>","DOI":"10.3390\/e25071007","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T16:06:37Z","timestamp":1688141197000},"page":"1007","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Multiple Self-Supervised Auxiliary Tasks for Target-Driven Visual Navigation Using Deep Reinforcement Learning"],"prefix":"10.3390","volume":"25","author":[{"given":"Wenzhi","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China"}]},{"given":"Li","family":"He","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8598-2343","authenticated-orcid":false,"given":"Hongwei","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China"}]},{"given":"Liang","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China"},{"name":"School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China"}]},{"given":"Wendong","family":"Xiao","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ristic, B., and Palmer, J.L. (2018). Autonomous Exploration and Mapping with RFS Occupancy-Grid SLAM. Entropy, 20.","DOI":"10.3390\/e20060456"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2738","DOI":"10.1109\/JSEN.2019.2952722","article-title":"Indirect Visual Simultaneous Localization and Mapping Based on Linear Models","volume":"20","author":"Chien","year":"2020","journal-title":"IEEE Sens. J."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1016\/j.dt.2019.04.011","article-title":"A Review: On Path Planning Strategies for Navigation of Mobile Robot","volume":"15","author":"Patle","year":"2019","journal-title":"Def. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Debeunne, C., and Vivet, D. (2020). A Review of Visual-LiDAR Fusion Based Simultaneous Localization and Mapping. Sensors, 20.","DOI":"10.3390\/s20072068"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the Game of Go with Deep Neural Networks and Tree Search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the Game of Go without Human Knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-Level Control through Deep Reinforcement Learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., and Farhadi, A. (June, January 29). Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989381"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., and Savarese, S. (2018, January 18\u201323). Gibson Env: Real-World Perception for Embodied Agents. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00945"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chang, A., Dai, A., Funkhouser, T., Halber, M., Niebner, M., Savva, M., Song, S., Zeng, A., and Zhang, Y. (2018, January 25). Matterport3D: Learning from RGB-D Data in Indoor Environments. Proceedings of the 2017 International Conference on 3D Vision, Qingdao, China.","DOI":"10.1109\/3DV.2017.00081"},{"key":"ref_11","unstructured":"Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur-Artal, R., Ren, C., and Verma, S. (2019). The Replica Dataset: A Digital Replica of Indoor Spaces. arXiv."},{"key":"ref_12","unstructured":"Mo, K., Li, H., Lin, Z., and Lee, J.-Y. (2018). The AdobeIndoorNav Dataset: Towards Deep Reinforcement Learning Based Real-World Indoor Robot Visual Navigation. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"674","DOI":"10.26599\/TST.2021.9010012","article-title":"Deep Reinforcement Learning Based Mobile Robot Navigation: A Review","volume":"26","author":"Zhu","year":"2021","journal-title":"Tsinghua Sci. Technol."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., and Malik, J. (November, January 27). Habitat: A Platform for Embodied AI Research. Proceedings of the 2019 International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00943"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M. (2020, January 23\u201328). Learning Object Relation Graph and Tentative Policy for Visual Navigation. Proceedings of the Computer Vision\u2014ECCV 2020, Glasgow, UK.","DOI":"10.1007\/978-3-030-58548-8"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Mayo, B., Hazan, T., and Tal, A. (2021, January 20\u201325). Visual Navigation with Spatial Attention. Proceedings of the 2021 Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01662"},{"key":"ref_17","first-page":"1","article-title":"Multigoal Visual Navigation With Collision Avoidance via Deep Reinforcement Learning","volume":"71","author":"Xiao","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Kulh\u00e1nek, J., Derner, E., de Bruin, T., and Babu\u0161ka, R. (2019, January 4\u20136). Vision-Based Navigation Using Deep Reinforcement Learning. Proceedings of the 2019 European Conference on Mobile Robots (ECMR), Prague, Czech Republic.","DOI":"10.1109\/ECMR.2019.8870964"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"4345","DOI":"10.1109\/LRA.2021.3068106","article-title":"Visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning","volume":"6","author":"Derner","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Oh, J., Chockalingam, V., Singh, S., and Lee, H. (2016, January 19\u201324). Control of Memory, Active Perception, and Action in Minecraft. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA.","DOI":"10.13052\/ijts2246-8765.2016.003"},{"key":"ref_21","first-page":"e8702962","article-title":"Visual Navigation with Asynchronous Proximal Policy Optimization in Artificial Agents","volume":"2020","author":"Zeng","year":"2020","journal-title":"J. Robot."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1546","DOI":"10.1109\/TRO.2020.2994002","article-title":"Towards Generalization in Target-Driven Visual Navigation by Using Deep Reinforcement Learning","volume":"36","author":"Devo","year":"2020","journal-title":"IEEE Trans. Robot."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wu, Y., Wu, Y., Tamar, A., Russell, S., Gkioxari, G., and Tian, Y. (November, January 27). Bayesian Relational Memory for Semantic Visual Navigation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00286"},{"key":"ref_24","first-page":"4283","article-title":"Semantic Visual Navigation by Watching YouTube Videos","volume":"Volume 33","author":"Chang","year":"2020","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"An, D., Qi, Y., Huang, Y., Wu, Q., Wang, L., and Tan, T. (2021, January 17). Neighbor-View Enhanced Model for Vision and Language Navigation. Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China.","DOI":"10.1145\/3474085.3475282"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Qi, Y., Pan, Z., Hong, Y., Yang, M.-H., van den Hengel, A., and Wu, Q. (2021, January 11\u201317). The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00168"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M. (2020, January 23\u201328). Object-and-Action Aware Model for Visual Language Navigation. Proceedings of the Computer Vision\u2014ECCV 2020, Glasgow, UK.","DOI":"10.1007\/978-3-030-58548-8"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Anderson, P., Wu, Q., Teney, D., Bruce, J., Johnson, M., S\u00fcnderhauf, N., Reid, I., Gould, S., and van den Hengel, A. (2018, January 18\u201323). Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00387"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Qi, Y., Wu, Q., Anderson, P., Wang, X., Wang, W.Y., Shen, C., and van den Hengel, A. (2020, January 13\u201319). REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01000"},{"key":"ref_30","unstructured":"Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., and Kavukcuoglu, K. (2016). Reinforcement Learning with Unsupervised Auxiliary Tasks. arXiv."},{"key":"ref_31","unstructured":"Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A.J., Banino, A., Denil, M., Goroshin, R., Sifre, L., and Kavukcuoglu, K. (2017, January 24\u201326). Learning to Navigate in Complex Environments. Proceedings of the 5th International Conference on Learning Representations, Toulon, France."},{"key":"ref_32","unstructured":"Goel, V., Weng, J., and Poupart, P. (2018). Proceedings of the Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Agrawal, P., Carreira, J., and Malik, J. (2015, January 7\u201313). Learning to See by Moving. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.13"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Tongloy, T., Chuwongin, S., Jaksukam, K., Chousangsuntorn, C., and Boonsang, S. (2017, January 29\u201331). Asynchronous Deep Reinforcement Learning for the Mobile Robot Navigation with Supervised Auxiliary Tasks. Proceedings of the 2017 2nd International Conference on Robotics and Automation Engineering (ICRAE), Shanghai, China.","DOI":"10.1109\/ICRAE.2017.8291355"},{"key":"ref_35","unstructured":"Ye, J., Batra, D., Wijmans, E., and Das, A. (2021, January 16\u201318). Auxiliary Tasks Speed Up Learning Point Goal Navigation. Proceedings of the 2020 Conference on Robot Learning, Virtual."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ye, J., Batra, D., Das, A., and Wijmans, E. (2021, January 11\u201317). Auxiliary Tasks and Exploration Enable ObjectGoal Navigation. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01581"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1291","DOI":"10.1109\/TSMCC.2012.2218595","article-title":"A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients","volume":"42","author":"Grondman","year":"2012","journal-title":"IEEE Trans. Syst. Man Cybern. Part C Appl. Rev."},{"key":"ref_38","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 19\u201324). Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_39","unstructured":"Clemente, A.V., Castej\u00f3n, H.N., and Chandra, A. (2017). Efficient Parallel Methods for Deep Reinforcement Learning. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Ye, X., Lin, Z., Li, H., Zheng, S., and Yang, Y. (2018, January 1\u20135). Active Object Perceiver: Recognition-Guided Policy Learning for Object Searching on Mobile Robots. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593720"},{"key":"ref_41","unstructured":"Mirowski, P., Grimes, M., Malinowski, M., Hermann, K.M., Anderson, K., Teplyashin, D., Simonyan, K., Kavukcuoglu, K., Zisserman, A., and Hadsell, R. (2018). Proceedings of the Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_42","unstructured":"Yang, W., Wang, X., Farhadi, A., Gupta, A., and Mottaghi, R. (2018). Visual Semantic Navigation Using Scene Priors. arXiv."},{"key":"ref_43","first-page":"10311","article-title":"Self-Supervised Attention-Aware Reinforcement Learning","volume":"35","author":"Wu","year":"2021","journal-title":"Proc. AAAI Conf. Artif. Intell."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/7\/1007\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:03:40Z","timestamp":1760126620000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/7\/1007"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,30]]},"references-count":43,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["e25071007"],"URL":"https:\/\/doi.org\/10.3390\/e25071007","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2023,6,30]]}}}