{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:15:14Z","timestamp":1760145314439,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:00:00Z","timestamp":1720742400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Henan University of Technology","award":["31401529","24A520014"],"award-info":[{"award-number":["31401529","24A520014"]}]},{"name":"Key Scientific Research Projects of Higher Education Institutions in Henan Province","award":["31401529","24A520014"],"award-info":[{"award-number":["31401529","24A520014"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Visual reinforcement learning is important in various practical applications, such as video games, robotic manipulation, and autonomous navigation. However, a major challenge in visual reinforcement learning is the generalization to unseen environments, that is, how agents manage environments with previously unseen backgrounds. This issue is triggered mainly by the high unpredictability inherent in high-dimensional observation space. To deal with this problem, techniques including domain randomization and data augmentation have been explored; nevertheless, these methods still cannot attain a satisfactory result. This paper proposes a new method named Internal States Simulation Auxiliary (ISSA), which uses internal states to improve generalization in visual reinforcement learning tasks. Our method contains two agents, a teacher agent and a student agent: the teacher agent has the ability to directly access the environment\u2019s internal states and is used to facilitate the student agent\u2019s training; the student agent receives initial guidance from the teacher agent and subsequently continues to learn independently. From another perspective, our method can be divided into two phases, the transfer learning phase and traditional visual reinforcement learning phase. In the first phase, the teacher agent interacts with environments and imparts knowledge to the vision-based student agent. With the guidance of the teacher agent, the student agent is able to discover more effective visual representations that address the high unpredictability of high-dimensional observation space. In the next phase, the student agent autonomously learns from the visual information in the environment, and ultimately, it becomes a vision-based reinforcement learning agent with enhanced generalization. The effectiveness of our method is evaluated using the DMControl Generalization Benchmark and the DrawerWorld with texture distortions. Preliminary results indicate that our method significantly improves generalization ability and performance in complex continuous control tasks.<\/jats:p>","DOI":"10.3390\/s24144513","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T11:28:03Z","timestamp":1720783683000},"page":"4513","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Generalization Enhancement of Visual Reinforcement Learning through Internal States"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-3875-7971","authenticated-orcid":false,"given":"Hanlin","family":"Yang","sequence":"first","affiliation":[{"name":"Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"William","family":"Zhu","sequence":"additional","affiliation":[{"name":"Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7148-7923","authenticated-orcid":false,"given":"Xianchao","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_2","unstructured":"Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A.S., Yeo, M., Makhzani, A., K\u00fcttler, H., Agapiou, J., and Schrittwieser, J. (2017). Starcraft II: A new challenge for reinforcement learning. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017, January 24\u201328). Domain randomization for transferring deep neural networks from simulation to the real world. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8202133"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ren, X., Luo, J., Solowjow, E., Ojea, J.A., Gupta, A., Tamar, A., and Abbeel, P. (2019, January 20\u201324). Domain randomization for active pose estimation. Proceedings of the 2019 International Conference on Robotics and Automation, Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794126"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Chaysri, P., Spatharis, C., Vlachos, K., and Blekas, K. (2024). Design and implementation of a low-cost intelligent unmanned surface vehicle. Sensors, 24.","DOI":"10.3390\/s24103254"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wen, Y., Chen, Y., and Guo, X. (2024). USV trajectory tracking control based on receding horizon reinforcement learning. Sensors, 24.","DOI":"10.3390\/s24092771"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Al-Hamadani, M.N., Fadhel, M.A., Alzubaidi, L., and Harangi, B. (2024). Reinforcement learning algorithms and applications in healthcare and robotics: A comprehensive and systematic review. Sensors, 24.","DOI":"10.3390\/s24082461"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., and Farhadi, A. (June, January 29). Target-driven visual navigation in indoor scenes using deep reinforcement learning. Proceedings of the 2017 IEEE International Conference on Robotics and Automation, Singapore.","DOI":"10.1109\/ICRA.2017.7989381"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2230","DOI":"10.1109\/JSEN.2020.3016299","article-title":"Vision-based autonomous navigation approach for a tracked robot using deep reinforcement learning","volume":"21","author":"Ejaz","year":"2020","journal-title":"IEEE Sensors J."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wang, C., and Wang, Y. (2024). Safe autonomous driving with latent dynamics and state-wise constraints. Sensors, 24.","DOI":"10.3390\/s24103139"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhao, R., Wang, K., Che, W., Li, Y., Fan, Y., and Gao, F. (2024). Adaptive cruise control based on safe deep reinforcement learning. Sensors, 24.","DOI":"10.3390\/s24082657"},{"key":"ref_12","unstructured":"Cobbe, K., Klimov, O., Hesse, C., Kim, T., and Schulman, J. (2019, January 10\u201315). Quantifying generalization in reinforcement learning. Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA."},{"key":"ref_13","unstructured":"Gamrian, S., and Goldberg, Y. (2019, January 10\u201315). Transfer learning for related reinforcement learning tasks via image-to-image translation. Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA."},{"key":"ref_14","unstructured":"Zhang, C., Vinyals, O., Munos, R., and Bengio, S. (2018). A study on overfitting in deep reinforcement learning. arXiv."},{"key":"ref_15","unstructured":"Farebrother, J., Machado, M.C., and Bowling, M. (2018). Generalization and regularization in dqn. arXiv."},{"key":"ref_16","unstructured":"Mehta, B., Diaz, M., Golemo, F., Pal, C.J., and Paull, L. (November, January 30). Active domain randomization. Proceedings of the Conference on Robot Learning, Virtual."},{"key":"ref_17","unstructured":"Hansen, N., and Wang, X. (June, January 30). Generalization in reinforcement learning by soft data augmentation. Proceedings of the International Conference on Robotics and Automation, Xi\u2019an, China."},{"key":"ref_18","unstructured":"Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D.d.L., Budden, D., Abdolmaleki, A., Merel, J., and Lefrancq, A. (2018). Deepmind control suite. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, X., Lian, L., and Yu, S.X. (2021, January 20\u201325). Unsupervised visual attention and invariance for reinforcement learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00661"},{"key":"ref_20","unstructured":"Yu, T., Quillen, D., He, Z., Julian, R., Hausman, K., Finn, C., and Levine, S. (November, January 30). Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. Proceedings of the Conference on Robot Learning (PMLR), Virtual."},{"key":"ref_21","unstructured":"Yarats, D., Fergus, R., Lazaric, A., and Pinto, L. (2021). Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv."},{"key":"ref_22","first-page":"679","article-title":"A Markovian decision process","volume":"6","author":"Bellman","year":"1957","journal-title":"J. Math. Mech."},{"key":"ref_23","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_24","first-page":"1","article-title":"Provably efficient Q-learning with function approximation via distribution shift error checking oracle","volume":"32","author":"Du","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q.V. (2019, January 15\u201319). Autoaugment: Learning augmentation strategies from data. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00020"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Cubuk, E.D., Zoph, B., Shlens, J., and Le, Q.V. (2020, January 14\u201319). Randaugment: Practical automated data augmentation with a reduced search space. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00359"},{"key":"ref_29","first-page":"19884","article-title":"Reinforcement learning with augmented data","volume":"33","author":"Laskin","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","unstructured":"Kostrikov, I., Yarats, D., and Fergus, R. (2020). Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv."},{"key":"ref_31","unstructured":"Hansen, N., Su, H., and Wang, X. (2021). Stabilizing deep Q-learning with convNets and vision transformers under data augmentation. arXiv."},{"key":"ref_32","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, January 10\u201315). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden."},{"key":"ref_33","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_35","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_36","unstructured":"Hansen, N., Jangir, R., Sun, Y., Aleny\u00e0, G., Abbeel, P., Efros, A.A., Pinto, L., and Wang, X. (2020). Self-supervised policy adaptation during deployment. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/14\/4513\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:15:48Z","timestamp":1760109348000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/14\/4513"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,12]]},"references-count":36,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["s24144513"],"URL":"https:\/\/doi.org\/10.3390\/s24144513","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2024,7,12]]}}}