{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T22:16:01Z","timestamp":1768688161606,"version":"3.49.0"},"reference-count":34,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2023,5,21]],"date-time":"2023-05-21T00:00:00Z","timestamp":1684627200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"Korean government (MSIT)","doi-asserted-by":"publisher","award":["NRF-2021R1A5A1032937"],"award-info":[{"award-number":["NRF-2021R1A5A1032937"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Reinforcement learning agents that have not been seen during training must be robust in test environments. However, the generalization problem is challenging to solve in reinforcement learning using high-dimensional images as the input. The addition of a self-supervised learning framework with data augmentation in the reinforcement learning architecture can promote generalization to a certain extent. However, excessively large changes in the input images may disturb reinforcement learning. Therefore, we propose a contrastive learning method that can help manage the trade-off relationship between the performance of reinforcement learning and auxiliary tasks against the data augmentation strength. In this framework, strong augmentation does not disturb reinforcement learning and instead maximizes the auxiliary effect for generalization. Results of experiments on the DeepMind Control suite demonstrate that the proposed method effectively uses strong data augmentation and achieves a higher generalization than the existing methods.<\/jats:p>","DOI":"10.3390\/s23104946","type":"journal-article","created":{"date-parts":[[2023,5,22]],"date-time":"2023-05-22T02:28:42Z","timestamp":1684722522000},"page":"4946","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["C2RL: Convolutional-Contrastive Learning for Reinforcement Learning Based on Self-Pretraining for Strong Augmentation"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-8201-9246","authenticated-orcid":false,"given":"Sanghoon","family":"Park","sequence":"first","affiliation":[{"name":"Graduate School of Automotive Engineering, Kookmin University, Seoul 02707, Republic of Korea"}]},{"given":"Jihun","family":"Kim","sequence":"additional","affiliation":[{"name":"Graduate School of Automotive Engineering, Kookmin University, Seoul 02707, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1074-8464","authenticated-orcid":false,"given":"Han-You","family":"Jeong","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Pusan National University, Busan 46241, Republic of Korea"}]},{"given":"Tae-Kyoung","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Gachon University, Seongnam 13120, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1025-3784","authenticated-orcid":false,"given":"Jinwoo","family":"Yoo","sequence":"additional","affiliation":[{"name":"Department of Automobile and IT Convergence, Kookmin University, Seoul 02707, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2023,5,21]]},"reference":[{"key":"ref_1","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1126\/science.aar6404","article-title":"A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play","volume":"362","author":"Silver","year":"2018","journal-title":"Science"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_4","unstructured":"Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A.S., Yeo, M., Makhzani, A., K\u00fcttler, H., Agapiou, J., and Schrittwieser, J. (2017). Starcraft ii: A new challenge for reinforcement learning. arXiv."},{"key":"ref_5","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_6","unstructured":"Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., and Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv."},{"key":"ref_7","unstructured":"Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., and Dunning, I. (2018, January 10\u201315). Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1126\/science.aau6249","article-title":"Human-level performance in 3D multiplayer games with population-based reinforcement learning","volume":"364","author":"Jaderberg","year":"2019","journal-title":"Science"},{"key":"ref_9","unstructured":"Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., and Vanhoucke, V. (2018, January 29\u201331). Scalable deep reinforcement learning for vision-based robotic manipulation. Proceedings of the Conference on Robot Learning, PMLR, Z\u00fcrich, Switzerland."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"e253","DOI":"10.1017\/S0140525X16001837","article-title":"Building machines that learn and think like people","volume":"40","author":"Lake","year":"2017","journal-title":"Behav. Brain Sci."},{"key":"ref_11","unstructured":"Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Campbell, R.H., Czechowski, K., Erhan, D., Finn, C., Kozakowski, P., and Levine, S. (2019). Model-based reinforcement learning for atari. arXiv."},{"key":"ref_12","unstructured":"Laskin, M., Srinivas, A., and Abbeel, P. (2020, January 13\u201318). Curl: Contrastive unsupervised representations for reinforcement learning. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_13","unstructured":"Zhang, C., Vinyals, O., Munos, R., and Bengio, S. (2018). A study on overfitting in deep reinforcement learning. arXiv."},{"key":"ref_14","unstructured":"Cobbe, K., Klimov, O., Hesse, C., Kim, T., and Schulman, J. (2019, January 9\u201315). Quantifying generalization in reinforcement learning. Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA."},{"key":"ref_15","unstructured":"Ma, G., Wang, Z., Yuan, Z., Wang, X., Yuan, B., and Tao, D. (2022). A comprehensive survey of data augmentation in visual reinforcement learning. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Hansen, N., and Wang, X. (June, January 30). Generalization in reinforcement learning by soft data augmentation. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9561103"},{"key":"ref_17","unstructured":"Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D.D., Budden, D., Abdolmaleki, A., Merel, J., and Lefrancq, A. (2018). Deepmind control suite. arXiv."},{"key":"ref_18","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, January 10\u201315). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Doersch, C., Gupta, A., and Efros, A.A. (2015, January 7\u201313). Unsupervised visual representation learning by context prediction. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.167"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.neucom.2021.10.110","article-title":"Graph-based few-shot learning with transformed feature propagation and optimal class allocation","volume":"470","author":"Zhang","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ding, B., Zhang, R., Xu, L., Liu, G., Yang, S., Liu, Y., and Zhang, Q. (2023). U2D2 Net: Unsupervised Unified Image Dehazing and Denoising Network for Single Hazy Image Enhancement. IEEE Trans. Multimed., 1\u201316.","DOI":"10.1109\/TMM.2023.3263078"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020, January 13\u201319). Momentum contrast for unsupervised visual representation learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"ref_23","unstructured":"Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020, January 13\u201318). A simple framework for contrastive learning of visual representations. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_24","first-page":"21271","article-title":"Bootstrap your own latent-a new approach to self-supervised learning","volume":"33","author":"Grill","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wu, Z., Xiong, Y., Yu, S.X., and Lin, D. (2018, January 18\u201323). Unsupervised feature learning via non-parametric instance discrimination. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00393"},{"key":"ref_27","unstructured":"Oord, A.V., Li, Y., and Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv."},{"key":"ref_28","first-page":"8626","article-title":"Randomized prior functions for deep reinforcement learning","volume":"31","author":"Osband","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_29","unstructured":"Burda, Y., Edwards, H., Storkey, A., and Klimov, O. (2018). Exploration by random network distillation. arXiv."},{"key":"ref_30","unstructured":"Lee, K., Lee, K., Shin, J., and Lee, H. (2019). Network randomization: A simple technique for generalization in deep reinforcement learning. arXiv."},{"key":"ref_31","unstructured":"Glorot, X., and Bengio, Y. (2010, January 13\u201315). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Sardinia, Italy."},{"key":"ref_32","unstructured":"Hansen, N., Jangir, R., Sun, Y., Aleny\u00e0, G., Abbeel, P., Efros, A.A., Pinto, L., and Wang, X. (2020). Self-supervised policy adaptation during deployment. arXiv."},{"key":"ref_33","first-page":"19884","article-title":"Reinforcement learning with augmented data","volume":"33","author":"Laskin","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_34","unstructured":"Kostrikov, I., Yarats, D., and Fergus, R. (2020). Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/10\/4946\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:39:25Z","timestamp":1760125165000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/10\/4946"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,21]]},"references-count":34,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["s23104946"],"URL":"https:\/\/doi.org\/10.3390\/s23104946","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,21]]}}}