{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:06:41Z","timestamp":1760231201692,"version":"build-2065373602"},"reference-count":80,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2022,8,29]],"date-time":"2022-08-29T00:00:00Z","timestamp":1661731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute of Information &amp; communications Technology Planning &amp; Evaluation (IITP)","award":["2021-0-02068","2022R1A2C201270611"],"award-info":[{"award-number":["2021-0-02068","2022R1A2C201270611"]}]},{"name":"National Research Foundation of Korea (NRF)","award":["2021-0-02068","2022R1A2C201270611"],"award-info":[{"award-number":["2021-0-02068","2022R1A2C201270611"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In an attempt to overcome the limitations of reward-driven representation learning in vision-based reinforcement learning (RL), an unsupervised learning framework referred to as the visual pretraining via contrastive predictive model (VPCPM) is proposed to learn the representations detached from the policy learning. Our method enables the convolutional encoder to perceive the underlying dynamics through a pair of forward and inverse models under the supervision of the contrastive loss, thus resulting in better representations. In experiments with a diverse set of vision control tasks, by initializing the encoders with VPCPM, the performance of state-of-the-art vision-based RL algorithms is significantly boosted, with 44% and 10% improvement for RAD and DrQ at 100 steps, respectively. In comparison to the prior unsupervised methods, the performance of VPCPM matches or outperforms all the baselines. We further demonstrate that the learned representations successfully generalize to the new tasks that share a similar observation and action space.<\/jats:p>","DOI":"10.3390\/s22176504","type":"journal-article","created":{"date-parts":[[2022,8,30]],"date-time":"2022-08-30T01:37:55Z","timestamp":1661823475000},"page":"6504","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9488-7463","authenticated-orcid":false,"given":"Tung M.","family":"Luu","sequence":"first","affiliation":[{"name":"School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea"}]},{"given":"Thang","family":"Vu","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3533-4054","authenticated-orcid":false,"given":"Thanh","family":"Nguyen","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0756-7179","authenticated-orcid":false,"given":"Chang D.","family":"Yoo","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_2","unstructured":"Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., and Kavukcuoglu, K. (2017, January 24\u201326). Reinforcement learning with unsupervised auxiliary tasks. Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France."},{"key":"ref_3","unstructured":"Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., and Dunning, I. (2018, January 10\u201315). Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden."},{"key":"ref_4","first-page":"1334","article-title":"End-to-end training of deep visuomotor policies","volume":"17","author":"Levine","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_5","unstructured":"Lee, A.X., Nagabandi, A., Abbeel, P., and Levine, S. (2020, January 6\u201312). Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online."},{"key":"ref_6","unstructured":"Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., and Vanhoucke, V. (2018, January 29\u201331). Scalable deep reinforcement learning for vision-based robotic manipulation. Proceedings of the 2nd Conference on Robot Learning, CoRL 2018, Z\u00fcrich, Switzerland."},{"key":"ref_7","unstructured":"Akkaya, I., Andrychowicz, M., Chociej, M., Litwin, M., McGrew, B., Petron, A., Paino, A., Plappert, M., Powell, G., and Ribas, R. (2019). Solving rubik\u2019s cube with a robot hand. arXiv."},{"key":"ref_8","unstructured":"Julian, R., Swanson, B., Sukhatme, G.S., Levine, S., Finn, C., and Hausman, K. (2020, January 16\u201318). Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning. Proceedings of the 4th Conference on Robot Learning, CoRL 2020, Online."},{"key":"ref_9","unstructured":"Liu, H., and Abbeel, P. (2021, January 6\u201314). Behavior from the void: Unsupervised active pre-training. Proceedings of the 35th Advances in Neural Information Processing Systems, NeurIPS 2021, Online."},{"key":"ref_10","unstructured":"Shah, R., and Kumar, V. (2021, January 18\u201324). Rrl: Resnet as representation for reinforcement learning. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Lange, S., and Riedmiller, M. (2010, January 18\u201323). Deep auto-encoder neural networks in reinforcement learning. Proceedings of the International Joint Conference on Neural Networks, IJCNN 2010, Barcelona, Spain.","DOI":"10.1109\/IJCNN.2010.5596468"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., and Abbeel, P. (2016, January 16\u201321). Deep spatial autoencoders for visuomotor learning. Proceedings of the International Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden.","DOI":"10.1109\/ICRA.2016.7487173"},{"key":"ref_13","unstructured":"Nair, A.V., Pong, V., Dalal, M., Bahl, S., Lin, S., and Levine, S. (2018, January 3\u20138). Visual reinforcement learning with imagined goals. Proceedings of the 32th Advances in Neural Information Processing Systems, NeurIPS 2018, Montreal, QC, Canada."},{"key":"ref_14","unstructured":"Srinivas, A., Laskin, M., and Abbeel, P. (2020, January 12\u201318). Curl: Contrastive unsupervised representations for reinforcement learning. Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Online."},{"key":"ref_15","unstructured":"Stooke, A., Lee, K., Abbeel, P., and Laskin, M. (2021, January 18\u201324). Decoupling representation learning from reinforcement learning. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"100022","DOI":"10.1016\/j.simpa.2020.100022","article-title":"dm_control: Software and tasks for continuous control","volume":"6","author":"Tunyasuvunakool","year":"2020","journal-title":"Softw. Impacts"},{"key":"ref_17","unstructured":"Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., and Srinivas, A. (2020, January 6\u201312). Reinforcement Learning with Augmented Data. Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online."},{"key":"ref_18","unstructured":"Yarats, D., Kostrikov, I., and Fergus, R. (2021, January 3\u20137). Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels. Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Online."},{"key":"ref_19","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20138). Imagenet classification with deep convolutional neural networks. Proceedings of the 26th Advances in Neural Information Processing Systems, NeurIPS 2012, Lake Tahoe, NV, USA."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wojke, N., Bewley, A., and Paulus, D. (2017, January 17\u201320). Simple online and realtime tracking with a deep association metric. Proceedings of the 24th IEEE International Conference on Image Processing, ICIP 2017, Beijing, China.","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wojke, N., and Bewley, A. (2018, January 12\u201315). Deep cosine metric learning for person re-identification. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00087"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1343","DOI":"10.1016\/j.apenergy.2017.12.002","article-title":"Using machine learning techniques for occupancy-prediction-based cooling control in office buildings","volume":"211","author":"Peng","year":"2018","journal-title":"Appl. Energy"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_24","unstructured":"Vu, T., Jang, H., Pham, T.X., and Yoo, C. (2019, January 8\u201314). Cascade rpn: Delving into high-quality region proposal network with adaptive convolution. Proceedings of the 33rd Advances in Neural Information Processing Systems, NeurIPS 2019, Vancouver, BC, Canada."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Vu, T., Kang, H., and Yoo, C.D. (2021, January 2\u20139). Scnet: Training inference sample consistency for instance segmentation. Proceedings of the 35th Association for the Advancement of Artificial Intelligence, AAAI 2021, Online.","DOI":"10.1609\/aaai.v35i3.16374"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.W., and Jia, J. (2020, January 14\u201319). Pointgroup: Dual-set point grouping for 3d instance segmentation. Proceedings of the IEEE\/CVF Computer Vision and Pattern Recognition, CVPR 2020, Online.","DOI":"10.1109\/CVPR42600.2020.00492"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chen, S., Fang, J., Zhang, Q., Liu, W., and Wang, X. (2021, January 11\u201317). Hierarchical aggregation for 3d instance segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, ICCV 2021, Online.","DOI":"10.1109\/ICCV48922.2021.01518"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Vu, T., Kim, K., Luu, T.M., Nguyen, X.T., and Yoo, C.D. (2022, January 19\u201324). SoftGroup for 3D Instance Segmentation on 3D Point Clouds. Proceedings of the IEEE\/CVF Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00273"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Choy, C.B., Xu, D., Gwak, J., Chen, K., and Savarese, S. (2016, January 8\u201316). 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. Proceedings of the 14th European Conference on Computer Vision, ECCV 2016, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_38"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Rosinol, A., Sattler, T., Pollefeys, M., and Carlone, L. (2019, January 20\u201324). Incremental visual-inertial 3d mesh generation with structural regularities. Proceedings of the IEEE International Conference on Robotics and Automation, ICRA 2019, Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794456"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1016\/j.optlaseng.2019.06.011","article-title":"High-accuracy multi-camera reconstruction enhanced by adaptive point cloud correction algorithm","volume":"122","author":"Chen","year":"2019","journal-title":"Opt. Lasers Eng."},{"key":"ref_32","unstructured":"Zhang, F., Leitner, J., Milford, M., Upcroft, B., and Corke, P. (2015). Towards vision-based deep reinforcement learning for robotic motion control. arXiv."},{"key":"ref_33","unstructured":"Ebert, F., Finn, C., Dasari, S., Xie, A., Lee, A., and Levine, S. (2018). Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. arXiv."},{"key":"ref_34","unstructured":"Beattie, C., Leibo, J.Z., Teplyashin, D., Ward, T., Wainwright, M., K\u00fcttler, H., Lefrancq, A., Green, S., Vald\u00e9s, V., and Sadik, A. (2016). Deepmind lab. arXiv."},{"key":"ref_35","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-encoding variational bayes. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Yarats, D., Zhang, A., Kostrikov, I., Amos, B., Pineau, J., and Fergus, R. (2021, January 2\u20139). Improving sample efficiency in model-free reinforcement learning from images. Proceedings of the 35th Association for the Advancement of Artificial Intelligence, AAAI 2021, Online.","DOI":"10.1609\/aaai.v35i12.17276"},{"key":"ref_37","unstructured":"Agrawal, P., Nair, A.V., Abbeel, P., Malik, J., and Levine, S. (2016, January 5\u201310). Learning to poke by poking: Experiential learning of intuitive physics. Proceedings of the 30th Advances in Neural Information Processing Systems, NeurIPS 2016, Barcelona, Spain."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Pathak, D., Agrawal, P., Efros, A.A., and Darrell, T. (2017, January 6\u201311). Curiosity-driven exploration by self-supervised prediction. Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia.","DOI":"10.1109\/CVPRW.2017.70"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Pathak, D., Mahmoudieh, P., Luo, G., Agrawal, P., Chen, D., Shentu, Y., Shelhamer, E., Malik, J., Efros, A.A., and Darrell, T. (2018, January 18\u201322). Zero-shot visual imitation. Proceedings of the IEEE\/CVF Computer Vision and Pattern Recognition Workshop, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00278"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1007\/s13218-015-0356-1","article-title":"Autonomous learning of state representations for control: An emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations","volume":"29","author":"Springenberg","year":"2015","journal-title":"K\u00fcnstl. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1016\/j.neunet.2018.07.006","article-title":"State representation learning for control: An overview","volume":"108","author":"Lesort","year":"2018","journal-title":"Neural Netw."},{"key":"ref_42","unstructured":"Oord, A.v.d., Li, Y., and Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020, January 14\u201319). Momentum contrast for unsupervised visual representation learning. Proceedings of the TEEE\/CVF Computer Vision and Pattern Recognition, CVPR 2020, Online.","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"ref_44","unstructured":"Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020, January 12\u201318). A simple framework for contrastive learning of visual representations. Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Online."},{"key":"ref_45","unstructured":"Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., and Bachman, P. (2021, January 3\u20137). Data-efficient reinforcement learning with self-predictive representations. Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Online."},{"key":"ref_46","unstructured":"Lee, K.H., Fischer, I., Liu, A., Guo, Y., Lee, H., Canny, J., and Guadarrama, S. (2020, January 6\u201312). Predictive information accelerates learning in rl. Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online."},{"key":"ref_47","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 19\u201324). Asynchronous methods for deep reinforcement learning. Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, New York, NY, USA."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1613\/jair.3912","article-title":"The arcade learning environment: An evaluation platform for general agents","volume":"47","author":"Bellemare","year":"2013","journal-title":"J. Artif. Intell. Res."},{"key":"ref_49","unstructured":"Anand, A., Racah, E., Ozair, S., Bengio, Y., C\u00f4t\u00e9, M.A., and Hjelm, R.D. (2019, January 8\u201314). Unsupervised state representation learning in atari. Proceedings of the 33rd Advances in Neural Information Processing Systems, NeurIPS 2019, Vancouver, BC, Canada."},{"key":"ref_50","unstructured":"Grill, J.B., Strub, F., Altch\u00e9, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., and Azar, M.G. (2020, January 6\u201312). Bootstrap your own latent: A new approach to self-supervised learning. Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Fischer, I. (2020). The conditional entropy bottleneck. Entropy, 22.","DOI":"10.3390\/e22090999"},{"key":"ref_52","unstructured":"Zhang, A., McAllister, R., Calandra, R., Gal, Y., and Levine, S. (2021, January 3\u20137). Learning invariant representations for reinforcement learning without reconstruction. Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Online."},{"key":"ref_53","unstructured":"Agarwal, R., Machado, M.C., Castro, P.S., and Bellemare, M.G. (2021, January 3\u20137). Contrastive behavioral similarity embeddings for generalization in reinforcement learning. Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Online."},{"key":"ref_54","unstructured":"Ferns, N., and Precup, D. (2014, January 23\u201327). Bisimulation Metrics are Optimal Value Functions. Proceedings of the 30th Association for Uncertainty in Artificial Intelligence, UAI 2014, Quebec City, QC, Canada."},{"key":"ref_55","unstructured":"Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2020, January 15). Improving Language Understanding by Generative Pre-Training. Available online: https:\/\/openai.com\/blog\/language-unsupervised."},{"key":"ref_56","unstructured":"Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2020, January 15). Language models are unsupervised multitask learners. OpenAI Blog 2019. Available online: https:\/\/d4mucfpksywv.cloudfront.net\/better-language-models\/language-models.pdf."},{"key":"ref_57","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 17th North American Chapter of the Association for Computational Linguistics, NAACL 2019, Minneapolis, MN, USA."},{"key":"ref_58","unstructured":"Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S. (2021, January 18\u201324). Barlow twins: Self-supervised learning via redundancy reduction. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online."},{"key":"ref_59","unstructured":"Bardes, A., Ponce, J., and LeCun, Y. (2022, January 25\u201329). Vicreg: Variance-invariance-covariance regularization for self-supervised learning. Proceedings of the 10th International Conference on Learning Representations, ICLR 2022, Online."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Devin, C., Abbeel, P., Darrell, T., and Levine, S. (2018, January 21\u201325). Deep object-centric representations for generalizable robot learning. Proceedings of the IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461196"},{"key":"ref_61","unstructured":"Pathak, D., Gandhi, D., and Gupta, A. (2019, January 10\u201315). Self-supervised exploration via disagreement. Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA."},{"key":"ref_62","unstructured":"Burda, Y., Edwards, H., Storkey, A., and Klimov, O. (2019, January 6\u20139). Exploration by random network distillation. Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA."},{"key":"ref_63","unstructured":"Aubret, A., Matignon, L., and Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. arXiv."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Nguyen, T., Luu, T.M., Vu, T., and Yoo, C.D. (October, January 27). Sample-efficient reinforcement learning representation learning with curiosity contrastive forward dynamics model. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, IROS 2021, Online.","DOI":"10.1109\/IROS51168.2021.9636536"},{"key":"ref_65","unstructured":"Laskin, M., Yarats, D., Liu, H., Lee, K., Zhan, A., Lu, K., Cang, C., Pinto, L., and Abbeel, P. (2021, January 6\u201314). URLB: Unsupervised reinforcement learning benchmark. Proceedings of the 35th Advances in Neural Information Processing Systems, NeurIPS 2021, Online."},{"key":"ref_66","unstructured":"Yarats, D., Fergus, R., Lazaric, A., and Pinto, L. (2021, January 18\u201324). Reinforcement learning with prototypical representations. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online."},{"key":"ref_67","unstructured":"Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., and Salakhutdinov, R. (2019). Efficient exploration via state marginal matching. arXiv."},{"key":"ref_68","unstructured":"Eysenbach, B., Gupta, A., Ibarz, J., and Levine, S. (May, January 30). Diversity is all you need: Learning skills without a reward function. Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada."},{"key":"ref_69","unstructured":"Hansen, S., Dabney, W., Barreto, A., Van de Wiele, T., Warde-Farley, D., and Mnih, V. (May, January 26). Fast task inference with variational intrinsic successor features. Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Online."},{"key":"ref_70","unstructured":"Liu, H., and Abbeel, P. (2021, January 18\u201324). Aps: Active pretraining with successor features. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online."},{"key":"ref_71","first-page":"301","article-title":"Nearest neighbor estimates of entropy","volume":"23","author":"Singh","year":"2003","journal-title":"Am. J. Math. Manag. Sci."},{"key":"ref_72","unstructured":"Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A. (2020, January 6\u201312). Unsupervised learning of visual features by contrasting cluster assignments. Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0004-3702(98)00023-X","article-title":"Planning and acting in partially observable stochastic domains","volume":"101","author":"Kaelbling","year":"1998","journal-title":"Artif. Intell."},{"key":"ref_75","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, January 10\u201315). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden."},{"key":"ref_76","unstructured":"Ziebart, B.D. (2010). Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy, Carnegie Mellon University."},{"key":"ref_77","unstructured":"Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., and Davidson, J. (2019, January 10\u201315). Learning latent dynamics for planning from pixels. Proceedings of the International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA."},{"key":"ref_78","unstructured":"Hafner, D., Lillicrap, T., Ba, J., and Norouzi, M. (May, January 26). Dream to Control: Learning Behaviors by Latent Imagination. Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Online."},{"key":"ref_79","unstructured":"Kingma, D.P., and Ba, J. (2014, January 14\u201316). Adam: A Method for Stochastic Optimization. Proceedings of the 5th International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada."},{"key":"ref_80","unstructured":"Shu, R., Nguyen, T., Chow, Y., Pham, T., Than, K., Ghavamzadeh, M., Ermon, S., and Bui, H. (2020, January 12\u201318). Predictive coding for locally-linear control. Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Online."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/17\/6504\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:19:39Z","timestamp":1760141979000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/17\/6504"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,29]]},"references-count":80,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["s22176504"],"URL":"https:\/\/doi.org\/10.3390\/s22176504","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,8,29]]}}}