{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T14:05:34Z","timestamp":1756994734291,"version":"3.41.2"},"reference-count":50,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T00:00:00Z","timestamp":1673481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Robot. AI"],"abstract":"<jats:p>Reinforcement Learning has been shown to have a great potential for robotics. It demonstrated the capability to solve complex manipulation and locomotion tasks, even by learning end-to-end policies that operate directly on visual input, removing the need for custom perception systems. However, for practical robotics applications, its scarce sample efficiency, the need for huge amounts of resources, data, and computation time can be an insurmountable obstacle. One potential solution to this sample efficiency issue is the use of simulated environments. However, the discrepancy in visual and physical characteristics between reality and simulation, namely the sim-to-real gap, often significantly reduces the real-world performance of policies trained within a simulator. In this work we propose a sim-to-real technique that trains a Soft-Actor Critic agent together with a decoupled feature extractor and a latent-space dynamics model. The decoupled nature of the method allows to independently perform the sim-to-real transfer of feature extractor and control policy, and the presence of the dynamics model acts as a constraint on the latent representation when finetuning the feature extractor on real-world data. We show how this architecture can allow the transfer of a trained agent from simulation to reality without retraining or finetuning the control policy, but using real-world data only for adapting the feature extractor. By avoiding training the control policy in the real domain we overcome the need to apply Reinforcement Learning on real-world data, instead, we only focus on the unsupervised training of the feature extractor, considerably reducing real-world experience collection requirements. We evaluate the method on sim-to-sim and sim-to-real transfer of a policy for table-top robotic object pushing. We demonstrate how the method is capable of adapting to considerable variations in the task observations, such as changes in point-of-view, colors, and lighting, all while substantially reducing the training time with respect to policies trained directly in the real.<\/jats:p>","DOI":"10.3389\/frobt.2022.1067502","type":"journal-article","created":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T05:32:32Z","timestamp":1673501552000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies"],"prefix":"10.3389","volume":"9","author":[{"given":"Carlo","family":"Rizzardo","sequence":"first","affiliation":[]},{"given":"Fei","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Darwin","family":"Caldwell","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,1,12]]},"reference":[{"key":"B1","first-page":"4243","article-title":"Using simulation and domain adaptation to improve efficiency of deep robotic grasping","author":"Bousmalis","year":"2018"},{"key":"B2","first-page":"3722","article-title":"Unsupervised pixel-level domain adaptation with generative adversarial networks","author":"Bousmalis","year":"2017"},{"key":"B3","article-title":"Deep reinforcement learning in a handful of trials using probabilistic dynamics models","volume-title":"Advances in Neural Information Processing Systems, NeurIPS 2018","author":"Chua","year":"2018"},{"key":"B5","first-page":"1180","article-title":"Unsupervised domain adaptation by backpropagation","author":"Ganin","year":"2015"},{"article-title":"From variational to deterministic autoencoders","year":"2020","author":"Ghosh","key":"B6"},{"article-title":"Learning invariant feature spaces to transfer skills with reinforcement learning","year":"2017","author":"Gupta","key":"B7"},{"key":"B8","first-page":"1861","article-title":"Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor","author":"Haarnoja","year":""},{"volume-title":"Soft actor-critic algorithms and applications","year":"","author":"Haarnoja","key":"B9"},{"key":"B11","first-page":"2555","article-title":"Learning latent dynamics for planning from pixels","volume-title":"International conference on machine learning, ICML 2019","author":"Hafner","year":"2019"},{"article-title":"Dream to control: Learning behaviors by latent imagination","year":"2020","author":"Hafner","key":"B10"},{"article-title":"Mastering atari with discrete world models","year":"2021","author":"Hafner","key":"B12"},{"key":"B13","first-page":"9474","article-title":"Neuralsim: Augmenting differentiable simulators with neural networks","author":"Heiden","year":"2021"},{"key":"B14","first-page":"1989","article-title":"Cycada: Cycle-consistent adversarial domain adaptation","author":"Hoffman","year":"2018"},{"key":"B15","first-page":"1314","article-title":"Searching for mobilenetv3","author":"Howard","year":"2019"},{"key":"B16","first-page":"12627","article-title":"Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks","author":"James","year":"2019"},{"article-title":"Adam: A method for stochastic optimization","year":"2015","author":"Kingma","key":"B17"},{"article-title":"Auto-encoding variational bayes","year":"2014","author":"Kingma","key":"B18"},{"key":"B19","first-page":"2149","article-title":"Design and use paradigms for gazebo, an open-source multi-robot simulator","author":"Koenig","year":"2004"},{"volume-title":"Image augmentation is all you need: Regularizing deep reinforcement learning from pixels","year":"2020","author":"Kostrikov","key":"B20"},{"key":"B21","first-page":"19884","article-title":"Reinforcement learning with augmented data","volume":"33","author":"Laskin","year":"","journal-title":"Adv. neural Inf. Process. Syst."},{"key":"B49","first-page":"5639","article-title":"CURL: contrastive unsupervised representations for reinforcement learning","volume":"119","author":"Laskin","year":"","journal-title":"Proceed. Machine Learning Res."},{"key":"B22","first-page":"741","article-title":"Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model","author":"Lee","year":"2020"},{"key":"B23","first-page":"97","article-title":"Learning transferable features with deep adaptation networks","author":"Long","year":"2015"},{"key":"B24","first-page":"1928","article-title":"Asynchronous methods for deep reinforcement learning","author":"Mnih","year":"2016"},{"key":"B25","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"nature"},{"article-title":"Neural posterior domain randomization","year":"2021","author":"Muratore","key":"B26"},{"key":"B27","first-page":"1101","article-title":"Deep dynamics models for learning dexterous manipulation","author":"Nagabandi","year":"2019"},{"volume-title":"Massively parallel methods for deep reinforcement learning","year":"2015","author":"Nair","key":"B28"},{"key":"B4","unstructured":"Nvidia isaac sim\n            Nvidia\n          2020"},{"volume-title":"Solving rubik\u2019s cube with a robot hand","year":"2019","author":"OpenAI","key":"B29"},{"key":"B30","first-page":"1","article-title":"Sim-to-real transfer of robotic control with dynamics randomization","author":"Peng","year":"2017"},{"key":"B31","doi-asserted-by":"crossref","DOI":"10.15607\/RSS.2018.XIV.008","article-title":"Asymmetric actor critic for image-based robot learning","volume-title":"Proceedings of Robotics: Science and Systems, R:SS 2018","author":"Pinto","year":"2018"},{"key":"B32","first-page":"5445","article-title":"Online bayessim for combined simulator parameter inference and policy improvement","author":"Possas","year":"2020"},{"key":"B33","first-page":"1","article-title":"Stable-baselines3: Reliable reinforcement learning implementations","volume":"22","author":"Raffin","year":"2021","journal-title":"J. Mach. Learn. Res."},{"key":"B34","doi-asserted-by":"crossref","DOI":"10.15607\/RSS.2019.XV.029","article-title":"Bayessim: Adaptive domain randomization via probabilistic inference for robotics simulators","author":"Ramos","year":"2019"},{"key":"B35","first-page":"1278","article-title":"Stochastic backpropagation and approximate inference in deep generative models","author":"Rezende","year":"2014"},{"key":"B36","first-page":"91","article-title":"Learning to walk in minutes using massively parallel deep reinforcement learning","author":"Rudin","year":"2021"},{"key":"B37","doi-asserted-by":"publisher","first-page":"1711","DOI":"10.1109\/lra.2018.2801939","article-title":"Nonprehensile dynamic manipulation: A survey","volume":"3","author":"Ruggiero","year":"2018","journal-title":"IEEE Robotics Automation Lett."},{"volume-title":"Proximal policy optimization algorithms","year":"2017","author":"Schulman","key":"B38"},{"volume-title":"Curl: Contrastive unsupervised representations for reinforcement learning","year":"2020","author":"Srinivas","key":"B39"},{"key":"B40","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1007\/978-3-319-49409-8_35","article-title":"Deep coral: Correlation alignment for deep domain adaptation","volume-title":"Computer Vision\u2014ECCV 2016 Workshops","author":"Sun","year":"2016"},{"key":"B41","first-page":"23","article-title":"Domain randomization for transferring deep neural networks from simulation to the real world","author":"Tobin","year":"2017"},{"key":"B42","first-page":"688","volume-title":"Adapting deep visuomotor representations with weak pairwise constraints","author":"Tzeng","year":"2016"},{"volume-title":"Towards adapting deep visuomotor representations from simulated to real environments","year":"2015","author":"Tzeng","key":"B43"},{"key":"B44","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2017.316","article-title":"Adversarial discriminative domain adaptation","author":"Tzeng","year":"2017"},{"volume-title":"Deep domain confusion: Maximizing for domain invariance","year":"2014","author":"Tzeng","key":"B45"},{"volume-title":"Unity","year":"2020","key":"B46"},{"article-title":"Mastering visual continuous control: Improved data-augmented reinforcement learning","year":"2022","author":"Yarats","key":"B47"},{"article-title":"Image augmentation is all you need: Regularizing deep reinforcement learning from pixels","year":"","author":"Yarats","key":"B50"},{"key":"B48","first-page":"10674","article-title":"Improving sample efficiency in model-free reinforcement learning from images","author":"Yarats","year":""}],"container-title":["Frontiers in Robotics and AI"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2022.1067502\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T05:32:54Z","timestamp":1673501574000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2022.1067502\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,12]]},"references-count":50,"alternative-id":["10.3389\/frobt.2022.1067502"],"URL":"https:\/\/doi.org\/10.3389\/frobt.2022.1067502","relation":{},"ISSN":["2296-9144"],"issn-type":[{"type":"electronic","value":"2296-9144"}],"subject":[],"published":{"date-parts":[[2023,1,12]]},"article-number":"1067502"}}