{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,25]],"date-time":"2026-01-25T18:44:14Z","timestamp":1769366654688,"version":"3.49.0"},"reference-count":209,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T00:00:00Z","timestamp":1756857600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T00:00:00Z","timestamp":1756857600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/W00206X\/1"],"award-info":[{"award-number":["EP\/W00206X\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering"],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:p>Reinforcement Learning (RL) has been considered a promising method to enable the automation of contact-rich manipulation tasks, which can increase capabilities for industrial automation. RL facilitates autonomous agents\u2019 learning to solve environments with complex dynamics with little human intervention, making it easier to implement control strategies for contact-rich tasks compared to traditional control approaches. Further, RL-based robotic control has the potential to transfer policies between task variations, significantly improving scalability compared to existing methods. However, RL is currently inviable for wider adoption due to its relatively high implementation costs and safety issues, so current research has been focused on addressing these issues. This paper comprehensively reviewed recently developed techniques to improve cost and safety for RL in contact-rich robotic manipulation. Techniques were organized by their approach, and their impact was analysed. It was found that current research efforts have significantly improved the cost and safety of RL-based control for contact-rich tasks, but further improvements can be made by progressing research towards improving knowledge transfer between tasks, improving inter-robot policy transfer and facilitating real-world and continual RL. The identified directions for further research set the stage for future developments in more versatile and cost-effective RL-based control for contact-rich robotic manipulation in future industrial automation applications.<\/jats:p>","DOI":"10.1177\/09596518251350353","type":"journal-article","created":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T12:46:52Z","timestamp":1756903612000},"page":"3-35","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Towards cost-effective and safe contact-rich robotic manipulation with reinforcement learning: A review of techniques for future industrial automation"],"prefix":"10.1177","volume":"240","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-9385-5592","authenticated-orcid":false,"given":"Anselmo","family":"Parnada","sequence":"first","affiliation":[{"name":"School of Engineering, University of Birmingham, Birmingham, West Midlands, UK"}]},{"given":"Mo","family":"Qu","sequence":"additional","affiliation":[{"name":"School of Engineering, University of Birmingham, Birmingham, West Midlands, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5623-7491","authenticated-orcid":false,"given":"Marco","family":"Castellani","sequence":"additional","affiliation":[{"name":"School of Engineering, University of Birmingham, Birmingham, West Midlands, UK"}]},{"given":"Hyung","family":"Jin Chang","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Birmingham, Birmingham, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9640-0871","authenticated-orcid":false,"given":"Yongjing","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Engineering, University of Birmingham, Birmingham, West Midlands, UK"}]}],"member":"179","published-online":{"date-parts":[[2025,9,3]]},"reference":[{"issue":"30","key":"e_1_3_2_2_2","first-page":"1","article-title":"A review of robot learning for manipulation: challenges, representations, and algorithms","volume":"22","author":"Kroemer O","year":"2021","unstructured":"Kroemer O, Niekum S, Konidaris G. A review of robot learning for manipulation: challenges, representations, and algorithms. J Mach Learn Res 2021; 22(30): 1\u201382.","journal-title":"J Mach Learn Res"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.3390\/app12020937"},{"issue":"3","key":"e_1_3_2_4_2","first-page":"135","article-title":"Reinforcement learning for assembly robots: a review","volume":"15","author":"Stan L","year":"2020","unstructured":"Stan L, Nicolescu AF, Pup\u0103z\u0103 C. Reinforcement learning for assembly robots: a review. Proc Manuf Syst 2020; 15(3): 135\u2013146.","journal-title":"Proc Manuf Syst"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00170-022-10680-8"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-control-060117-104848"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.abd9461"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMECH.2017.2671342"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIE.2018.2838069"},{"key":"e_1_3_2_10_2","first-page":"279","article-title":"Q-learning","volume":"8","author":"Watkins CJCH","year":"1992","unstructured":"Watkins CJCH, Dayan P. Q-learning. Mach Learn 1992; 8: 279\u2013292.","journal-title":"Mach Learn"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.153.3731.34"},{"key":"e_1_3_2_12_2","unstructured":"Mnih V Kavukcuoglu K Silver D et al. Playing Atari with deep reinforcement learning. 2013 ArXiv preprint arXiv:1312.5602 2013."},{"key":"e_1_3_2_13_2","first-page":"20","article-title":"Alphastar: mastering the real-time strategy game Starcraft II","volume":"2","author":"Vinyals O","year":"2019","unstructured":"Vinyals O, Babuschkin I, Chung J, et al. Alphastar: mastering the real-time strategy game Starcraft II. DeepMind blog 2019; 2: 20.","journal-title":"DeepMind blog"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-63519-4"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aar6404"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCI.2019.2901089"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2012.6252823"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/11552246_35"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04772-5_11"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913495721"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.3390\/robotics10030105"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s43154-020-00021-6"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.3390\/robotics10010022"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2023.3280161"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2022.104224"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2022.102517"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793485"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS40897.2019.8968201"},{"key":"e_1_3_2_29_2","unstructured":"Zhu Y Wong J Mandlekar A. Roberto Mart\u2019in-Mart\u2019in. Robosuite: a modular simulation framework and benchmark for robot learning. ArXiv abs\/2009.12293 2020."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsos.240956"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.1998.712192"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1512\/iumj.1957.6.56038"},{"issue":"5","key":"e_1_3_2_33_2","first-page":"1054","article-title":"A notation for Markov decision processes","volume":"9","author":"Thomas PS.","year":"1998","unstructured":"Thomas PS. A notation for Markov decision processes. IEEE Trans Neural Netw 1998; 9(5): 1054. ArXiv, abs\/1512.09075, 2015.","journal-title":"IEEE Trans Neural Netw"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1631\/FITEE.1900533"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2743240"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3207346"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/IRC.2019.00120"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2018.07.006"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-13823-7_31"},{"key":"e_1_3_2_40_2","unstructured":"Gu S Yang L Du Y et al. A review of safe reinforcement learning: methods theory and applications. ArXiv abs\/2205.10330 2022."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.matpr.2021.10.009"},{"key":"e_1_3_2_42_2","unstructured":"Malik D Li Y Ravikumar P. When is generalizable reinforcement learning tractable? ArXiv abs\/2101.00300 2021."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341714"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aat8414"},{"issue":"39","key":"e_1_3_2_45_2","first-page":"1","article-title":"End-to-end training of deep visuomotor policies","volume":"17","author":"Levine S","year":"2015","unstructured":"Levine S, Finn C, Darrell T, et al. End-to-end training of deep visuomotor policies. J Mach Learn Res 2015; 17(39): 1\u201339.","journal-title":"J Mach Learn Res"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561646"},{"key":"e_1_3_2_47_2","unstructured":"Jin P Lin YB Tan Y et al. Multi-modal fusion in contact-rich precise tasks via hierarchical policy learning. ArXiv abs\/2202.08401 2022."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA46639.2022.9812019"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.conengprac.2021.104957"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2023.3295236"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2023.3310505"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/WRCSARA60131.2023.10261820"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA58977.2023.00155"},{"key":"e_1_3_2_54_2","unstructured":"Kamijo T Ramirez-Alpizar IG. Enrique Coronado and Gentiane Venture. Tactile-based active inference for force-controlled peg-in-hole insertions. ArXiv preprint arXiv:2309.15681 2023."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-023-05156-5"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS55552.2023.10341657"},{"key":"e_1_3_2_57_2","unstructured":"Ferrandis JDA Moura J Vijayakumar S. Learning visuotactile estimation and control for non-prehensile manipulation under occlusions. ArXiv preprint arXiv:2412.13157 2024."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2024.3412630"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS58592.2024.10801575"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2025.3547647"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2024.104832"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2018.2868859"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561891"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3061374"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.3390\/app12063181"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2024.102896"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794127"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3150024"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS51168.2021.9636547"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561162"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2020.3020065"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561619"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3060389"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS51168.2021.9636176"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2021.777363"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1002\/aisy.202100095"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2023.3261759"},{"key":"e_1_3_2_78_2","volume-title":"8th annual conference on robot learning (CoRL 2024)","author":"Sleiman JP","unstructured":"Sleiman JP, Mittal M, Hutter M. Guided reinforcement learning for robust multi-contact loco-manipulation. In: 8th annual conference on robot learning (CoRL 2024), Munich, Germany, 2024."},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmsy.2023.11.008"},{"key":"e_1_3_2_80_2","first-page":"159","article-title":"Deep reinforcement learning-based variable impedance control for grinding workpieces with complex geometry","volume":"45","author":"Li Y","year":"2025","unstructured":"Li Y, Wang Y, Li Z, et al. Deep reinforcement learning-based variable impedance control for grinding workpieces with complex geometry. Robot Intell Autom 2025; 45: 159\u2013172.","journal-title":"Robot Intell Autom"},{"key":"e_1_3_2_81_2","unstructured":"Florensa C Held D Wulfmeier M et al. Reverse curriculum generation for reinforcement learning. ArXiv abs\/1707.05300 2017."},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2019.2928212"},{"key":"e_1_3_2_83_2","first-page":"7505","volume-title":"2020 IEEE international conference on robotics and automation (ICRA)","author":"Florensa C","year":"2020","unstructured":"Florensa C, Tremblay J, Ratliff ND, et al. Guided uncertainty-aware policy optimization: combining learning and model-based strategies for sample-efficient policy learning. In: 2020 IEEE international conference on robotics and automation (ICRA), Paris, France, pp.7505\u20137512, 2020."},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/AIM.2019.8868579"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3010739"},{"key":"e_1_3_2_86_2","unstructured":"Ulmer M Aljalbout E Schwarz S et al. Learning robotic manipulation skills using an adaptive force-impedance action space. ArXiv abs\/2110.09904 2021."},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA46639.2022.9811986"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9981185"},{"key":"e_1_3_2_89_2","doi-asserted-by":"crossref","unstructured":"Tang B Lin MA Akinola I et al. Industreal: transferring contact-rich assembly tasks from simulation to reality. ArXiv preprint arXiv:2305.17110 2023.","DOI":"10.15607\/RSS.2023.XIX.039"},{"key":"e_1_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.1080\/00207543.2024.2318488"},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIE.2020.3038072"},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA46639.2022.9812140"},{"key":"e_1_3_2_93_2","unstructured":"Yang Q Johannes Andreas S Stoyanov T. Transferring knowledge for reinforcement learning in contact-rich manipulation. ArXiv abs\/2210.02891 2022."},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/ROBIO55434.2022.10011996"},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3187276"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2023.3308061"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2023.3298195"},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10161283"},{"key":"e_1_3_2_99_2","unstructured":"Nghia V Pham QC. Reinforcement learning with parameterized manipulation primitives for robotic assembly. ArXiv preprint arXiv:2306.06679 2023."},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610846"},{"key":"e_1_3_2_101_2","unstructured":"Sun J Curtis A You Y et al. Hierarchical hybrid learning for long-horizon contact-rich robotic assembly. ArXiv preprint arXiv:2409.16451 2024."},{"key":"e_1_3_2_102_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-024-10164-6"},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610567"},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2025.3547272"},{"key":"e_1_3_2_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9340848"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561232"},{"key":"e_1_3_2_107_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCES54031.2021.9686128"},{"key":"e_1_3_2_108_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA46639.2022.9812312"},{"key":"e_1_3_2_109_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3184805"},{"key":"e_1_3_2_110_2","first-page":"4019","volume-title":"2016 IEEE\/RSJ international conference on intelligent robots and systems (IROS)","author":"Fu J","year":"2015","unstructured":"Fu J, Levine S, Abbeel P. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors. In: 2016 IEEE\/RSJ international conference on intelligent robots and systems (IROS), pp.4019\u20134026, 2015."},{"key":"e_1_3_2_111_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2015.7138994"},{"key":"e_1_3_2_112_2","volume-title":"International conference on machine learning","author":"Chebotar Y","year":"2017","unstructured":"Chebotar Y, Hausman K, Marvin Z, et al. Combining model-based and model-free updates for trajectory-centric reinforcement learning. In: International conference on machine learning, Sydney, 2017."},{"key":"e_1_3_2_113_2","doi-asserted-by":"publisher","DOI":"10.1007\/s41315-020-00138-z"},{"key":"e_1_3_2_114_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3070252"},{"key":"e_1_3_2_115_2","doi-asserted-by":"publisher","DOI":"10.1109\/HUMANOIDS47582.2021.9555680"},{"key":"e_1_3_2_116_2","doi-asserted-by":"publisher","DOI":"10.1109\/ROBIO54168.2021.9739258"},{"key":"e_1_3_2_117_2","doi-asserted-by":"publisher","DOI":"10.1109\/SII52469.2022.9708826"},{"key":"e_1_3_2_118_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793789"},{"key":"e_1_3_2_119_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341260"},{"key":"e_1_3_2_120_2","doi-asserted-by":"crossref","unstructured":"Beltran-Hernandez CC Petit D Ramirez-Alpizar IG et al. Variable compliance control for robotic peg-in-hole assembly: a deep reinforcement learning approach. ArXiv abs\/2008.10224 2020.","DOI":"10.3390\/app10196923"},{"key":"e_1_3_2_121_2","doi-asserted-by":"publisher","DOI":"10.1109\/MMAR55195.2022.9874304"},{"key":"e_1_3_2_122_2","unstructured":"Beltran-Hernandez CC Petit D Ramirez-Alpizar IG et al. Accelerating robot learning of contact-rich manipulations: a curriculum learning study. ArXiv abs\/2204.12844 2022."},{"key":"e_1_3_2_123_2","first-page":"1645","volume-title":"Conference on robot learning","author":"Church A","year":"2022","unstructured":"Church A, Lloyd J, Lepora NF, et al. Tactile sim-to-real policy transfer via real-to-sim image translation. In: Conference on robot learning, Auckland, New Zealand, pp.1645\u20131654. PMLR, 2022."},{"key":"e_1_3_2_124_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCDS.2023.3237734"},{"key":"e_1_3_2_125_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2022.104321"},{"key":"e_1_3_2_126_2","doi-asserted-by":"publisher","DOI":"10.3389\/fnbot.2023.1271607"},{"key":"e_1_3_2_127_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS55552.2023.10341866"},{"key":"e_1_3_2_128_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3432644"},{"key":"e_1_3_2_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS58592.2024.10801764"},{"key":"e_1_3_2_130_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2024.3352969"},{"key":"e_1_3_2_131_2","doi-asserted-by":"publisher","DOI":"10.3390\/app122211610"},{"key":"e_1_3_2_132_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_133_2","unstructured":"Curran WJ Brys T Taylor ME et al. Using PCA to efficiently represent state spaces. ArXiv abs\/1505.00322 2015."},{"key":"e_1_3_2_134_2","unstructured":"Sinha S Bharadhwaj H Srinivas A et al. D2rl: deep dense architectures in reinforcement learning. ArXiv abs\/2010.09163 2020."},{"key":"e_1_3_2_135_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2800101"},{"key":"e_1_3_2_136_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11831-023-09884-2"},{"key":"e_1_3_2_137_2","unstructured":"Kingma DP Welling M. Auto-Encoding Variational Bayes. 2022. https:\/\/arxiv.org\/abs\/1312.6114"},{"key":"e_1_3_2_138_2","doi-asserted-by":"publisher","DOI":"10.1145\/3465055"},{"issue":"3","key":"e_1_3_2_139_2","first-page":"331368","article-title":"Attention mechanisms in computer vision: a survey","volume":"8","author":"Guo MH","year":"2022","unstructured":"Guo MH, Xu TX, Liu JJ, et al. Attention mechanisms in computer vision: a survey. Comput Vis Media 2022; 8(3): 331368.","journal-title":"Comput Vis Media"},{"key":"e_1_3_2_140_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-36802-9_25"},{"key":"e_1_3_2_141_2","doi-asserted-by":"publisher","DOI":"10.1109\/CoG57401.2023.10333243"},{"key":"e_1_3_2_142_2","unstructured":"Sorokin I Seleznev A Pavlov M et al. Deep attention recurrent q-network. ArXiv preprint arXiv:1512.01693 2015."},{"key":"e_1_3_2_143_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-30164-8_731"},{"key":"e_1_3_2_144_2","volume-title":"International Conference on Machine Learning","author":"Ng AY","unstructured":"Ng AY, Russell S. Algorithms for inverse reinforcement learning. In: International Conference on Machine Learning, Boston, MA, USA, 2000."},{"key":"e_1_3_2_145_2","unstructured":"Levine F L S Learning robust rewards with adversarial inverse reinforcement learning. ArXiv abs\/1710.11248 2017."},{"key":"e_1_3_2_146_2","doi-asserted-by":"crossref","unstructured":"Hussein A Gaber MM Elyan E et al. Imitation learning: a survey of learning methods. ACM Comput Surv 2018; 50(2). 1\u201335. https:\/\/arxiv.org\/abs\/1710.11248","DOI":"10.1145\/3054912"},{"key":"e_1_3_2_147_2","doi-asserted-by":"publisher","DOI":"10.1162\/NECO_a_00393"},{"key":"e_1_3_2_148_2","doi-asserted-by":"publisher","DOI":"10.2514\/2.4231"},{"key":"e_1_3_2_149_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-022-01611-x"},{"key":"e_1_3_2_150_2","doi-asserted-by":"publisher","DOI":"10.1145\/3099564.3099567"},{"key":"e_1_3_2_151_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS40897.2019.8967995"},{"key":"e_1_3_2_152_2","doi-asserted-by":"publisher","DOI":"10.1109\/JRA.1987.1087068"},{"key":"e_1_3_2_153_2","doi-asserted-by":"publisher","DOI":"10.1115\/DSCC2014-6201"},{"key":"e_1_3_2_154_2","doi-asserted-by":"publisher","DOI":"10.23919\/ACC.1984.4788393"},{"key":"e_1_3_2_155_2","doi-asserted-by":"publisher","DOI":"10.1115\/1.2897725"},{"key":"e_1_3_2_156_2","doi-asserted-by":"publisher","DOI":"10.1115\/1.3427095"},{"key":"e_1_3_2_157_2","doi-asserted-by":"publisher","DOI":"10.1109\/70.246048"},{"key":"e_1_3_2_158_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364918768950"},{"key":"e_1_3_2_159_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364911402527"},{"key":"e_1_3_2_160_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.arcontrol.2017.02.002"},{"key":"e_1_3_2_161_2","unstructured":"Ratliff ND Issac J Kappler D. Riemannian motion policies. ArXiv abs\/1801.02854 2018."},{"key":"e_1_3_2_162_2","doi-asserted-by":"publisher","DOI":"10.1145\/3453160"},{"key":"e_1_3_2_163_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9981440"},{"key":"e_1_3_2_164_2","unstructured":"Caelan Reed G Chitnis R Holladay R et al. Integrated task and motion planning. ArXiv abs\/2010.01083 2020."},{"key":"e_1_3_2_165_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2014.6906922"},{"key":"e_1_3_2_166_2","unstructured":"Shi L Lim JJ Youngwoon L. Skill-based model-based reinforcement learning. ArXiv abs\/2207.07560 2022."},{"key":"e_1_3_2_167_2","unstructured":"Pertsch K Lee Y Lim JJ. Accelerating reinforcement learning with learned skill priors. In: Conference on robot learning Cambridge MA 2020."},{"key":"e_1_3_2_168_2","first-page":"299","volume-title":"7th international joint conference on Autonomous agents and multiagent systems-Volume 1","author":"Jong NK","year":"2008","unstructured":"Jong NK, Hester T, Peter S. The utility of temporal abstraction in reinforcement learning. In: 7th international joint conference on Autonomous agents and multiagent systems-Volume 1, Estoril, Portugal, 2008, pp. 299\u2013306."},{"key":"e_1_3_2_169_2","doi-asserted-by":"publisher","DOI":"10.3390\/make4010009"},{"key":"e_1_3_2_170_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho J","year":"2020","unstructured":"Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inform Process Syst 2020; 33: 6840\u20136851.","journal-title":"Adv Neural Inform Process Syst"},{"key":"e_1_3_2_171_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani A","year":"2017","unstructured":"Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inform Process Syst 2017; 30.","journal-title":"Adv Neural Inform Process Syst"},{"key":"e_1_3_2_172_2","first-page":"5149","article-title":"Meta-learning in neural networks: a survey","volume":"44","author":"Hospedales T","year":"2022","unstructured":"Hospedales T, Antoniou A, Micaelli P, et al. Meta-learning in neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 2022; 44: 5149\u20135169.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"e_1_3_2_173_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2013.2262913"},{"key":"e_1_3_2_174_2","unstructured":"Beck JA Vuorio R. Liu E. Z et al. A survey of meta-reinforcement learning. ArXiv abs\/2301.08028 2023."},{"key":"e_1_3_2_175_2","unstructured":"Wang JX Kurth-Nelson Z Soyer H et al. Learning to reinforcement learn. ArXiv abs\/1611.05763 2016."},{"key":"e_1_3_2_176_2","unstructured":"Duan Y Schulman J Chen X et al. Rl2: fast reinforcement learning via slow reinforcement learning. ArXiv abs\/1611.02779 2016."},{"key":"e_1_3_2_177_2","volume-title":"International conference on machine learning","author":"Rakelly K","year":"2019","unstructured":"Rakelly K, Zhou A, Quillen D, et al. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: International conference on machine learning, Long Beach, CA, 2019."},{"key":"e_1_3_2_178_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.28.1.1"},{"key":"e_1_3_2_179_2","volume-title":"International Conference on Machine Learning","author":"Chelsea Finn P","year":"2017","unstructured":"Chelsea Finn P, Abbeel S, Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, Sydney, Australia, 2017."},{"key":"e_1_3_2_180_2","unstructured":"Prat A Johns E. Peril: probabilistic embeddings for hybrid meta-reinforcement and imitation learning. 2020."},{"key":"e_1_3_2_181_2","doi-asserted-by":"publisher","DOI":"10.1561\/2200000086"},{"key":"e_1_3_2_182_2","unstructured":"Kurutach T Clavera I Duan Y et al. Model-ensemble trust-region policy optimization. ArXiv abs\/1802.10592 2018."},{"key":"e_1_3_2_183_2","volume-title":"Conference on robot learning","author":"Fazeli N","year":"2017","unstructured":"Fazeli N, Zapolsky S, Drumwright E, et al. Learning data-efficient rigid-body contact models: case study of planar impact. In: Conference on robot learning, Mountain View, CA, 2017."},{"key":"e_1_3_2_184_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-28619-4_41"},{"key":"e_1_3_2_185_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10339-011-0404-1"},{"key":"e_1_3_2_186_2","unstructured":"Charpentier B Senanayake R Kochenderfer MJ et al. Disentangling epistemic and aleatoric uncertainty in reinforcement learning. ArXiv abs\/2206.01558 2022."},{"key":"e_1_3_2_187_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSTCC50638.2020.9259716"},{"key":"e_1_3_2_188_2","volume-title":"Conference on Robot Learning","author":"Clavera I","unstructured":"Clavera I, Rothfuss J, Schulman J, et al. Model-based reinforcement learning via meta-policy optimization. In: Conference on Robot Learning, Sinaia, Romania, 2018."},{"key":"e_1_3_2_189_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmaa.2023.127096"},{"key":"e_1_3_2_190_2","volume-title":"Advances in Neural Information Processing Systems","author":"Williams C","year":"1995","unstructured":"Williams C, Rasmussen C. Gaussian processes for regression. In: Touretzky D, Mozer MC, Hasselmo M (eds) Advances in Neural Information Processing Systems, MIT Press, Vol. 8, 1995."},{"key":"e_1_3_2_191_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-7488-4_196"},{"key":"e_1_3_2_192_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2015.01.010"},{"key":"e_1_3_2_193_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAI.2021.3054609"},{"key":"e_1_3_2_194_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561081"},{"key":"e_1_3_2_195_2","unstructured":"Nagabandi A Clavera I Simin L et al. Learning to adapt in dynamic real-world environments through meta-reinforcement learning. ArXiv: Learning 2018."},{"key":"e_1_3_2_196_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8460528"},{"key":"e_1_3_2_197_2","doi-asserted-by":"publisher","DOI":"10.1109\/SSCI47803.2020.9308468"},{"key":"e_1_3_2_198_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8202133"},{"key":"e_1_3_2_199_2","first-page":"8973","volume-title":"2019 international conference on robotics and automation (ICRA)","author":"Chebotar Y","year":"2018","unstructured":"Chebotar Y, Handa A, Makoviychuk V, et al. Closing the sim-to-real loop: adapting simulation randomization with real world experience. In: 2019 international conference on robotics and automation (ICRA), Vancouver, British Columbia, Canada, 2018, pp.8973\u20138979."},{"key":"e_1_3_2_200_2","unstructured":"Farahani A Voghoei S Rasheed KM et al. A brief review of domain adaptation. ArXiv abs\/2010.03978 2020."},{"key":"e_1_3_2_201_2","doi-asserted-by":"publisher","DOI":"10.1145\/3422622"},{"key":"e_1_3_2_202_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01117"},{"key":"e_1_3_2_203_2","first-page":"11920","volume-title":"International Conference on Machine Learning","author":"Yarats D","year":"2021","unstructured":"Yarats D, Fergus R, Alessandro L, et al. Reinforcement learning with prototypical representations. In: International Conference on Machine Learning, Vienna, Austria, 2021, pp. 11920\u201311931."},{"key":"e_1_3_2_204_2","unstructured":"Tomar M Zhang A Calandra R et al. Model-invariant state abstractions for model-based reinforcement learning. ArXiv abs\/2102.09850 2021."},{"key":"e_1_3_2_205_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2023.3242831"},{"key":"e_1_3_2_206_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-021-05961-4"},{"key":"e_1_3_2_207_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561384"},{"key":"e_1_3_2_208_2","doi-asserted-by":"crossref","unstructured":"Smith LM Cao Y Levine S. Grow your limits: continuous improvement with real-world RL for robotic locomotion. ArXiv abs\/2310.17634 2023.","DOI":"10.1109\/ICRA57147.2024.10610485"},{"key":"e_1_3_2_209_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3367329"},{"key":"e_1_3_2_210_2","volume-title":"Synthesis lectures on artificial intelligence and machine learning","author":"Chen Z","year":"2018","unstructured":"Chen Z, Liu B. Lifelong machine learning. In: Synthesis lectures on artificial intelligence and machine learning. Springer International Publishing, Cham. 2018."}],"container-title":["Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/09596518251350353","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/09596518251350353","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/09596518251350353","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,25]],"date-time":"2026-01-25T13:38:11Z","timestamp":1769348291000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/09596518251350353"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,3]]},"references-count":209,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["10.1177\/09596518251350353"],"URL":"https:\/\/doi.org\/10.1177\/09596518251350353","relation":{},"ISSN":["0959-6518","2041-3041"],"issn-type":[{"value":"0959-6518","type":"print"},{"value":"2041-3041","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,3]]}}}