{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T20:19:27Z","timestamp":1777407567269,"version":"3.51.4"},"reference-count":49,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2024,1,24]],"date-time":"2024-01-24T00:00:00Z","timestamp":1706054400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotica"],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Reinforcement learning (RL) has been successfully applied to a wealth of robot manipulation tasks and continuous control problems. However, it is still limited to industrial applications and suffers from three major challenges: sample inefficiency, real data collection, and the gap between simulator and reality. In this paper, we focus on the practical application of RL for robot assembly in the real world. We apply enlightenment learning to improve the proximal policy optimization, an on-policy model-free actor-critic reinforcement learning algorithm, to train an agent in Cartesian space using the proprioceptive information. We introduce enlightenment learning incorporated via pretraining, which is beneficial to reduce the cost of policy training and improve the effectiveness of the policy. A human-like assembly trajectory is generated through a two-step method with segmenting objects by locations and iterative closest point for pretraining. We also design a sim-to-real controller to correct the error while transferring to reality. We set up the environment in the MuJoCo simulator and demonstrated the proposed method on the recently established The National Institute of Standards and Technology (NIST)\u00a0gear assembly benchmark. The paper introduces a unique framework that enables a robot to learn assembly tasks efficiently using limited real-world samples by leveraging simulations and visual demonstrations. The comparative experiment results indicate that our approach surpasses other baseline methods in terms of training speed, success rate, and efficiency.<\/jats:p>","DOI":"10.1017\/s0263574724000092","type":"journal-article","created":{"date-parts":[[2024,1,24]],"date-time":"2024-01-24T05:18:04Z","timestamp":1706073484000},"page":"1074-1093","source":"Crossref","is-referenced-by-count":10,"title":["One-shot sim-to-real transfer policy for robotic assembly via reinforcement learning with visual demonstration"],"prefix":"10.1017","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3855-0588","authenticated-orcid":false,"given":"Ruihong","family":"Xiao","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5255-5559","authenticated-orcid":false,"given":"Chenguang","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yiming","family":"Jiang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hui","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2024,1,24]]},"reference":[{"key":"S0263574724000092_ref20","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1017\/S0263574700017379","article-title":"Path finding and grasp planning for robotic assembly","volume":"12","author":"Lee","year":"1994","journal-title":"Robotica"},{"key":"S0263574724000092_ref36","doi-asserted-by":"crossref","unstructured":"[36] He, K. , Gkioxari, G. , Doll\u00e1r, P. and Girshick, R. , \u201cMask R-CNN\u201d Proceedings of the IEEE International Conference on Computer Vision (2017) pp. 2961\u20132969.","DOI":"10.1109\/ICCV.2017.322"},{"key":"S0263574724000092_ref38","doi-asserted-by":"crossref","unstructured":"[38] Zakharov, S. , Shugurov, I. and Ilic, S. , \u201cDpod: 6D Pose Object Detector and Refiner\u201d Proceedings of the IEEE\/CVF International Conference on Computer Vision (2019) pp. 1941\u20131950.","DOI":"10.1109\/ICCV.2019.00203"},{"key":"S0263574724000092_ref16","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1108\/RIA-10-2022-0248","article-title":"Efficient experience replay architecture for offline reinforcement learning","volume":"43","author":"Zhang","year":"2023","journal-title":"Robot. Intell. Automat."},{"key":"S0263574724000092_ref44","doi-asserted-by":"crossref","unstructured":"[44] Arndt, K. , Hazara, M. , Ghadirzadeh, A. and Kyrki, V. , \u201cMeta Reinforcement Learning for Sim-to-Real Domain Adaptation\u201d 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2020) pp. 2725\u20132731.","DOI":"10.1109\/ICRA40945.2020.9196540"},{"key":"S0263574724000092_ref45","volume-title":"Advances in Neural Information Processing Systems","volume":"30","author":"Li","year":"2017"},{"key":"S0263574724000092_ref24","doi-asserted-by":"crossref","first-page":"439","DOI":"10.23919\/JSEE.2023.000051","article-title":"A review of mobile robot motion planning methods: From classical motion planning workflows to reinforcement learning-based architectures","volume":"34","author":"Dong","year":"2023","journal-title":"J. Syst. Eng. Electron."},{"key":"S0263574724000092_ref10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11633-022-1390-8","article-title":"Brain-inspired intelligent robotics: Theoretical analysis and systematic application","volume":"20","author":"Qiao","year":"2023","journal-title":"Mach. Intell. Res."},{"key":"S0263574724000092_ref28","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1108\/AA-11-2020-0168","article-title":"Dynamic movement primitives based cloud robotic skill learning for point and non-point obstacle avoidance","volume":"41","author":"Lu","year":"2021","journal-title":"Assembly Autom."},{"key":"S0263574724000092_ref29","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1007\/s11633-022-1346-z","article-title":"Dynamic movement primitives based robot skills learning","volume":"20","author":"Kong","year":"2023","journal-title":"Mach. Intell. Res."},{"key":"S0263574724000092_ref48","doi-asserted-by":"crossref","first-page":"51416","DOI":"10.1109\/ACCESS.2021.3068769","article-title":"A review of physics simulators for robotic applications","volume":"9","author":"Collins","year":"2021","journal-title":"IEEE Access"},{"key":"S0263574724000092_ref15","unstructured":"[15] Vecerik, M. , Hester, T. , Scholz, J. , Wang, F. , Pietquin, O. , Piot, B. , Heess, N. , Roth\u00f6rl, T. , Lampe, T. and Riedmiller, M. , \u201cLeveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards\u201d (2017), arXiv preprint arXiv: 1707.08817."},{"key":"S0263574724000092_ref3","unstructured":"[3] Lillicrap, T. P. , Hunt, J. J. , Pritzel, A. , Heess, N. , Erez, T. , Tassa, Y. , Silver, D. and Wierstra, D. , \u201cContinuous control with deep reinforcement learning\u201d (2015), arXiv preprint arXiv: 1509.02971."},{"key":"S0263574724000092_ref5","unstructured":"[5] Haarnoja, T. , Zhou, A. , Hartikainen, K. , Tucker, G. , Ha, S. , Tan, J. , Kumar, V. , Zhu, H. , Gupta, A. , Abbeel, P. and Levine, S. \u201cSoft actor-critic algorithms and applications\u201d (2018), arXiv preprint arXiv: 1812.05905."},{"key":"S0263574724000092_ref19","doi-asserted-by":"crossref","first-page":"3306","DOI":"10.1017\/S0263574722000200","article-title":"Robot assembly theory and simulation of circular-rectangular compound peg-in-hole","volume":"40","author":"Wu","year":"2022","journal-title":"Robotica"},{"key":"S0263574724000092_ref30","doi-asserted-by":"crossref","unstructured":"[30] Hester, T. , Vecerik, M. , Pietquin, O. , Lanctot, M. , Schaul, T. , Piot, B. , Sendonaris, A. , Dulac-Arnold, G. , Osband, I. , Agapiou, J. P. , J. Z. Leibo and A. Gruslys, \u201cLearning from demonstrations for real world reinforcement learning\u201d (2017).","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"S0263574724000092_ref27","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/4-431-31381-8_23","volume-title":"Adaptive Motion of Animals and Machines","author":"Schaal","year":"2006"},{"key":"S0263574724000092_ref14","doi-asserted-by":"crossref","unstructured":"[14] Rajeswaran, A. , Kumar, V. , Gupta, A. , Vezzani, G. , Schulman, J. , Todorov, E. and Levine, S. , \u201cLearning complex dexterous manipulation with deep reinforcement learning and demonstrations\u201d (2017), arXiv preprint arXiv: 1709.10087.","DOI":"10.15607\/RSS.2018.XIV.049"},{"key":"S0263574724000092_ref23","unstructured":"[23] Kuffner, J. J. and LaValle, S. M. , \u201cRRT-Connect: An Efficient Approach to Single-Query Path Planning\u201d Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), IEEE, vol. 2 (2000) pp. 995\u20131001."},{"key":"S0263574724000092_ref26","doi-asserted-by":"crossref","unstructured":"[26] Theodorou, E. , Buchli, J. and Schaal, S. , \u201cReinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach\u201d 2010 IEEE International Conference on Robotics and Automation, IEEE (2010) pp. 2397\u20132403.","DOI":"10.1109\/ROBOT.2010.5509336"},{"key":"S0263574724000092_ref34","doi-asserted-by":"crossref","unstructured":"[34] He, Y. , Sun, W. , Huang, H. , Liu, J. , Fan, H. and Sun, J. , \u201cPvn3d: A Deep Point-wise 3D Keypoints Voting Network for 6dof Pose Estimation\u201d Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (2020) pp. 11632\u201311641.","DOI":"10.1109\/CVPR42600.2020.01165"},{"key":"S0263574724000092_ref49","doi-asserted-by":"crossref","first-page":"2427","DOI":"10.1007\/s11042-019-08302-9","article-title":"Robotic grasp detection based on image processing and random forest","volume":"79","author":"Zhang","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"S0263574724000092_ref4","unstructured":"[4] Schulman, J. , Wolski, F. , Dhariwal, P. , Radford, A. and Klimov, O. , \u201cProximal policy optimization algorithms\u201d (2017), arXiv preprint arXiv: 1707.06347."},{"key":"S0263574724000092_ref9","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1108\/RIA-01-2023-0002","article-title":"A novel human-robot skill transfer method for contact-rich manipulation task","volume":"43","author":"Dong","year":"2023","journal-title":"Robot. Intell. Automat."},{"key":"S0263574724000092_ref11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11432-022-3606-1","article-title":"Improving performance of robots using human-inspired approaches: A survey","volume":"65","author":"Qiao","year":"2022","journal-title":"Sci. China Inf. Sci."},{"key":"S0263574724000092_ref31","first-page":"12348","article-title":"Stable-baselines3: Reliable reinforcement learning implementations","volume":"22","author":"Raffin","year":"2021","journal-title":"J. Mach. Learn. Res."},{"key":"S0263574724000092_ref43","doi-asserted-by":"crossref","unstructured":"[43] Tobin, J. , Fong, R. , Ray, A. , Schneider, J. , Zaremba, W. and Abbeel, P. , \u201cDomain Randomization for Transferring Deep Neural Networks from Simulation to the Real World\u201d 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 23\u201330,","DOI":"10.1109\/IROS.2017.8202133"},{"key":"S0263574724000092_ref12","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1098\/rstb.2002.1258","article-title":"Computational approaches to motor learning by imitation","volume":"358","author":"Schaal","year":"2003","journal-title":"Philos. Trans. R. Soc. Lond.. Ser. B Biol. Sci."},{"key":"S0263574724000092_ref6","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1007\/s11633-022-1347-y","article-title":"A survey on recent advances and challenges in reinforcement learning methods for task-oriented dialogue policy learning,","volume":"20","author":"Kwan","journal-title":"Mach. Intell. Res."},{"key":"S0263574724000092_ref13","doi-asserted-by":"crossref","unstructured":"[13] Wen, B. , Lian, W. , Bekris, K. and Schaal, S. , \u201cYou only demonstrate once: Category-level manipulation from single visual demonstration\u201d (2022), arXiv preprint arXiv: 2201.12716.","DOI":"10.15607\/RSS.2022.XVIII.044"},{"key":"S0263574724000092_ref18","doi-asserted-by":"crossref","first-page":"883","DOI":"10.1109\/LRA.2020.2965869","article-title":"Benchmarking protocols for evaluating small parts robotic assembly systems","volume":"5","author":"Kimble","year":"2020","journal-title":"IEEE Robot. Automat. Lett."},{"key":"S0263574724000092_ref33","doi-asserted-by":"crossref","first-page":"2093","DOI":"10.1016\/j.neucom.2017.10.034","article-title":"Robot teaching by teleoperation based on visual interaction and extreme learning machine","volume":"275","author":"Xu","year":"2018","journal-title":"Neurocomputing"},{"key":"S0263574724000092_ref42","doi-asserted-by":"crossref","first-page":"1015","DOI":"10.1017\/S0263574722001230","article-title":"Zero-Shot sim-to-real transfer of reinforcement learning framework for robotics manipulation with demonstration and force feedback,","volume":"41","author":"Chen","year":"2022","journal-title":"Robotica"},{"key":"S0263574724000092_ref17","doi-asserted-by":"crossref","unstructured":"[17] Zhao, W. , Queralta, J. P. and Westerlund, T. , \u201cSim-to-real transfer in deep reinforcement learning for robotics: a survey\u201d 2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE (2020) pp. 737\u2013744.","DOI":"10.1109\/SSCI47803.2020.9308468"},{"key":"S0263574724000092_ref47","doi-asserted-by":"crossref","unstructured":"[47] Todorov, E. , Erez, T. and Tassa, Y. , \u201cMujoco: A Physics Engine for Model-based Control\u201d 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems, IEEE (2012) pp. 5026\u20135033.","DOI":"10.1109\/IROS.2012.6386109"},{"key":"S0263574724000092_ref25","doi-asserted-by":"crossref","unstructured":"[25] Inoue, T. , De Magistris, G. , Munawar, A. , Yokoya, T. and Tachibana, R. , \u201cDeep Reinforcement Learning for High Precision Assembly Tasks\u201d 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 819\u2013825.","DOI":"10.1109\/IROS.2017.8202244"},{"key":"S0263574724000092_ref46","unstructured":"[46] Coumans, E. and Bai, Y. , \u201cPybullet, a python module for physics simulation for games, robotics and machine learning (2016)."},{"key":"S0263574724000092_ref21","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/TSSC.1968.300136","article-title":"A formal basis for the heuristic determination of minimum cost paths","volume":"4","author":"Hart","year":"1968","journal-title":"IEEE Trans. Syst. Sci. Cybern."},{"key":"S0263574724000092_ref8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1017\/S0263574723000607","article-title":"Reinforcement learning with modified exploration strategy for mobile robot path planning","volume":"41","author":"Khlif","year":"2023","journal-title":"Robotica"},{"key":"S0263574724000092_ref39","doi-asserted-by":"crossref","unstructured":"[39] Rusu, R. B. , Blodow, N. , Marton, Z. C. and Beetz, M. , \u201cAligning Point Cloud Views Using Persistent Feature Histograms\u201d 2008 IEEE\/RSJ International Conference on Intelligent Robots and Systems, IEEE (2008) pp. 3384\u20133391.","DOI":"10.1109\/IROS.2008.4650967"},{"key":"S0263574724000092_ref41","first-page":"586","article-title":"Method for Registration of 3-D Shapes,","volume":"1611","author":"Besl","year":"1992","journal-title":"Sensor Fusion IV: Control Paradigms and Data Structures."},{"key":"S0263574724000092_ref35","doi-asserted-by":"crossref","first-page":"6526","DOI":"10.1109\/LRA.2022.3174261","article-title":"E2EK: End-to-end regression network based on keypoint for 6d pose estimation","volume":"7","author":"Lin","year":"2022","journal-title":"IEEE Robot. Automat. Lett."},{"key":"S0263574724000092_ref1","article-title":"Hierarchical multiobjective heuristic for pcb assembly optimization in a beam-head surface mounter,","volume":"52","author":"Gao","journal-title":"IEEE Trans. Cybernet."},{"key":"S0263574724000092_ref7","doi-asserted-by":"crossref","first-page":"2718","DOI":"10.1109\/TMECH.2019.2945135","article-title":"A survey of methods and strategies for high-precision robotic grasping and assembly tasks\u2013some new trends","volume":"24","author":"Li","year":"2019","journal-title":"IEEE\/ASME Trans. Mechatron."},{"key":"S0263574724000092_ref2","unstructured":"[2] Fujimoto, S. , Hoof, H. and Meger, D. , \u201cAddressing Function Approximation Error in Actor-Critic Methods\u201d International Conference on Machine Learning, PMLR (2018) pp. 1587\u20131596."},{"key":"S0263574724000092_ref40","doi-asserted-by":"crossref","unstructured":"[40] Rusu, R. B. , Blodow, N. and Beetz, M. , \u201cFast Point Feature Histograms (FPFH) for 3D Registration\u201d 2009 IEEE International Conference on Robotics and Automation, IEEE (2009) pp. 3212\u20133217.","DOI":"10.1109\/ROBOT.2009.5152473"},{"key":"S0263574724000092_ref32","doi-asserted-by":"crossref","unstructured":"[32] Xu, Y. , Yang, C. , Liu, X. and Li, Z. , \u201cA Novel Robot Teaching System Based on Mixed Reality\u201d 2018 3rd International Conference on Advanced Robotics and Mechatronics (ICARM), IEEE (2018) pp. 250\u2013255,","DOI":"10.1109\/ICARM.2018.8610861"},{"key":"S0263574724000092_ref37","first-page":"17721","volume-title":"Advances in Neural Information Processing Systems","volume":"33","author":"Wang","year":"2020"},{"key":"S0263574724000092_ref22","doi-asserted-by":"crossref","unstructured":"[22] Stentz, A. , \u201cOptimal and Efficient Path Planning for Partially-known Environments\u201d Proceedings of the 1994 IEEE international conference on robotics and automation, IEEE (1994) pp. 3310\u20133317,","DOI":"10.1109\/ROBOT.1994.351061"}],"container-title":["Robotica"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0263574724000092","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,9]],"date-time":"2024-11-09T00:00:23Z","timestamp":1731110423000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0263574724000092\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,24]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["S0263574724000092"],"URL":"https:\/\/doi.org\/10.1017\/s0263574724000092","relation":{},"ISSN":["0263-5747","1469-8668"],"issn-type":[{"value":"0263-5747","type":"print"},{"value":"1469-8668","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,24]]}}}