{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T13:49:07Z","timestamp":1767707347959,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2019,12,16]],"date-time":"2019-12-16T00:00:00Z","timestamp":1576454400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003130","name":"Fonds Wetenschappelijk Onderzoek","doi-asserted-by":"publisher","award":["Yves"],"award-info":[{"award-number":["Yves"]}],"id":[{"id":"10.13039\/501100003130","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Flanders Make","award":["Proud"],"award-info":[{"award-number":["Proud"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>The assembly industry is shifting more towards customizable products, or requiring assembly of small batches. This requires a lot of reprogramming, which is expensive because a specialized engineer is required. It would be an improvement if untrained workers could help a cobot to learn an assembly sequence by giving advice. Learning an assembly sequence is a hard task for a cobot, because the solution space increases drastically when the complexity of the task increases. This work introduces a novel method where human knowledge is used to reduce this solution space, and as a result increases the learning speed. The method proposed is the IRL-PBRS method, which uses Interactive Reinforcement Learning (IRL) to learn from human advice in an interactive way, and uses Potential Based Reward Shaping (PBRS), in a simulated environment, to focus learning on a smaller part of the solution space. The method was compared in simulation to two other feedback strategies. The results show that IRL-PBRS converges more quickly to a valid assembly sequence policy and does this with the fewest human interactions. Finally, a use case is presented where participants were asked to program an assembly task. Here, the results show that IRL-PBRS learns quickly enough to keep up with advice given by a user, and is able to adapt online to a changing knowledge base.<\/jats:p>","DOI":"10.3390\/robotics8040104","type":"journal-article","created":{"date-parts":[[2019,12,17]],"date-time":"2019-12-17T02:59:01Z","timestamp":1576551541000},"page":"104","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Accelerating Interactive Reinforcement Learning by Human Advice for an Assembly Task by a Cobot"],"prefix":"10.3390","volume":"8","author":[{"given":"Joris","family":"De Winter","sequence":"first","affiliation":[{"name":"Departement of Mechanical Engineering, Vrije Universiteit Brussel: Pleinlaan 2, 1000 Brussel, Belgium"},{"name":"Flanders Make vzw, Celestijnenlaan 300, 3001 Leuven, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Albert","family":"De Beir","sequence":"additional","affiliation":[{"name":"Departement of Mechanical Engineering, Vrije Universiteit Brussel: Pleinlaan 2, 1000 Brussel, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ilias","family":"El Makrini","sequence":"additional","affiliation":[{"name":"Departement of Mechanical Engineering, Vrije Universiteit Brussel: Pleinlaan 2, 1000 Brussel, Belgium"},{"name":"Flanders Make vzw, Celestijnenlaan 300, 3001 Leuven, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Greet","family":"Van de Perre","sequence":"additional","affiliation":[{"name":"Departement of Mechanical Engineering, Vrije Universiteit Brussel: Pleinlaan 2, 1000 Brussel, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ann","family":"Now\u00e9","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Lab, Vrije Universiteit Brussel: Pleinlaan 2, 1000 Brussel, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4881-9341","authenticated-orcid":false,"given":"Bram","family":"Vanderborght","sequence":"additional","affiliation":[{"name":"Departement of Mechanical Engineering, Vrije Universiteit Brussel: Pleinlaan 2, 1000 Brussel, Belgium"},{"name":"Flanders Make vzw, Celestijnenlaan 300, 3001 Leuven, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,12,16]]},"reference":[{"key":"ref_1","unstructured":"Group, B.C. (2019, December 16). The Robotics Revolution: The Next Great Leap In Manufacturing. Available online: https:\/\/www.bcg.com\/publications\/2015\/lean-manufacturing-innovation-robotics-revolution-next-great-leap-manufacturing.aspx."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1109\/MRA.2010.936952","article-title":"Imitation and Reinforcement Learning","volume":"17","author":"Kober","year":"2010","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1109\/MRA.2018.2815947","article-title":"Working with Walt: How a Cobot Was Developed and Inserted on an Auto Assembly Line","volume":"25","author":"Elprama","year":"2018","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_4","first-page":"103","article-title":"Task allocation for improved ergonomics in Human-Robot Collaborative Assembly","volume":"20","author":"Merckaert","year":"2019","journal-title":"Interact. Stud."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"122","DOI":"10.3390\/robotics2030122","article-title":"Reinforcement Learning in Robotics: Applications and Real-World Challenges","volume":"2","author":"Kormushev","year":"2013","journal-title":"Robotics"},{"key":"ref_6","unstructured":"Skoglund, A. (2009). Programming by Demonstration of Robot Manipulators. [Ph.D. Thesis, Orebro University]."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1016\/j.robot.2006.01.003","article-title":"Robust trajectory learning and approximation for robot programming by demonstration","volume":"54","author":"Aleotti","year":"2006","journal-title":"Robot. Auton. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11370-015-0187-9","article-title":"A tutorial on task-parameterized movement learning and retrieval","volume":"9","author":"Calinon","year":"2016","journal-title":"Intell. Serv. Robot."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1109\/TCYB.2015.2414277","article-title":"Learning Trajectories for Robot Programing by Demonstration Using a Coordinated Mixture of Factor Analyzers","volume":"46","author":"Field","year":"2016","journal-title":"IEEE Trans. Cybern."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Melchior, N.A., and Simmons, R. (2012, January 7\u201312). Graph-based trajectory planning through programming by demonstration. Proceedings of the 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal.","DOI":"10.1109\/IROS.2012.6386101"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhu, Z., and Hu, H. (2018). Robot Learning from Demonstration in Robotic Assembly: A Survey. Robotics, 7.","DOI":"10.3390\/robotics7020017"},{"key":"ref_12","unstructured":"(2019, September 30). Emika Franka Panda Interface. Available online: https:\/\/www.franka.de\/panda\/."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ahmadzadeh, S.R., Paikan, A., Mastrogiovanni, F., Natale, L., Kormushev, P., and Caldwell, D.G. (2015, January 26\u201330). Learning symbolic representations of actions from human demonstrations. Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA.","DOI":"10.1109\/ICRA.2015.7139728"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"254","DOI":"10.5772\/55640","article-title":"Spatial Programming for Industrial Robots through Task Demonstration","volume":"10","author":"Lambrecht","year":"2013","journal-title":"Int. J. Adv.Robot. Syst."},{"key":"ref_15","unstructured":"Stenmark, M., and Topp, E.A. (2016). From Demonstrations to Skills for High-level Programming of Industrial Robots. AAAI Fall Symposium Series 2016, AAAI Press."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, J., Wang, Y., and Xiong, R. (2016, January 18\u201320). Industrial robot programming by demonstration. Proceedings of the 2016 International Conference on Advanced Robotics and Mechatronics (ICARM), Macau, China.","DOI":"10.1109\/ICARM.2016.7606936"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"524","DOI":"10.1108\/IR-02-2016-0058","article-title":"Learning of assembly constraints by demonstration and active exploration","volume":"43","author":"Kramberger","year":"2016","journal-title":"Ind. Robot Int. J."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mollard, Y., Munzer, T., Baisero, A., Toussaint, M., and Lopes, M. (October, January 28). Robot programming from demonstration, feedback and transfer. Proceedings of the 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany.","DOI":"10.1109\/IROS.2015.7353615"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/j.patrec.2017.03.015","article-title":"Supervised autonomy for online learning in human-robot interaction","volume":"99","author":"Senft","year":"2017","journal-title":"Pattern Recognit. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Knox, W.B., and Stone, P. (2009, January 1\u20134). Interactively Shaping Agents via Human Reinforcement: The TAMER Framework. Proceedings of the Fifth International Conference on Knowledge Capture, Redondo Beach, CA, USA.","DOI":"10.1145\/1597735.1597738"},{"key":"ref_21","unstructured":"Thomaz, A.L., and Breazeal, C. (2006). Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance, AAAI Press."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Thomas, G., Chien, M., Tamar, A., Ojea, J., and Abbeel, P. (2018, January 21\u201325). Learning Robotic Assembly from CAD. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia.","DOI":"10.1109\/ICRA.2018.8460696"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Cruz, F., Twiefel, J., Magg, S., Weber, C., and Wermter, S. (2015, January 12\u201317). Interactive reinforcement learning through speech guidance in a domestic scenario. Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland.","DOI":"10.1109\/IJCNN.2015.7280477"},{"key":"ref_24","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Introduction to Reinforcement Learning, MIT Press. [1st ed.]."},{"key":"ref_25","unstructured":"Watkins, C. (1989). Learning From Delayed Rewards. [Ph.D. Thesis, King\u2019s College]."},{"key":"ref_26","unstructured":"B.F. Skinner Foundation (1938). The Behavior of Organisms: An Experimental Analysis, B.F. Skinner Foundation."},{"key":"ref_27","unstructured":"Ng, A.Y., Harada, D., and Russell, S.J. (1999, January 27\u201330). Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia."},{"key":"ref_28","unstructured":"Devlin, S., and Kudenko, D. (2012, January 4\u20138). Dynamic Potential-based Reward Shaping. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, Valencia, Spain."},{"key":"ref_29","first-page":"227","article-title":"Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition","volume":"13","author":"Dietterich","year":"2000","journal-title":"J. Artif. Int. Res."},{"key":"ref_30","unstructured":"Gao, Y., and Toni, F. (2015, January 25\u201331). Potential Based Reward Shaping for Hierarchical Reinforcement Learning. Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina."},{"key":"ref_31","unstructured":"Gilbreth, F.B., and Carey, E.G. (1948). Cheaper by the Dozen, Thomas Y. Crowell Company."},{"key":"ref_32","first-page":"1","article-title":"Learning Rates for Q-learning","volume":"5","author":"Mansour","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Collins, K., Palmer, A.J., and Rathmill, K. (1985). The Development of a European Benchmark for the Comparison of Assembly Robot Programming Systems. Robot Technology and Applications, Springer.","DOI":"10.1007\/978-3-662-02440-9_18"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kelley, J.F. (1983, January 12\u201315). An Empirical Methodology for Writing User-Friendly Natural Language Computer Applications. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Boston, MA, USA.","DOI":"10.1145\/800045.801609"}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/8\/4\/104\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:42:48Z","timestamp":1760190168000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/8\/4\/104"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,16]]},"references-count":34,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2019,12]]}},"alternative-id":["robotics8040104"],"URL":"https:\/\/doi.org\/10.3390\/robotics8040104","relation":{},"ISSN":["2218-6581"],"issn-type":[{"type":"electronic","value":"2218-6581"}],"subject":[],"published":{"date-parts":[[2019,12,16]]}}}