{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T03:21:07Z","timestamp":1740108067212,"version":"3.37.3"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"23","license":[{"start":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T00:00:00Z","timestamp":1670198400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T00:00:00Z","timestamp":1670198400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Deutsches Forschungszentrum f\u00fcr K\u00fcnstliche Intelligenz GmbH (DFKI)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Reinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control scenarios. In the literature, there is no consensus about which feedback frequency is optimal or at which time the feedback is most beneficial. To resolve these discrepancies we isolate and quantify the effect of feedback frequency in robotic tasks with continuous state and action spaces. The experiments encompass inverse kinematics learning for robotic manipulator arms of different complexity. We show that seemingly contradictory reported phenomena occur at different complexity levels. Furthermore, our results suggest that no single ideal feedback frequency exists. Rather that feedback frequency should be changed as the agent\u2019s proficiency in the task increases.<\/jats:p>","DOI":"10.1007\/s00521-022-07949-0","type":"journal-article","created":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T05:05:31Z","timestamp":1670216731000},"page":"16931-16943","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Quantifying the effect of feedback frequency in interactive reinforcement learning for robotic tasks"],"prefix":"10.1007","volume":"35","author":[{"given":"Daniel","family":"Harnack","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julie","family":"Pivin-Bachler","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1164-5579","authenticated-orcid":false,"given":"Nicol\u00e1s","family":"Navarro-Guerrero","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,12,5]]},"reference":[{"issue":"7676","key":"7949_CR1","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354\u2013359. https:\/\/doi.org\/10.1038\/nature24270","journal-title":"Nature"},{"key":"7949_CR2","doi-asserted-by":"crossref","unstructured":"Arzate\u00a0Cruz C, Igarashi T (2020) a survey on interactive reinforcement learning: design principles and open challenges. In: ACM designing interactive systems conference (DIS). Eindhoven, The Netherlands: Association for Computing Machinery; p. 1195\u20131209","DOI":"10.1145\/3357236.3395525"},{"key":"7949_CR3","unstructured":"Tan M (1997) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Readings in agents. Morgan Kaufmann Publishers Inc.. p. 487\u2013494"},{"issue":"1","key":"7949_CR4","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1007\/s10458-019-09430-0","volume":"34","author":"FL Da Silva","year":"2019","unstructured":"Da Silva FL, Warnell G, Costa AHR, Stone P (2019) Agents teaching agents: a survey on inter-agent transfer learning. Auton Agents Multi-Agent Syst 34(1):9. https:\/\/doi.org\/10.1007\/s10458-019-09430-0","journal-title":"Auton Agents Multi-Agent Syst"},{"key":"7949_CR5","unstructured":"Ng AY, Harada D, Russell SJ (1999) Policy invariance under reward transformations: theory and application to reward shaping. In: International conference on machine learning (ICML). vol. Sixteenth. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; p. 278\u2013287"},{"key":"7949_CR6","unstructured":"Griffith S, Subramanian K, Scholz J, Isbell C, Thomaz AL (2013) Policy shaping: integrating human feedback with reinforcement learning. In: International conference on neural information processing systems (NIPS). vol.\u00a02. Lake Tahoe, NV, USA: Curran Associates, Inc.; p. 2625\u20132633"},{"key":"7949_CR7","doi-asserted-by":"publisher","DOI":"10.17185\/duepublico\/40718","author":"C Stahlhut","year":"2015","unstructured":"Stahlhut C, Navarro-Guerrero N, Weber C, Wermter S (2015) Interaction in reinforcement learning reduces the need for finely tuned hyperparameters in complex tasks. Kogn Syst. https:\/\/doi.org\/10.17185\/duepublico\/40718","journal-title":"Kogn Syst"},{"issue":"3","key":"7949_CR8","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1037\/xge0000569","volume":"148","author":"MK Ho","year":"2019","unstructured":"Ho MK, Cushman F, Littman ML, Austerweil JL (2019) People teach with rewards and punishments as communication, not reinforcements. J Exp Psychol: Gen 148(3):520\u2013549. https:\/\/doi.org\/10.1037\/xge0000569","journal-title":"J Exp Psychol: Gen"},{"issue":"6\u20137","key":"7949_CR9","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1016\/j.artint.2007.09.009","volume":"172","author":"AL Thomaz","year":"2008","unstructured":"Thomaz AL, Breazeal C (2008) Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif Intell 172(6\u20137):716\u2013737. https:\/\/doi.org\/10.1016\/j.artint.2007.09.009","journal-title":"Artif Intell"},{"key":"7949_CR10","doi-asserted-by":"crossref","unstructured":"Loftin R, MacGlashan J, Peng B, Taylor M, Littman M, Huang J, et\u00a0al. (2014) A strategy-aware technique for learning behaviors from discrete human feedback. In: AAAI conference on artificial intelligence. vol.\u00a028 of AAAI Technical Track: Humans and AI. Qu\u00e9bec City, Qu\u00e9bec, Canada: Association for the Advancement of Artificial Intelligence. p. 937\u2013943","DOI":"10.1609\/aaai.v28i1.8839"},{"key":"7949_CR11","doi-asserted-by":"crossref","unstructured":"Knox WB, Stone P (2012) Reinforcement learning from human reward: discounting in episodic tasks. In: IEEE international symposium on robot and human interactive communication (RO-MAN). Paris, France. p. 878\u2013885","DOI":"10.1109\/ROMAN.2012.6343862"},{"issue":"1","key":"7949_CR12","doi-asserted-by":"publisher","first-page":"13","DOI":"10.3390\/biomimetics6010013","volume":"6","author":"A Bignold","year":"2021","unstructured":"Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) An evaluation methodology for interactive reinforcement learning with simulated users. Biomimetics 6(1):13. https:\/\/doi.org\/10.3390\/biomimetics6010013","journal-title":"Biomimetics"},{"issue":"1","key":"7949_CR13","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1080\/09540091.2014.885279","volume":"26","author":"ME Taylor","year":"2014","unstructured":"Taylor ME, Carboni N, Fachantidis A, Vlahavas I, Torrey L (2014) Reinforcement learning agents providing advice in complex video games. Connect Sci 26(1):45\u201363. https:\/\/doi.org\/10.1080\/09540091.2014.885279","journal-title":"Connect Sci"},{"key":"7949_CR14","doi-asserted-by":"crossref","unstructured":"Cruz F, W\u00fcppen P, Magg S, Fazrie A, Wermter S (2017) Agent-advising approaches in an interactive reinforcement learning scenario. In: Joint IEEE international conference on development and learning and epigenetic robotics (ICDL-EpiRob). Lisbon, Portugal. p. 209\u2013214","DOI":"10.1109\/DEVLRN.2017.8329809"},{"key":"7949_CR15","doi-asserted-by":"crossref","unstructured":"Suay HB, Chernova S (2011) Effect of human guidance and state space size on interactive reinforcement learning. In: IEEE international symposium on robot and human interactive communication (RO-MAN). Atlanta, GA, USA. p. 1\u20136","DOI":"10.1109\/ROMAN.2011.6005223"},{"key":"7949_CR16","unstructured":"Stahlhut C, Navarro-Guerrero N, Weber C, Wermter S (2015) Interaction is more beneficial in complex reinforcement learning problems than in simple ones. In: 4. Interdisziplin\u00e4rer workshop kognitive systeme: mensch, teams, systeme und automaten. Bielefeld, Germany. p. 142\u2013150"},{"key":"7949_CR17","doi-asserted-by":"crossref","unstructured":"Mill\u00e1n-Arias C, Fernandes B, Cruz F, Dazeley R, Fernandes S (2020) Robust approach for continuous interactive reinforcement learning. In: International conference on human-agent interaction (HAI). vol. 8th. Virtual Event USA: Association for Computing Machinery. p. 278\u2013280","DOI":"10.1145\/3406499.3418769"},{"issue":"4","key":"7949_CR18","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1109\/TCDS.2016.2543839","volume":"8","author":"F Cruz","year":"2016","unstructured":"Cruz F, Magg S, Weber C, Wermter S (2016) Training agents with interactive reinforcement learning and contextual affordances. IEEE Trans Cogn Dev Syst 8(4):271\u2013284. https:\/\/doi.org\/10.1109\/TCDS.2016.2543839","journal-title":"IEEE Trans Cogn Dev Syst"},{"issue":"2","key":"7949_CR19","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1007\/s10846-013-0015-4","volume":"77","author":"N Kofinas","year":"2015","unstructured":"Kofinas N, Orfanoudakis E, Lagoudakis MG (2015) Complete analytical forward and inverse kinematics for the NAO humanoid robot. J Intell Robot Syst 77(2):251\u2013264. https:\/\/doi.org\/10.1007\/s10846-013-0015-4","journal-title":"J Intell Robot Syst"},{"issue":"1","key":"7949_CR20","doi-asserted-by":"publisher","first-page":"14588","DOI":"10.1016\/j.ifacol.2017.08.2108","volume":"50","author":"D Busson","year":"2017","unstructured":"Busson D, Bearee R, Olabi A (2017) Task-oriented rigidity optimization for 7 DoF redundant manipulators. IFAC-PapersOnLine. 50(1):14588\u201314593. https:\/\/doi.org\/10.1016\/j.ifacol.2017.08.2108","journal-title":"IFAC-PapersOnLine."},{"key":"7949_CR21","doi-asserted-by":"publisher","first-page":"10","DOI":"10.3389\/fnbot.2017.00010","volume":"11","author":"N Navarro-Guerrero","year":"2017","unstructured":"Navarro-Guerrero N, Lowe R, Wermter S (2017) Improving robot motor learning with negatively valenced reinforcement signals. Front Neurorobotics 11:10. https:\/\/doi.org\/10.3389\/fnbot.2017.00010","journal-title":"Front Neurorobotics"},{"key":"7949_CR22","doi-asserted-by":"crossref","unstructured":"Navarro-Guerrero N, Lowe R, Wermter S (2017) The effects on adaptive behaviour of negatively valenced signals in reinforcement learning. In: Joint IEEE international conference on development and learning and epigenetic robotics (ICDL-EpiRob). Lisbon, Portugal. p. 148\u2013155","DOI":"10.1109\/DEVLRN.2017.8329800"},{"key":"7949_CR23","doi-asserted-by":"crossref","unstructured":"van Hasselt H, Wiering MA (2007) Reinforcement learning in continuous action spaces. In: IEEE symposium on approximate dynamic programming and reinforcement learning (ADPRL). Honolulu, HI, USA. p. 272\u2013279","DOI":"10.1109\/ADPRL.2007.368199"},{"key":"7949_CR24","unstructured":"Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations (ICLR). 3rd. San Diego, CA, USA. p.\u00a015"},{"issue":"56","key":"7949_CR25","first-page":"1633","volume":"10","author":"ME Taylor","year":"2009","unstructured":"Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(56):1633\u20131685","journal-title":"J Mach Learn Res"},{"key":"7949_CR26","doi-asserted-by":"crossref","unstructured":"Bergstra J, Yamins D, Cox DD (2013) Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: Python in science conference (SciPy). Austin, TX, USA. p. 13\u201320","DOI":"10.25080\/Majora-8b375195-003"},{"key":"7949_CR27","unstructured":"Lyle C, Rowland M, Dabney W (2022) Understanding and preventing capacity loss in reinforcement learning. In: International conference on learning representations (ICLR). vol. 10th. Virtual Event. p.\u00a012"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-022-07949-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-022-07949-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-022-07949-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,12]],"date-time":"2023-07-12T19:04:38Z","timestamp":1689188678000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-022-07949-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,5]]},"references-count":27,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["7949"],"URL":"https:\/\/doi.org\/10.1007\/s00521-022-07949-0","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"type":"print","value":"0941-0643"},{"type":"electronic","value":"1433-3058"}],"subject":[],"published":{"date-parts":[[2022,12,5]]},"assertion":[{"value":"2 March 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 October 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not contain any studies with human participants or animals performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}},{"value":"Not applicable","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication\/Informed consent"}}]}}