{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:24:39Z","timestamp":1750220679304,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":20,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,11,10]],"date-time":"2020-11-10T00:00:00Z","timestamp":1604966400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Universidad Central de Chile","award":["CIP2018009"],"award-info":[{"award-number":["CIP2018009"]}]},{"name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","award":["001"],"award-info":[{"award-number":["001"]}]},{"name":"National Council for Scientific and Technological Development","award":["432818\/2018-9"],"award-info":[{"award-number":["432818\/2018-9"]}]},{"name":"Prysmian Group","award":["001\/2017"],"award-info":[{"award-number":["001\/2017"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,11,10]]},"DOI":"10.1145\/3406499.3418769","type":"proceedings-article","created":{"date-parts":[[2020,11,2]],"date-time":"2020-11-02T21:20:21Z","timestamp":1604352021000},"page":"278-280","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["A Robust Approach for Continuous Interactive Reinforcement Learning"],"prefix":"10.1145","author":[{"given":"Cristian","family":"Mill\u00e1n-Arias","sequence":"first","affiliation":[{"name":"Universidade de Pernambuco, Recife, Pernambuco, Brazil"}]},{"given":"Bruno","family":"Fernandes","sequence":"additional","affiliation":[{"name":"Universidade de Pernambuco, Recife, Pernambuco, Brazil"}]},{"given":"Francisco","family":"Cruz","sequence":"additional","affiliation":[{"name":"Deakin University &amp; Universidad Central de Chile, Geelong, Australia"}]},{"given":"Richard","family":"Dazeley","sequence":"additional","affiliation":[{"name":"Deakin University, Greelong, VIC, Australia"}]},{"given":"Sergio","family":"Fernandes","sequence":"additional","affiliation":[{"name":"Universidade de Pernambuco, Recife, Pernambuco, Brazil"}]}],"member":"320","published-online":{"date-parts":[[2020,11,10]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_3_2_2_1_1","DOI":"10.1145\/3309772.3309801"},{"key":"e_1_3_2_2_2_1","volume-title":"Anderson","author":"Barto Andrew G.","year":"1983","unstructured":"Andrew G. Barto , Richard S. Sutton , and Charles W . Anderson . 1983 . Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics , Vol. SMC-13 , 5 (1983), 834--846. https:\/\/doi.org\/10.1109\/TSMC.1983.6313077 10.1109\/TSMC.1983.6313077 Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. 1983. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-13, 5 (1983), 834--846. https:\/\/doi.org\/10.1109\/TSMC.1983.6313077"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_3_1","DOI":"10.1109\/TCDS.2016.2543839"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_4_1","DOI":"10.1109\/IJCNN.2018.8489237"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_5_1","DOI":"10.1109\/DEVLRN.2017.8329809"},{"volume-title":"15th AAAI. AAAI, Wisconsin, 761--768.","author":"Dearden Richard","unstructured":"Richard Dearden , Nir Friedman , and Stuart Russell . 1998. Bayesian Q-learning . In 15th AAAI. AAAI, Wisconsin, 761--768. Richard Dearden, Nir Friedman, and Stuart Russell. 1998. Bayesian Q-learning. In 15th AAAI. AAAI, Wisconsin, 761--768.","key":"e_1_3_2_2_6_1"},{"key":"e_1_3_2_2_7_1","volume-title":"Thomaz","author":"Griffith Shane","year":"2013","unstructured":"Shane Griffith , Kaushik Subramanian , Jonathan Scholz , Charles L. Isbell , and Andrea L . Thomaz . 2013 . Policy Shaping : Integrating Human Feedback with Reinforcement Learning. In Advances in Neural Information Processing Systems 26 (NIPS 2013). NIPS, Lake Tahoe , 2625--2633. https:\/\/papers.nips.cc\/paper\/5187-policy-shaping-integrating-human-feedback-with-reinforcement-learning Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, and Andrea L. Thomaz. 2013. Policy Shaping: Integrating Human Feedback with Reinforcement Learning. In Advances in Neural Information Processing Systems 26 (NIPS 2013). NIPS, Lake Tahoe, 2625--2633. https:\/\/papers.nips.cc\/paper\/5187-policy-shaping-integrating-human-feedback-with-reinforcement-learning"},{"key":"e_1_3_2_2_8_1","volume-title":"Technical Report 513. WRIGHT LAB WRIGHT-PATTERSON AFB OH. 14 pages.","author":"Harry Klopf A.","year":"1993","unstructured":"A. Harry Klopf and L. C. Baird . 1993 . Reinforcement learning with high-dimensional, continuous actions. Technical Report 513. WRIGHT LAB WRIGHT-PATTERSON AFB OH. 14 pages. A. Harry Klopf and L. C. Baird. 1993. Reinforcement learning with high-dimensional, continuous actions. Technical Report 513. WRIGHT LAB WRIGHT-PATTERSON AFB OH. 14 pages."},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_9_1","DOI":"10.1145\/1597735.1597738"},{"key":"e_1_3_2_2_10_1","volume-title":"Proceedings European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. ESANN","author":"Mill\u00e1n Cristian","year":"2019","unstructured":"Cristian Mill\u00e1n , Bruno Fernandes , and Fransisco Cruz . 2019 . Human feedback in continuous actor-critic reinforcement learning . In Proceedings European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. ESANN , Bruges (Belgium), 661--666. Cristian Mill\u00e1n, Bruno Fernandes, and Fransisco Cruz. 2019. Human feedback in continuous actor-critic reinforcement learning. In Proceedings European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. ESANN, Bruges (Belgium), 661--666."},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_11_1","DOI":"10.1162\/0899766053011528"},{"key":"e_1_3_2_2_12_1","first-page":"278","article-title":"Policy invariance under reward transformations theory and application to reward shaping","volume":"99","author":"Ng Andrew Y.","year":"1999","unstructured":"Andrew Y. Ng , Harada Daishi , and Russell Stuart . 1999 . Policy invariance under reward transformations theory and application to reward shaping . ICML , Vol. 99 (1999), 278 -- 287 . Andrew Y. Ng, Harada Daishi, and Russell Stuart. 1999. Policy invariance under reward transformations theory and application to reward shaping. ICML, Vol. 99 (1999), 278--287.","journal-title":"ICML"},{"key":"e_1_3_2_2_13_1","volume-title":"Between Instruction and Reward: Human-Prompted Switching. In AAAI Fall Symposium: Robots Learning Interactively from Human Teachers. AAAI","author":"Pilarski PM","year":"2012","unstructured":"PM Pilarski and RS Sutton . 2012 . Between Instruction and Reward: Human-Prompted Switching. In AAAI Fall Symposium: Robots Learning Interactively from Human Teachers. AAAI , Arlington, Virginia, 46--52. PM Pilarski and RS Sutton. 2012. Between Instruction and Reward: Human-Prompted Switching. In AAAI Fall Symposium: Robots Learning Interactively from Human Teachers. AAAI, Arlington, Virginia, 46--52."},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_14_1","DOI":"10.1109\/ICEEE.2018.8533946"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_15_1","DOI":"10.1109\/IROS.2006.282223"},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_16_1","DOI":"10.1109\/ROMAN.2011.6005223"},{"key":"e_1_3_2_2_17_1","volume-title":"Barto","author":"Sutton Richard S.","year":"1998","unstructured":"Richard S. Sutton and Andrew G . Barto . 1998 . Reinforcement Learning : An Introduction. MIT press , Cambridge, Massachusetts. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT press, Cambridge, Massachusetts."},{"doi-asserted-by":"publisher","key":"e_1_3_2_2_18_1","DOI":"10.5555\/3009657.3009806"},{"key":"e_1_3_2_2_19_1","volume-title":"RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication. IEEE","author":"Andrea","year":"2007","unstructured":"Andrea L. Thomaz and Cynthia Breazeal. 2007. Asymmetric Interpretations of Positive and Negative Human Feedback for a Social Learning Agent . In RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication. IEEE , Jeju, South Korea, 720--725. https:\/\/doi.org\/10.1109\/ROMAN. 2007 .4415180 10.1109\/ROMAN.2007.4415180 Andrea L. Thomaz and Cynthia Breazeal. 2007. Asymmetric Interpretations of Positive and Negative Human Feedback for a Social Learning Agent. In RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, Jeju, South Korea, 720--725. https:\/\/doi.org\/10.1109\/ROMAN.2007.4415180"},{"key":"e_1_3_2_2_20_1","volume-title":"Reinforcement Learning in Continuous Action Spaces. In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE","author":"Hasselt Hado Van","year":"2007","unstructured":"Hado Van Hasselt and Marco A. Wiering . 2007 . Reinforcement Learning in Continuous Action Spaces. In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE , Honolulu, HI, USA, 272--279. https:\/\/doi.org\/10.1109\/ADPRL. 2007 .368199 10.1109\/ADPRL.2007.368199 Hado Van Hasselt and Marco A. Wiering. 2007. Reinforcement Learning in Continuous Action Spaces. In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE, Honolulu, HI, USA, 272--279. https:\/\/doi.org\/10.1109\/ADPRL.2007.368199"}],"event":{"sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"acronym":"HAI '20","name":"HAI '20: 8th International Conference on Human-Agent Interaction","location":"Virtual Event USA"},"container-title":["Proceedings of the 8th International Conference on Human-Agent Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3406499.3418769","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3406499.3418769","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:03:18Z","timestamp":1750197798000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3406499.3418769"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,10]]},"references-count":20,"alternative-id":["10.1145\/3406499.3418769","10.1145\/3406499"],"URL":"https:\/\/doi.org\/10.1145\/3406499.3418769","relation":{},"subject":[],"published":{"date-parts":[[2020,11,10]]},"assertion":[{"value":"2020-11-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}