{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T07:35:33Z","timestamp":1776929733691,"version":"3.51.2"},"reference-count":25,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2017,12,6]],"date-time":"2017-12-06T00:00:00Z","timestamp":1512518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student\u2019s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.<\/jats:p>","DOI":"10.3390\/make1010002","type":"journal-article","created":{"date-parts":[[2017,12,6]],"date-time":"2017-12-06T11:29:36Z","timestamp":1512559776000},"page":"21-42","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["Learning to Teach Reinforcement Learning Agents"],"prefix":"10.3390","volume":"1","author":[{"given":"Anestis","family":"Fachantidis","sequence":"first","affiliation":[{"name":"Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"}]},{"given":"Matthew","family":"Taylor","sequence":"additional","affiliation":[{"name":"Borealis AI, University of Alberta, CCIS 3-232, Edmonton, AB T6G 2M9, Canada"}]},{"given":"Ioannis","family":"Vlahavas","sequence":"additional","affiliation":[{"name":"Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2017,12,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning, An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_2","first-page":"1633","article-title":"Transfer Learning for Reinforcement Learning Domains: A Survey","volume":"10","author":"Taylor","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey. Reinforcement Learning, Springer.","DOI":"10.1007\/978-3-642-27645-3_5"},{"key":"ref_4","unstructured":"Zhan, Y., and Taylor, M.E. (2015, January 12\u201314). Online Transfer Learning in Reinforcement Learning Domains. Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), Arlington, VA, USA."},{"key":"ref_5","unstructured":"Zimmer, M., Viappiani, P., and Weng, P. (2014, January 5\u20139). Teacher-Student Framework: A Reinforcement Learning Approach. Proceedings of the AAMAS Workshop Autonomous Robots and Multirobot Systems, Paris, France."},{"key":"ref_6","unstructured":"Torrey, L., and Taylor, M. (2013, January 6\u201310). Teaching on a budget: Agents advising agents in reinforcement learning. Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems, International Foundation for Autonomous Agents and Multiagent Systems, Saint Paul, MN, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_8","first-page":"2125","article-title":"Transfer Learning via Inter-Task Mappings for Temporal Difference Learning","volume":"8","author":"Taylor","year":"2007","journal-title":"J. Mach. Learn. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1007\/s13748-012-0026-6","article-title":"Learning domain structure through probabilistic policy reuse in reinforcement learning","volume":"2","author":"Veloso","year":"2013","journal-title":"Prog. Artif. Intell."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Torrey, L., Walker, T., Shavlik, J., and Maclin, R. (2005). Using advice to transfer knowledge acquired in one reinforcement learning task to another. Proceedings of the Sixteenth European Conference on Machine Learning (ECML\u201905), Porto, Portugal, 2 October 2005, Springer.","DOI":"10.1007\/11564096_40"},{"key":"ref_11","unstructured":"Taylor, M.E., Jong, N.K., and Stone, P. (2008, January 15\u201319). Transferring Instances for Model-Based Reinforcement Learning. Proceedings of the European Conference on Machine Learning, Antwerp, Belgium."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Rohlfshagen, P., and Lucas, S.M. (2011, January 5\u20138). Ms Pac-Man versus ghost team CEC 2011 competition. Proceedings of the 2011 IEEE Congress on Evolutionary Computation (CEC), New Orleans, LA, USA.","DOI":"10.1109\/CEC.2011.5949599"},{"key":"ref_13","unstructured":"Stroock, D. (2004). An Introduction to Markov Processes, Springer. Graduate Texts in Mathematics."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Schwartz, A. (1993, January 27\u201329). A reinforcement learning method for maximizing undiscounted rewards. Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA.","DOI":"10.1016\/B978-1-55860-307-3.50045-9"},{"key":"ref_15","unstructured":"White, A., Modayil, J., and Sutton, R.S. (2014, January 27\u201328). Surprise and curiosity for big data robotics. Proceedings of the AAAI-14 Workshop on Sequential Decision-Making with Big Data, Quebec City, QC, Canada."},{"key":"ref_16","unstructured":"(2017, December 01). Overlapping Confidence Intervals and Statistical Significance. Available online: https:\/\/www.cscu.cornell.edu\/news\/statnews\/stnews73.pdf."},{"key":"ref_17","unstructured":"Rummery, G., and Niranjan, M. (1994). On-Line Q-Learning Using Connectionist Systems, Engineering Department, Cambridge University. Technical Report CUED\/F-INFENG-RT 116."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1096","DOI":"10.1080\/01621459.1955.10501294","article-title":"A Multiple Comparison Procedure for Comparing Several Treatments with a Control","volume":"50","author":"Dunnett","year":"1955","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chakraborty, D., and Sen, S. (2006). Teaching new teammates. Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, Hakodate, Japan, 8\u201312 May 2006, ACM.","DOI":"10.1145\/1160633.1160757"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Stone, P., Kaminka, G.A., Kraus, S., and Rosenschein, J.S. (2010, January 11\u201315). Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination. Proceedings of the Twenty-Fourth Conference on Artificial Intelligence, Atlanta, GA, USA.","DOI":"10.1609\/aaai.v24i1.7529"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1007\/BF00992699","article-title":"Self-improving reactive agents based on reinforcement learning, planning and teaching","volume":"8","author":"Lin","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_22","unstructured":"Clouse, J.A. (1996). On Integrating Apprentice Learning and Reinforcement Learning. [Ph.D. Thesis, University of Massachusetts Amherst]."},{"key":"ref_23","unstructured":"Amir, O., Kamar, E., Kolobov, A., and Grosz, B.J. (2016, January 9\u201315). Interactive teaching strategies for agent training. Proceedings of the International Joint Conferences on Artificial Intelligence, New York, NY, USA."},{"key":"ref_24","unstructured":"Da Silva, F.L., Glatt, R., and Costa, A.H.R. (2017, January 8\u201312). Simultaneously Learning and Advising in Multiagent Reinforcement Learning. Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, S\u00e3o Paulo, Brazil."},{"key":"ref_25","unstructured":"Holzinger, A., Plass, M., Holzinger, K., Crisan, G.C., Pintea, C.M., and Palade, V. (arXiv, 2017). A glass-box interactive machine learning approach for solving np-hard problems with the human-in-the-loop, arXiv."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/2\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:52:52Z","timestamp":1760208772000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,6]]},"references-count":25,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["make1010002"],"URL":"https:\/\/doi.org\/10.3390\/make1010002","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,12,6]]}}}