{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:16:23Z","timestamp":1740122183314,"version":"3.37.3"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T00:00:00Z","timestamp":1672272000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T00:00:00Z","timestamp":1672272000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2023,6]]},"DOI":"10.1007\/s10458-022-09595-1","type":"journal-article","created":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T16:04:06Z","timestamp":1672329846000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Learning by reusing previous advice: a memory-based teacher\u2013student framework"],"prefix":"10.1007","volume":"37","author":[{"given":"Changxi","family":"Zhu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Cai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1908-1344","authenticated-orcid":false,"given":"Shuyue","family":"Hu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ho-fung","family":"Leung","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dickson K. W.","family":"Chiu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,12,29]]},"reference":[{"key":"9595_CR1","unstructured":"Akiyama, H. (2012). Helios team base code."},{"key":"9595_CR2","unstructured":"Amir, O., Kamar, E., Kolobov, A., & Grosz, B.\u00a0J. (2016). Interactive teaching strategies for agent training. In Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI) (pp. 804\u2013811)."},{"key":"9595_CR3","unstructured":"Barto, A.\u00a0G., Thomas, P.\u00a0S., & Sutton, R.\u00a0S. (2017). Some recent applications of reinforcement learning. In Proceedings of the 18th Yale workshop on adaptive and learning systems."},{"key":"9595_CR4","doi-asserted-by":"crossref","unstructured":"Brys, T., Now\u00e9, A., Kudenko, D., Taylor, M. (2014). Combining multiple correlated reward and shaping signals by measuring confidence. In Proceedings of 28th AAAI conference on artificial intelligence (pp. 1687\u20131693).","DOI":"10.1609\/aaai.v28i1.8998"},{"issue":"2","key":"9595_CR5","doi-asserted-by":"publisher","first-page":"3293","DOI":"10.1016\/j.eswa.2008.01.055","volume":"36","author":"DKW Chiu","year":"2009","unstructured":"Chiu, D. K. W., Leung, H. F., & Lam, K. M. (2009). On the making of service recommendations: An action theory based on utility, reputation, and risk attitude. Expert Systems with Applications, 36(2), 3293\u20133301.","journal-title":"Expert Systems with Applications"},{"key":"9595_CR6","unstructured":"Claus, C., & Boutilier C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In The national conference on artificial intelligence (pp. 746\u2013752)"},{"key":"9595_CR7","unstructured":"Clouse, J. A. (1996). On integrating apprentice learning and reinforcement learning. PhD thesis, University of Massachusetts"},{"key":"9595_CR8","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1613\/jair.1.11396","volume":"64","author":"FL da Silva","year":"2019","unstructured":"da Silva, F. L., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, 64, 645\u2013703.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9595_CR9","unstructured":"da\u00a0Silva F. L., Glatt, R., & Costa, A. H. R. (2017). Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (pp. 1100\u20131108)."},{"key":"9595_CR10","doi-asserted-by":"crossref","unstructured":"Felipe\u00a0Leno da\u00a0Silva, Pablo Hernandez-Leal, Bilal Kartal, and Taylor, M. E. (2020) Uncertainty-aware action advising for deep reinforcement learning agents. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (pp.5792\u20135799","DOI":"10.1609\/aaai.v34i04.6036"},{"key":"9595_CR11","unstructured":"Felipe\u00a0Leno da\u00a0Silva, Matthew\u00a0E. Taylor, and Anna Helena\u00a0Reali Costa (2018) Autonomously reusing knowledge in multiagent reinforcement learning. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (pp.5487\u20135493"},{"key":"9595_CR12","doi-asserted-by":"publisher","first-page":"21","DOI":"10.3390\/make1010002","volume":"1","author":"A Fachantidis","year":"2017","unstructured":"Fachantidis, A., Taylor, M. E., & Vlahavas, I. P. (2017). Learning to teach reinforcement learning agents. Machine Learning and Knowledge Extraction, 1, 21\u201342.","journal-title":"Machine Learning and Knowledge Extraction"},{"key":"9595_CR13","doi-asserted-by":"crossref","unstructured":"Ilhan, E., Gow, J., & Liebana, D. P. (2019) Teaching on a budget in multi-agent deep reinforcement learning. arXiv:1905.01357","DOI":"10.1109\/CIG.2019.8847988"},{"issue":"1","key":"9595_CR14","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1109\/TCIAIG.2012.2188528","volume":"4","author":"S Karakovskiy","year":"2012","unstructured":"Karakovskiy, S., & Togelius, J. (2012). The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 55\u201367.","journal-title":"IEEE Transactions on Computational Intelligence and AI in Games"},{"key":"9595_CR15","first-page":"73","volume":"18","author":"H Kitano","year":"1997","unstructured":"Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawa, E., & Matsubara, H. (1997). Robocup: A challenge problem for AI. AI Magazine, 18, 73\u201385.","journal-title":"AI Magazine"},{"key":"9595_CR16","doi-asserted-by":"publisher","first-page":"1238","DOI":"10.1177\/0278364913495721","volume":"32","author":"J Kober","year":"2013","unstructured":"Kober, J., Bagnell, J.A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32, 1238\u20131274.","journal-title":"The International Journal of Robotics Research"},{"key":"9595_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1017\/S0269888912000057","volume":"27","author":"L Matignon","year":"2012","unstructured":"Matignon, L., Laurent, G. J., & Le Fort-Piat, N. (2012). Independent reinforcement learners in cooperative markov games: A survey regarding coordination problems. Knowledge Engineering Review, 27, 1\u201331.","journal-title":"Knowledge Engineering Review"},{"key":"9595_CR18","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-28929-8","volume-title":"A concise introduction to decentralized POMDPs","author":"FA Oliehoek","year":"2016","unstructured":"Oliehoek, F. A., & Amato, C. (2016). A concise introduction to decentralized POMDPs (1st ed.). Springer: New York.","edition":"1"},{"key":"9595_CR19","doi-asserted-by":"crossref","unstructured":"Omidshafiei, S., Kim, D.-K., Liu, M., Tesauro, G., Riemer, M., Amato, C., Campbell, M., How, J.\u00a0P. (2019). Learning to teach in cooperative multiagent reinforcement learning. In The thirty-third AAAI conference on artificial intelligence (pp. 6128\u20136136).","DOI":"10.1609\/aaai.v33i01.33016128"},{"key":"9595_CR20","unstructured":"Rummery, G. A., & Niranjan, M. (1994). On-line q-learning using connectionist systems. Technical report cued\/f-infeng\/tr 166, Cambridge University Engineering Department."},{"key":"9595_CR21","doi-asserted-by":"crossref","unstructured":"Sherstov, A.\u00a0A., & Stone, P. (2005). Function approximation via tile coding: Automating parameter choice. In Proceedings symposium on abstraction, reformulation, and approximation (SARA-05), Edinburgh, Scotland, UK","DOI":"10.1007\/11527862_14"},{"key":"9595_CR22","unstructured":"Suay H.\u00a0B., Brys T., Taylor, M. E., & Chernova S. (2016). Learning from demonstration for shaping through inverse reinforcement learning. In Proceedings of the 2016 international conference on autonomous agents and multiagent systems) (pp. 429\u2013437)."},{"key":"9595_CR23","volume-title":"Reinforcement Learning: An Introduction","author":"RS Sutton","year":"1998","unstructured":"Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction (1st ed.). Cambridge, MA: MIT Press.","edition":"1"},{"issue":"1","key":"9595_CR24","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1080\/09540091.2014.885279","volume":"26","author":"ME Taylor","year":"2014","unstructured":"Taylor, M. E., Carboni, N., Fachantidis, A., Vlahavas, I. P., & Torrey, L. (2014). Reinforcement learning agents providing advice in complex video games. Connection Science, 26(1), 45\u201363.","journal-title":"Connection Science"},{"key":"9595_CR25","unstructured":"Torrey, L., & Taylor, M. E. (2013). Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of 12th the international conference on autonomous agents and multiagent systems (pp. 1053\u20131060)."},{"key":"9595_CR26","unstructured":"Wang, Y., Lu, W., Hao, J., Wei, J., & Leung, H. f. (2018). Efficient convention emergence through decoupled reinforcement social learning with teacher\u2013student mechanism. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (pp. 795\u2013803)."},{"key":"9595_CR27","doi-asserted-by":"crossref","unstructured":"Wang, Z., & Taylor. M. E. (2017). Improving reinforcement learning with confidence-based demonstrations. In Proceedings of the twenty-sixth international joint conference on artificial intelligence (IJCAI) (pp. 3027\u20133033).","DOI":"10.24963\/ijcai.2017\/422"},{"key":"9595_CR28","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins, C. J. C. H., & Dayan, P. (1992). Technical note: Q-learning. Machine Learning, 8, 279\u2013292.","journal-title":"Machine Learning"},{"key":"9595_CR29","unstructured":"Zhan, Y., Bou-Ammar, H., & Taylor, M. E. (2016). Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In Proceedings of the twenty-fifth international joint conference on artificial intelligence (pp. 2315\u20132321)."},{"key":"9595_CR30","unstructured":"Zhu, C., Cai, Y., Leung, H.-f., & Hu, S. (2020). Learning by reusing previous advice in teacher\u2013student paradigm. In: A. El\u00a0Fallah Seghrouchni, G. Sukthankar, B. An, and N. Yorke-Smith (Eds.), Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS \u201920, Auckland, New Zealand, May 9\u201313, 2020. International foundation for autonomous agents and multiagent systems, 2020 (pp. 1674\u20131682)."},{"key":"9595_CR31","unstructured":"Zimmer, M., Viappiani, P. & Weng, P. (2014). Teacher\u2013student framework: A reinforcement learning approach. In AAMAS workshop."}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-022-09595-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10458-022-09595-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-022-09595-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T07:42:38Z","timestamp":1683790958000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10458-022-09595-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,29]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["9595"],"URL":"https:\/\/doi.org\/10.1007\/s10458-022-09595-1","relation":{},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"type":"print","value":"1387-2532"},{"type":"electronic","value":"1573-7454"}],"subject":[],"published":{"date-parts":[[2022,12,29]]},"assertion":[{"value":"10 December 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 December 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"14"}}