{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,6]],"date-time":"2025-03-06T14:40:10Z","timestamp":1741272010445,"version":"3.38.0"},"reference-count":57,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T00:00:00Z","timestamp":1736208000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T00:00:00Z","timestamp":1736208000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,3]]},"DOI":"10.1007\/s00521-024-10829-4","type":"journal-article","created":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T04:05:31Z","timestamp":1736222731000},"page":"5945-5956","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Do as you teach: a multi-teacher approach to self-play in deep reinforcement learning"],"prefix":"10.1007","volume":"37","author":[{"given":"Chaitanya","family":"Kharyal","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0251-1851","authenticated-orcid":false,"given":"Sai Krishna","family":"Gottipati","sequence":"additional","affiliation":[]},{"given":"Tanmay Kumar","family":"Sinha","sequence":"additional","affiliation":[]},{"given":"Fatemeh","family":"Abdollahi","sequence":"additional","affiliation":[]},{"given":"Srijita","family":"Das","sequence":"additional","affiliation":[]},{"given":"Matthew E.","family":"Taylor","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,7]]},"reference":[{"key":"10829_CR1","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap TP, Hui F, Sifre L, Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550:354\u2013359","journal-title":"Nature"},{"key":"10829_CR2","first-page":"1","volume":"113","author":"MK Janjua","year":"2023","unstructured":"Janjua MK, Shah H, White M, Miahi E, Machado MC, White A (2023) Gvfs in the real world: making predictions online for water treatment. Machine Learn 113:1\u201331","journal-title":"Machine Learn"},{"key":"10829_CR3","doi-asserted-by":"crossref","unstructured":"Tang C, Abbatematteo B, Hu J, Chandra R, Mart\u00edn-Mart\u00edn R, Stone P (2024). Deep reinforcement learning for robotics: A survey of real-world successes. arXiv preprint arXiv:2408.03539","DOI":"10.1146\/annurev-control-030323-022510"},{"issue":"4","key":"10829_CR4","doi-asserted-by":"publisher","first-page":"4394","DOI":"10.1109\/LRA.2019.2932575","volume":"4","author":"SK Gottipati","year":"2019","unstructured":"Gottipati SK, Seo K, Bhatt D, Mai V, Murthy K, Paull L (2019) Deep active localization. IEEE Rob Autom Lett 4(4):4394\u20134401. https:\/\/doi.org\/10.1109\/LRA.2019.2932575","journal-title":"IEEE Rob Autom Lett"},{"key":"10829_CR5","unstructured":"Gottipati S.K, Sattarov B, Niu S, Pathak Y, Wei H, Liu S, Blackburn S, Thomas K, Coley C, Tang J, et al (2020). Learning to navigate the synthetically accessible chemical space using reinforcement learning. In: international conference on machine learning, pp 3668\u20133679"},{"key":"10829_CR6","first-page":"142","volume":"35","author":"SK Gottipati","year":"2021","unstructured":"Gottipati SK, Pathak Y, Sattarov B, Nuttall R, Amini M, Taylor ME, Chandar S et al (2021) Towered actor critic for handling multiple action types in reinforcement learning for drug discovery. Proc Conf Artif Intell 35:142\u2013150","journal-title":"Proc Conf Artif Intell"},{"key":"10829_CR7","unstructured":"Kaelbling L.P (1993). Learning to achieve goals. In: international joint conference on artificial intelligence. https:\/\/api.semanticscholar.org\/CorpusID:5538688"},{"key":"10829_CR8","unstructured":"Schaul T, Horgan D, Gregor K, Silver D (2015). Universal value function approximators. In: international conference on machine learning. pp 1312\u20131320"},{"key":"10829_CR9","unstructured":"Pitis S, Chan H, Zhao S, Stadie B, Ba J (2020). Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: international conference on machine learning, 7750\u20137761"},{"key":"10829_CR10","first-page":"26532","volume":"35","author":"W Ding","year":"2022","unstructured":"Ding W, Lin H, Li B, Zhao D (2022) Generalizing goal-conditioned reinforcement learning with Variational causal reasoning. Adv Neural Inform Proc Syst 35:26532\u201326548","journal-title":"Adv Neural Inform Proc Syst"},{"key":"10829_CR11","first-page":"99","volume-title":"Keeping your distance: solving sparse reward tasks using self-balancing shaped rewards","author":"A Trott","year":"2019","unstructured":"Trott A, Zheng S, Xiong C, Socher R (2019) Keeping your distance: solving sparse reward tasks using self-balancing shaped rewards. Curran Associates Inc., Red Hook, NY, USA, p 99"},{"key":"10829_CR12","unstructured":"Wu Y, Tucker G, Nachum O (2018). The laplacian in rl: Learning representations with efficient approximations. arXiv preprint arXiv:1810.04586"},{"key":"10829_CR13","first-page":"8622","volume":"34","author":"I Durugkar","year":"2021","unstructured":"Durugkar I, Tec M, Niekum S, Stone P (2021) Adversarial intrinsic motivation for reinforcement learning. Adv Neural Inform Process Syst 34:8622\u20138636","journal-title":"Adv Neural Inform Process Syst"},{"key":"10829_CR14","unstructured":"Campero A, Raileanu R, K\u00fcttler H, Tenenbaum J.B, Rockt\u00e4schel T, Grefenstette E (2021) Learning with amigo: adversarially motivated intrinsic goals"},{"key":"10829_CR15","unstructured":"Florensa C, Held D, Geng X, Abbeel P (2018). Automatic goal generation for reinforcement learning agents. In: international conference on machine learning. pp 1515\u20131528"},{"key":"10829_CR16","unstructured":"Sukhbaatar S, Lin Z, Kostrikov I, Synnaeve G, Szlam A, Fergus R (2018). Intrinsic motivation and automatic curricula via asymmetric self-play"},{"key":"10829_CR17","unstructured":"Kharyal C, Sinha T, Gottipati S.K, Abdollahi F, Das S, Taylor M.E (2023). Do as you teach: A multi-teacher approach to self-play in deep reinforcement learning. In: proceedings of the 2023 international conference on autonomous agents and multiagent systems. pp 2457\u20132459"},{"key":"10829_CR18","unstructured":"Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods"},{"key":"10829_CR19","unstructured":"Lillicrap T.P, Hunt J.J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio, Y., LeCun, Y. (eds.) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference track proceedings. http:\/\/arxiv.org\/abs\/1509.02971"},{"key":"10829_CR20","unstructured":"OpenAI O, Plappert M, Sampedro R, Xu T, Akkaya I, Kosaraju V, Welinder P, D\u2019Sa R, Petron A, O.\u00a0Pinto H.P, Paino A, Noh H, Weng L, Yuan Q, Chu C, Zaremba W (2021) Asymmetric self-play for automatic goal discovery in robotic manipulation"},{"key":"10829_CR21","unstructured":"Yang T, Tang H, Bai C, Liu J, Hao J, Meng Z, Liu P (2021) Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint arXiv:2109.06668"},{"key":"10829_CR22","first-page":"1471","volume":"29","author":"M Bellemare","year":"2016","unstructured":"Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inf Process Syst 29:1471\u20131479","journal-title":"Adv Neural Inf Process Syst"},{"key":"10829_CR23","unstructured":"Burda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. arXiv preprint arXiv:1810.12894"},{"key":"10829_CR24","unstructured":"Schirp J (2024). Exploring with attention: comparing exploration strategies on sparse reward environments"},{"key":"10829_CR25","unstructured":"Mohamed S, Rezende D.J (2015) Variational information maximisation for intrinsically motivated reinforcement learning. arXiv preprint arXiv:1509.08731"},{"key":"10829_CR26","unstructured":"Houthooft R, Chen X, Duan Y, Schulman J, De\u00a0Turck F, Abbeel P (2016). Vime: Variational information maximizing exploration. Adv Neural Inf Proc Syst 29"},{"key":"10829_CR27","doi-asserted-by":"crossref","unstructured":"Schmidhuber J (1991). A possibility for implementing curiosity and boredom in model-building neural controllers. In: proceedings of the international conference on simulation of adaptive behavior: from animals to animals, pp 222\u2013227","DOI":"10.7551\/mitpress\/3115.003.0030"},{"key":"10829_CR28","doi-asserted-by":"crossref","unstructured":"Pathak D, Agrawal P, Efros A.A, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: international conference on machine learning, pp 2778\u20132787","DOI":"10.1109\/CVPRW.2017.70"},{"key":"10829_CR29","unstructured":"Raileanu R, Rockt\u00e4schel T (2020) Ride: rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292"},{"key":"10829_CR30","unstructured":"Osband I, Van\u00a0Roy B, Wen Z (2016). Generalization and exploration via randomized value functions. In: international conference on machine learning"},{"key":"10829_CR31","unstructured":"Janz D, Hron J, Mazur P, Hofmann K, Hern\u00e1ndez-Lobato J.M, Tschiatschek S (2019). Successor uncertainties: exploration and uncertainty in temporal difference learning. Adv Neural Inf Proc Syst"},{"key":"10829_CR32","unstructured":"Metelli AM, Likmeta A, Restelli M (2019) Propagating uncertainty in reinforcement learning via wasserstein barycenters. Adv Neural Inf Proc Syst"},{"key":"10829_CR33","doi-asserted-by":"publisher","first-page":"318","DOI":"10.1007\/978-981-97-7244-5_21","volume-title":"Web and big data","author":"S Zhu","year":"2024","unstructured":"Zhu S, Zhang K, Pan H (2024) Reinforcement learning from clip. In: Zhang W, Tung A, Zheng Z, Yang Z, Wang X, Guo H (eds) Web and big data. Springer, Singapore, pp 318\u2013329"},{"key":"10829_CR34","unstructured":"Bauza M, Chen JE, Dalibard V, Gileadi N, Hafner R, Martins MF, Moore J, Pevceviciute R, Laurens A, Rao D et al (2024) Demostart: demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots. arXiv preprint arXiv:2409.06613"},{"key":"10829_CR35","unstructured":"Clark E, Ryu K, Mehr N (2024) Adaptive teaching in heterogeneous agents: balancing surprise in sparse reward scenarios. arXiv preprint arXiv:2405.14199"},{"key":"10829_CR36","unstructured":"Oh J, Guo Y, Singh S, Lee H (2018) Self-imitation learning. In: international conference on machine learning, pp 3878\u20133887"},{"key":"10829_CR37","unstructured":"Zha D, Ma W, Yuan L, Hu X, Liu J (2021) Rank the episodes: a simple approach for exploration in procedurally-generated environments. ICLR"},{"key":"10829_CR38","unstructured":"Hartikainen K, Geng X, Haarnoja T, Levine S (2020) Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30. https:\/\/openreview.net\/forum?id=H1lmhaVtvr"},{"issue":"181","key":"10829_CR39","first-page":"1","volume":"21","author":"S Narvekar","year":"2020","unstructured":"Narvekar S, Peng B, Leonetti M, Sinapov J, Taylor ME, Stone P (2020) Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res 21(181):1\u201350","journal-title":"J Mach Learn Res"},{"key":"10829_CR40","unstructured":"Kharyal C, Gottipati S.K, Sinha T.K, Das S, Taylor M.E (2024). Glide-rl: grounded language instruction through demonstration in rl. arXiv preprint arXiv:2401.02991"},{"key":"10829_CR41","unstructured":"Du Y, Abbeel P, Grover A (2022). It takes four to tango: Multiagent selfplay for automatic curriculum generation. In: 10th international conference on learning representations, ICLR 2022, pp 1515\u20131528"},{"key":"10829_CR42","unstructured":"Campero A, Raileanu R, K\u00fcttler H, Tenenbaum J.B, Rockt\u00e4schel T, Grefenstette E (2021). Learning with amigo: Adversarially motivated intrinsic goals. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7. https:\/\/openreview.net\/forum?id=ETBc_MIMgoX"},{"key":"10829_CR43","doi-asserted-by":"crossref","unstructured":"Wang V.H, Pajarinen J, Wang T, K\u00e4m\u00e4r\u00e4inen J.-K (2023). State-conditioned adversarial subgoal generation. In: proceedings of the AAAI conference on artificial intelligence, 37, 10184\u201310191","DOI":"10.1609\/aaai.v37i8.26213"},{"key":"10829_CR44","unstructured":"Racaniere S, Lampinen A, Santoro A, Reichert D, Firoiu V, Lillicrap T (2019) Automated curriculum generation through setter-solver interactions. In: international conference on learning representations"},{"key":"10829_CR45","unstructured":"Zhang S, Cao Z, Sadigh D, Sui Y (2021) Confidence-aware imitation learning from demonstrations with varying optimality. In: NeurIPS"},{"key":"10829_CR46","unstructured":"Cheng C.-A, Kolobov A, Agarwal A (2020) Policy improvement from multiple experts. NeuRIPS"},{"key":"10829_CR47","unstructured":"Cao X, Luo F.-M, Ye J, Xu T, Zhang Z, Yu Y (2024) Limited preference aided imitation learning from imperfect demonstrations. In: forty-first international conference on machine learning . https:\/\/openreview.net\/forum?id=PAbkWU0KDG"},{"key":"10829_CR48","doi-asserted-by":"crossref","unstructured":"Li G, M\u00fcller M, Casser V, Smith N, Michels D.L, Ghanem B (2019) Oil: observational imitation learning. In: robotics: science and systems","DOI":"10.15607\/RSS.2019.XV.005"},{"key":"10829_CR49","doi-asserted-by":"crossref","unstructured":"Li M, Wei Y, Kudenko D (2019) Two-level q-learning: learning from conflict demonstrations. The Knowledge Engineering Review","DOI":"10.1017\/S0269888919000092"},{"key":"10829_CR50","unstructured":"Kurenkov A, Mandlekar A, Martin-Martin R, Savarese S, Garg A (2019) Ac-teach: a bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers. CoRL"},{"key":"10829_CR51","unstructured":"Gimelfarb M, Sanner S, Lee C-G (2018) Reinforcement learning with multiple experts: a bayesian model combination approach. NeurIPS"},{"issue":"1","key":"10829_CR52","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1613\/jair.1.11396","volume":"64","author":"FL Da Silva","year":"2019","unstructured":"Da Silva FL, Costa AHR (2019) A survey on transfer learning for multiagent reinforcement learning systems. J Artif Int Res 64(1):645\u2013703. https:\/\/doi.org\/10.1613\/jair.1.11396","journal-title":"J Artif Int Res"},{"key":"10829_CR53","unstructured":"Coumans E, Bai Y (2016\u20132021) PyBullet, a Python module for physics simulation for games, robotics and machine learning. http:\/\/pybullet.org"},{"key":"10829_CR54","unstructured":"Cideron G, Pierrot T, Perrin N, Beguir K, Sigaud O (2020) QD-RL: efficient mixing of quality and diversity in reinforcement learning. CoRR abs\/2006.08505 2006.08505"},{"key":"10829_CR55","first-page":"5923","volume":"19","author":"M Masood","year":"2019","unstructured":"Masood M, Doshi-Velez F (2019) Diversity-inducing policy gradient: using maximum mean discrepancy to find a set of diverse policies. Proc Int Conf Artif Intell 19:5923\u20135929","journal-title":"Proc Int Conf Artif Intell"},{"key":"10829_CR56","unstructured":"Eysenbach B, Gupta A, Ibarz J, Levine S (2019). Diversity is all you need: Learning skills without a reward function. In: international conference on learning representations"},{"key":"10829_CR57","unstructured":"Kingma D.P, Ba J (2017). Adam: a method for stochastic optimization"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10829-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-024-10829-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10829-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,6]],"date-time":"2025-03-06T13:56:28Z","timestamp":1741269388000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-024-10829-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,7]]},"references-count":57,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["10829"],"URL":"https:\/\/doi.org\/10.1007\/s00521-024-10829-4","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"type":"print","value":"0941-0643"},{"type":"electronic","value":"1433-3058"}],"subject":[],"published":{"date-parts":[[2025,1,7]]},"assertion":[{"value":"22 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 October 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}