{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:49:15Z","timestamp":1760168955254,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,7,12]],"date-time":"2023-07-12T00:00:00Z","timestamp":1689120000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1815886"],"award-info":[{"award-number":["IIS-1815886"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000181","name":"Air Force Office of Scientific Research","doi-asserted-by":"publisher","award":["FA9550-19-1-0195"],"award-info":[{"award-number":["FA9550-19-1-0195"]}],"id":[{"id":"10.13039\/100000181","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,7,15]]},"DOI":"10.1145\/3583131.3590428","type":"proceedings-article","created":{"date-parts":[[2023,7,12]],"date-time":"2023-07-12T19:40:19Z","timestamp":1689190819000},"page":"402-410","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Novelty Seeking Multiagent Evolutionary Reinforcement Learning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6279-9266","authenticated-orcid":false,"given":"Ayhan Alp","family":"Aydeniz","sequence":"first","affiliation":[{"name":"Oregon State University, Corvallis, Oregon, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9888-178X","authenticated-orcid":false,"given":"Robert","family":"Loftin","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3809-7257","authenticated-orcid":false,"given":"Kagan","family":"Tumer","sequence":"additional","affiliation":[{"name":"Oregon State University, Corvallis, Oregon, United States of America"}]}],"member":"320","published-online":{"date-parts":[[2023,7,12]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2330163.2330306"},{"key":"e_1_3_2_1_2_1","volume-title":"Autonomous Agents and Multi-Agent Systems Conference.","author":"Agogino Adrian K","year":"2004","unstructured":"Adrian K Agogino and Kagan Tumer . 2004 . Unifying temporal and structural credit assignment problems . In Autonomous Agents and Multi-Agent Systems Conference. Adrian K Agogino and Kagan Tumer. 2004. Unifying temporal and structural credit assignment problems. In Autonomous Agents and Multi-Agent Systems Conference."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3520304.3529035"},{"key":"e_1_3_2_1_4_1","unstructured":"Adri\u00e0 Puigdom\u00e8nech Badia Pablo Sprechmann Alex Vitvitskyi Daniel Guo Bilal Piot Steven Kapturowski Olivier Tieleman Mart\u00edn Arjovsky Alexander Pritzel Andew Bolt etal 2020. Never give up: Learning directed exploration strategies. arXiv preprint arXiv:2002.06038 (2020).  Adri\u00e0 Puigdom\u00e8nech Badia Pablo Sprechmann Alex Vitvitskyi Daniel Guo Bilal Piot Steven Kapturowski Olivier Tieleman Mart\u00edn Arjovsky Alexander Pritzel Andew Bolt et al. 2020. Never give up: Learning directed exploration strategies. arXiv preprint arXiv:2002.06038 (2020)."},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems. Citeseer, 113--118","author":"Barto Andrew G","year":"2005","unstructured":"Andrew G Barto and Ozg\u00fcr Simsek . 2005 . Intrinsic motivation for reinforcement learning systems . In Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems. Citeseer, 113--118 . Andrew G Barto and Ozg\u00fcr Simsek. 2005. Intrinsic motivation for reinforcement learning systems. In Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems. Citeseer, 113--118."},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the 3rd International Conference on Development and Learning","author":"Barto Andrew G","year":"2004","unstructured":"Andrew G Barto , Satinder Singh , Nuttapong Chentanez , 2004 . Intrinsically motivated learning of hierarchical collections of skills . In Proceedings of the 3rd International Conference on Development and Learning . Piscataway, NJ, 112--19. Andrew G Barto, Satinder Singh, Nuttapong Chentanez, et al. 2004. Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of the 3rd International Conference on Development and Learning. Piscataway, NJ, 112--19."},{"key":"e_1_3_2_1_7_1","volume-title":"Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems 29","author":"Bellemare Marc","year":"2016","unstructured":"Marc Bellemare , Sriram Srinivasan , Georg Ostrovski , Tom Schaul , David Saxton , and Remi Munos . 2016. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems 29 ( 2016 ). Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. 2016. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems 29 (2016)."},{"key":"e_1_3_2_1_8_1","first-page":"13550","article-title":"Heuristic-guided reinforcement learning","volume":"34","author":"Cheng Ching-An","year":"2021","unstructured":"Ching-An Cheng , Andrey Kolobov , and Adith Swaminathan . 2021 . Heuristic-guided reinforcement learning . Advances in Neural Information Processing Systems 34 (2021), 13550 -- 13563 . Ching-An Cheng, Andrey Kolobov, and Adith Swaminathan. 2021. Heuristic-guided reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 13550--13563.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-018-9800-z"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1082473.1082599"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3321707.3321804"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/2615731.2615761"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3520304.3529062"},{"key":"e_1_3_2_1_14_1","volume-title":"Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070","author":"Eysenbach Benjamin","year":"2018","unstructured":"Benjamin Eysenbach , Abhishek Gupta , Julian Ibarz , and Sergey Levine . 2018. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070 ( 2018 ). Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070 (2018)."},{"key":"e_1_3_2_1_15_1","volume-title":"Eneko Osaba, and Iztok Fister.","author":"Fister Iztok","year":"2018","unstructured":"Iztok Fister , Andres Iglesias , Akemi Galvez , Javier Del Ser , Eneko Osaba, and Iztok Fister. 2018 . Using novelty search in differential evolution. In Highlights of Practical Applications of Agents, Multi-Agent Systems, and Complexity: The PAAMS Collection: International Workshops of PAAMS 2018, Toledo, Spain, June 20--22, 2018, Proceedings 16. Springer , 534--542. Iztok Fister, Andres Iglesias, Akemi Galvez, Javier Del Ser, Eneko Osaba, and Iztok Fister. 2018. Using novelty search in differential evolution. In Highlights of Practical Applications of Agents, Multi-Agent Systems, and Complexity: The PAAMS Collection: International Workshops of PAAMS 2018, Toledo, Spain, June 20--22, 2018, Proceedings 16. Springer, 534--542."},{"volume-title":"Evolutionary computation: toward a new philosophy of machine intelligence","author":"Fogel David B","key":"e_1_3_2_1_16_1","unstructured":"David B Fogel . 2006. Evolutionary computation: toward a new philosophy of machine intelligence . Vol. 1 . John Wiley & Sons . David B Fogel. 2006. Evolutionary computation: toward a new philosophy of machine intelligence. Vol. 1. John Wiley & Sons."},{"key":"e_1_3_2_1_17_1","volume-title":"Ex2: Exploration with exemplar models for deep reinforcement learning. Advances in neural information processing systems 30","author":"Fu Justin","year":"2017","unstructured":"Justin Fu , John Co-Reyes , and Sergey Levine . 2017. Ex2: Exploration with exemplar models for deep reinforcement learning. Advances in neural information processing systems 30 ( 2017 ). Justin Fu, John Co-Reyes, and Sergey Levine. 2017. Ex2: Exploration with exemplar models for deep reinforcement learning. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_18_1","volume-title":"International conference on machine learning. PMLR, 1587--1596","author":"Fujimoto Scott","year":"2018","unstructured":"Scott Fujimoto , Herke Hoof , and David Meger . 2018 . Addressing function approximation error in actor-critic methods . In International conference on machine learning. PMLR, 1587--1596 . Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587--1596."},{"key":"e_1_3_2_1_19_1","volume-title":"Novelty-driven cooperative coevolution. Evolutionary computation 25, 2","author":"Gomes Jorge","year":"2017","unstructured":"Jorge Gomes , Pedro Mariano , and Anders Lyhne Christensen . 2017. Novelty-driven cooperative coevolution. Evolutionary computation 25, 2 ( 2017 ), 275--307. Jorge Gomes, Pedro Mariano, and Anders Lyhne Christensen. 2017. Novelty-driven cooperative coevolution. Evolutionary computation 25, 2 (2017), 275--307."},{"key":"e_1_3_2_1_20_1","volume-title":"International conference on machine learning. PMLR","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , and Sergey Levine . 2018 . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor . In International conference on machine learning. PMLR , 1861--1870. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870."},{"key":"e_1_3_2_1_21_1","unstructured":"Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel etal 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).  Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)."},{"volume-title":"Robot Soccer World Cup","author":"Kalyanakrishnan Shivaram","key":"e_1_3_2_1_22_1","unstructured":"Shivaram Kalyanakrishnan and Peter Stone . 2009. Learning complementary multiagent behaviors: A case study . In Robot Soccer World Cup . Springer , 153--165. Shivaram Kalyanakrishnan and Peter Stone. 2009. Learning complementary multiagent behaviors: A case study. In Robot Soccer World Cup. Springer, 153--165."},{"key":"e_1_3_2_1_23_1","volume-title":"Evolution-guided policy gradient in reinforcement learning. Advances in Neural Information Processing Systems 31","author":"Khadka Shauharda","year":"2018","unstructured":"Shauharda Khadka and Kagan Tumer . 2018. Evolution-guided policy gradient in reinforcement learning. Advances in Neural Information Processing Systems 31 ( 2018 ). Shauharda Khadka and Kagan Tumer. 2018. Evolution-guided policy gradient in reinforcement learning. Advances in Neural Information Processing Systems 31 (2018)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8206234"},{"key":"e_1_3_2_1_25_1","volume-title":"Roulette-wheel selection via stochastic acceptance. Physica A: Statistical Mechanics and its Applications 391, 6","author":"Lipowski Adam","year":"2012","unstructured":"Adam Lipowski and Dorota Lipowska . 2012. Roulette-wheel selection via stochastic acceptance. Physica A: Statistical Mechanics and its Applications 391, 6 ( 2012 ), 2193--2196. Adam Lipowski and Dorota Lipowska. 2012. Roulette-wheel selection via stochastic acceptance. Physica A: Statistical Mechanics and its Applications 391, 6 (2012), 2193--2196."},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of the Second International Conference on Multi-Agent Systems. 181--188","author":"Liu Jyi-Shane","year":"1996","unstructured":"Jyi-Shane Liu and Katia P Sycara . 1996 . Multiagent coordination in tightly coupled task scheduling . In Proceedings of the Second International Conference on Multi-Agent Systems. 181--188 . Jyi-Shane Liu and Katia P Sycara. 1996. Multiagent coordination in tightly coupled task scheduling. In Proceedings of the Second International Conference on Multi-Agent Systems. 181--188."},{"key":"e_1_3_2_1_27_1","unstructured":"Robert Loftin Aadirupa Saha Sam Devlin and Katja Hofmann. 2021. Strategically efficient exploration in competitive multi-agent reinforcement learning. In Uncertainty in Artificial Intelligence. PMLR 1587--1596.  Robert Loftin Aadirupa Saha Sam Devlin and Katja Hofmann. 2021. Strategically efficient exploration in competitive multi-agent reinforcement learning. In Uncertainty in Artificial Intelligence. PMLR 1587--1596."},{"key":"e_1_3_2_1_28_1","volume-title":"Convergent temporal-difference learning with arbitrary smooth function approximation. Advances in neural information processing systems 22","author":"Maei Hamid","year":"2009","unstructured":"Hamid Maei , Csaba Szepesvari , Shalabh Bhatnagar , Doina Precup , David Silver , and Richard S Sutton . 2009. Convergent temporal-difference learning with arbitrary smooth function approximation. Advances in neural information processing systems 22 ( 2009 ). Hamid Maei, Csaba Szepesvari, Shalabh Bhatnagar, Doina Precup, David Silver, and Richard S Sutton. 2009. Convergent temporal-difference learning with arbitrary smooth function approximation. Advances in neural information processing systems 22 (2009)."},{"key":"e_1_3_2_1_29_1","volume-title":"International Conference on Machine Learning. PMLR, 6651--6660","author":"Majumdar Somdeb","year":"2020","unstructured":"Somdeb Majumdar , Shauharda Khadka , Santiago Miret , Stephen McAleer , and Kagan Tumer . 2020 . Evolutionary reinforcement learning for sample-efficient multiagent coordination . In International Conference on Machine Learning. PMLR, 6651--6660 . Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen McAleer, and Kagan Tumer. 2020. Evolutionary reinforcement learning for sample-efficient multiagent coordination. In International Conference on Machine Learning. PMLR, 6651--6660."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1080\/095281398146806"},{"key":"e_1_3_2_1_31_1","volume-title":"Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909","author":"Mouret Jean-Baptiste","year":"2015","unstructured":"Jean-Baptiste Mouret and Jeff Clune . 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 ( 2015 ). Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015)."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2012.67"},{"volume-title":"A concise introduction to decentralized POMDPs","author":"Oliehoek Frans A","key":"e_1_3_2_1_33_1","unstructured":"Frans A Oliehoek and Christopher Amato . 2016. A concise introduction to decentralized POMDPs . Springer . Frans A Oliehoek and Christopher Amato. 2016. A concise introduction to decentralized POMDPs. Springer."},{"key":"e_1_3_2_1_34_1","volume-title":"International conference on machine learning. PMLR, 2721--2730","author":"Ostrovski Georg","year":"2017","unstructured":"Georg Ostrovski , Marc G Bellemare , A\u00e4ron Oord , and R\u00e9mi Munos . 2017 . Count-based exploration with neural density models . In International conference on machine learning. PMLR, 2721--2730 . Georg Ostrovski, Marc G Bellemare, A\u00e4ron Oord, and R\u00e9mi Munos. 2017. Count-based exploration with neural density models. In International conference on machine learning. PMLR, 2721--2730."},{"key":"e_1_3_2_1_35_1","volume-title":"CEM-RL: Combining evolutionary and gradient-based methods for policy search. arXiv preprint arXiv:1810.01222","author":"Pourchot Alo\u00efs","year":"2018","unstructured":"Alo\u00efs Pourchot and Olivier Sigaud . 2018. CEM-RL: Combining evolutionary and gradient-based methods for policy search. arXiv preprint arXiv:1810.01222 ( 2018 ). Alo\u00efs Pourchot and Olivier Sigaud. 2018. CEM-RL: Combining evolutionary and gradient-based methods for policy search. arXiv preprint arXiv:1810.01222 (2018)."},{"key":"e_1_3_2_1_36_1","volume-title":"Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI","author":"Pugh Justin K","year":"2016","unstructured":"Justin K Pugh , Lisa B Soros , and Kenneth O Stanley . 2016. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI ( 2016 ), 40. Justin K Pugh, Lisa B Soros, and Kenneth O Stanley. 2016. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI (2016), 40."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2016.7759651"},{"key":"e_1_3_2_1_38_1","volume-title":"A mathematical theory of communication. The Bell system technical journal 27, 3","author":"Shannon Claude Elwood","year":"1948","unstructured":"Claude Elwood Shannon . 1948. A mathematical theory of communication. The Bell system technical journal 27, 3 ( 1948 ), 379--423. Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal 27, 3 (1948), 379--423."},{"key":"e_1_3_2_1_39_1","volume-title":"Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel.","author":"Tang Haoran","year":"2017","unstructured":"Haoran Tang , Rein Houthooft , Davis Foote , Adam Stooke , OpenAI Xi Chen , Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. 2017 . # exploration: A study of count-based exploration for deep reinforcement learning. Advances in neural information processing systems 30 (2017). Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. 2017. # exploration: A study of count-based exploration for deep reinforcement learning. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/544741.544832"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/1402298.1402315"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/1558013.1558099"},{"key":"e_1_3_2_1_43_1","volume-title":"Aaai","volume":"8","author":"Ziebart Brian D","year":"2008","unstructured":"Brian D Ziebart , Andrew L Maas , J Andrew Bagnell , Anind K Dey , 2008 . Maximum entropy inverse reinforcement learning .. In Aaai , Vol. 8 . Chicago, IL, USA, 1433--1438. Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, Anind K Dey, et al. 2008. Maximum entropy inverse reinforcement learning.. In Aaai, Vol. 8. Chicago, IL, USA, 1433--1438."}],"event":{"name":"GECCO '23: Genetic and Evolutionary Computation Conference","sponsor":["SIGEVO ACM Special Interest Group on Genetic and Evolutionary Computation"],"location":"Lisbon Portugal","acronym":"GECCO '23"},"container-title":["Proceedings of the Genetic and Evolutionary Computation Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3583131.3590428","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3583131.3590428","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:03Z","timestamp":1750178823000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3583131.3590428"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,12]]},"references-count":43,"alternative-id":["10.1145\/3583131.3590428","10.1145\/3583131"],"URL":"https:\/\/doi.org\/10.1145\/3583131.3590428","relation":{},"subject":[],"published":{"date-parts":[[2023,7,12]]},"assertion":[{"value":"2023-07-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}