{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T18:28:59Z","timestamp":1767637739388,"version":"3.48.0"},"reference-count":18,"publisher":"Maximum Academic Press","issue":"1","license":[{"start":{"date-parts":[[2016,2,11]],"date-time":"2016-02-11T00:00:00Z","timestamp":1455148800000},"content-version":"unspecified","delay-in-days":41,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2016,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Learning automata are reinforcement learners belonging to the class of policy iterators. They have already been shown to exhibit nice convergence properties in a wide range of discrete action game settings. Recently, a new formulation for a continuous action reinforcement learning automata (CARLA) was proposed. In this paper, we study the behavior of these CARLA in continuous action games and propose a novel method for coordinated exploration of the joint-action space. Our method allows a team of independent learners, using CARLA, to find the optimal joint action in common interest settings. We first show that independent agents using CARLA will converge to a local optimum of the continuous action game. We then introduce a method for coordinated exploration which allows the team of agents to find the global optimum of the game. We validate our approach in a number of experiments.<\/jats:p>","DOI":"10.1017\/s026988891500020x","type":"journal-article","created":{"date-parts":[[2016,2,11]],"date-time":"2016-02-11T20:33:12Z","timestamp":1455222792000},"page":"77-95","source":"Crossref","is-referenced-by-count":1,"title":["A reinforcement learning approach to coordinate exploration with limited communication in continuous action games"],"prefix":"10.48130","volume":"31","author":[{"given":"Abdel","family":"Rodr\u00edguez","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Vrancx","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ricardo","family":"Grau","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ann","family":"Now\u00e9","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"27968","published-online":{"date-parts":[[2016,2,11]]},"reference":[{"key":"S026988891500020X_ref18","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2008.08.017"},{"key":"S026988891500020X_ref17","unstructured":"Verbeeck K. 2004. Coordinated Exploration in Multi-Agent Reinforcement Learning. PhD thesis, Vrije Universiteit Brussel, Faculteit Wetenschappen, DINF, Computational Modeling Lab, September."},{"key":"S026988891500020X_ref16","doi-asserted-by":"publisher","DOI":"10.1007\/s00199-008-0338-8"},{"volume-title":"Foundations of the Theory of Learning Systems","year":"1973","author":"Tsypkin","key":"S026988891500020X_ref15"},{"volume-title":"Adaptation and Learning in Automatic systems","year":"1971","author":"Tsypkin","key":"S026988891500020X_ref14"},{"key":"S026988891500020X_ref12","first-page":"1345","article-title":"The behavior of finite automata in random media","volume":"22","author":"Tsetlin","year":"1961","journal-title":"Avtomatika i Telemekhanika"},{"key":"S026988891500020X_ref11","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4419-9052-5"},{"key":"S026988891500020X_ref10","unstructured":"Rodr\u00edguez A. , Grau R. & Now\u00e9 A. 2011. Continuous action reinforcement learning automata. Performance and convergence. In Proceedings of the Third International Conference on Agents and Artificial Intelligence, Filipe, J. & Fred, A. (eds). SciTePress, 473\u2013478."},{"key":"S026988891500020X_ref9","doi-asserted-by":"crossref","DOI":"10.1063\/1.3056709","volume-title":"Modern Probability Theory And Its Applications","author":"Parzen","year":"1960"},{"key":"S026988891500020X_ref8","unstructured":"Kapetanakis S. , Kudenko D. & Strens M. 2003. Learning to coordinate using commitment sequences in cooperative multiagent-systems. In Proceedings of the Third Symposium on Adaptive Agents and Multiagent Systems (AAMAS-03), 2004."},{"key":"S026988891500020X_ref7","doi-asserted-by":"publisher","DOI":"10.1016\/S0957-4158(97)00003-2"},{"key":"S026988891500020X_ref6","doi-asserted-by":"publisher","DOI":"10.1016\/S0967-0661(99)00141-0"},{"key":"S026988891500020X_ref2","doi-asserted-by":"crossref","unstructured":"Castelletti A. , Pianosi F. & Restelli M. 2012. Tree-based fitted Q-iteration for multi-objective Markov decision problems. In IJCNN, 1\u20138. IEEE.","DOI":"10.1109\/IJCNN.2012.6252759"},{"key":"S026988891500020X_ref1","doi-asserted-by":"publisher","DOI":"10.1037\/14496-000"},{"volume-title":"Theories of Learning","year":"1966","author":"Hilgard","key":"S026988891500020X_ref5"},{"key":"S026988891500020X_ref13","first-page":"1210","article-title":"The behavior of finite automata in random media","volume":"22","author":"Tsetlin","year":"1962","journal-title":"Avtomatika i Telemekhanika"},{"key":"S026988891500020X_ref3","unstructured":"Claus C. & Boutilier C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of National Conference on Artificial Intelligence (AAAI-98), 746\u2013752."},{"key":"S026988891500020X_ref4","doi-asserted-by":"publisher","DOI":"10.1037\/10757-000"}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S026988891500020X","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:42:06Z","timestamp":1767624126000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S026988891500020X\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,1]]},"references-count":18,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2016,1]]}},"alternative-id":["S026988891500020X"],"URL":"https:\/\/doi.org\/10.1017\/s026988891500020x","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"type":"print","value":"0269-8889"},{"type":"electronic","value":"1469-8005"}],"subject":[],"published":{"date-parts":[[2016,1]]}}}