{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:16:10Z","timestamp":1750306570284,"version":"3.41.0"},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T00:00:00Z","timestamp":1417996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006754","name":"U.S. Army Research Laboratory","doi-asserted-by":"publisher","award":["W911NF-11-1-0124"],"award-info":[{"award-number":["W911NF-11-1-0124"]}],"id":[{"id":"10.13039\/100006754","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Auton. Adapt. Syst."],"published-print":{"date-parts":[[2015,1,14]]},"abstract":"<jats:p>Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based\u2014limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.<\/jats:p>","DOI":"10.1145\/2668130","type":"journal-article","created":{"date-parts":[[2014,12,16]],"date-time":"2014-12-16T13:39:54Z","timestamp":1418737194000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Reinforcement Learning of Informed Initial Policies for Decentralized Planning"],"prefix":"10.1145","volume":"9","author":[{"given":"Landon","family":"Kraemer","sequence":"first","affiliation":[{"name":"University of Southern Mississippi, Hattiesburg, Mississippi"}]},{"given":"Bikramjit","family":"Banerjee","sequence":"additional","affiliation":[{"name":"University of Southern Mississippi, Hattiesburg, Mississippi"}]}],"member":"320","published-online":{"date-parts":[[2014,12,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS'09)","author":"Amato Christopher","year":"2009","unstructured":"Christopher Amato , Jilles Steeve Dibangoye , and Shlomo Zilberstein . 2009 . Incremental policy generation for finite-horizon DEC-POMDPs . In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS'09) . 2--9. Christopher Amato, Jilles Steeve Dibangoye, and Shlomo Zilberstein. 2009. Incremental policy generation for finite-horizon DEC-POMDPs. In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS'09). 2--9."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems. 593--600","author":"Amato Christopher","year":"2009","unstructured":"Christopher Amato and Shlomo Zilberstein . 2009 . Achieving goals in decentralized POMDPs . In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems. 593--600 . Christopher Amato and Shlomo Zilberstein. 2009. Achieving goals in decentralized POMDPs. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems. 593--600."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 27th AAAI Conference on Artificial Intelligence. 88--94","author":"Banerjee Bikramjit","year":"2013","unstructured":"Bikramjit Banerjee . 2013 . Pruning for Monte Carlo distributed reinforcement learning in decentralized POMDPs . In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 88--94 . Bikramjit Banerjee. 2013. Pruning for Monte Carlo distributed reinforcement learning in decentralized POMDPs. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 88--94."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 26th AAAI Conference on Artificial Intelligence. 1256--1262","author":"Banerjee Bikramjit","year":"2012","unstructured":"Bikramjit Banerjee , Jeremy Lyle , Landon Kraemer , and Rajesh Yellamraju . 2012 . Sample bounded distributed reinforcement learning for decentralized POMDPs . In Proceedings of the 26th AAAI Conference on Artificial Intelligence. 1256--1262 . Bikramjit Banerjee, Jeremy Lyle, Landon Kraemer, and Rajesh Yellamraju. 2012. Sample bounded distributed reinforcement learning for decentralized POMDPs. In Proceedings of the 26th AAAI Conference on Artificial Intelligence. 1256--1262."},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.1287\/moor.27.4.819.297"},{"key":"e_1_2_1_6_1","volume-title":"MARL Toolbox Ver. 1.3. Retrieved","author":"Busoniu Lucian","year":"2014","unstructured":"Lucian Busoniu . 2010. MARL Toolbox Ver. 1.3. Retrieved November 3, 2014 , from http:\/\/busoniu.net\/repository.php. Lucian Busoniu. 2010. MARL Toolbox Ver. 1.3. Retrieved November 3, 2014, from http:\/\/busoniu.net\/repository.php."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 10th National Conference on Artificial Intelligence. 183--188","author":"Chrisman Lonnie","year":"1992","unstructured":"Lonnie Chrisman . 1992 . Reinforcement learning with perceptual aliasing: The perceptual distinctions approach . In Proceedings of the 10th National Conference on Artificial Intelligence. 183--188 . Lonnie Chrisman. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the 10th National Conference on Artificial Intelligence. 183--188."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 15th National Conference on Artificial Intelligence. 746--752","author":"Claus Caroline","year":"1998","unstructured":"Caroline Claus and Craig Boutilier . 1998 . The dynamics of reinforcement learning in cooperative multiagent systems . In Proceedings of the 15th National Conference on Artificial Intelligence. 746--752 . Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence. 746--752."},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1016\/S0004-3702(98)00023-X"},{"volume-title":"Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG'05)","author":"Kok Jelle R.","unstructured":"Jelle R. Kok , Pieter Jan't Hoen , Bram Bakker , and Nikos A. Vlassis . 2005. Utile coordination: Learning interdependencies among cooperative agents . In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG'05) . 29--36. Jelle R. Kok, Pieter Jan't Hoen, Bram Bakker, and Nikos A. Vlassis. 2005. Utile coordination: Learning interdependencies among cooperative agents. In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG'05). 29--36.","key":"e_1_2_1_10_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_12_1","DOI":"10.1016\/j.artint.2011.05.001"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI'99)","author":"Meuleau Nicolas","year":"1999","unstructured":"Nicolas Meuleau , Leonid Peshkin , Kee-Eung Kim , and Leslie Pack Kaelbling . 1999 . Learning finite-state controllers for partially observable environments . In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI'99) . 427--436. Nicolas Meuleau, Leonid Peshkin, Kee-Eung Kim, and Leslie Pack Kaelbling. 1999. Learning finite-state controllers for partially observable environments. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI'99). 427--436."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03)","author":"Nair Ranjit","year":"2003","unstructured":"Ranjit Nair , Milind Tambe , Makoto Yokoo , David Pynadath , and Stacy Marsella . 2003 . Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings . In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03) . 705--711. Ranjit Nair, Milind Tambe, Makoto Yokoo, David Pynadath, and Stacy Marsella. 2003. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03). 705--711."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI'07)","author":"Seuken Sven","year":"2007","unstructured":"Sven Seuken and Shlomo Zilberstein . 2007 . Improved memory-bounded dynamic programming for decentralized POMDPs . In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI'07) . 344--351. Sven Seuken and Shlomo Zilberstein. 2007. Improved memory-bounded dynamic programming for decentralized POMDPs. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI'07). 344--351."},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1007\/11564096_35"},{"key":"e_1_2_1_18_1","volume-title":"Dec-POMDP Problem Domains. Retrieved","author":"Spaan Matthijs T. J.","year":"2014","unstructured":"Matthijs T. J. Spaan . 2013. Dec-POMDP Problem Domains. Retrieved November 3, 2014 , from http:\/\/masplan.org\/problem_domains. Matthijs T. J. Spaan. 2013. Dec-POMDP Problem Domains. Retrieved November 3, 2014, from http:\/\/masplan.org\/problem_domains."},{"doi-asserted-by":"publisher","key":"e_1_2_1_19_1","DOI":"10.5555\/2283696.2283739"},{"key":"e_1_2_1_20_1","volume-title":"Barto","author":"Sutton Richard","year":"1998","unstructured":"Richard Sutton and Andrew G . Barto . 1998 . Reinforcement Learning : An Introduction. MIT Press , Cambridge, MA. Richard Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS'09)","author":"Varakantham Pradeep","year":"2009","unstructured":"Pradeep Varakantham , Jun Young Kwak , Matthew E. Taylor , Janusz Marecki , Paul Scerri , and Milind Tambe . 2009 . Exploiting coordination locales in distributed POMDPs via social model shaping . In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS'09) . http:\/\/dblp.uni-trier.de\/db\/conf\/aips\/icaps2009.html#VarakanthamKTMST09. Pradeep Varakantham, Jun Young Kwak, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. 2009. Exploiting coordination locales in distributed POMDPs via social model shaping. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS'09). http:\/\/dblp.uni-trier.de\/db\/conf\/aips\/icaps2009.html#VarakanthamKTMST09."},{"volume-title":"A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence","author":"Vlassis Nikos","unstructured":"Nikos Vlassis . 2003. A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence . Morgan and Claypool Publishers . Nikos Vlassis. 2003. A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Morgan and Claypool Publishers.","key":"e_1_2_1_22_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.1007\/BF00992698"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI'10)","author":"Wu Feng","year":"2010","unstructured":"Feng Wu , Shlomo Zilberstein , and Xiaoping Chen . 2010 . Rollout sampling policy iteration for decentralized POMDPs . In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI'10) . 666--673. Feng Wu, Shlomo Zilberstein, and Xiaoping Chen. 2010. Rollout sampling policy iteration for decentralized POMDPs. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI'10). 666--673."},{"volume-title":"Strategic Learning and Its Limits","author":"Young H. Peyton","unstructured":"H. Peyton Young . 2004. Strategic Learning and Its Limits . Oxford University Press . H. Peyton Young. 2004. Strategic Learning and Its Limits. Oxford University Press.","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 25th AAI Conference on Artificial Intelligence (AAAI'11)","author":"Zhang Chongjie","year":"2011","unstructured":"Chongjie Zhang and Victor Lesser . 2011 . Coordinated multi-agent reinforcement learning in networked distributed POMDPs . In Proceedings of the 25th AAI Conference on Artificial Intelligence (AAAI'11) . 764--770. Chongjie Zhang and Victor Lesser. 2011. Coordinated multi-agent reinforcement learning in networked distributed POMDPs. In Proceedings of the 25th AAI Conference on Artificial Intelligence (AAAI'11). 764--770."}],"container-title":["ACM Transactions on Autonomous and Adaptive Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2668130","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2668130","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:22Z","timestamp":1750227202000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2668130"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12,8]]},"references-count":24,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,14]]}},"alternative-id":["10.1145\/2668130"],"URL":"https:\/\/doi.org\/10.1145\/2668130","relation":{},"ISSN":["1556-4665","1556-4703"],"issn-type":[{"type":"print","value":"1556-4665"},{"type":"electronic","value":"1556-4703"}],"subject":[],"published":{"date-parts":[[2014,12,8]]},"assertion":[{"value":"2013-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-12-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}