{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T02:06:12Z","timestamp":1769825172872,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":63,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T00:00:00Z","timestamp":1685664000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["CCF1565235,CCF1918421"],"award-info":[{"award-number":["CCF1565235,CCF1918421"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,2]]},"DOI":"10.1145\/3564246.3585099","type":"proceedings-article","created":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T17:34:20Z","timestamp":1684258460000},"page":"349-362","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Planning and Learning in Partially Observable Systems via Filter Stability"],"prefix":"10.1145","author":[{"given":"Noah","family":"Golowich","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"given":"Ankur","family":"Moitra","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"given":"Dhruv","family":"Rohatgi","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,6,2]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Ioannis Anagnostides Gabriele Farina and Tuomas Sandholm. 2022. Near-Optimal Phi-Regret Learning in Extensive-Form Games. \t\t\t\t  Ioannis Anagnostides Gabriele Farina and Tuomas Sandholm. 2022. Near-Optimal Phi-Regret Learning in Extensive-Form Games."},{"key":"e_1_3_2_1_2_1","article-title":"Tensor Decompositions for Learning Latent Variable Models","volume":"15","author":"Anandkumar Animashree","year":"2014","unstructured":"Animashree Anandkumar , Rong Ge , Daniel Hsu , Sham M. Kakade , and Matus Telgarsky . 2014 . Tensor Decompositions for Learning Latent Variable Models . J. Mach. Learn. Res. , 15 , 1 (2014), jan, 2773\u20132832. issn:1532-4435 Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2014. Tensor Decompositions for Learning Latent Variable Models. J. Mach. Learn. Res., 15, 1 (2014), jan, 2773\u20132832. issn:1532-4435","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the 25th Annual Conference on Learning Theory, Shie Mannor, Nathan Srebro, and Robert C. Williamson (Eds.) (Proceedings of Machine Learning Research","volume":"33","author":"Anandkumar Animashree","unstructured":"Animashree Anandkumar , Daniel Hsu , and Sham M. Kakade . 2012. A Method of Moments for Mixture Models and Hidden Markov Models . In Proceedings of the 25th Annual Conference on Learning Theory, Shie Mannor, Nathan Srebro, and Robert C. Williamson (Eds.) (Proceedings of Machine Learning Research , Vol. 23). PMLR, Edinburgh, Scotland. 33 .1\u201333.34. Animashree Anandkumar, Daniel Hsu, and Sham M. Kakade. 2012. A Method of Moments for Mixture Models and Hidden Markov Models. In Proceedings of the 25th Annual Conference on Learning Theory, Shie Mannor, Nathan Srebro, and Robert C. Williamson (Eds.) (Proceedings of Machine Learning Research, Vol. 23). PMLR, Edinburgh, Scotland. 33.1\u201333.34."},{"key":"e_1_3_2_1_4_1","volume-title":"International Conference on Machine Learning. 2859\u20132867","author":"Arora Sanjeev","year":"2016","unstructured":"Sanjeev Arora , Rong Ge , Frederic Koehler , Tengyu Ma , and Ankur Moitra . 2016 . Provable algorithms for inference in topic models . In International Conference on Machine Learning. 2859\u20132867 . Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, and Ankur Moitra. 2016. Provable algorithms for inference in topic models. In International Conference on Machine Learning. 2859\u20132867."},{"key":"e_1_3_2_1_5_1","volume-title":"Annales de l","author":"Atar Rami","unstructured":"Rami Atar and Ofer Zeitouni . 1997. Exponential stability for nonlinear filtering . In Annales de l \u2019 Institut Henri Poincare (B) Probability and Statistics . 33, 697\u2013725. Rami Atar and Ofer Zeitouni. 1997. Exponential stability for nonlinear filtering. In Annales de l\u2019Institut Henri Poincare (B) Probability and Statistics. 33, 697\u2013725."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0363012994272046"},{"key":"e_1_3_2_1_7_1","volume-title":"Conference on Learning Theory. 193\u2013256","author":"Azizzadenesheli Kamyar","year":"2016","unstructured":"Kamyar Azizzadenesheli , Alessandro Lazaric , and Animashree Anandkumar . 2016 . Reinforcement learning of POMDPs using spectral methods . In Conference on Learning Theory. 193\u2013256 . Kamyar Azizzadenesheli, Alessandro Lazaric, and Animashree Anandkumar. 2016. Reinforcement learning of POMDPs using spectral methods. In Conference on Learning Theory. 193\u2013256."},{"key":"e_1_3_2_1_8_1","volume-title":"Conference on Learning Theory. 742\u2013778","author":"Bhaskara Aditya","year":"2014","unstructured":"Aditya Bhaskara , Moses Charikar , and Aravindan Vijayaraghavan . 2014 . Uniqueness of tensor decompositions with applications to polynomial identifiability . In Conference on Learning Theory. 742\u2013778 . Aditya Bhaskara, Moses Charikar, and Aravindan Vijayaraghavan. 2014. Uniqueness of tensor decompositions with applications to polynomial identifiability. In Conference on Learning Theory. 742\u2013778."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Aditya Bhaskara Aidao Chen Aidan Perreault and Aravindan Vijayaraghavan. 2019. Smoothed Analysis in Unsupervised Learning via Decoupling. 582\u2013610. \t\t\t\t  Aditya Bhaskara Aidao Chen Aidan Perreault and Aravindan Vijayaraghavan. 2019. Smoothed Analysis in Unsupervised Learning via Decoupling. 582\u2013610.","DOI":"10.1109\/FOCS.2019.00043"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2074094.2074099"},{"key":"e_1_3_2_1_11_1","volume-title":"Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359, 6374","author":"Brown Noam","year":"2018","unstructured":"Noam Brown and Tuomas Sandholm . 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359, 6374 ( 2018 ), 418\u2013424. Noam Brown and Tuomas Sandholm. 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359, 6374 (2018), 418\u2013424."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(95)00158-1"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.1996.571080"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Sitan Chen Frederic Koehler Ankur Moitra and Morris Yau. 2021. Kalman Filtering with Adversarial Corruptions. arXiv preprint arXiv:2111.06395. \t\t\t\t  Sitan Chen Frederic Koehler Ankur Moitra and Morris Yau. 2021. Kalman Filtering with Adversarial Corruptions. arXiv preprint arXiv:2111.06395.","DOI":"10.1145\/3519935.3520050"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1516512.1516516"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1137\/070699652"},{"key":"e_1_3_2_1_17_1","unstructured":"Eyal Even-Dar Sham M Kakade and Yishay Mansour. 2007. The Value of Observation for Monitoring Dynamic Systems.. In IJCAI. 2474\u20132479. \t\t\t\t  Eyal Even-Dar Sham M Kakade and Yishay Mansour. 2007. The Value of Observation for Monitoring Dynamic Systems.. In IJCAI. 2474\u20132479."},{"key":"e_1_3_2_1_18_1","volume-title":"POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning. CoRR, abs\/2001.04032","author":"Futoma Joseph","year":"2020","unstructured":"Joseph Futoma , Michael C. Hughes , and Finale Doshi-Velez . 2020 . POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning. CoRR, abs\/2001.04032 (2020), arxiv:2001.04032. Joseph Futoma, Michael C. Hughes, and Finale Doshi-Velez. 2020. POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning. CoRR, abs\/2001.04032 (2020), arxiv:2001.04032."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Neha Priyadarshini Garg David Hsu and Wee Sun Lee. 2019. DESPOT-Alpha: Online POMDP Planning with Large State and Observation Spaces.. In Robotics: Science and Systems. \t\t\t\t  Neha Priyadarshini Garg David Hsu and Wee Sun Lee. 2019. DESPOT-Alpha: Online POMDP Planning with Large State and Observation Spaces.. In Robotics: Science and Systems.","DOI":"10.15607\/RSS.2019.XV.006"},{"key":"e_1_3_2_1_20_1","unstructured":"Noah Golowich Ankur Moitra and Dhruv Rohatgi. 2022. Learning in Observable POMDPs without Computationally Intractable Oracles. \t\t\t\t  Noah Golowich Ankur Moitra and Dhruv Rohatgi. 2022. Learning in Observable POMDPs without Computationally Intractable Oracles."},{"key":"e_1_3_2_1_21_1","unstructured":"Zhaohan Daniel Guo Shayan Doroudi and Emma Brunskill. 2016. A pac rl algorithm for episodic pomdps. In Artificial Intelligence and Statistics. 510\u2013518. \t\t\t\t  Zhaohan Daniel Guo Shayan Doroudi and Emma Brunskill. 2016. A pac rl algorithm for episodic pomdps. In Artificial Intelligence and Statistics. 510\u2013518."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/2074094.2074119"},{"key":"e_1_3_2_1_23_1","series-title":"In 2015 aaai fall symposium series","volume-title":"Deep recurrent q-learning for partially observable mdps","author":"Hausknecht Matthew","unstructured":"Matthew Hausknecht and Peter Stone . 2015. Deep recurrent q-learning for partially observable mdps . In 2015 aaai fall symposium series . Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. In 2015 aaai fall symposium series."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622262.1622264"},{"key":"e_1_3_2_1_25_1","volume-title":"Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial intelligence in medicine, 18, 3","author":"Hauskrecht Milos","year":"2000","unstructured":"Milos Hauskrecht and Hamish Fraser . 2000. Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial intelligence in medicine, 18, 3 ( 2000 ), 221\u2013244. Milos Hauskrecht and Hamish Fraser. 2000. Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial intelligence in medicine, 18, 3 (2000), 221\u2013244."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.2000.1727"},{"key":"e_1_3_2_1_27_1","unstructured":"Tommi Jaakkola Satinder P Singh and Michael I Jordan. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. Advances in neural information processing systems 345\u2013352. \t\t\t\t  Tommi Jaakkola Satinder P Singh and Michael I Jordan. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. Advances in neural information processing systems 345\u2013352."},{"key":"e_1_3_2_1_28_1","unstructured":"Chi Jin Sham M Kakade Akshay Krishnamurthy and Qinghua Liu. 2020. Sample-efficient reinforcement learning of undercomplete POMDPs. arXiv preprint arXiv:2006.12484. \t\t\t\t  Chi Jin Sham M Kakade Akshay Krishnamurthy and Qinghua Liu. 2020. Sample-efficient reinforcement learning of undercomplete POMDPs. arXiv preprint arXiv:2006.12484."},{"key":"e_1_3_2_1_29_1","volume-title":"Planning and acting in partially observable stochastic domains. Artificial intelligence, 101, 1-2","author":"Kaelbling Leslie Pack","year":"1998","unstructured":"Leslie Pack Kaelbling , Michael L Littman , and Anthony R Cassandra . 1998. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101, 1-2 ( 1998 ), 99\u2013134. Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101, 1-2 (1998), 99\u2013134."},{"key":"e_1_3_2_1_30_1","unstructured":"Daniel Kane Sihan Liu Shachar Lovett and Gaurav Mahajan. 2022. Computational-statistical gaps in reinforcement learning. arXiv preprint arXiv:2202.05444. \t\t\t\t  Daniel Kane Sihan Liu Shachar Lovett and Gaurav Mahajan. 2022. Computational-statistical gaps in reinforcement learning. arXiv preprint arXiv:2202.05444."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2007.04.013"},{"key":"e_1_3_2_1_32_1","unstructured":"Tadashi Kozuno Pierre MENARD Remi Munos and Michal Valko. 2021. Learning in two-player zero-sum partially observable Markov games with perfect recall. In Advances in Neural Information Processing Systems A. Beygelzimer Y. Dauphin P. Liang and J. Wortman Vaughan (Eds.). https:\/\/openreview.net\/forum?id=1LLemKrsgQp \t\t\t\t  Tadashi Kozuno Pierre MENARD Remi Munos and Michal Valko. 2021. Learning in two-player zero-sum partially observable Markov games with perfect recall. In Advances in Neural Information Processing Systems A. Beygelzimer Y. Dauphin P. Liang and J. Wortman Vaughan (Eds.). https:\/\/openreview.net\/forum?id=1LLemKrsgQp"},{"key":"e_1_3_2_1_33_1","unstructured":"Akshay Krishnamurthy Alekh Agarwal and John Langford. 2016. PAC reinforcement learning with rich observations. arXiv preprint arXiv:1602.02722. \t\t\t\t  Akshay Krishnamurthy Alekh Agarwal and John Langford. 2016. PAC reinforcement learning with rich observations. arXiv preprint arXiv:1602.02722."},{"key":"e_1_3_2_1_34_1","unstructured":"Jeongyeol Kwon Yonathan Efroni Constantine Caramanis and Shie Mannor. 2021. Reinforcement Learning in Reward-Mixing MDPs. arXiv preprint arXiv:2110.03743. \t\t\t\t  Jeongyeol Kwon Yonathan Efroni Constantine Caramanis and Shie Mannor. 2021. Reinforcement Learning in Reward-Mixing MDPs. arXiv preprint arXiv:2110.03743."},{"key":"e_1_3_2_1_35_1","unstructured":"Jeongyeol Kwon Yonathan Efroni Constantine Caramanis and Shie Mannor. 2021. RL for Latent MDPs: Regret Guarantees and a Lower Bound. arXiv preprint arXiv:2102.04939. \t\t\t\t  Jeongyeol Kwon Yonathan Efroni Constantine Caramanis and Shie Mannor. 2021. RL for Latent MDPs: Regret Guarantees and a Lower Bound. arXiv preprint arXiv:2102.04939."},{"key":"e_1_3_2_1_36_1","volume-title":"Memoryless policies: Theoretical limitations and practical results. From animals to animats, 3","author":"Littman Michael L","year":"1994","unstructured":"Michael L Littman . 1994. Memoryless policies: Theoretical limitations and practical results. From animals to animats, 3 ( 1994 ), 238\u2013245. Michael L Littman. 1994. Memoryless policies: Theoretical limitations and practical results. From animals to animats, 3 (1994), 238\u2013245."},{"key":"e_1_3_2_1_37_1","unstructured":"Qinghua Liu Alan Chung Csaba Szepesv\u00e1ri and Chi Jin. 2022. When Is Partially Observable Reinforcement Learning Not Scary? arXiv preprint arXiv:2204.08967. \t\t\t\t  Qinghua Liu Alan Chung Csaba Szepesv\u00e1ri and Chi Jin. 2022. When Is Partially Observable Reinforcement Learning Not Scary? arXiv preprint arXiv:2204.08967."},{"key":"e_1_3_2_1_38_1","unstructured":"Qinghua Liu Csaba Szepesv\u00e1ri and Chi Jin. 2022. Sample-Efficient Reinforcement Learning of Partially Observable Markov Games. \t\t\t\t  Qinghua Liu Csaba Szepesv\u00e1ri and Chi Jin. 2022. Sample-Efficient Reinforcement Learning of Partially Observable Markov Games."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622394.1622398"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1214\/20-ECP333"},{"key":"e_1_3_2_1_41_1","volume-title":"Mitter and Irvin Cemil Schick","author":"Sanjoy","year":"1992","unstructured":"Sanjoy K. Mitter and Irvin Cemil Schick . 1992 . Point Estimation, Stochastic Approximation , and Robust Kalman Filtering. In Systems, Models and Feedback: Theory and Applications . Sanjoy K. Mitter and Irvin Cemil Schick. 1992. Point Estimation, Stochastic Approximation, and Robust Kalman Filtering. In Systems, Models and Feedback: Theory and Applications."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1080\/00207178008961063"},{"key":"e_1_3_2_1_43_1","unstructured":"Tianwei Ni Benjamin Eysenbach and Ruslan Salakhutdinov. 2021. Recurrent Model-Free RL is a Strong Baseline for Many POMDPs. arXiv preprint arXiv:2110.05038. \t\t\t\t  Tianwei Ni Benjamin Eysenbach and Ruslan Salakhutdinov. 2021. Recurrent Model-Free RL is a Strong Baseline for Many POMDPs. arXiv preprint arXiv:2110.05038."},{"key":"e_1_3_2_1_44_1","volume-title":"The complexity of Markov decision processes. Mathematics of operations research, 12, 3","author":"Papadimitriou Christos H","year":"1987","unstructured":"Christos H Papadimitriou and John N Tsitsiklis . 1987. The complexity of Markov decision processes. Mathematics of operations research, 12, 3 ( 1987 ), 441\u2013450. Christos H Papadimitriou and John N Tsitsiklis. 1987. The complexity of Markov decision processes. Mathematics of operations research, 12, 3 (1987), 441\u2013450."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.2078"},{"key":"e_1_3_2_1_46_1","series-title":"Lecture Notes for ECE563 (UIUC) and, 6","volume-title":"Lecture notes on information theory","author":"Polyanskiy Yury","year":"2012","unstructured":"Yury Polyanskiy and Yihong Wu. 2014. Lecture notes on information theory . Lecture Notes for ECE563 (UIUC) and, 6 , 2012 -2016 (2014), 7. Yury Polyanskiy and Yihong Wu. 2014. Lecture notes on information theory. Lecture Notes for ECE563 (UIUC) and, 6, 2012-2016 (2014), 7."},{"key":"e_1_3_2_1_47_1","unstructured":"Pascal Poupart and Craig Boutilier. 2004. VDCBPI: an Approximate Scalable Algorithm for Large POMDPs.. In NIPS. 1081\u20131088. \t\t\t\t  Pascal Poupart and Craig Boutilier. 2004. VDCBPI: an Approximate Scalable Algorithm for Large POMDPs.. In NIPS. 1081\u20131088."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622673.1622690"},{"key":"e_1_3_2_1_49_1","first-page":"1667","article-title":"Exponential family PCA for belief compression in POMDPs","volume":"15","author":"Roy Nicholas","year":"2002","unstructured":"Nicholas Roy and Geoffrey J Gordon . 2002 . Exponential family PCA for belief compression in POMDPs . Advances in Neural Information Processing Systems , 15 (2002), 1667 \u2013 1674 . Nicholas Roy and Geoffrey J Gordon. 2002. Exponential family PCA for belief compression in POMDPs. Advances in Neural Information Processing Systems, 15 (2002), 1667\u20131674.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_50_1","unstructured":"\u00c4 Schick. 1989.\n  Robust recursive estimation of the state of a discrete-time stochastic linear dynamic system in the presence of heavy-tailed observation noise\n  . Ph.D. Dissertation. \n  Massachusetts Institute of Technology\n  . \t\t\t\t  \u00c4 Schick. 1989. Robust recursive estimation of the state of a discrete-time stochastic linear dynamic system in the presence of heavy-tailed observation noise. Ph.D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Irvin C Schick and Sanjoy K Mitter. 1994. Robust recursive estimation in the presence of heavy-tailed observation noise. The annals of statistics 1045\u20131080. \t\t\t\t  Irvin C Schick and Sanjoy K Mitter. 1994. Robust recursive estimation in the presence of heavy-tailed observation noise. The annals of statistics 1045\u20131080.","DOI":"10.1214\/aos\/1176325511"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.39.10.1953"},{"key":"e_1_3_2_1_53_1","volume-title":"Advances in Neural Information Processing Systems","author":"Sharan Vatsal","unstructured":"Vatsal Sharan , Sham M Kakade , Percy S Liang , and Gregory Valiant . 2017. Learning Overcomplete HMMs . In Advances in Neural Information Processing Systems , I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc. . Vatsal Sharan, Sham M Kakade, Percy S Liang, and Gregory Valiant. 2017. Learning Overcomplete HMMs. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc.."},{"key":"e_1_3_2_1_54_1","volume-title":"Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations","author":"Shoham Yoav","year":"2009","unstructured":"Yoav Shoham and Kevin Leyton-Brown . 2009 . Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations . Cambridge University Press , Cambridge, UK . isbn:978-0-521-89943-7 Yoav Shoham and Kevin Leyton-Brown. 2009. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, UK. isbn:978-0-521-89943-7"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.705429"},{"key":"e_1_3_2_1_56_1","unstructured":"David Silver and Joel Veness. 2010. Monte-Carlo planning in large POMDPs. \t\t\t\t  David Silver and Joel Veness. 2010. Monte-Carlo planning in large POMDPs."},{"key":"e_1_3_2_1_57_1","unstructured":"Trey Smith and Reid Simmons. 2012. Heuristic search value iteration for POMDPs. arXiv preprint arXiv:1207.4166. \t\t\t\t  Trey Smith and Reid Simmons. 2012. Heuristic search value iteration for POMDPs. arXiv preprint arXiv:1207.4166."},{"key":"e_1_3_2_1_58_1","volume-title":"DESPOT: Online POMDP planning with regularization. Advances in neural information processing systems, 26","author":"Somani Adhiraj","year":"2013","unstructured":"Adhiraj Somani , Nan Ye , David Hsu , and Wee Sun Lee . 2013 . DESPOT: Online POMDP planning with regularization. Advances in neural information processing systems, 26 (2013), 1772\u20131780. Adhiraj Somani, Nan Ye, David Hsu, and Wee Sun Lee. 2013. DESPOT: Online POMDP planning with regularization. Advances in neural information processing systems, 26 (2013), 1772\u20131780."},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622519.1622525"},{"key":"e_1_3_2_1_60_1","first-page":"775","article-title":"Approximate planning in POMDPs with macro-actions","volume":"16","author":"Theocharous Georgios","year":"2003","unstructured":"Georgios Theocharous and Leslie Kaelbling . 2003 . Approximate planning in POMDPs with macro-actions . Advances in Neural Information Processing Systems , 16 (2003), 775 \u2013 782 . Georgios Theocharous and Leslie Kaelbling. 2003. Approximate planning in POMDPs with macro-actions. Advances in Neural Information Processing Systems, 16 (2003), 775\u2013782.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_61_1","volume-title":"Observability and nonlinear filtering. Probability theory and related fields, 145, 1-2","author":"Handel Ramon Van","year":"2009","unstructured":"Ramon Van Handel . 2009. Observability and nonlinear filtering. Probability theory and related fields, 145, 1-2 ( 2009 ), 35\u201374. Ramon Van Handel. 2009. Observability and nonlinear filtering. Probability theory and related fields, 145, 1-2 (2009), 35\u201374."},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1214\/08-AAP576"},{"key":"e_1_3_2_1_63_1","unstructured":"Yi Xiong Ningyuan Chen Xuefeng Gao and Xiang Zhou. 2021. Sublinear regret for learning POMDPs. arXiv preprint arXiv:2107.03635. \t\t\t\t  Yi Xiong Ningyuan Chen Xuefeng Gao and Xiang Zhou. 2021. Sublinear regret for learning POMDPs. arXiv preprint arXiv:2107.03635."}],"event":{"name":"STOC '23: 55th Annual ACM Symposium on Theory of Computing","location":"Orlando FL USA","acronym":"STOC '23","sponsor":["SIGACT ACM Special Interest Group on Algorithms and Computation Theory"]},"container-title":["Proceedings of the 55th Annual ACM Symposium on Theory of Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585099","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3564246.3585099","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3564246.3585099","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:33Z","timestamp":1750295433000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585099"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,2]]},"references-count":63,"alternative-id":["10.1145\/3564246.3585099","10.1145\/3564246"],"URL":"https:\/\/doi.org\/10.1145\/3564246.3585099","relation":{},"subject":[],"published":{"date-parts":[[2023,6,2]]},"assertion":[{"value":"2023-06-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}