{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T09:40:33Z","timestamp":1767174033474,"version":"build-2238731810"},"reference-count":93,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2014,3,14]],"date-time":"2014-03-14T00:00:00Z","timestamp":1394755200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"},{"start":{"date-parts":[[2014,3,14]],"date-time":"2014-03-14T00:00:00Z","timestamp":1394755200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Adapt Syst Model"],"published-print":{"date-parts":[[2014,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Purpose<\/jats:title>\n                    <jats:p>Sampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>Although mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here we discuss two important features of this approach. First, we show in how far such generalized Thompson sampling can be regarded as an optimal strategy under limited information processing capabilities that constrain the sampling complexity of the decision-making process. Second, we show how such Thompson sampling can be extended to solve causal inference problems when interacting with an environment in a sequential fashion.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/2194-3206-2-2","type":"journal-article","created":{"date-parts":[[2014,3,14]],"date-time":"2014-03-14T14:08:55Z","timestamp":1394806135000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Generalized Thompson sampling for sequential decision-making and causal inference"],"prefix":"10.1186","volume":"2","author":[{"given":"Pedro A","family":"Ortega","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel A","family":"Braun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2014,3,14]]},"reference":[{"key":"18_CR1","volume-title":"JMLR: Workshop and Conference Proceedings vol 23 (2012) 39.1\u201339.26. 25th Annual Conference on Learning Theory","author":"S Agrawal","year":"2011","unstructured":"Agrawal S, Goyal N: Analysis of Thompson sampling for the multi-armed bandit problem. JMLR: Workshop and Conference Proceedings vol 23 (2012) 39.1\u201339.26. 25th Annual Conference on Learning Theory 2011."},{"key":"18_CR2","first-page":"19","volume-title":"Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence","author":"J Asmuth","year":"2009","unstructured":"Asmuth J, Li L, Littman ML, Nouri A, Wingate D: A Bayesian s+ in reinforcement learning. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. UAI \u201909, Arlington, Virginia, United States: AUAI Press; 2009:19\u201326."},{"key":"18_CR3","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1023\/A:1013689704352","volume":"47","author":"P Auer","year":"2002","unstructured":"Auer P, Cesa-Bianchi N, Fisher P: Finite-time analysis of the multiarmed bandit problem. Machine Learning 2002, 47: 235\u2013256. 10.1023\/A:1013689704352","journal-title":"Machine Learning"},{"issue":"1-2","key":"18_CR4","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1006\/game.1997.0585","volume":"21","author":"RJ Aumann","year":"1997","unstructured":"Aumann RJ: Rationality and bounded rationality. Games and Econ Behavior 1997, 21(1\u20132):2\u201314. 10.1006\/game.1997.0585","journal-title":"Games and Econ Behavior"},{"issue":"4","key":"18_CR5","doi-asserted-by":"publisher","first-page":"625","DOI":"10.2307\/1266635","volume":"8","author":"G Box","year":"1966","unstructured":"Box G: Use and abuse of regression. Technometrics 1966, 8(4):625\u2013629. 10.2307\/1266635","journal-title":"Technometrics"},{"key":"18_CR6","first-page":"103","volume-title":"The 7th Conference on Informatics in Control, Automation and Robotics, Volume 3","author":"DA Braun","year":"2010","unstructured":"Braun DA, Ortega PA: A minimum relative entropy principle for adaptive control in linear quadratic regulators. The 7th Conference on Informatics in Control, Automation and Robotics, Volume 3 2010, 103\u2013108."},{"key":"18_CR7","first-page":"202","volume-title":"IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning","author":"DA Braun","year":"2011","unstructured":"Braun DA, Ortega PA, Theodorou E, Schaal S: Path integral control and bounded rationality. IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning 2011, 202\u2013209."},{"key":"18_CR8","volume-title":"A note on the Bayesian regret of Thompson sampling with an arbitrary prior","author":"S Bubeck","year":"2013","unstructured":"Bubeck S, Liu CY: A note on the Bayesian regret of Thompson sampling with an arbitrary prior. 2013. arXiv:1304.5758"},{"key":"18_CR9","volume-title":"Behavioral Game Theory: Experiments in Strategic Interaction","author":"C Camerer","year":"2003","unstructured":"Camerer C: Behavioral Game Theory: Experiments in Strategic Interaction. Princeton: Princeton University Press; 2003."},{"key":"18_CR10","volume-title":"Neural Information Processing Systems 25 (NIPS)","author":"F Cao","year":"2012","unstructured":"Cao F, Ray S: Bayesian hierarchical reinforcement learning. Neural Information Processing Systems 25 (NIPS) 2012."},{"issue":"3","key":"18_CR11","doi-asserted-by":"publisher","first-page":"1516","DOI":"10.1214\/13-AOS1119","volume":"41","author":"O Capp\u00e9","year":"2013","unstructured":"Capp\u00e9 O, Garivier A, Maillard OA, Munos R, Stoltz G: Kullback-Leibler upper confidence bounds for optimal sequential allocation. Ann Stat 2013, 41(3):1516\u20131541. 10.1214\/13-AOS1119","journal-title":"Ann Stat"},{"key":"18_CR12","first-page":"2249","volume-title":"NIPS","author":"O Chapelle","year":"2011","unstructured":"Chapelle O, Li L: An empirical evaluation of Thompson sampling. NIPS 2011, 2249\u20132257."},{"key":"18_CR13","first-page":"761","volume-title":"AAAI \u201998\/IAAI \u201998: Proceedings of the fifteenth national\/tenth conference on Artificial intelligence\/Innovative applications of artificial intelligence","author":"R Dearden","year":"1998","unstructured":"Dearden R, Friedman N, Russell S: Bayesian Q-learning. In AAAI \u201998\/IAAI \u201998: Proceedings of the fifteenth national\/tenth conference on Artificial intelligence\/Innovative applications of artificial intelligence. Menlo Park, CA, US: American Association for Artificial Intelligence; 1998:761\u2013768."},{"key":"18_CR14","volume-title":"IEEE Conference on Decision and Control","author":"C Dimitrakakis","year":"2013","unstructured":"Dimitrakakis C: Monte-Carlo utility estimates for Bayesian reinforcement learning. IEEE Conference on Decision and Control 2013."},{"key":"18_CR15","first-page":"684","volume-title":"Proceedings of The 30th International Conference on Machine Learning","author":"C Dimitrakakis","year":"2013","unstructured":"Dimitrakakis C, Tziortziotis N: ABC reinforcement learning. Proceedings of The 30th International Conference on Machine Learning 2013, 684\u2013692."},{"key":"18_CR16","volume-title":"PhD thesis","author":"M Duff","year":"2002","unstructured":"Duff M: Optimal learning: computational procedures for bayes-adaptive markov decision processes. PhD thesis. 2002. [Director-Andrew Barto] [Director-Andrew Barto]"},{"key":"18_CR17","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1016\/j.tics.2009.04.005","volume":"13","author":"K Friston","year":"2009","unstructured":"Friston K: The free-energy principle: a rough guide to the brain? Trends in Cognitive Science 2009, 13: 293\u2013301. 10.1016\/j.tics.2009.04.005","journal-title":"Trends in Cognitive Science"},{"key":"18_CR18","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1038\/nrn2787","volume":"11","author":"K Friston","year":"2010","unstructured":"Friston K: The free-energy principle: a unified brain theory? Nat Rev Neurosci 2010, 11: 127\u2013138. 10.1038\/nrn2787","journal-title":"Nat Rev Neurosci"},{"key":"18_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/neco.1992.4.1.1","volume":"4","author":"S Geman","year":"1992","unstructured":"Geman S, Bienenstock E, Doursat R: Neural networks and the bias\/variance dilemma. Neural Comput 1992, 4: 1\u201358. 10.1162\/neco.1992.4.1.1","journal-title":"Neural Comput"},{"key":"18_CR20","volume-title":"Bounded Rationality: The Adaptive Toolbox","author":"G Gigerenzer","year":"2001","unstructured":"Gigerenzer G, Selten R: Bounded Rationality: The Adaptive Toolbox. Cambridge, MA: MIT Press; 2001."},{"key":"18_CR21","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1111\/j.2517-6161.1979.tb01068.x","volume":"41","author":"J Gittins","year":"1979","unstructured":"Gittins J: Bandit processes and dynamic allocation indices. J R Stat Soc Ser B, Methodological 1979, 41: 148\u2013177.","journal-title":"J R Stat Soc Ser B, Methodological"},{"key":"18_CR22","volume-title":"Causation, Prediction, and Search, 2nd edition","author":"C Glymour","year":"2000","unstructured":"Glymour C, Spirtes P, Scheines R: Causation, Prediction, and Search, 2nd edition. Cambridge, Massachusetts, USA: MIT Press; 2000."},{"key":"18_CR23","first-page":"25","volume-title":"Proceedings of the Twenty-Seventh International Conference on Machine Learning","author":"T Graepel","year":"2010","unstructured":"Graepel T, Qui\u00f1onero Candela J, Borchert T, Herbrich R: Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft\u2019s Bing search engine. Proceedings of the Twenty-Seventh International Conference on Machine Learning 2010, 25\u201326."},{"key":"18_CR24","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1109\/ICMLA.2008.67","volume-title":"Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications: ICMLA \u201908","author":"OC Granmo","year":"2008","unstructured":"Granmo OC: A Bayesian learning automaton for solving two-armed bernoulli bandit problems. Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications: ICMLA \u201908 2008, 23\u201330."},{"issue":"2","key":"18_CR25","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1108\/17563781011049179","volume":"3","author":"OC Granmo","year":"2010","unstructured":"Granmo OC: Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton. Int J Intell Comput Cybernetics 2010, 3(2):207\u2013234. 10.1108\/17563781011049179","journal-title":"Int J Intell Comput Cybernetics"},{"issue":"4","key":"18_CR26","doi-asserted-by":"publisher","first-page":"479","DOI":"10.1007\/s10489-012-0346-z","volume":"38","author":"OC Granmo","year":"2013","unstructured":"Granmo OC, Glimsdal S: Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore game. Applied intelligence 2013, 38(4):479\u2013488. 10.1007\/s10489-012-0346-z","journal-title":"Applied intelligence"},{"key":"18_CR27","doi-asserted-by":"publisher","first-page":"661","DOI":"10.1037\/a0017201","volume":"116","author":"TL Griffiths","year":"2009","unstructured":"Griffiths TL, Tenenbaum JB: Theory-based causal induction. Psychological Rev 2009, 116: 661\u2013716.","journal-title":"Psychological Rev"},{"issue":"3","key":"18_CR28","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1287\/mnsc.14.3.159","volume":"14","author":"J Harsanyi","year":"1967","unstructured":"Harsanyi J: Games with incomplete information played by \u201cBayesian\u201d players. Management Sci 1967, 14(3):159\u2013182. 10.1287\/mnsc.14.3.159","journal-title":"Management Sci"},{"key":"18_CR29","first-page":"141","volume":"19","author":"D Heckerman","year":"1999","unstructured":"Heckerman D, Meek C, Cooper G: A Bayesian approach to causal discovery. Computation, causation, and discovery 1999, 19: 141\u2013166.","journal-title":"Computation, causation, and discovery"},{"issue":"4","key":"18_CR30","doi-asserted-by":"publisher","first-page":"717","DOI":"10.1037\/a0017187","volume":"116","author":"A Howes","year":"2009","unstructured":"Howes A, Lewis RL, Vera A: Rational adaptation under task and processing constraints: implications for testing theories of cognition and action. Psychological Rev 2009, 116(4):717\u2013751.","journal-title":"Psychological Rev"},{"key":"18_CR31","volume-title":"Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability","author":"M Hutter","year":"2004","unstructured":"Hutter M: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer; 2004."},{"key":"18_CR32","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1111\/j.1756-8765.2010.01125.x","volume":"3","author":"CP Janssen","year":"2011","unstructured":"Janssen CP, Brumby DP, Dowell J, Chater N, Howes A: Identifying optimum performance trade-offs using a cognitively bounded rational analysis model of discretionary task interleaving. Topics in Cognitive Sci 2011, 3: 123\u2013139. 10.1111\/j.1756-8765.2010.01125.x","journal-title":"Topics in Cognitive Sci"},{"issue":"10","key":"18_CR33","doi-asserted-by":"publisher","first-page":"5168","DOI":"10.1109\/TIT.2010.2060095","volume":"56","author":"D Janzing","year":"2010","unstructured":"Janzing D, Sch\u00f6lkopf B: Causal inference using the algorithmic Markov condition. IEEE Trans Inf Theor 2010, 56(10):5168\u20135194.","journal-title":"IEEE Trans Inf Theor"},{"key":"18_CR34","volume-title":"Maximum entropy and Bayesian methods in inverse problems","author":"E Jaynes","year":"1985","unstructured":"Jaynes E: Entropy and search theory. In Maximum entropy and Bayesian methods in inverse problems. Heidelberg: Springer-Verlag; 1985."},{"issue":"4","key":"18_CR35","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1093\/jopart\/mug028","volume":"13","author":"BD Jones","year":"2003","unstructured":"Jones BD: Bounded rationality and political science: lessons from public administration and public policy. J Public Administration Res Theory 2003, 13(4):395\u2013412. 10.1093\/jopart\/mug028","journal-title":"J Public Administration Res Theory"},{"issue":"5","key":"18_CR36","doi-asserted-by":"publisher","first-page":"1449","DOI":"10.1257\/000282803322655392","volume":"93","author":"D Kahneman","year":"2003","unstructured":"Kahneman D: Maps of bounded rationality: psychology for behavioral economics. Am Econ Rev 2003, 93(5):1449\u20131475. 10.1257\/000282803322655392","journal-title":"Am Econ Rev"},{"key":"18_CR37","doi-asserted-by":"publisher","first-page":"200201","DOI":"10.1103\/PhysRevLett.95.200201","volume":"95","author":"H Kappen","year":"2005","unstructured":"Kappen H: A linear theory for control of non-linear stochastic systems. Phys Rev Lett 2005, 95: 200201.","journal-title":"Phys Rev Lett"},{"key":"18_CR38","first-page":"1","volume":"1","author":"H Kappen","year":"2012","unstructured":"Kappen H, G\u00f3mez V, Opper M: Optimal control as a graphical model inference problem. Machine Learn 2012, 1: 1\u201311.","journal-title":"Machine Learn"},{"key":"18_CR39","first-page":"199","volume-title":"ALT, Volume 7568 of, Lecture Notes in Computer Science","author":"E Kaufmann","year":"2012","unstructured":"Kaufmann E, Korda N, Munos R: Thompson sampling: an asymptotically optimal finite-time analysis. In ALT, Volume 7568 of, Lecture Notes in Computer Science. Edited by: Bshouty NH, Stoltz G, Vayatis N, Zeugmann T. Heidelberg, Germany: Springer; 2012:199\u2013213."},{"key":"18_CR40","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781107359987","volume-title":"Equilibrium States in Ergodic Theory","author":"G Keller","year":"1998","unstructured":"Keller G: Equilibrium States in Ergodic Theory. London Mathematical Society Student Texts: Cambridge Univeristy Press; 1998."},{"key":"18_CR41","volume-title":"Bounded rationality: The adaptive toolbox","author":"G Klein","year":"2001","unstructured":"Klein G: The fiction of optimization. In Bounded rationality: The adaptive toolbox. Edited by: Gigerenzer G, Selten R. Cambridge, Massachusetts, USA: MIT Press; 2001."},{"key":"18_CR42","first-page":"1448","volume-title":"Advances in Neural Information Processing Systems","author":"N Korda","year":"2013","unstructured":"Korda N, Kaufmann E, Munos R: Thompson sampling for 1-dimensional exponential family bandits. Advances in Neural Information Processing Systems 2013, 1448\u20131456."},{"key":"18_CR43","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1016\/0196-8858(85)90002-8","volume":"6","author":"T Lai","year":"1995","unstructured":"Lai T, Robbins H: Asymptotically efficient adaptive allocation rules. Adv Appl Math 1995, 6: 4\u201322.","journal-title":"Adv Appl Math"},{"key":"18_CR44","volume-title":"PhD thesis, Department of Informatics, University of Lugano","author":"S Legg","year":"2008","unstructured":"Legg S: Machine super intelligence. PhD thesis, Department of Informatics, University of Lugano 2008."},{"key":"18_CR45","doi-asserted-by":"crossref","unstructured":"Lewis R, Howes A, Singh S: Computational rationality: linking mechanism and behavior through bounded utility maximization. Topics in Cognitive Science 2014, (in press) (in press)","DOI":"10.1111\/tops.12086"},{"key":"18_CR46","doi-asserted-by":"publisher","first-page":"42","DOI":"10.2307\/136022","volume":"28","author":"B Lipman","year":"1995","unstructured":"Lipman B: Information processing and bounded rationality: a survey. Canadian J Econ 1995, 28: 42\u201367. 10.2307\/136022","journal-title":"Canadian J Econ"},{"key":"18_CR47","volume-title":"Information Theory, Inference, and Learning Algorithms","author":"D MacKay","year":"2003","unstructured":"MacKay D: Information Theory, Inference, and Learning Algorithms. Cambridge, UK: Cambridge University Press; 2003."},{"key":"18_CR48","volume-title":"Bayesian Decision Problems and Markov Chains","author":"J Martin","year":"1967","unstructured":"Martin J: Bayesian Decision Problems and Markov Chains. Publications in Operations Research, Wiley; 1967."},{"key":"18_CR49","volume-title":"Technical Report 11:02. Statistics Group, Department of Mathematics","author":"B May","year":"2011","unstructured":"May B, Leslie D: Simulation studies in optimistic Bayesian sampling in contextual-bandit problems. In Technical Report 11:02. Statistics Group, Department of Mathematics. Bristol, UK: University of Bristol; 2011."},{"key":"18_CR50","first-page":"2069","volume":"98888","author":"BC May","year":"2012","unstructured":"May BC, Korda N, Lee A, Leslie DS: Optimistic Bayesian sampling in contextual-bandit problems. J Mach Learn Res 2012, 98888: 2069\u20132106.","journal-title":"J Mach Learn Res"},{"key":"18_CR51","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1023\/A:1009905800005","volume":"1","author":"R Mckelvey","year":"1998","unstructured":"Mckelvey R, Palfrey TR: Quantal response equilibria for extensive form games. Experimental Econ 1998, 1: 9\u201341.","journal-title":"Experimental Econ"},{"key":"18_CR52","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1006\/game.1995.1023","volume":"10","author":"RD McKelvey","year":"1995","unstructured":"McKelvey RD, Palfrey TR: Quantal response equilibria for normal form games. Games and Econ Behavior 1995, 10: 6\u201338. 10.1006\/game.1995.1023","journal-title":"Games and Econ Behavior"},{"key":"18_CR53","volume-title":"Thompson sampling in switching environments with Bayesian online change point detection","author":"J Mellor","year":"2013","unstructured":"Mellor J, Shapiro J: Thompson sampling in switching environments with Bayesian online change point detection. 2013. arXiv:1302.3721"},{"key":"18_CR54","volume-title":"PhD thesis, Department of Engineering, University of Cambridge, UK","author":"PA Ortega","year":"2011a","unstructured":"Ortega PA: A unified framework for resource-bounded autonomous agents interacting with unknown environments. PhD thesis, Department of Engineering, University of Cambridge, UK 2011a."},{"key":"18_CR55","volume-title":"NIPS Workshop on Philosophy and Machine Learning, Granada","author":"PA Ortega","year":"2011","unstructured":"Ortega PA: Bayesian causal induction. NIPS Workshop on Philosophy and Machine Learning, Granada 2011."},{"key":"18_CR56","volume-title":"Proceedings of the third conference on general artificial intelligence","author":"PA Ortega","year":"2010a","unstructured":"Ortega PA, Braun DA: A Bayesian rule for adaptive control based on causal interventions. In Proceedings of the third conference on general artificial intelligence. Paris, France: Atlantis Press; 2010a."},{"key":"18_CR57","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1613\/jair.3062","volume":"38","author":"PA Ortega","year":"2010b","unstructured":"Ortega PA, Braun DA: A minimum relative entropy principle for learning and acting. J Artif Intell Res 2010b, 38: 475\u2013511.","journal-title":"J Artif Intell Res"},{"key":"18_CR58","first-page":"115","volume-title":"Proceedings of the Third Conference on Artificial General Intelligence","author":"PA Ortega","year":"2010c","unstructured":"Ortega PA, Braun DA: A conversion between utility and information. In Proceedings of the Third Conference on Artificial General Intelligence. Paris, France: Atlantis Press; 2010c:115\u2013120."},{"key":"18_CR59","first-page":"269","volume-title":"Lecture notes on artificial intelligence, Volume 6830","author":"PA Ortega","year":"2011","unstructured":"Ortega PA, Braun DA: Information, utility and bounded rationality. In Lecture notes on artificial intelligence, Volume 6830. Heidelberg, Germany: Springer-Verlag; 2011:269\u2013274."},{"key":"18_CR60","volume-title":"European Workshop for Reinforcement Learning","author":"PA Ortega","year":"2012a","unstructured":"Ortega PA, Braun DA: Free energy and the generalized optimality equations for sequential decision making. In European Workshop for Reinforcement Learning. Edinburgh, UK; 2012a."},{"key":"18_CR61","volume-title":"NIPS Workshop on Information in Perception and Action","author":"PA Ortega","year":"2012b","unstructured":"Ortega PA, Braun DA: Adaptive coding of actions and observations. NIPS Workshop on Information in Perception and Action 2012b."},{"key":"18_CR62","doi-asserted-by":"publisher","first-page":"2153","DOI":"10.1098\/rspa.2012.0683","volume":"469","author":"PA Ortega","year":"2013","unstructured":"Ortega PA, Braun DA: Thermodynamics as a theory of decision-making with information-processing costs. Proc R Soc A: Mathematical, Physical and Engineering Science 2013, 469: 2153.","journal-title":"Proc R Soc A: Mathematical, Physical and Engineering Science"},{"key":"18_CR63","first-page":"3003","volume-title":"Advances in Neural Information Processing Systems","author":"I Osband","year":"2013","unstructured":"Osband I, Russo D, Roy BV: (More) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems 2013, 3003\u20133011."},{"key":"18_CR64","volume-title":"A Course in Game Theory","author":"MJ Osborne","year":"1999","unstructured":"Osborne MJ, Rubinstein A: A Course in Game Theory. Cambridge, Massachusetts, USA: MIT Press; 1999."},{"key":"18_CR65","volume-title":"Causality: Models, Reasoning, and Inference","author":"J Pearl","year":"2000","unstructured":"Pearl J: Causality: Models, Reasoning, and Inference. Cambridge, UK: Cambridge University Press; 2000."},{"key":"18_CR66","volume-title":"AAAI","author":"J Peters","year":"2010","unstructured":"Peters J, M\u00fclling K, Altun Y: Relative entropy policy search. AAAI 2010."},{"key":"18_CR67","volume-title":"Proceedings of Robotics: Science and Systems","author":"K Rawlik","year":"2012","unstructured":"Rawlik K, Toussaint M, Vijayakumar S: On stochastic optimal control and reinforcement learning by approximate inference. In Proceedings of Robotics: Science and Systems. Sydney, Australia: ; 2012."},{"key":"18_CR68","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/4702.001.0001","volume-title":"Modeling Bounded Rationality","author":"A Rubinstein","year":"1998","unstructured":"Rubinstein A: Modeling Bounded Rationality. Cambridge, Massachusetts, USA: MIT Press; 1998."},{"key":"18_CR69","first-page":"950","volume-title":"Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence","author":"S Russell","year":"1995","unstructured":"Russell S: Rationality and Intelligence. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Edited by: Mellish C. Englewood Cliffs, New Jersey, USA: Prentice-Hall; 1995:950\u2013957."},{"key":"18_CR70","volume-title":"Artificial Intelligence: A Modern Approach, 1st edition","author":"S Russell","year":"1995","unstructured":"Russell S, Norvig P: Artificial Intelligence: A Modern Approach, 1st edition. Prentice-Hall: Englewood Cliffs, NJ; 1995."},{"key":"18_CR71","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1613\/jair.133","volume":"3","author":"S Russell","year":"1995","unstructured":"Russell S, Subramanian D: Provably bounded-optimal agents. J Artif Intell Res 1995, 3: 575\u2013609.","journal-title":"J Artif Intell Res"},{"key":"18_CR72","volume-title":"Learning to optimize via posterior sampling","author":"D Russo","year":"2013","unstructured":"Russo D, Roy BV: Learning to optimize via posterior sampling. 2013. arXiv:abs\/1301.2609"},{"issue":"6","key":"18_CR73","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1016\/S1364-6613(99)01327-3","volume":"3","author":"S Schaal","year":"1999","unstructured":"Schaal S: Is imitation learning the route to humanoid robots? Trends in cognitive sciences 1999, 3(6):233\u2013242. 10.1016\/S1364-6613(99)01327-3","journal-title":"Trends in cognitive sciences"},{"key":"18_CR74","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1002\/asmb.874","volume":"26","author":"S Scott","year":"2010","unstructured":"Scott S: A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry 2010, 26: 639\u2013658. 10.1002\/asmb.874","journal-title":"Applied Stochastic Models in Business and Industry"},{"key":"18_CR75","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/1403.001.0001","volume-title":"The Art of Causal Conjecture","author":"G Shafer","year":"1996","unstructured":"Shafer G: The Art of Causal Conjecture. Cambridge, Massachusetts, USA: MIT Press; 1996."},{"key":"18_CR76","first-page":"2003","volume":"7","author":"S Shimizu","year":"2006","unstructured":"Shimizu S, Hoyer PO, Hyv\u00e4rinen A, Kerminen A: A Linear Non-Gaussian Acyclic Model for Causal Discovery. J Mach Learn Res 2006, 7: 2003\u20132030.","journal-title":"J Mach Learn Res"},{"issue":"2","key":"18_CR77","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1037\/h0042769","volume":"63","author":"HA Simon","year":"1956","unstructured":"Simon HA: Rational choice and the structure of the environment. Psychological Rev 1956, 63(2):129\u2013138.","journal-title":"Psychological Rev"},{"key":"18_CR78","first-page":"161","volume-title":"Decision and Organization","author":"HA Simon","year":"1972","unstructured":"Simon HA: Theories of bounded rationality. In Decision and Organization. Edited by: McGuire CB, Radner R. Amsterdam: North-Holland Publishing; 1972:161\u2013176."},{"key":"18_CR79","volume-title":"Models of Bounded Rationality. Cambridge","author":"Simon H A","year":"1984","unstructured":"Simon H A: Models of Bounded Rationality. Cambridge. Cambridge, Massachusetts, USA: MIT Press; 1984."},{"key":"18_CR80","doi-asserted-by":"publisher","DOI":"10.1093\/acprof:oso\/9780195398717.001.0001","volume-title":"Bounded Rationality and Industrial Organization","author":"R Spiegler","year":"2011","unstructured":"Spiegler R: Bounded Rationality and Industrial Organization. Oxford: Oxford University Press; 2011."},{"key":"18_CR81","volume-title":"Theory of Optimal Search","author":"L Stone","year":"1998","unstructured":"Stone L: Theory of Optimal Search. New York: Academic Press; 1998."},{"key":"18_CR82","volume-title":"Proceedings of the Seventeenth International Conference on Machine Learning","author":"M Strens","year":"2000","unstructured":"Strens M: A Bayesian framework for reinforcement learning. Proceedings of the Seventeenth International Conference on Machine Learning 2000."},{"key":"18_CR83","volume-title":"Reinforcement Learning: An Introduction","author":"R Sutton","year":"1998","unstructured":"Sutton R, Barto A: Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998."},{"key":"18_CR84","first-page":"3137","volume":"11","author":"E Theodorou","year":"2010","unstructured":"Theodorou E, Buchli J, Schaal S: A generalized path integral approach to reinforcement learning. J Mach Learn Res 2010, 11: 3137\u20133181.","journal-title":"J Mach Learn Res"},{"issue":"3\/4","key":"18_CR85","doi-asserted-by":"publisher","first-page":"285","DOI":"10.2307\/2332286","volume":"25","author":"WR Thompson","year":"1933","unstructured":"Thompson WR: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 1933, 25(3\/4):285\u2013294. 10.2307\/2332286","journal-title":"Biometrika"},{"key":"18_CR86","doi-asserted-by":"publisher","first-page":"601","DOI":"10.1007\/978-1-4419-1452-1_19","volume-title":"Perception-reason-action cycle: Models, algorithms and systems","author":"N Tishby","year":"2011","unstructured":"Tishby N, Polani D: Information theory of decisions and actions. In Perception-reason-action cycle: Models, algorithms and systems. Edited by: Vassilis T Hussain, Vassilis T Hussain . Heidelberg: Springer-Verlag; 2011:601\u2013636."},{"key":"18_CR87","first-page":"1369","volume-title":"Advances in Neural Information Processing Systems, Volume 19","author":"E Todorov","year":"2006","unstructured":"Todorov E: Linearly solvable Markov decision problems. Advances in Neural Information Processing Systems, Volume 19 2006, 1369\u20131376."},{"key":"18_CR88","doi-asserted-by":"publisher","first-page":"11478","DOI":"10.1073\/pnas.0710743106","volume":"106","author":"E Todorov","year":"2009","unstructured":"Todorov E: Efficient computation of optimal actions. Proceedings of the National Academy of Sciences USA 2009, 106: 11478\u201311483. 10.1073\/pnas.0710743106","journal-title":"Proceedings of the National Academy of Sciences USA"},{"key":"18_CR89","volume-title":"Cover tree Bayesian reinforcement learning","author":"N Tziortziotis","year":"2013a","unstructured":"Tziortziotis N, Dimitrakakis C, Blekas K: Cover tree Bayesian reinforcement learning. 2013a. arXiv: 1305.1809"},{"key":"18_CR90","first-page":"1721","volume-title":"Proceedings of the Twenty-Third international joint conference on Artificial Intelligence","author":"N Tziortziotis","year":"2013","unstructured":"Tziortziotis N, Dimitrakakis C, Blekas K: Linear Bayesian reinforcement learning. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence. AAAI Press; 2013:1721\u20131728."},{"key":"18_CR91","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1145\/1062261.1062335","volume-title":"Proceedings of the 2nd ACM conference on Computing frontiers","author":"P Vitanyi","year":"2005","unstructured":"Vitanyi P: Time, space, and energy in reversible computing. Proceedings of the 2nd ACM conference on Computing frontiers 2005, 435\u2013444."},{"key":"18_CR92","volume-title":"Complex Engineering Systems","author":"DH Wolpert","year":"2004","unstructured":"Wolpert DH: Information theory - the bridge connecting bounded rational game theory and statistical physics. In Complex Engineering Systems. New York, USA: Perseus Books; 2004."},{"key":"18_CR93","volume-title":"PhD thesis, Department of Artificial Intelligence, University of Edinburgh","author":"J Wyatt","year":"1997","unstructured":"Wyatt J: Exploration and inference in learning from reinforcement. PhD thesis, Department of Artificial Intelligence, University of Edinburgh 1997."}],"updated-by":[{"DOI":"10.1186\/s40294-014-0004-x","type":"erratum","label":"Erratum","source":"publisher","updated":{"date-parts":[[2014,10,1]],"date-time":"2014-10-01T00:00:00Z","timestamp":1412121600000}}],"container-title":["Complex Adaptive Systems Modeling"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/2194-3206-2-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/2194-3206-2-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/2194-3206-2-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T20:53:10Z","timestamp":1716583990000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1186\/2194-3206-2-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,3,14]]},"references-count":93,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,12]]}},"alternative-id":["18"],"URL":"https:\/\/doi.org\/10.1186\/2194-3206-2-2","relation":{},"ISSN":["2194-3206"],"issn-type":[{"value":"2194-3206","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,3,14]]},"assertion":[{"value":"11 November 2013","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 February 2014","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 March 2014","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"2"}}