{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T13:46:28Z","timestamp":1774446388978,"version":"3.50.1"},"reference-count":47,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2024,11,26]],"date-time":"2024-11-26T00:00:00Z","timestamp":1732579200000},"content-version":"vor","delay-in-days":330,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62376073"],"award-info":[{"award-number":["62376073"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["International Journal of Intelligent Systems"],"published-print":{"date-parts":[[2024,1]]},"abstract":"<jats:p>Counterfactual regret minimization (CFR) is an effective algorithm for solving extensive\u2010form games with imperfect information (IIEGs). However, CFR is only allowed to be applied in known environments, where the transition function of the chance player and the reward function of the terminal node in IIEGs are known. In uncertain situations, such as reinforcement learning (RL) problems, CFR is not applicable. Thus, applying CFR in unknown environments is a significant challenge that can also address some difficulties in the real world. Currently, advanced solutions require more interactions with the environment and are limited by large single\u2010sampling variances to narrow the gap with the real environment. In this paper, we propose a method that combines CFR with information gain to compute the Nash equilibrium (NE) of IIEGs with unknown environments. We use a curiosity\u2010driven approach to explore unknown environments and minimize the discrepancy between uncertain and real environments. In addition, by incorporating information into the reward, the average strategy calculated by CFR can be directly implemented as the interaction policy with the environment, thereby improving the exploration efficiency of our method in uncertain environments. Through experiments on standard testbeds such as Kuhn poker and Leduc poker, our method significantly reduces the number of interactions with the environment compared to the different baselines and computes a more accurate approximate NE within the same number of interaction rounds.<\/jats:p>","DOI":"10.1155\/int\/9482323","type":"journal-article","created":{"date-parts":[[2024,11,26]],"date-time":"2024-11-26T16:05:16Z","timestamp":1732637116000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Combining Counterfactual Regret Minimization With Information Gain to Solve Extensive Games With Unknown Environments"],"prefix":"10.1155","volume":"2024","author":[{"given":"Chen","family":"Qiu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3512-0649","authenticated-orcid":false,"given":"Xuan","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianzi","family":"Ma","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yaojun","family":"Wen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6611-2046","authenticated-orcid":false,"given":"Jiajia","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2024,11,26]]},"reference":[{"key":"e_1_2_12_1_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.36.1.48"},{"key":"e_1_2_12_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1785414.1785439"},{"key":"e_1_2_12_3_2","article-title":"A Course in Game Theory","author":"Osborne M. J.","year":"1994","journal-title":"1 of MIT Press Books"},{"key":"e_1_2_12_4_2","first-page":"1729","article-title":"Regret Minimization in Games With Incomplete Information","volume":"20","author":"Zinkevich M.","year":"2007","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_12_5_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v26i1.8241"},{"key":"e_1_2_12_6_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aao1733"},{"key":"e_1_2_12_7_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton R. S.","year":"2018"},{"key":"e_1_2_12_8_2","unstructured":"ZhouY. LiJ. andZhuJ. Posterior Sampling for Multi-Agent Reinforcement Learning: Solving Extensive Games with Imperfect Information 2020."},{"key":"e_1_2_12_9_2","unstructured":"HouthooftR. ChenX. DuanY. SchulmanJ. De TurckF. andAbbeelP. VIME: Variational Information Maximizing Exploration 2016 Curran Associates Inc 1117\u20131125."},{"key":"e_1_2_12_10_2","unstructured":"TammelinO. BurchN. JohansonM. andBowlingM. Solving Heads-Up Limit Texas Hold\u2019em 2015 AAAI Press 645\u2013652."},{"key":"e_1_2_12_11_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33011829"},{"key":"e_1_2_12_12_2","unstructured":"OsbandI. AslanidesJ. andCassirerA. Randomized Prior Functions for Deep Reinforcement Learning 2018 Curran Associates Inc 8626\u20138638."},{"key":"e_1_2_12_13_2","unstructured":"BurdaY. EdwardsH. StorkeyA. andKlimovO. Exploration by Random Network Distillation 2019."},{"key":"e_1_2_12_14_2","unstructured":"CiosekK. FortuinV. TomiokaR. HofmannK. andTurnerR. Conservative Uncertainty Estimation by Fitting Prior Networks 2020."},{"key":"e_1_2_12_15_2","unstructured":"LakshminarayananB. PritzelA. andBlundellC. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles 2017 Curran Associates Inc 6405\u20136416."},{"key":"e_1_2_12_16_2","unstructured":"GalY.andGhahramaniZ. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning 2016 JMLR.org 1050\u20131059."},{"key":"e_1_2_12_17_2","first-page":"1929","article-title":"Dropout: A Simple Way to Prevent Neural Networks From Overfitting","volume":"15","author":"Srivastava N.","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_12_18_2","unstructured":"GalY. HronJ. andKendallA. Concrete Dropout 2017 Curran Associates Inc 3584\u20133593."},{"key":"e_1_2_12_19_2","article-title":"Implicit Weight Uncertainty in Neural Networks","author":"Pawlowski N.","year":"2017","journal-title":"CoRR"},{"key":"e_1_2_12_20_2","unstructured":"BrosseN. MoulinesE. andDurmusA. The Promises and Pitfalls of Stochastic Gradient Langevin Dynamics 2018 Curran Associates Inc 8278\u20138288."},{"key":"e_1_2_12_21_2","volume-title":"Stochastic Backpropagation and Approximate Inference in Deep Generative Models","author":"Rezende D. J.","year":"2014"},{"key":"e_1_2_12_22_2","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-24412-4_17","volume-title":"Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits","author":"Carpentier A.","year":"2011"},{"key":"e_1_2_12_23_2","doi-asserted-by":"publisher","DOI":"10.1038\/NATURE14236"},{"key":"e_1_2_12_24_2","doi-asserted-by":"publisher","DOI":"10.1049\/trit.2020.0024"},{"key":"e_1_2_12_25_2","doi-asserted-by":"crossref","unstructured":"ShojaeeG. K.andMashhadiH. R. Optimistic Initial Value Analysis in a Greedy Selection Approach to MAB Problems 2017 419\u2013424.","DOI":"10.1109\/ICCKE.2017.8167915"},{"key":"e_1_2_12_26_2","unstructured":"SilverD. LeverG. HeessN. DegrisT. WierstraD. andRiedmillerM. Deterministic Policy Gradient Algorithms 2014 I\u2013387\u2013I\u2013395."},{"key":"e_1_2_12_27_2","doi-asserted-by":"publisher","DOI":"10.1049\/rpg2.12782"},{"key":"e_1_2_12_28_2","doi-asserted-by":"publisher","DOI":"10.1049\/cit2.12195"},{"key":"e_1_2_12_29_2","unstructured":"OsbandI.andVan RoyB. Why Is Posterior Sampling Better Than Optimism for Reinforcement Learning? 2017 2701\u20132710."},{"key":"e_1_2_12_30_2","unstructured":"ChapelleO.andLiL. An Empirical Evaluation of Thompson Sampling 2011 Curran Associates Inc 2249\u20132257."},{"key":"e_1_2_12_31_2","unstructured":"RussoD.andRoyB. V. Learning to Optimize via Information-Directed Sampling 2014 MIT Press 1583\u20131591."},{"key":"e_1_2_12_32_2","unstructured":"MohamedS.andRezendeD. J. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning 2015 MIT Press 2125\u20132133."},{"key":"e_1_2_12_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12064-011-0142-z"},{"key":"e_1_2_12_34_2","unstructured":"GravesA. Practical Variational Inference for Neural Networks 2011 Curran Associates Inc 2348\u20132356."},{"key":"e_1_2_12_35_2","unstructured":"BlundellC. CornebiseJ. KavukcuogluK. andWierstraD. Weight Uncertainty in Neural Networks 2015 JMLR.org 1613\u20131622."},{"key":"e_1_2_12_36_2","unstructured":"BrownN. LererA. GrossS. andSandholmT. Deep Counterfactual Regret Minimization 2019 PMLR 793\u2013802."},{"key":"e_1_2_12_37_2","doi-asserted-by":"publisher","DOI":"10.1006\/inco.1994.1009"},{"key":"e_1_2_12_38_2","unstructured":"ChaudhuriK. FreundY. andHsuD. A Parameter-Free Hedging Algorithm 2009 Curran Associates Inc 297\u2013305."},{"key":"e_1_2_12_39_2","unstructured":"QiuC. WangX. MaT. WenY. andZhangJ. Combining Counterfactual Regret Minimization With Information Gain to Solve Extensive Games With Imperfect Information 2021 https:\/\/arxiv.org\/abs\/2110.07892."},{"key":"e_1_2_12_40_2","unstructured":"BrownN.andSandholmT. Regret-Based Pruning in Extensive-Form Games 2015 MIT Press 1972\u20131980."},{"key":"e_1_2_12_41_2","unstructured":"LiH. HuK. ZhangS. QiY. andSongL. Double Neural Counterfactual Regret Minimization 2019."},{"key":"e_1_2_12_42_2","doi-asserted-by":"publisher","DOI":"10.1111\/1468-0262.00153"},{"key":"e_1_2_12_43_2","unstructured":"SoutheyF. BowlingM. LarsonB.et al. Bayes\u2019 Bluff: Opponent Modelling in Poker 2005 AUAI Press 550\u2013558."},{"key":"e_1_2_12_44_2","doi-asserted-by":"crossref","unstructured":"KuhnH. W. A Simplified Two-Person Poker 1951 Princeton University Press.","DOI":"10.1515\/9781400881727-010"},{"key":"e_1_2_12_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-006-0143-1"},{"key":"e_1_2_12_46_2","unstructured":"HeinrichJ. LanctotM. andSilverD. Fictitious Self-Play in Extensive-Form Games 2015 JMLR.org 805\u2013813."},{"key":"e_1_2_12_47_2","unstructured":"LanctotM. WaughK. ZinkevichM. andBowlingM. Monte Carlo Sampling for Regret Minimization in Extensive Games 2009 Curran Associates Inc 1078\u20131086."}],"container-title":["International Journal of Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/int\/9482323","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,26]],"date-time":"2024-11-26T16:05:27Z","timestamp":1732637127000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/int\/9482323"}},"subtitle":[],"editor":[{"given":"Yu-an","family":"Tan","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2024,1]]},"references-count":47,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1]]}},"alternative-id":["10.1155\/int\/9482323"],"URL":"https:\/\/doi.org\/10.1155\/int\/9482323","archive":["Portico"],"relation":{},"ISSN":["0884-8173","1098-111X"],"issn-type":[{"value":"0884-8173","type":"print"},{"value":"1098-111X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1]]},"assertion":[{"value":"2023-12-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-26","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"9482323"}}