{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,3]],"date-time":"2022-04-03T06:26:53Z","timestamp":1648967213495},"reference-count":11,"publisher":"World Scientific Pub Co Pte Lt","issue":"06","funder":[{"name":"Innovation Center of Novel Software Technology and Industrialization, National Natural Science Foundation of China"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2018,6]]},"abstract":"<jats:p> In reinforcement learning (RL), the exploration\/exploitation (E\/E) dilemma is a very crucial issue, which can be described as searching between the exploration of the environment to find more profitable actions, and the exploitation of the best empirical actions for the current state. We focus on the single trajectory RL problem where an agent is interacting with a partially unknown MDP over single trajectories, and try to deal with the E\/E in this setting. Given the reward function, we try to find a good E\/E strategy to address the MDPs under some MDP distribution. This is achieved by selecting the best strategy in mean over a potential MDP distribution from a large set of candidate strategies, which is done by exploiting single trajectories drawn from plenty of MDPs. In this paper, we mainly make the following contributions: (1) We discuss the strategy-selector algorithm based on formula set and polynomial function. (2) We provide the theoretical and experimental regret analysis of the learned strategy under an given MDP distribution. (3) We compare these methods with the \u201cstate-of-the-art\u201d Bayesian RL method experimentally. <\/jats:p>","DOI":"10.1142\/s0218001418590097","type":"journal-article","created":{"date-parts":[[2017,10,19]],"date-time":"2017-10-19T09:13:27Z","timestamp":1508404407000},"page":"1859009","source":"Crossref","is-referenced-by-count":1,"title":["Single Trajectory Learning: Exploration Versus Exploitation"],"prefix":"10.1142","volume":"32","author":[{"given":"Qiming","family":"Fu","sequence":"first","affiliation":[{"name":"College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215000, Jiangsu, P. R. China"},{"name":"Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, P. R. China"},{"name":"Suzhou Key Laboratory of Mobile Network Technology and Application, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, P. R. China"}]},{"given":"Quan","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Soochow University, Suzhou 215000, Jiangsu, P. R. China"},{"name":"Key Laboratory of Symbolic Computation and Knowledge, Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P. R. China"},{"name":"Collaborative Innovation Center of Novel Software, Technology and Industrialization Nanjing, Jiangsu 210000, P. R. China"}]},{"given":"Shan","family":"Zhong","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changshu Institute of Technology, Suzhou, Jiangsu 215500, P. R. China"}]},{"given":"Heng","family":"Luo","sequence":"additional","affiliation":[{"name":"College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215000, Jiangsu, P. R. China"},{"name":"Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, P. R. China"},{"name":"Suzhou Key Laboratory of Mobile Network Technology and Application, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, P. R. China"}]},{"given":"Hongjie","family":"Wu","sequence":"additional","affiliation":[{"name":"College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215000, Jiangsu, P. R. China"},{"name":"Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, P. R. China"}]},{"given":"Jianping","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215000, Jiangsu, P. R. China"},{"name":"Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, P. R. China"},{"name":"Suzhou Key Laboratory of Mobile Network Technology and Application, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, P. R. China"}]}],"member":"219","published-online":{"date-parts":[[2018,2,21]]},"reference":[{"issue":"7","key":"S0218001418590097BIB002","first-page":"1429","volume":"42","author":"Bo W.","year":"2014","journal-title":"Acta Electron. Sin."},{"key":"S0218001418590097BIB003","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0157088"},{"key":"S0218001418590097BIB009","first-page":"457","volume":"19","author":"Ghavamzadeh M.","year":"2007","journal-title":"Adv. Neural Inf. Process. Syst."},{"issue":"66","key":"S0218001418590097BIB010","first-page":"1","volume":"17","author":"Ghavamzadeh M.","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"S0218001418590097BIB011","doi-asserted-by":"publisher","DOI":"10.1561\/2200000049"},{"issue":"3","key":"S0218001418590097BIB013","first-page":"1","volume":"32","author":"Hishinuma T.","year":"2017","journal-title":"Comput. Intell."},{"issue":"1","key":"S0218001418590097BIB015","first-page":"4354","volume":"17","author":"Klenske E. D.","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"S0218001418590097BIB019","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007541107674"},{"key":"S0218001418590097BIB020","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15880-3_19"},{"key":"S0218001418590097BIB023","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-014-0565-6"},{"key":"S0218001418590097BIB024","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2543219"}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001418590097","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,6]],"date-time":"2019-08-06T21:04:45Z","timestamp":1565125485000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0218001418590097"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,2,21]]},"references-count":11,"journal-issue":{"issue":"06","published-online":{"date-parts":[[2018,2,21]]},"published-print":{"date-parts":[[2018,6]]}},"alternative-id":["10.1142\/S0218001418590097"],"URL":"https:\/\/doi.org\/10.1142\/s0218001418590097","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"value":"0218-0014","type":"print"},{"value":"1793-6381","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,2,21]]}}}