{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,10,12]],"date-time":"2022-10-12T17:16:02Z","timestamp":1665594962908},"reference-count":29,"publisher":"MIT Press","issue":"11","content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical context may introduce spurious relationships or lack a convenient representation of crucial information. In order to address these issues, we propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling. Our experiments on a diverse selection of contextual and noncontextual nonstationary problems show that our recurrent approach consistently outperforms its feedforward counterpart, which requires handcrafted historical contexts, while being more widely applicable than conventional nonstationary bandit algorithms. Although it is very difficult to provide theoretical performance guarantees for our new approach, we also prove a novel regret bound for linear posterior sampling with measurement error that may serve as a foundation for future theoretical work.<\/jats:p>","DOI":"10.1162\/neco_a_01539","type":"journal-article","created":{"date-parts":[[2022,9,16]],"date-time":"2022-09-16T19:41:50Z","timestamp":1663357310000},"page":"2232-2272","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":0,"title":["Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits"],"prefix":"10.1162","volume":"34","author":[{"given":"Aditya","family":"Ramesh","sequence":"first","affiliation":[{"name":"Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, Lugano 6962, Switzerland"},{"name":"Universit\u00e0 della Svizzera italiana, Lugano 6962, Switzerland"},{"name":"Scuola universitaria professionale della Svizzera italiana, Lugano 6962, Switzerland aditya.ramesh@idsia.ch"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paulo","family":"Rauber","sequence":"additional","affiliation":[{"name":"Queen Mary University of London, London E1 4 NS, U.K. p.rauber@qmul.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michelangelo","family":"Conserva","sequence":"additional","affiliation":[{"name":"Queen Mary University of London, London E1 4 NS, U.K. m.conserva@qmul.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J\u00fcrgen","family":"Schmidhuber","sequence":"additional","affiliation":[{"name":"Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, Lugano 6962, Switzerland"},{"name":"Universit\u00e0 della Svizzera italiana, Lugano 6962, Switzerland"},{"name":"Scuola universitaria professionale della Svizzera italiana, Lugano 6962, Switzerland"},{"name":"NNAISENSE, Lugano 6900, Switzerland"},{"name":"King Abdullah University of Science and Technology, Thuwal, Saudi Arabia juergen@idsia.ch"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2022,10,7]]},"reference":[{"key":"2022101216311344600_B1","first-page":"2312","volume-title":"Advances in neural information processing systems","author":"Abbasi-Yadkori","year":"2011"},{"issue":"2","key":"2022101216311344600_B2","doi-asserted-by":"publisher","first-page":"5165","DOI":"10.1214\/17-EJS1341SI","article-title":"Linear Thompson sampling revisited","volume":"11","author":"Abeille","year":"2017","journal-title":"Electronic Journal of Statistics"},{"key":"2022101216311344600_B3","author":"Agrawal","year":"2012","journal-title":"Thompson sampling for contextual bandits with linear payoffs."},{"key":"2022101216311344600_B4","first-page":"127","article-title":"Thompson sampling for contextual bandits with linear payoffs","author":"Agrawal","year":"2013","journal-title":"Proceedings of the International Conference on Machine Learning"},{"key":"2022101216311344600_B5","first-page":"397","article-title":"Using confidence bounds for exploitation-exploration trade-offs","volume":"3","author":"Auer","year":"2002","journal-title":"Journal of Machine Learning Research"},{"key":"2022101216311344600_B6","volume-title":"Pattern recognition and machine learning","author":"Bishop","year":"2013"},{"key":"2022101216311344600_B7","author":"Bouneffouf","year":"2019","journal-title":"A survey on practical applications of multi-armed and contextual bandits."},{"key":"2022101216311344600_B8","first-page":"418","article-title":"Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit","author":"Cao","year":"2019","journal-title":"Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics"},{"key":"2022101216311344600_B9","author":"Carpentier","year":"2020","journal-title":"The elliptical potential lemma revisited."},{"key":"2022101216311344600_B10","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511546921","volume-title":"Prediction, learning, and games","author":"Cesa-Bianchi","year":"2006"},{"key":"2022101216311344600_B11","first-page":"1079","article-title":"Learning to optimize under non-stationarity","volume-title":"Proceedings of Machine Learning Research","author":"Cheung","year":"2019"},{"key":"2022101216311344600_B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3124749.3124752","article-title":"Attribution modeling in- creases efficiency of bidding in display advertising","volume-title":"Proceedings of the ADKDD'17","author":"Diemert","year":"2017"},{"key":"2022101216311344600_B13","author":"Dua","year":"2017","journal-title":"UCI machine learning repository"},{"key":"2022101216311344600_B14","author":"Duan","year":"2016","journal-title":"RL2: Fast reinforcement learning via slow reinforcement learning"},{"key":"2022101216311344600_B15","first-page":"1","article-title":"Short-term memory mechanisms in neural network learning of robot navigation tasks: A case study","author":"Freire","year":"2009","journal-title":"Proceedings of the 6th Latin American Robotics Symposium"},{"key":"2022101216311344600_B16","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1007\/978-3-642-24412-4_16","article-title":"On upper-confidence bound policies for switching bandit problems","volume-title":"Proceedings of the International Conference on Algorithmic Learning Theory","author":"Garivier","year":"2011"},{"issue":"10","key":"2022101216311344600_B17","doi-asserted-by":"publisher","first-page":"2451","DOI":"10.1162\/089976600300015015","article-title":"Learning to forget: Continual prediction with LSTM","volume":"12","author":"Gers","year":"2000","journal-title":"Neural Computation"},{"issue":"8","key":"2022101216311344600_B18","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2022101216311344600_B19","first-page":"71","article-title":"Randomized exploration for non-stationary stochastic linear bandits","volume-title":"Proceedings of the Conference on Uncertainty in Artificial Intelligence","author":"Kim","year":"2020"},{"key":"2022101216311344600_B20","article-title":"Adam: A method for stochastic optimization","volume-title":"Proceedings of the 3rd International Conference on Learning Representations","author":"Kingma","year":"2014"},{"key":"2022101216311344600_B21","volume-title":"Bandit algorithms","author":"Lattimore","year":"2019"},{"key":"2022101216311344600_B22","author":"LeCun","year":"2010","journal-title":"MNIST handwritten digit database."},{"key":"2022101216311344600_B23","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v32i1.11746","article-title":"A change-detection based framework for piecewise-stationary multi-armed bandit problem","volume-title":"Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence","author":"Liu","year":"2018"},{"key":"2022101216311344600_B24","article-title":"Deep Bayesian bandits showdown: An empirical comparison of Bayesian deep networks for Thompson sampling","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Riquelme","year":"2018"},{"key":"2022101216311344600_B25","author":"Russac","year":"2020","journal-title":"Algorithms for non-stationary generalized linear bandits"},{"key":"2022101216311344600_B26","article-title":"Weighted linear bandits for non-stationary environments","author":"Russac","year":"2019","journal-title":"Proceedings of the 33rd Conference on Neural Information Processing Systems"},{"issue":"3\/4","key":"2022101216311344600_B27","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1093\/biomet\/25.3-4.285","article-title":"On the likelihood that one unknown probability exceeds another in view of the evidence of two samples","volume":"25","author":"Thompson","year":"1933","journal-title":"Biometrika"},{"key":"2022101216311344600_B28","author":"Wang","year":"2016","journal-title":"Learning to reinforcement learn."},{"key":"2022101216311344600_B29","article-title":"Neural Thompson sampling","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zhang","year":"2021"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/34\/11\/2232\/2048426\/neco_a_01539.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/34\/11\/2232\/2048426\/neco_a_01539.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,12]],"date-time":"2022-10-12T16:32:26Z","timestamp":1665592346000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/34\/11\/2232\/112951\/Recurrent-Neural-Linear-Posterior-Sampling-for"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,7]]},"references-count":29,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2022,10,7]]},"published-print":{"date-parts":[[2022,10,7]]}},"URL":"https:\/\/doi.org\/10.1162\/neco_a_01539","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,11]]},"published":{"date-parts":[[2022,10,7]]}}}