{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T16:56:24Z","timestamp":1773161784960,"version":"3.50.1"},"reference-count":52,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2021,3,23]],"date-time":"2021-03-23T00:00:00Z","timestamp":1616457600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden\/abrupt, incremental, gradual and increasing\/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework\u2014a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.<\/jats:p>","DOI":"10.3390\/e23030380","type":"journal-article","created":{"date-parts":[[2021,3,23]],"date-time":"2021-03-23T14:08:16Z","timestamp":1616508496000},"page":"380","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0235-0421","authenticated-orcid":false,"given":"Emanuele","family":"Cavenaghi","sequence":"first","affiliation":[{"name":"Dipartimento di Informatica, Sistemistica e Comunicazione, University of Milano-Bicocca, 20126 Milano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9983-2330","authenticated-orcid":false,"given":"Gabriele","family":"Sottocornola","sequence":"additional","affiliation":[{"name":"Facolt\u00e0 di Scienze e Tecnologie Informatiche, Free University of Bozen-Bolzano, 39100 Bolzano, Italy"}]},{"given":"Fabio","family":"Stella","sequence":"additional","affiliation":[{"name":"Dipartimento di Informatica, Sistemistica e Comunicazione, University of Milano-Bicocca, 20126 Milano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4805-5516","authenticated-orcid":false,"given":"Markus","family":"Zanker","sequence":"additional","affiliation":[{"name":"Facolt\u00e0 di Scienze e Tecnologie Informatiche, Free University of Bozen-Bolzano, 39100 Bolzano, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1090\/S0002-9904-1952-09620-8","article-title":"Some aspects of the sequential design of experiments","volume":"58","author":"Robbins","year":"1952","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_2","unstructured":"Berry, D.A., and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments (Monographs on Statistics and Applied Probability), Chapman Hall."},{"key":"ref_3","unstructured":"Kuleshov, V., and Precup, D. (2014). Algorithms for multi-armed bandit problems. arXiv."},{"key":"ref_4","first-page":"199","article-title":"Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges","volume":"30","author":"Villar","year":"2015","journal-title":"Stat. Sci. A Rev. J. Inst. Math. Stat."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ding, W., Qin, T., Zhang, X.D., and Liu, T.Y. (2013, January 14\u201318). Multi-Armed Bandit with Budget Constraint and Variable Costs. Proceedings of the AAAI Conference on Artificial Intelligence, Bellevue, WA, USA.","DOI":"10.1609\/aaai.v27i1.8637"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1287\/mksc.2016.1023","article-title":"Customer acquisition via display advertising using multi-armed bandit experiments","volume":"36","author":"Schwartz","year":"2017","journal-title":"Mark. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Le Ny, J., Dahleh, M., and Feron, E. (2008, January 11\u201313). Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Proceedings of the 2008 American Control Conference, Seattle, WA, USA.","DOI":"10.1109\/ACC.2008.4587156"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1002\/asmb.874","article-title":"A Modern Bayesian Look at the Multi-Armed Bandit","volume":"26","author":"Scott","year":"2010","journal-title":"Appl. Stoch. Model. Bus. Ind."},{"key":"ref_9","unstructured":"Chapelle, O., and Li, L. (2011, January 12\u201314). An Empirical Evaluation of Thompson Sampling. Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS\u201911), Granada, Spain."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, L., Chu, W., Langford, J., and Schapire, R.E. (2010, January 26\u201330). A Contextual-Bandit Approach to Personalized News Article Recommendation. Proceedings of the 19th International Conference on World Wide Web (WWW\u201910), Raleigh, NC, USA.","DOI":"10.1145\/1772690.1772758"},{"key":"ref_11","unstructured":"Benedetto, G.D., Bellini, V., and Zappella, G. (2020). A Linear Bandit for Seasonal Environments. arXiv."},{"key":"ref_12","unstructured":"Besbes, O., Gur, Y., and Zeevi, A. (2014, January 8\u201313). Stochastic multi-armed-bandit problem with non-stationary rewards. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2523813","article-title":"A survey on concept drift adaptation","volume":"46","author":"Gama","year":"2014","journal-title":"ACM Comput. Surv. CSUR"},{"key":"ref_14","unstructured":"Zliobaite, I. (2010). Learning under Concept Drift: An Overview. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1613\/jair.1.11407","article-title":"Sliding-Window Thompson Sampling for Non-Stationary Settings","volume":"68","author":"Trovo","year":"2020","journal-title":"J. Artif. Intell. Res."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Slivkins, A. (2019). Introduction to Multi-Armed Bandits. arXiv.","DOI":"10.1561\/9781680836219"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1054","DOI":"10.2307\/1427934","article-title":"Sample mean based index policies with O (log n) regret for the multi-armed bandit problem","volume":"27","author":"Agrawal","year":"1995","journal-title":"Adv. Appl. Probab."},{"key":"ref_18","first-page":"397","article-title":"Using confidence bounds for exploitation-exploration trade-offs","volume":"3","author":"Auer","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1093\/biomet\/25.3-4.285","article-title":"On the likelihood that one unknown probability exceeds another in view of the evidence of two samples","volume":"25","author":"Thompson","year":"1933","journal-title":"Biometrika"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"450","DOI":"10.2307\/2371219","article-title":"On the theory of apportionment","volume":"57","author":"Thompson","year":"1935","journal-title":"Am. J. Math."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Russo, D., Roy, B.V., Kazerouni, A., Osband, I., and Wen, Z. (2017). A Tutorial on Thompson Sampling. arXiv.","DOI":"10.1561\/9781680834710"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1013689704352","article-title":"Finite-time analysis of the multiarmed bandit problem","volume":"47","author":"Auer","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kaufmann, E., Korda, N., and Munos, R. (2012, January 29\u201331). Thompson sampling: An asymptotically optimal finite-time analysis. Proceedings of the International Conference on Algorithmic Learning Theory, Lyon, France.","DOI":"10.1007\/978-3-642-34106-9_18"},{"key":"ref_24","unstructured":"Agrawal, S., and Goyal, N. (2012, January 25\u201327). Analysis of thompson sampling for the multi-armed bandit problem. Proceedings of the Conference on Learning Theory, Edinburgh, UK."},{"key":"ref_25","first-page":"2442","article-title":"An Information-Theoretic Analysis of Thompson Sampling","volume":"17","author":"Russo","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1287\/opre.2017.1663","article-title":"Learning to Optimize via Information-Directed Sampling","volume":"66","author":"Russo","year":"2018","journal-title":"Oper. Res."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chow, S.C., and Chang, M. (2006). Adaptive Design Methods in Clinical Trials, CRC Press. [2nd ed.].","DOI":"10.1201\/9781584887775"},{"key":"ref_28","unstructured":"Srinivas, N., Krause, A., Kakade, S., and Seeger, M. (2010). Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design, Omnipress."},{"key":"ref_29","unstructured":"Brochu, E., Brochu, T., and de Freitas, N. (2010, January 2\u20134). A Bayesian Interactive Optimization Approach to Procedural Animation Design. Proceedings of the 2010 ACM SIGGRAPH\/Eurographics Symposium on Computer Animation (SCA\u201910), Madrid, Spain."},{"key":"ref_30","first-page":"2346","article-title":"Learning under concept drift: A review","volume":"31","author":"Lu","year":"2018","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"\u017dliobait\u0117, I., Pechenizkiy, M., and Gama, J. (2016). An overview of concept drift applications. Big Data Analysis: New Algorithms for a New Society, Springer.","DOI":"10.1007\/978-3-319-26989-4_4"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1002\/sam.10054","article-title":"Adaptive Concept Drift Detection","volume":"2","author":"Dries","year":"2009","journal-title":"Stat. Anal. Data Min."},{"key":"ref_33","unstructured":"Klinkenberg, R., and Joachims, T. (July, January 29). Detecting Concept Drift with Support Vector Machines. Proceedings of the Seventeenth International Conference on Machine Learning (ICML\u201900), Standord, CA, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Corruble, V., Takeda, M., and Suzuki, E. (2007). Detecting Concept Drift Using Statistical Testing. Discovery Science, Springer.","DOI":"10.1007\/978-3-540-75488-6"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1517","DOI":"10.1109\/TNN.2011.2160459","article-title":"Incremental Learning of Concept Drift in Nonstationary Environments","volume":"22","author":"Elwell","year":"2011","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_36","unstructured":"Raj, V., and Kalyani, S. (2017). Taming non-stationary bandits: A Bayesian approach. arXiv."},{"key":"ref_37","unstructured":"Garivier, A., and Moulines, E. (2008). On upper-confidence bound policies for non-stationary bandit problems. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Fouch\u00e9, E., Komiyama, J., and B\u00f6hm, K. (2019, January 4\u20138). Scaling multi-armed bandit algorithms. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330862"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Bifet, A., and Gavalda, R. (2007, January 26\u201328). Learning from time-changing data with adaptive windowing. Proceedings of the 2007 SIAM International Conference on Data Mining, Minneapolis, MN, USA.","DOI":"10.1137\/1.9781611972771.42"},{"key":"ref_40","unstructured":"Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., and Sebag, M. (2006, January 8). Multi-armed bandit, dynamic environments and meta-bandits. Proceedings of the NIPS-2006 Workshop, Online Trading between Exploration and Exploitation, Whistler, BC, Canada."},{"key":"ref_41","unstructured":"Kaufmann, E., Capp\u00e9, O., and Garivier, A. (2012, January 21\u201323). On Bayesian upper confidence bounds for bandit problems. Proceedings of the Artificial Intelligence and Statistics, Canary Islands, Spain."},{"key":"ref_42","first-page":"2069","article-title":"Optimistic Bayesian sampling in contextual-bandit problems","volume":"13","author":"May","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_43","unstructured":"Mellor, J., and Shapiro, J. (May, January 29). Thompson sampling in switching environments with Bayesian online change detection. Proceedings of the Artificial Intelligence and Statistics, Scottsdale, AZ, USA."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Liu, F., Lee, J., and Shroff, N. (2017). A change-detection based framework for piecewise-stationary multi-armed bandit problem. arXiv.","DOI":"10.1609\/aaai.v32i1.11746"},{"key":"ref_45","unstructured":"Besson, L., and Kaufmann, E. (2019). The generalized likelihood ratio test meets klucb: An improved algorithm for piece-wise non-stationary bandits. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1007\/s10458-019-09419-9","article-title":"Expertise drift in referral networks","volume":"33","author":"KhudaBukhsh","year":"2019","journal-title":"Auton. Agents Multi-Agent Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"St-Pierre, D.L., and Liu, J. (2014, January 6\u201311). Differential evolution algorithm applied to non-stationary bandit problem. Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China.","DOI":"10.1109\/CEC.2014.6900505"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/s41060-017-0050-5","article-title":"The non-stationary stochastic multi-armed bandit problem","volume":"3","author":"Allesiardo","year":"2017","journal-title":"Int. J. Data Sci. Anal."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Souza, V., Reis, D.M.d., Maletzke, A.G., and Batista, G.E. (2020). Challenges in Benchmarking Stream Learning Algorithms with Real-world Data. arXiv.","DOI":"10.1007\/s10618-020-00698-5"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Sottocornola, G., Symeonidis, P., and Zanker, M. (2018, January 23\u201327). Session-Based News Recommendations. Proceedings of the Companion Proceedings of the The Web Conference 2018 (WWW\u201918), Lyon, France.","DOI":"10.1145\/3184558.3191582"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1145\/2133806.2133826","article-title":"Probabilistic Topic Models","volume":"55","author":"Blei","year":"2012","journal-title":"Commun. ACM"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"23299","DOI":"10.1073\/pnas.1908493116","article-title":"Microbial communities in the tropical air ecosystem follow a precise diel cycle","volume":"116","author":"Gusareva","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/3\/380\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:39:53Z","timestamp":1760161193000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/3\/380"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,23]]},"references-count":52,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["e23030380"],"URL":"https:\/\/doi.org\/10.3390\/e23030380","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,23]]}}}