{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:39:37Z","timestamp":1760240377682,"version":"build-2065373602"},"reference-count":31,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,5,24]],"date-time":"2019-05-24T00:00:00Z","timestamp":1558656000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration strategies are justified by reliance on the aforementioned heuristic. This paper will detail the non-intuitive discovery that when measuring the goodness of an exploration strategy by the stochastic shortest path to a goal state, there is a class of processes for which an action selection strategy based on without-replacement sampling of actions can be worse than with-replacement sampling. Specifically, the expected time until a specified goal state is first reached can be provably larger under without-replacement sampling. Numerical experiments describe the frequency and severity of this inferiority.<\/jats:p>","DOI":"10.3390\/make1020041","type":"journal-article","created":{"date-parts":[[2019,5,24]],"date-time":"2019-05-24T11:20:46Z","timestamp":1558696846000},"page":"698-714","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Exploration Using Without-Replacement Sampling of Actions Is Sometimes Inferior"],"prefix":"10.3390","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4793-7883","authenticated-orcid":false,"given":"Stephen W.","family":"Carden","sequence":"first","affiliation":[{"name":"Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA 30460, USA"}]},{"given":"S. Dalton","family":"Walker","sequence":"additional","affiliation":[{"name":"Air Force Material Command, Robins Air Force Base, GA 31098, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,5,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Pathical, S., and Serpen, G. (2010, January 11\u201314). Comparison of subsampling techniques for random subspace ensembles. Proceedings of the 2010 International Conference on Machine Learning and Cybernetics, Qingdao, China.","DOI":"10.1109\/ICMLC.2010.5581032"},{"key":"ref_2","first-page":"981","article-title":"Sampling Methods for the Nystr\u00f6m Method","volume":"13","author":"Kumar","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_3","unstructured":"Leen, T.K., Dietterich, T.G., and Tresp, V. (2001). Using the Nystr\u00f6m Method to Speed Up Kernel Machines. Advances in Neural Information Processing Systems 13, MIT Press."},{"key":"ref_4","unstructured":"Schneider, M. (2016, January 9\u201311). Probability Inequalities for Kernel Embeddings in Sampling without Replacement. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain."},{"key":"ref_5","first-page":"335","article-title":"Machine Learning with Data Dependent Hypothesis Classes","volume":"2","author":"Cannon","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Feng, X., Kumar, A., Recht, B., and R\u00e9, C. (2012). Towards a Unified Architecture for in-RDBMS Analytics. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD\u201912, ACM.","DOI":"10.1145\/2213836.2213874"},{"key":"ref_7","unstructured":"G\u00fcrb\u00fczbalaban, M., Ozdaglar, A., and Parrilo, P. (2015). Why Random Reshuffling Beats Stochastic Gradient Descent. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Meng, Q., Chen, W., Wang, Y., Ma, Z.M., and Liu, T.Y. (2019). Convergence analysis of distributed stochastic gradient descent with shuffling. Neurocomputing.","DOI":"10.1016\/j.neucom.2019.01.037"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ying, B., Yuan, K., Vlaski, S., and Sayed, A.H. (2019). Stochastic Learning Under Random Reshuffling with Constant Step-sizes. IEEE Trans. Signal Process., 67.","DOI":"10.1109\/TSP.2018.2878551"},{"key":"ref_10","unstructured":"Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., and Garnett, R. (2016). Without-Replacement Sampling for Stochastic Gradient Methods. Advances in Neural Information Processing Systems 29, Curran Associates, Inc."},{"key":"ref_11","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Introduction to Reinforcement Learning, MIT Press. [1st ed.]."},{"key":"ref_12","unstructured":"Puterman, M. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley-Interscience."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1109\/9.580874","article-title":"An Analysis of Temporal-Difference Learning with Function Approximation","volume":"42","author":"Tsitsiklis","year":"1997","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1613\/jair.4271","article-title":"Convergence of a Q-learning Variant for Continuous States and Actions","volume":"49","author":"Carden","year":"2014","journal-title":"J. Artif. Intell. Res."},{"key":"ref_16","unstructured":"Wunder, M., Littman, M., and Babes, M. (2010, January 21\u201324). Classes of Multiagent Q-learning Dynamics with \u03f5-greedy Exploration. Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tijsma, A.D., Drugan, M.M., and Wiering, M.A. (2016, January 6\u20139). Comparing exploration strategies for Q-learning in random stochastic mazes. Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece.","DOI":"10.1109\/SSCI.2016.7849366"},{"key":"ref_18","unstructured":"Kaelbling, L. (1990). Learning in Embedded Systems. [Ph.D. Thesis, Stanford University]."},{"key":"ref_19","first-page":"397","article-title":"Using Confidence Bounds for Exploitation-Exploration Trade-offs","volume":"3","author":"Auer","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_20","unstructured":"Porter, B., and Mooney, R. (1990). Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. Machine Learning Proceedings 1990, Morgan Kaufmann."},{"key":"ref_21","unstructured":"Thrun, S.B. (1992). Efficient Exploration in Reinforcement Learning, Carnegie-Mellon University. Technical Report CMU-CS-92-102."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Martin, J., Sasikumar, S.N., Everitt, T., and Hutter, M. (2017, January 19\u201325). Count-Based Exploration in Feature Space for Reinforcement Learning. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, Melbourne, Australia.","DOI":"10.24963\/ijcai.2017\/344"},{"key":"ref_23","unstructured":"Xu, Z.X., Chen, X.L., Cao, L., and Li, C.X. (2017, January 28\u201330). A study of count-based exploration and bonus for reinforcement learning. Proceedings of the 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China."},{"key":"ref_24","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Learning. Advances in Neural Information Processing Systems 30, Curran Associates, Inc."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1023\/A:1017984413808","article-title":"Near-Optimal Reinforcement Learning in Polynomial Time","volume":"49","author":"Kearns","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_26","unstructured":"Kolobov, A., and Weld, D.S. (2012). A Theory of Goal-oriented MDPs with Dead Ends. Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI\u201912, AUAI Press."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1287\/moor.16.3.580","article-title":"An Analysis of Stochastic Shortest Path Problems","volume":"16","author":"Bertsekas","year":"1991","journal-title":"Math. Oper. Res."},{"key":"ref_28","unstructured":"Biler, P., and Witkowski, A. (1990). Problems in Mathematical Analysis, CRC Press."},{"key":"ref_29","unstructured":"Resnick, S.I. (1992). Adventures in Stochastic Processes, Birkh\u00e4user."},{"key":"ref_30","unstructured":"R Core Team (2017). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing. Available online: https:\/\/www.R-project.org\/."},{"key":"ref_31","unstructured":"Trevizan, F.W., Teichteil-K\u00f6nigsberg, F., and Thi\u00e9baux, S. (2017). Efficient solutions for Stochastic Shortest Path Problems with Dead Ends. Proceedings of the Conference on Uncertainty in Artificial Intelligence, AUAI Press."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/2\/41\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:54:55Z","timestamp":1760187295000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/2\/41"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,24]]},"references-count":31,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["make1020041"],"URL":"https:\/\/doi.org\/10.3390\/make1020041","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2019,5,24]]}}}