{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:44:03Z","timestamp":1760060643600,"version":"build-2065373602"},"reference-count":21,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T00:00:00Z","timestamp":1757462400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union within the framework of the National Laboratory for Autonomous Systems","award":["RRF-2.3.1-21-2022-00002","TKP2021","EK\u00d6P-24-4-I-BME-150"],"award-info":[{"award-number":["RRF-2.3.1-21-2022-00002","TKP2021","EK\u00d6P-24-4-I-BME-150"]}]},{"name":"Ministry of Innovation and Technology of Hungary from the National Research, Development and Innovation Fund","award":["RRF-2.3.1-21-2022-00002","TKP2021","EK\u00d6P-24-4-I-BME-150"],"award-info":[{"award-number":["RRF-2.3.1-21-2022-00002","TKP2021","EK\u00d6P-24-4-I-BME-150"]}]},{"name":"Ministry of Culture and Innovation of Hungary from the National Research, Development and Innovation Fund","award":["RRF-2.3.1-21-2022-00002","TKP2021","EK\u00d6P-24-4-I-BME-150"],"award-info":[{"award-number":["RRF-2.3.1-21-2022-00002","TKP2021","EK\u00d6P-24-4-I-BME-150"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Curriculum Learning (CL) is a potent field in Machine Learning that provides several excellent techniques for enhancing the performance of the training process given the same data points, regardless of the training method used. In this research, we propose a novel Monte Carlo Tree Search (MCTS)-based technique that enhances model performance, articulating the utilization of MCTS in Curriculum Learning. The proposed approach leverages MCTS to optimize the sequence of batches during the training process. First, we demonstrate the application of our method in Reinforcement Learning, where sparse rewards often diminish convergence and deteriorate performance. By leveraging the strategic planning and exploration capabilities of MCTS, our method systematically identifies and selects trajectories that are more informative and have a higher potential to enhance policy improvement. This MCTS-guided batch optimization focuses the learning process on valuable experiences, accelerating convergence and improving overall performance. We evaluate our approach on standard RL benchmarks, demonstrating that it outperforms conventional batch selection methods regarding learning speed and policy effectiveness. The results highlight the potential of combining MCTS with CL to optimize batch selection, offering a promising direction for future research in efficient Reinforcement Learning.<\/jats:p>","DOI":"10.3390\/make7030098","type":"journal-article","created":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T12:04:55Z","timestamp":1757505895000},"page":"98","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["MCTS-Based Policy Improvement for Reinforcement Learning"],"prefix":"10.3390","volume":"7","author":[{"given":"Gy\u00f6rgy","family":"Csipp\u00e1n","sequence":"first","affiliation":[{"name":"Department of Control for Transportation and Vehicle Systems, Faculty of Transportation Engineering and Vehicle Engineering, Budapest University of Technology and Economics, H-1111 Budapest, Hungary"},{"name":"Asura Technologies Ltd., H-1122 Budapest, Hungary"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8651-2018","authenticated-orcid":false,"given":"Istv\u00e1n","family":"P\u00e9ter","sequence":"additional","affiliation":[{"name":"Department of Control for Transportation and Vehicle Systems, Faculty of Transportation Engineering and Vehicle Engineering, Budapest University of Technology and Economics, H-1111 Budapest, Hungary"},{"name":"Asura Technologies Ltd., H-1122 Budapest, Hungary"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2178-2921","authenticated-orcid":false,"given":"B\u00e1lint","family":"K\u0151v\u00e1ri","sequence":"additional","affiliation":[{"name":"Department of Control for Transportation and Vehicle Systems, Faculty of Transportation Engineering and Vehicle Engineering, Budapest University of Technology and Economics, H-1111 Budapest, Hungary"},{"name":"Asura Technologies Ltd., H-1122 Budapest, Hungary"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1487-9672","authenticated-orcid":false,"given":"Tam\u00e1s","family":"B\u00e9csi","sequence":"additional","affiliation":[{"name":"Department of Control for Transportation and Vehicle Systems, Faculty of Transportation Engineering and Vehicle Engineering, Budapest University of Technology and Economics, H-1111 Budapest, Hungary"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"104463","DOI":"10.1109\/ACCESS.2024.3434965","article-title":"Distributed highway control: A cooperative reinforcement learning-based approach","volume":"12","author":"Aradi","year":"2024","journal-title":"IEEE Access"},{"key":"ref_2","first-page":"94","article-title":"Linear Parameter Varying and Reinforcement Learning Approaches for Trajectory Tracking Controller of Autonomous Vehicles","volume":"53","author":"Vu","year":"2025","journal-title":"Period. Polytech. Transp. Eng."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2407","DOI":"10.1109\/LRA.2019.2901898","article-title":"Robot motion planning in learned latent spaces","volume":"4","author":"Ichter","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","article-title":"Mastering atari, go, chess and shogi by planning with a learned model","volume":"588","author":"Schrittwieser","year":"2020","journal-title":"Nature"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1038\/s41586-022-05172-4","article-title":"Discovering faster matrix multiplication algorithms with reinforcement learning","volume":"610","author":"Fawzi","year":"2022","journal-title":"Nature"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1038\/s41586-023-06004-9","article-title":"Faster sorting algorithms discovered using deep reinforcement learning","volume":"618","author":"Mankowitz","year":"2023","journal-title":"Nature"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1051","DOI":"10.1007\/s00607-022-01141-x","article-title":"An intelligent resource management method in SDN based fog computing using reinforcement learning","volume":"106","author":"Anoushee","year":"2024","journal-title":"Computing"},{"key":"ref_9","first-page":"282","article-title":"Bandit Based Monte-Carlo Planning","volume":"Volume 4212","author":"Scheffer","year":"2006","journal-title":"Machine Learning: ECML 2006"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_11","unstructured":"Anthony, T., Tian, Z., and Barber, D. (2017). Thinking Fast and Slow with Deep Learning and Tree Search. arXiv."},{"key":"ref_12","unstructured":"Guez, A., Weber, T., Antonoglou, I., Simonyan, K., Vinyals, O., Wierstra, D., Munos, R., and Silver, D. (2018). Learning to Search with MCTSnets. arXiv."},{"key":"ref_13","unstructured":"Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2016). Prioritized Experience Replay. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zha, D., Lai, K.H., Zhou, K., and Hu, X. (2019). Experience Replay Optimization. arXiv.","DOI":"10.24963\/ijcai.2019\/589"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009, January 14\u201318). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553380"},{"key":"ref_16","unstructured":"Graves, A., Bellemare, M.G., Menick, J., Munos, R., and Kavukcuoglu, K. (2017). Automated Curriculum Learning for Neural Networks. arXiv."},{"key":"ref_17","first-page":"7382","article-title":"Curriculum learning for reinforcement learning domains: A framework and survey","volume":"21","author":"Narvekar","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","unstructured":"Wang, L., Xu, Z., Stone, P., and Xiao, X. (2024). Grounded curriculum learning. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"78342","DOI":"10.1109\/ACCESS.2024.3406768","article-title":"Automatic curriculum determination for deep reinforcement learning in reconfigurable robots","volume":"12","author":"Karni","year":"2024","journal-title":"IEEE Access"},{"key":"ref_20","unstructured":"Irandoust, S., Durand, T., Rakhmangulova, Y., Zi, W., and Hajimirsadeghi, H. (2022, January 2). Training a Vision Transformer from scratch in less than 24 hours with 1 GPU. Proceedings of the Has It Trained Yet? NeurIPS 2022 Workshop, New Orleans, LA, USA."},{"key":"ref_21","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/98\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:43:02Z","timestamp":1760035382000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/98"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,10]]},"references-count":21,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["make7030098"],"URL":"https:\/\/doi.org\/10.3390\/make7030098","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2025,9,10]]}}}