{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T04:35:16Z","timestamp":1780461316404,"version":"3.54.1"},"reference-count":30,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2022,12,6]],"date-time":"2022-12-06T00:00:00Z","timestamp":1670284800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61806221"],"award-info":[{"award-number":["61806221"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["WDZC20225250403"],"award-info":[{"award-number":["WDZC20225250403"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Defense Scientific Research Program","award":["61806221"],"award-info":[{"award-number":["61806221"]}]},{"name":"National Defense Scientific Research Program","award":["WDZC20225250403"],"award-info":[{"award-number":["WDZC20225250403"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>With the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorithms perform curriculum ranking through expert experience and a single network, which has the problems of difficult curriculum task ranking and slow convergence speed. In this paper, we propose a curriculum reinforcement learning method based on K-Fold Cross Validation that can estimate the relativity score of task curriculum difficulty. Drawing lessons from the human concept of curriculum learning from easy to difficult, this method divides automatic curriculum learning into a curriculum difficulty assessment stage and a curriculum sorting stage. Through parallel training of the teacher model and cross-evaluation of task sample difficulty, the method can better sequence curriculum learning tasks. Finally, simulation comparison experiments were carried out in two types of multi-agent experimental environments. The experimental results show that the automatic curriculum learning method based on K-Fold cross-validation can improve the training speed of the MADDPG algorithm, and at the same time has a certain generality for multi-agent deep reinforcement learning algorithm based on the replay buffer mechanism.<\/jats:p>","DOI":"10.3390\/e24121787","type":"journal-article","created":{"date-parts":[[2022,12,7]],"date-time":"2022-12-07T02:18:48Z","timestamp":1670379528000},"page":"1787","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Curriculum Reinforcement Learning Based on K-Fold Cross Validation"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5599-5856","authenticated-orcid":false,"given":"Zeyang","family":"Lin","sequence":"first","affiliation":[{"name":"Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jun","family":"Lai","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiliang","family":"Chen","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lei","family":"Cao","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Foglino, F., Christakou, C.C., and Gutierrez, R.L. (2019). Curriculum learning for cumulative return maximization. arXiv.","DOI":"10.24963\/ijcai.2019\/320"},{"key":"ref_2","unstructured":"Mnih, V., Kavukcuoglu, K., and Silver, D. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_3","first-page":"12602","article-title":"Curriculum-guided hindsight experience replay","volume":"19","author":"Fang","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_5","unstructured":"Palmer, G., Tuyls, K., and Bloembergen, D. (2017). Lenient multi-agent deep reinforcement learning. arXiv."},{"key":"ref_6","unstructured":"Sunehag, P., Lever, G., and Gruslys, A. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv."},{"key":"ref_7","unstructured":"Rashid, T., Samvelyan, M., and Schroeder, C. (2018, January 10\u201315). Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden."},{"key":"ref_8","unstructured":"Hausknecht, M., and Stone, P. (2015, January 17\u201321). Deep recurrent q-learning for partially observable mdps. Proceedings of the 2015 AAAI Fall Symposium Series, Arlington, VA, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Foerster, J., Farquhar, G., and Afouras, T. (2018, January 5\u20139). Counterfactual multi-agent policy gradients. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, S. (2020, January 22\u201325). Multi-agent deep deterministic policy gradient for traffic signal control on urban road network. Proceedings of the 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), Melbourne, Australia.","DOI":"10.1109\/AEECA49918.2020.9213523"},{"key":"ref_11","unstructured":"Yu, C., Velu, A., and Vinitsky, E. (2021). The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Shi, D., Guo, X., and Liu, Y. (2022). Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning. Entropy, 24.","DOI":"10.3390\/e24060774"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Portelas, R., Colas, C., and Weng, L. (2020). Automatic curriculum learning for deep rl: A short survey. arXiv.","DOI":"10.24963\/ijcai.2020\/671"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Bengio, Y., Louradour, J., and Collobert, R. (2009, January 14\u201318). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning (ICML), Quebec, MT, Canada.","DOI":"10.1145\/1553374.1553380"},{"key":"ref_15","unstructured":"Schaul, T., Quan, J., and Antonoglou, I. (2015). Prioritized experience replay. arXiv."},{"key":"ref_16","unstructured":"Sutton, R.S., and Barto, A.G. (2014). Learning to execute. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1038\/nature20101","article-title":"Hybrid computing using a neural network with dynamic external memory","volume":"538","author":"Graves","year":"2016","journal-title":"Nature"},{"key":"ref_18","unstructured":"Silva, F.L.D., and Costa, A.H.R. (2018, January 8\u201312). Object-oriented curriculum generation for reinforcement learning. Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems, New York, NY, USA."},{"key":"ref_19","first-page":"36","article-title":"Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems","volume":"34","author":"Chen","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_20","unstructured":"Weinshall, D., Cohen, G., and Amir, D. (2018, January 10\u201315). Curriculum learning by transfer learning: Theory and experiments with deep networks. Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA."},{"key":"ref_21","first-page":"12151","article-title":"Safe reinforcement learning via curriculum induction","volume":"33","author":"Turchetta","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3732","DOI":"10.1109\/TNNLS.2019.2934906","article-title":"Teacher\u2013student curriculum learning","volume":"31","author":"Matiisen","year":"2019","journal-title":"IEEE. Trans. Neural Net. Learn. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Narvekar, S., and Stone, P. (2018). Learning curriculum policies for reinforcement learning. arXiv.","DOI":"10.24963\/ijcai.2017\/757"},{"key":"ref_24","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lei, W., Wen, H., and Wu, J. (2021). MADDPG-based security situational awareness for smart grid with intelligent edge. Appl. Sci., 11.","DOI":"10.3390\/app11073101"},{"key":"ref_26","unstructured":"Fedus, W., Ramachandran, P., and Agarwal, R. (2020, January 13\u201318). Revisiting fundamentals of experience replay. Proceedings of the International Conference on Machine Learning (ICML), Virtual Event."},{"key":"ref_27","unstructured":"Portelas, R., Colas, C., and Hofmann, K. (2020, January 8\u201313). Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. Proceedings of the Conference on Robot Learning (PMLR), San Diego, CA, USA."},{"key":"ref_28","first-page":"154","article-title":"Self-paced learning for latent variable models","volume":"23","author":"Kumar","year":"2010","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_29","unstructured":"Florensa, C., Held, D., and Geng, X. (2018, January 10\u201315). Automatic goal generation for reinforcement learning agents. Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden."},{"key":"ref_30","first-page":"133","article-title":"Multi-agent actor-critic for mixed cooperative-competitive environments","volume":"30","author":"Lowe","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/12\/1787\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:35:06Z","timestamp":1760146506000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/12\/1787"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,6]]},"references-count":30,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["e24121787"],"URL":"https:\/\/doi.org\/10.3390\/e24121787","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,6]]}}}