{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T21:42:21Z","timestamp":1773870141766,"version":"3.50.1"},"reference-count":36,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:00:00Z","timestamp":1773792000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>This study proposes Advisory Board Reinforcement Learning (AdvB-RL), a cooperative reinforcement-learning framework that integrates multiple advisory neural networks to guide policy optimization. Unlike conventional single-agent architectures, AdvB-RL maintains a set of independently trained advisory networks that contribute to action selection through a dynamic aggregation mechanism. This design preserves diverse experiential knowledge while improving learning stability and the exploration\u2013exploitation balance. The framework is evaluated on three benchmark control tasks, namely LunarLander-v2, CartPole-v1, and MountainCar-v0, using advisory board sizes of 1, 5, and 10 members against a Double Deep Q-Network (DDQN) baseline. The best-performing configuration, 10 AdvB, achieved 270.02 \u00b1 24.74 on LunarLander-v2 versus 227.92 \u00b1 86.02 for DDQN, 497.79 \u00b1 5.18 on CartPole-v1 versus 304.37 \u00b1 144.04, and \u2212103.16 \u00b1 15.46 on MountainCar-v0 versus \u2212130.71 \u00b1 31.64, indicating higher returns together with markedly lower variability. Across the three environments, these results show that increasing the number of advisory members improves both reward consistency and overall robustness, with the 10-member setting providing the strongest performance. Within the tested configurations, the advisory board mechanism remains computationally feasible, while preliminary experiments beyond 10 advisors show diminishing returns relative to added complexity. Overall, AdvB-RL provides a robust and modular alternative to single-policy reinforcement learning for adaptive cooperative control.<\/jats:p>","DOI":"10.3390\/a19030230","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:14:05Z","timestamp":1773843245000},"page":"230","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Multi-Agent Advisory Board Reinforcement Learning Framework for Adaptive Cooperative Control"],"prefix":"10.3390","volume":"19","author":[{"given":"Onur","family":"Osman","sequence":"first","affiliation":[{"name":"Department of Electric Electronics Engineering, \u0130stanbul Topkapi University, 34087 Istanbul, T\u00fcrkiye"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5562-6367","authenticated-orcid":false,"given":"Tolga Kudret","family":"Karaca","sequence":"additional","affiliation":[{"name":"Department of Industrial Engineering, \u0130stanbul Topkapi University, 34087 Istanbul, T\u00fcrkiye"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5295-1631","authenticated-orcid":false,"given":"Bahar","family":"Yalcin Kavus","sequence":"additional","affiliation":[{"name":"Quality Coordination Office, \u0130zmir Katip \u00c7elebi University, 35620 Izmir, T\u00fcrkiye"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gokalp","family":"Tulum","sequence":"additional","affiliation":[{"name":"Department of Electric Electronics Engineering, \u0130stanbul Topkapi University, 34087 Istanbul, T\u00fcrkiye"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5064-2181","authenticated-orcid":false,"given":"Sajjad","family":"Nematzadeh","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, \u0130stanbul Topkapi University, 34087 Istanbul, T\u00fcrkiye"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,18]]},"reference":[{"key":"ref_1","first-page":"3285","article-title":"Deep Learning for Autonomous Lunar Landing","volume":"167","author":"Furfaro","year":"2018","journal-title":"Adv. Astronaut. Sci."},{"key":"ref_2","first-page":"187","article-title":"Comparison of Three Deep Reinforcement Learning Algorithms for Solving the Lunar Lander Problem","volume":"Volume 180","author":"Ahmad","year":"2024","journal-title":"Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023)"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000086","article-title":"Model-Based Reinforcement Learning: A Survey","volume":"16","author":"Moerland","year":"2023","journal-title":"FNT Mach. Learn."},{"key":"ref_4","first-page":"43","article-title":"Powered Landing Guidance Algorithms Using Reinforcement Learning Methods for Lunar Lander Case","volume":"19","author":"Nugroho","year":"2021","journal-title":"Indones. J. Aerosp."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Shah, S., and Yao, N. (2023). Deep Reinforcement Learning for Unpredictability-Induced Rewards to Handle Spacecraft Landing. Proceedings of the 2023 13th International Conference on Information Science and Technology (ICIST), IEEE.","DOI":"10.1109\/ICIST59754.2023.10367162"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1474","DOI":"10.1109\/TCDS.2022.3221805","article-title":"Multitask Neuroevolution for Reinforcement Learning with Long and Short Episodes","volume":"15","author":"Zhang","year":"2023","journal-title":"IEEE Trans. Cogn. Dev. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"352","DOI":"10.1016\/j.actaastro.2021.01.015","article-title":"Lunar Human Landing System Architecture Tradespace Modeling","volume":"181","author":"Latyshev","year":"2021","journal-title":"Acta Astronaut."},{"key":"ref_8","unstructured":"Lu, Y., Squillante, M.S., and Wu, C.W. (2019). A Control-Model-Based Approach for Reinforcement Learning. arXiv."},{"key":"ref_9","unstructured":"Gou, S.Z., and Liu, Y. (2019). DQN with Model-Based Exploration: Efficient Learning on Environments with Sparse Rewards. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Guttulsrud, H., Sandnes, M., and Shrestha, R. (2023). Solving the Lunar Lander Problem with Multiple Uncertainties Using a Deep Q-Learning Based Short-Term Memory Agent. Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition, ACM.","DOI":"10.1145\/3633637.3633641"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2223","DOI":"10.1109\/TNNLS.2020.3044196","article-title":"An Off-Policy Trust Region Policy Optimization Method with Monotonic Improvement Guarantee for Deep Reinforcement Learning","volume":"33","author":"Meng","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"60296","DOI":"10.1109\/ACCESS.2021.3074535","article-title":"Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay","volume":"9","author":"Kang","year":"2021","journal-title":"IEEE Access"},{"key":"ref_13","unstructured":"Dollen, D.V. (2017). Investigating Reinforcement Learning Agents For. arXiv."},{"key":"ref_14","unstructured":"Van Hasselt, H. (2010, January 1). Double Q-Learning. Proceedings of the 24th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., and Silver, D. (2016, January 12\u201317). Deep Reinforcement Learning with Double Q-Learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_16","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv."},{"key":"ref_17","unstructured":"Cini, A., D\u2019Eramo, C., Peters, J., and Alippi, C. (2022). Deep Reinforcement Learning with Weighted Q-Learning. arXiv."},{"key":"ref_18","first-page":"347","article-title":"Fly4SmartCity: A Cloud Robotics Service for Smart City Applications","volume":"8","author":"Ermacora","year":"2016","journal-title":"J. Assoc. Inf. Syst."},{"key":"ref_19","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-Level Control through Deep Reinforcement Learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_21","unstructured":"Yu, X. (2026, March 10). Deep Q-Learning on Lunar Lander Game. Available online: https:\/\/www.researchgate.net\/publication\/333145451_Deep_Q-Learning_on_Lunar_Lander_Game."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Gadgil, S., Xin, Y., and Xu, C. (2020). Solving The Lunar Lander Problem under Uncertainty Using Reinforcement Learning. Proceedings of the 2020 SoutheastCon, IEEE.","DOI":"10.1109\/SoutheastCon44009.2020.9368267"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1120","DOI":"10.1016\/j.procs.2023.10.623","article-title":"Data-Efficient Reinforcement Learning with Data Augmented Episodic Memory","volume":"227","author":"Rusdyputra","year":"2023","journal-title":"Procedia Comput. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Kubo, Y., Chalmers, E., and Luczak, A. (2022). Combining Backpropagation with Equilibrium Propagation to Improve an Actor-Critic Reinforcement Learning Framework. Front. Comput. Neurosci., 16.","DOI":"10.3389\/fncom.2022.980613"},{"key":"ref_25","unstructured":"Hafiz, A.M., and Bhat, G.M. (2020). Deep Q-Network Based Multi-Agent Reinforcement Learning with Binary Action Agents. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1093\/comjnl\/bxz066","article-title":"Deep Reinforcement Learning with Adaptive Update Target Combination","volume":"63","author":"Xu","year":"2020","journal-title":"Comput. J."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ma, J., Ning, D., Zhang, C., and Liu, S. (2022). Fresher Experience Plays a More Important Role in Prioritized Experience Replay. Appl. Sci., 12.","DOI":"10.3390\/app122312489"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"6654","DOI":"10.1109\/TNNLS.2022.3212273","article-title":"Stochastic Integrated Actor\u2013Critic for Deep Reinforcement Learning","volume":"35","author":"Zheng","year":"2024","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1713","DOI":"10.1007\/s00521-021-06270-6","article-title":"Discrete-to-Deep Reinforcement Learning Methods","volume":"34","author":"Kurniawan","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_30","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Advances in Neural Information Processing Systems 30, Curran Associates, Inc."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"129384","DOI":"10.1016\/j.neucom.2025.129384","article-title":"Cooperative Multi-Agent Actor-Critic Approach Using Adaptive Value Decomposition and Parallel Training for Traffic Network Flow Control","volume":"623","author":"Zhang","year":"2025","journal-title":"Neurocomputing"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"104528","DOI":"10.1016\/j.trc.2024.104528","article-title":"Cooperative Traffic Signal Control Through a Counterfactual Multi-Agent Actor\u2013Critic with Scheduler","volume":"160","author":"Song","year":"2024","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4615","DOI":"10.1109\/TASE.2026.3660669","article-title":"Event-Driven Prescribed Optimal Disturbance Rejection for Dynamic Positioning of Ships via Reinforcement Learning","volume":"23","author":"Gao","year":"2026","journal-title":"IEEE Trans. Autom. Sci. Eng."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"4109","DOI":"10.1109\/TITS.2024.3520328","article-title":"Prescribed Performance-Based Optimal Formation Control for USVs with Position Constraints and Yaw Angle Time-Varying Partial Constraints","volume":"26","author":"Cao","year":"2024","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Scholz, J., Weber, C., Hafez, M.B., and Wermter, S. (2021). Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision. Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), IEEE.","DOI":"10.1109\/IJCNN52387.2021.9534023"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, R., Shang, Z., Zheng, C., Li, H., Liang, Q., and Cui, Y. (2022). Dynamic Policy Programming with Descending Regularization for Efficient Reinforcement Learning Control. Proceedings of the 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), IEEE.","DOI":"10.1109\/PRAI55851.2022.9904283"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/19\/3\/230\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:21:30Z","timestamp":1773843690000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/19\/3\/230"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,18]]},"references-count":36,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["a19030230"],"URL":"https:\/\/doi.org\/10.3390\/a19030230","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,18]]}}}