{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T10:19:21Z","timestamp":1778149161979,"version":"3.51.4"},"reference-count":36,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,2,20]],"date-time":"2022-02-20T00:00:00Z","timestamp":1645315200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62001067"],"award-info":[{"award-number":["62001067"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Pre-research Fund Project","award":["61405180409"],"award-info":[{"award-number":["61405180409"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This paper studies the problem of distributed spectrum\/channel access for cognitive radio-enabled unmanned aerial vehicles (CUAVs) that overlay upon primary channels. Under the framework of cooperative spectrum sensing and opportunistic transmission, a one-shot optimization problem for channel allocation, aiming to maximize the expected cumulative weighted reward of multiple CUAVs, is formulated. To handle the uncertainty due to the lack of prior knowledge about the primary user activities as well as the lack of the channel-access coordinator, the original problem is cast into a competition and cooperation hybrid multi-agent reinforcement learning (CCH-MARL) problem in the framework of Markov game (MG). Then, a value-iteration-based RL algorithm, which features upper confidence bound-Hoeffding (UCB-H) strategy searching, is proposed by treating each CUAV as an independent learner (IL). To address the curse of dimensionality, the UCB-H strategy is further extended with a double deep Q-network (DDQN). Numerical simulations show that the proposed algorithms are able to efficiently converge to stable strategies, and significantly improve the network performance when compared with the benchmark algorithms such as the vanilla Q-learning and DDQN algorithms.<\/jats:p>","DOI":"10.3390\/s22041651","type":"journal-article","created":{"date-parts":[[2022,2,21]],"date-time":"2022-02-21T08:34:47Z","timestamp":1645432487000},"page":"1651","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Multi-Agent Reinforcement Learning for Joint Cooperative Spectrum Sensing and Channel Access in Cognitive UAV Networks"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1856-8337","authenticated-orcid":false,"given":"Weiheng","family":"Jiang","sequence":"first","affiliation":[{"name":"Communication Measurement and Control Center, Chongqing University, Chongqing 400044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1641-2467","authenticated-orcid":false,"given":"Wanxin","family":"Yu","sequence":"additional","affiliation":[{"name":"Communication Measurement and Control Center, Chongqing University, Chongqing 400044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7500-8723","authenticated-orcid":false,"given":"Wenbo","family":"Wang","sequence":"additional","affiliation":[{"name":"Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tiancong","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ye, L., Zhang, Y., Li, Y., and Han, S. (2020, January 15\u201319). A Dynamic Cluster Head Selecting Algorithm for UAV Ad Hoc Networks. Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus.","DOI":"10.1109\/IWCMC48107.2020.9148458"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4175","DOI":"10.1109\/TCOMM.2020.2986289","article-title":"Cellular UAV-to-device communications: Trajectory design and mode selection by multi-agent deep reinforcement learning","volume":"68","author":"Wu","year":"2020","journal-title":"IEEE Trans. Commun."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"12418","DOI":"10.1109\/TVT.2020.3028301","article-title":"Impact of UAV rotation on MIMO channel characterization for air-to-ground communication systems","volume":"69","author":"Ma","year":"2020","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Jingnan, L., Pengfei, L., and Kai, L. (2017, January 27\u201329). Research on UAV communication network topology based on small world network model. Proceedings of the 2017 IEEE International Conference on Unmanned Systems (ICUS), Beijing, China.","DOI":"10.1109\/ICUS.2017.8278386"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3391","DOI":"10.1109\/TII.2020.2987421","article-title":"Reinforcement learning-based multislot double-threshold spectrum sensing with Bayesian fusion for industrial big spectrum data","volume":"17","author":"Liu","year":"2020","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1109\/JCN.2019.000052","article-title":"Reinforcement learning enabled cooperative spectrum sensing in cognitive radio networks","volume":"22","author":"Ning","year":"2020","journal-title":"J. Commun. Netw."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1768","DOI":"10.1109\/JIOT.2018.2882532","article-title":"An efficient wideband spectrum sensing algorithm for unmanned aerial vehicle communication networks","volume":"6","author":"Xu","year":"2018","journal-title":"IEEE Internet Things J."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"5711","DOI":"10.1109\/TVT.2019.2909167","article-title":"UAV-based 3D spectrum sensing in spectrum-heterogeneous networks","volume":"68","author":"Shen","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Nie, R., Xu, W., Zhang, Z., Zhang, P., Pan, M., and Lin, J. (2019, January 20\u201324). Max-min distance clustering based distributed cooperative spectrum sensing in cognitive UAV networks. Proceedings of the ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China.","DOI":"10.1109\/ICC.2019.8761421"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1016\/j.comcom.2019.01.010","article-title":"CogMOR-MAC: A cognitive multi-channel opportunistic reservation MAC for multi-UAVs ad hoc networks","volume":"136","author":"Feng","year":"2019","journal-title":"Comput. Commun."},{"key":"ref_11","first-page":"948","article-title":"Throughput optimization for cognitive UAV networks: A three-dimensional-location-aware approach","volume":"9","author":"Liang","year":"2020","journal-title":"IEEE Wirel. Commun. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1104","DOI":"10.1049\/iet-map.2018.6129","article-title":"3D non-stationary geometry-based multi-input multi-output channel model for UAV-ground communication systems","volume":"13","author":"Zhu","year":"2019","journal-title":"IET Microw. Antennas Propag."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"4533","DOI":"10.1109\/TAES.2020.3003104","article-title":"Ultra-Wideband Air-to-Ground Propagation Channel Characterization in an Open Area","volume":"56","author":"Khawaja","year":"2020","journal-title":"IEEE Trans. Aerosp. Electron. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"858","DOI":"10.1109\/JSTSP.2013.2259797","article-title":"Multiagent reinforcement learning based spectrum sensing policies for cognitive radio networks","volume":"7","author":"Lunden","year":"2013","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"9181","DOI":"10.1109\/TVT.2016.2520983","article-title":"Joint spectrum sensing and resource allocation scheme in cognitive radio networks with spectrum sensing data falsification attack","volume":"65","author":"Chen","year":"2016","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"118898","DOI":"10.1109\/ACCESS.2019.2937108","article-title":"Multi-agent deep reinforcement learning-based cooperative spectrum sensing with upper confidence bound exploration","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1109\/TCCN.2020.2982895","article-title":"Deep reinforcement learning for dynamic spectrum sensing and aggregation in multi-channel wireless networks","volume":"6","author":"Li","year":"2020","journal-title":"IEEE Trans. Cogn. Commun. Netw."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1778","DOI":"10.1109\/LWC.2020.3004687","article-title":"Coordination Graph-Based Deep Reinforcement Learning for Cooperative Spectrum Sensing under Correlated Fading","volume":"9","author":"Cai","year":"2020","journal-title":"IEEE Wirel. Commun. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1237","DOI":"10.1007\/s11276-012-0530-4","article-title":"Reinforcement learning for cooperative sensing gain in cognitive radio ad hoc networks","volume":"19","author":"Lo","year":"2013","journal-title":"Wirel. Netw."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1016\/j.aeue.2018.07.029","article-title":"Distributed cooperative spectrum sensing based on reinforcement learning in cognitive radio networks","volume":"94","author":"Zhang","year":"2018","journal-title":"AEU-Int. J. Electron. Commun."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1337","DOI":"10.1109\/TNSM.2020.3000274","article-title":"Energy-efficient resource allocation in cognitive radio networks under cooperative multi-agent model-free reinforcement learning schemes","volume":"17","author":"Kaur","year":"2020","journal-title":"IEEE Trans. Netw. Serv. Manag."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Nobar, S.K., Ahmed, M.H., Morgan, Y., and Mahmoud, S. (IEEE Trans. Cogn. Commun. Netw., 2021). Resource Allocation in Cognitive Radio-Enabled UAV Communication, IEEE Trans. Cogn. Commun. Netw., in press.","DOI":"10.1109\/TCCN.2021.3103531"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1109\/TWC.2019.2935201","article-title":"Multi-agent reinforcement learning-based resource allocation for UAV networks","volume":"19","author":"Cui","year":"2019","journal-title":"IEEE Trans. Wirel. Commun."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1109\/MCOM.2016.7470932","article-title":"Designing and implementing future aerial communication networks","volume":"54","author":"Chandrasekharan","year":"2016","journal-title":"IEEE Commun. Mag."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Chen, Z., and Qiu, R.C. (2011, January 17\u201320). Cooperative spectrum sensing using q-learning with experimental validation. Proceedings of the 2011 Proceedings of IEEE Southeastcon, Nashville, TN, USA.","DOI":"10.1109\/SECON.2011.5752975"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3006","DOI":"10.1109\/TWC.2010.080610.100317","article-title":"Efficient cooperative spectrum sensing with minimum overhead in cognitive radio","volume":"9","author":"Han","year":"2010","journal-title":"IEEE Trans. Wirel. Commun."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Abdi, N., Yazdian, E., and Hoseini, A.M.D. (2015, January 29). Optimum number of secondary users in cooperative spectrum sensing methods based on random matrix theory. Proceedings of the 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran.","DOI":"10.1109\/ICCKE.2015.7365844"},{"key":"ref_29","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, K., Yang, Z., and Ba\u015far, T. (2021). Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, Springer.","DOI":"10.1007\/978-3-030-60990-0_12"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1441","DOI":"10.1109\/TMC.2012.112","article-title":"E-MiLi: Energy-minimizing idle listening in wireless networks","volume":"11","author":"Zhang","year":"2012","journal-title":"IEEE Trans. Mob. Comput."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_33","unstructured":"Filar, J., and Vrieze, K. (2012). Competitive Markov Decision Processes, Springer Science & Business Media."},{"key":"ref_34","unstructured":"Jin, C., Allen-Zhu, Z., Bubeck, S., and Jordan, M.I. (2018). Is Q-learning provably efficient?. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., and Silver, D. (2016, January 12\u201317). Deep reinforcement learning with double q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_36","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/4\/1651\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:23:23Z","timestamp":1760135003000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/4\/1651"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,20]]},"references-count":36,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["s22041651"],"URL":"https:\/\/doi.org\/10.3390\/s22041651","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,20]]}}}