{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T16:56:56Z","timestamp":1762102616481,"version":"build-2065373602"},"reference-count":45,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T00:00:00Z","timestamp":1619136000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100016047","name":"Science Fund of the Republic of Serbia","doi-asserted-by":"publisher","award":["6524745, AI-DECIDE"],"award-info":[{"award-number":["6524745, AI-DECIDE"]}],"id":[{"id":"10.13039\/501100016047","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we propose a new algorithm for distributed spectrum sensing and channel selection in cognitive radio networks based on consensus. The algorithm operates within a multi-agent reinforcement learning scheme. The proposed consensus strategy, implemented over a directed, typically sparse, time-varying low-bandwidth communication network, enforces collaboration between the agents in a completely decentralized and distributed way. The motivation for the proposed approach comes directly from typical cognitive radio networks\u2019 practical scenarios, where such a decentralized setting and distributed operation is of essential importance. Specifically, the proposed setting provides all the agents, in unknown environmental and application conditions, with viable network-wide information. Hence, a set of participating agents becomes capable of successful calculation of the optimal joint spectrum sensing and channel selection strategy even if the individual agents are not. The proposed algorithm is, by its nature, scalable and robust to node and link failures. The paper presents a detailed discussion and analysis of the algorithm\u2019s characteristics, including the effects of denoising, the possibility of organizing coordinated actions, and the convergence rate improvement induced by the consensus scheme. The results of extensive simulations demonstrate the high effectiveness of the proposed algorithm, and that its behavior is close to the centralized scheme even in the case of sparse neighbor-based inter-node communication.<\/jats:p>","DOI":"10.3390\/s21092970","type":"journal-article","created":{"date-parts":[[2021,4,25]],"date-time":"2021-04-25T02:12:57Z","timestamp":1619316777000},"page":"2970","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Distributed Spectrum Management in Cognitive Radio Networks by Consensus-Based Reinforcement Learning"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5050-6331","authenticated-orcid":false,"given":"Dejan","family":"Da\u0161i\u0107","sequence":"first","affiliation":[{"name":"Artificial Intelligence Department, Vlatacom Institute, 11070 Belgrade, Serbia"},{"name":"Faculty of Technical Sciences, Singidunum University, 11000 Belgrade, Serbia"},{"name":"COPELABS, Universidade Lus\u00f3fona de Humanidades e Tecnologias, 1749-024 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2763-2564","authenticated-orcid":false,"given":"Nemanja","family":"Ili\u0107","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Department, Vlatacom Institute, 11070 Belgrade, Serbia"},{"name":"Department of Information Technologies, College of Applied Technical Sciences, 37000 Kru\u0161evac, Serbia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4362-6201","authenticated-orcid":false,"given":"Miljan","family":"Vu\u010deti\u0107","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Department, Vlatacom Institute, 11070 Belgrade, Serbia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7945-5571","authenticated-orcid":false,"given":"Miroslav","family":"Peri\u0107","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Department, Vlatacom Institute, 11070 Belgrade, Serbia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7315-8739","authenticated-orcid":false,"given":"Marko","family":"Beko","sequence":"additional","affiliation":[{"name":"Instituto de Telecomunica\u00e7\u00f5es, Instituto Superior T\u00e9cnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal"},{"name":"Faculty of Information Technology and Engineering, University Union Nikola Tesla, 11158 Belgrade, Serbia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9064-7059","authenticated-orcid":false,"given":"Milo\u0161 S.","family":"Stankovi\u0107","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Department, Vlatacom Institute, 11070 Belgrade, Serbia"},{"name":"Faculty of Technical Sciences, Singidunum University, 11000 Belgrade, Serbia"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Lo, B.F., and Akyildiz, I.F. (2010, January 26\u201330). Reinforcement learning-based cooperative sensing in cognitive radio ad hoc networks. Proceedings of the 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Istanbul, Turkey.","DOI":"10.1109\/PIMRC.2010.5671686"},{"key":"ref_2","first-page":"1849","article-title":"Reinforcement Learning-Based Spectrum Management for Cognitive Radio Networks: A Literature Review and Case Study","volume":"Volume 3","author":"Bedogni","year":"2019","journal-title":"Handbook of Cognitive Radio"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Yu, H., and Zikria, Y.B. (2020). Cognitive Radio Networks for Internet of Things and Wireless Sensor Networks. Sensors, 20.","DOI":"10.3390\/s20185288"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"4108","DOI":"10.1109\/TWC.2012.092712.120201","article-title":"Efficient Beamforming in Cognitive Radio Multicast Transmission","volume":"11","author":"Beko","year":"2012","journal-title":"IEEE Trans. Wirel. Commun."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1109\/SURV.2009.090109","article-title":"A survey of spectrum sensing algorithms for cognitive radio applications","volume":"11","author":"Yucek","year":"2009","journal-title":"IEEE Commun. Surv. Tutorials"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1155\/WCN.2005.275","article-title":"Software-Defined Radio\u2014Basics and Evolution to Cognitive Radio","volume":"2005","author":"Jondral","year":"2005","journal-title":"EURASIP J. Wirel. Commun. Netw."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1717","DOI":"10.1109\/COMST.2016.2539923","article-title":"A Survey on Applications of Model-Free Strategy Learning in Cognitive Wireless Networks","volume":"18","author":"Wang","year":"2016","journal-title":"IEEE Commun. Surv. Tutorials"},{"key":"ref_8","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, The MIT Press. [2nd ed.]."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesv\u00e1ri, C., and Wiewiora, E. (2009, January 14\u201318). Fast gradient-descent methods for temporal-difference learning with linear function approximation. Proceedings of the 26th International Conference on Machine Learning, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553501"},{"key":"ref_10","first-page":"289","article-title":"Off-policy Learning With Eligibility Traces: A Survey","volume":"15","author":"Geist","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Stankovi\u0107, M.S., Beko, M., and Stankovi\u0107, S.S. (2020, January 11\u201317). Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence. Proceedings of the IFAC World Congress, Berlin, Germany.","DOI":"10.1016\/j.ifacol.2020.12.2184"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Stankovi\u0107, M.S., Beko, M., and Stankovi\u0107, S.S. (2021). Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning. IEEE Trans. Control Netw. Syst.","DOI":"10.1109\/TCNS.2021.3061909"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1112","DOI":"10.1109\/LWC.2019.2908371","article-title":"Distributed NOMA-Based Multi-Armed Bandit Approach for Channel Access in Cognitive Radio Networks","volume":"8","author":"Tian","year":"2019","journal-title":"IEEE Wirel. Commun. Lett."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1109\/TCCN.2017.2675901","article-title":"QoS Driven Channel Selection Algorithm for Cognitive Radio Network: Multi-User Multi-Armed Bandit Approach","volume":"3","author":"Modi","year":"2017","journal-title":"IEEE Trans. Cogn. Commun. Netw."},{"key":"ref_15","unstructured":"Kuleshov, V., and Precup, D. (2014). Algorithms for multi-armed bandit problems. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","article-title":"A Comprehensive Survey of Multiagent Reinforcement Learning","volume":"38","author":"Busoniu","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhang, K., Yang, Z., and Basar, T. (2019). Decentralized Multi-Agent Reinforcement Learning with Networked Agents: Recent Advances. arXiv.","DOI":"10.1109\/CDC.2018.8619581"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, K., Yang, Z., Liu, H., Zhang, T., and Basar, T. (2018). Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents. arXiv.","DOI":"10.1109\/CDC.2018.8619581"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1848","DOI":"10.1109\/TSP.2013.2241057","article-title":"QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus+Innovations","volume":"61","author":"Kar","year":"2013","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1260","DOI":"10.1109\/TAC.2014.2368731","article-title":"Distributed policy evaluation under multiple behavior strategies","volume":"60","author":"Macua","year":"2015","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_21","unstructured":"Da\u0161i\u0107, D., Vu\u010deti\u0107, M., Peri\u0107, M., Beko, M., and Stankovi\u0107, M. (July, January 30). Cooperative Multi-Agent Reinforcement Learning for Spectrum Management in IoT Cognitive Networks. Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics, Biarritz, France."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"101226","DOI":"10.1016\/j.phycom.2020.101226","article-title":"Intelligent spectrum management based on reinforcement learning schemes in cooperative cognitive radio networks","volume":"43","author":"Kaur","year":"2020","journal-title":"Phys. Commun."},{"key":"ref_23","unstructured":"Wu, C., Chowdhury, K., Di Felice, M., and Meleis, W. (2010). Spectrum Management of Cognitive Radio Using Multi-Agent Reinforcement Learning, International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1049\/iet-com.2010.0258","article-title":"Efficient exploration in reinforcement learning-based cognitive radio spectrum sharing","volume":"5","author":"Jiang","year":"2011","journal-title":"IET Commun."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.pmcj.2016.07.007","article-title":"An energy efficient Reinforcement Learning based Cooperative Channel Sensing for Cognitive Radio Sensor Networks","volume":"35","author":"Mustapha","year":"2017","journal-title":"Pervasive Mob. Comput."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1109\/JCN.2019.000052","article-title":"Reinforcement learning enabled cooperative spectrum sensing in cognitive radio networks","volume":"22","author":"Ning","year":"2020","journal-title":"J. Commun. Netw."},{"key":"ref_27","unstructured":"Kaur, A., and Kumar, K. (2020). Imperfect CSI based Intelligent Dynamic Spectrum Management using Cooperative Reinforcement Learning Framework in Cognitive Radio Networks. IEEE Trans. Mob. Comput."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1186\/s13638-019-1433-1","article-title":"Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networks","volume":"2019","author":"Jang","year":"2019","journal-title":"EURASIP J. Wirel. Commun. Netw."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1109\/TWC.2018.2879433","article-title":"Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access","volume":"18","author":"Naparstek","year":"2019","journal-title":"IEEE Trans. Wirel. Commun."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/TCCN.2018.2809722","article-title":"Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks","volume":"4","author":"Wang","year":"2018","journal-title":"IEEE Trans. Cogn. Commun. Netw."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/JSTSP.2018.2798920","article-title":"Spectrum Access In Cognitive Radio Using a Two-Stage Reinforcement Learning Approach","volume":"12","author":"Raj","year":"2018","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Lin, Y., Wang, C., Wang, J., and Dou, Z. (2016). A Novel Dynamic Spectrum Access Framework Based on Reinforcement Learning for Cognitive Radio Sensor Networks. Sensors, 16.","DOI":"10.3390\/s16101675"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yang, P., Li, L., Yin, J., Zhang, H., Liang, W., Chen, W., and Han, Z. (2018, January 16\u201318). Dynamic Spectrum Access in Cognitive Radio Networks Using Deep Reinforcement Learning and Evolutionary Game. Proceedings of the 2018 IEEE\/CIC International Conference on Communications in China (ICCC), Beijing, China.","DOI":"10.1109\/ICCChina.2018.8641242"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1109\/TVT.2009.2031181","article-title":"A Distributed Consensus-Based Cooperative Spectrum-Sensing Scheme in Cognitive Radios","volume":"59","author":"Li","year":"2010","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"3845","DOI":"10.1109\/T-WC.2008.070391","article-title":"Optimal spectrum sensing framework for cognitive radio networks","volume":"7","author":"Lee","year":"2008","journal-title":"IEEE Trans. Wirel. Commun."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Di Felice, M., Chowdhury, K.R., Wu, C., Bononi, L., and Meleis, W. (2010, January 1\u20133). Learning-based spectrum selection in Cognitive Radio Ad Hoc Networks. Proceedings of the Wired\/Wireless Internet Communications, Lulea, Sweden.","DOI":"10.1007\/978-3-642-13315-2_11"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"998","DOI":"10.1049\/iet-rsn.2018.5127","article-title":"Distributed target tracking in sensor networks using multi-step consensus","volume":"12","author":"Ali","year":"2018","journal-title":"IET Radar Sonar Navig."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2508","DOI":"10.1109\/TIT.2006.874516","article-title":"Randomized gossip algorithms","volume":"52","author":"Boyd","year":"2006","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Olshevsky, A., and Tsitsiklis, J.N. (2006, January 13\u201315). Convergence Rates in Distributed Consensus and Averaging. Proceedings of the 45th IEEE Conference on Decision and Control, San Diego, CA, USA.","DOI":"10.1109\/CDC.2006.376899"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1266","DOI":"10.1137\/0325070","article-title":"Asymptotic properties of distributed and communicating stochastic approximation algorithms","volume":"25","author":"Kushner","year":"1987","journal-title":"SIAM J. Control Optim."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"4069","DOI":"10.1109\/TAC.2016.2545098","article-title":"Distributed Stochastic Approximation: Weak Convergence and Network Design","volume":"61","year":"2016","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_42","first-page":"1","article-title":"An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning","volume":"17","author":"Sutton","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_43","unstructured":"Bhandari, J., Russo, D., and Singal, R. (2018, January 6\u20139). A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation. Proceedings of the 31st Conference On Learning Theory, Stockholm, Sweden."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jnca.2014.04.001","article-title":"Primary radio user activity models for cognitive radio networks: A survey","volume":"43","author":"Saleem","year":"2014","journal-title":"J. Netw. Comput. Appl."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Arjoune, Y., and Kaabouch, N. (2019). A Comprehensive Survey on Spectrum Sensing in Cognitive Radio Networks: Recent Advances, New Challenges, and Future Research Directions. Sensors, 19.","DOI":"10.3390\/s19010126"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/9\/2970\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:52:00Z","timestamp":1760161920000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/9\/2970"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,23]]},"references-count":45,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["s21092970"],"URL":"https:\/\/doi.org\/10.3390\/s21092970","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,4,23]]}}}