{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:42:59Z","timestamp":1767339779953,"version":"build-2065373602"},"reference-count":63,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,4,16]],"date-time":"2019-04-16T00:00:00Z","timestamp":1555372800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent\u2019s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.<\/jats:p>","DOI":"10.3390\/make1020035","type":"journal-article","created":{"date-parts":[[2019,4,17]],"date-time":"2019-04-17T03:02:01Z","timestamp":1555470121000},"page":"590-610","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems"],"prefix":"10.3390","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8322-2722","authenticated-orcid":false,"given":"Zohreh","family":"Akbari","sequence":"first","affiliation":[{"name":"Institute for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen, 45141 Essen, Germany"}]},{"given":"Rainer","family":"Unland","sequence":"additional","affiliation":[{"name":"Institute for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen, 45141 Essen, Germany"},{"name":"Department of Information Systems, Poznan University of Economics and Business, 61-875 Poznan, Poland"}]}],"member":"1968","published-online":{"date-parts":[[2019,4,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Puterman, M.L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc.","DOI":"10.1002\/9780470316887"},{"key":"ref_2","unstructured":"Littman, M.L. (1996). Algorithms for Sequential Decision Making. [Ph.D. Thesis, Department of Computer Science, Brown University]."},{"key":"ref_3","first-page":"703","article-title":"Swarm Intelligence in Cellular Robotic Systems","volume":"102","author":"Beni","year":"1989","journal-title":"Robot. Biol. Syst. A New Bionics"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1109\/MRA.2013.2252996","article-title":"Swarmanoid: A novel concept for the study of heterogeneous robotic swarms","volume":"20","author":"Dorigo","year":"2013","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Akbari, Z., and Unland, R. (2018, January 20\u201322). A Holonic Multi-Agent Based Diagnostic Decision Support System for Computer-Aided History and Physical Examination. Proceedings of the Advances in Practical Applications of Agents, Multi-Agent Systems, and Complexity: The PAAMS Collection (PAAMS 2018), Lecture Notes in Computer Science, Toledo, Spain.","DOI":"10.1007\/978-3-319-94580-4_3"},{"key":"ref_6","unstructured":"(2019, March 11). UNANIMOUS AI. Available online: https:\/\/unanimous.ai\/."},{"key":"ref_7","unstructured":"Dorigo, M., and Birattari, M. (2019, March 11). Swarm intelligence. Available online: http:\/\/www.scholarpedia.org\/article\/Swarm_intelligence."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Montes de Oca, M.A., Pena, J., St\u00fctzle, T., Pinciroli, C., and Dorigo, M. (2009, January 18\u201321). Heterogeneous Particle Swarm Optimizers. Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Trondheim, Norway.","DOI":"10.1109\/CEC.2009.4983013"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"810","DOI":"10.1109\/TASE.2015.2403253","article-title":"Collective motions of heterogeneous swarms","volume":"12","author":"Szwaykowska","year":"2015","journal-title":"IEEE Trans. Autom. Sci. Eng."},{"key":"ref_10","unstructured":"Ferante, E. (2009). A Control Architecture for a Heterogenous Swarm of Robots, Universit\u00e9 Libre de Bruxelles."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1109\/TAC.2010.2040494","article-title":"Segregation of heterogeneous units in a swarm of robotic agents","volume":"55","author":"Kumar","year":"2010","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Pinciroli, C., O\u2019Grady, R., Christensen, A.L., and Dorigo, M. (2010, January 8\u201310). Coordinating heterogeneous swarms through minimal communication among homogeneous sub-swarms. Proceedings of the International Conference on Swarm Intelligence, Brussels, Belgium.","DOI":"10.1007\/978-3-642-15461-4_59"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Engelbrecht, A.P. (2010, January 8\u201310). Heterogeneous particle swarm optimization. Proceedings of the International Conference on Swarm Intelligence (ANTS 2010), Brussels, Belgium.","DOI":"10.1007\/978-3-642-15461-4_17"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1080\/17445760.2015.1118477","article-title":"Hierarchical heterogeneous particle swarm optimization: algorithms and evaluations","volume":"31","author":"Ma","year":"2015","journal-title":"Intern. J. Parallel Emergent Distrib. Syst."},{"key":"ref_15","unstructured":"van Hasselt, H.P. (2011). Insights in Reonforcment Learning, W\u00f6hrmann Print Service."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Tandon, P., Lam, S., Shih, B., Mehta, T., Mitev, A., and Ong, Z. (2017). Quantum Robotics: A Primer on Current Science and Future Perspectives, Morgan & Claypool Publichers.","DOI":"10.1007\/978-3-031-02520-4"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Poole, D., and Mackworth, A. (2010). Artificial Intelligence: Foundations of Computational Agents, Cambridge University Press.","DOI":"10.1017\/CBO9780511794797"},{"key":"ref_19","unstructured":"Mitchell, T.M. (1997). Chapter 13: Reinforcement Learning. Machine Learning, McGraw-Hill Science\/Engineering\/Math."},{"key":"ref_20","unstructured":"Vrancx, P. (2010). Decentralised Reinforcement Learning in Markov Games. [Ph.D. Thesis, Vrije Universiteit Brussel]."},{"key":"ref_21","unstructured":"Coggen, M. (2004). Exploration and Exploitation in Reinforcement Learning, CRA-W DMP Project at McGrill University."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TSMC.1983.6313077","article-title":"Neuronlike adaptive elements that can solve difficult learning control problems","volume":"SMC-13","author":"Barto","year":"1983","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/BF00115009","article-title":"Learning to predict by the methods of temporal differences","volume":"3","author":"Sutton","year":"1988","journal-title":"Mach. Learn."},{"key":"ref_24","unstructured":"Watkins, C.J.C.H. (1989). Learning from delayed rewards. [Ph.D. Thesis, Cambridge University]."},{"key":"ref_25","unstructured":"Schwartz, A. (1993, January 27\u201329). A reinforcement learning method for maximizing undiscounted rewards. Proceedings of the 10th International Conference on Machine Learning, Amherst, MA, USA."},{"key":"ref_26","unstructured":"Rummery, G.A., and Niranjan, M. (1994). On-Line Q-Learning Using Connectionist Systems, Department of Engineering, University of Cambridge."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wiering, M.A., and van Hasselt, H. (2007, January 1\u20135). Two novel on-policy reinforcement learning algortihms based on TD(\u03bb)-methods. Proceedings of the Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, Honolulu, HI, USA.","DOI":"10.1109\/ADPRL.2007.368200"},{"key":"ref_28","unstructured":"Hoffman, m., Doucet, A., de Freitas, N., and Jasra, A. (2007, January 3\u20136). Trans-dimensional MCMC for Bayesian Policy Learning. Proceedings of the 20th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_29","first-page":"183","article-title":"Multi-agent reinforcement learning: An overview","volume":"310","author":"Schutter","year":"2010","journal-title":"Innov. Multi-Agent Syst. Appl."},{"key":"ref_30","first-page":"41","article-title":"Multiagent Learning: Basics, Challenges, and Prospects","volume":"33","author":"Tuyls","year":"2012","journal-title":"AI Mag."},{"key":"ref_31","unstructured":"Dorigo, M. (1992). Optimization, Learning and Natural Algorithms. [Ph.D. Thesis, Politecnico di Milano]."},{"key":"ref_32","unstructured":"Gambardella, L.M., and Dorigo, M. (1995, January 9\u201312). Ant-Q: A Reinforcement Learning approach to the traveling salesmn problem. Proceedings of the ML-95, 12th International Conference on Machine Learning, Tahoe City, CA, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Monekosso, N., and Remagnino, A.P. (2001, January 10\u201314). Phe-Q: A pheromone based Q-learning. Proceedings of the Australian Joint Conference on Artificial Intelligence: AI 2001, LNAI 2256, Adelaide, SA, Australia.","DOI":"10.1007\/3-540-45656-2_30"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Iima, H., Kuroe, Y., and Matsuda, S. (2010, January 10\u201313). Swarm reinforcement learning method based on ant colony optimization. Proceedings of the 2010 IEEE International Conference on Systems Man and Cybernetics (SMC), Istanbul, Turkey.","DOI":"10.1109\/ICSMC.2010.5642307"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Hong, M., Jung, J.J., and Camacho, D. (2017). GRSAT: A novel method on group recommendation by social affinity and trustworthiness. Cybern. Syst., 140\u2013161.","DOI":"10.1080\/01969722.2016.1276770"},{"key":"ref_36","unstructured":"Hong, M., Jung, J.J., and Lee, M. (2015, January 26\u201327). Social Affinity-Based Group Recommender System. Proceedings of the International Conference on Context-Aware Systems and Applications, Vung Tau, Vietnam."},{"key":"ref_37","unstructured":"(2019, March 11). APA Dictionary of Psychology. Available online: https:\/\/dictionary.apa.org."},{"key":"ref_38","first-page":"1717","article-title":"A Cognitive Theory of Trust","volume":"84","author":"Hill","year":"2005","journal-title":"84 Wash. U. L. Rev."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chatterjee, K., Majumdar, R., and Henzinger, T.A. (2006, January 23\u201325). Markov decision processes with multiple objectives. Proceedings of the Annual Symposium on Theoretical Aspects of Computer Science, Marseille, France.","DOI":"10.1007\/11672142_26"},{"key":"ref_40","first-page":"1","article-title":"Multi-Objective Markov Decision Processes for Data-Driven Decision Support","volume":"17","author":"Lizotte","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1613\/jair.3987","article-title":"A survey of multi-objective sequential decision-making","volume":"48","author":"Roijers","year":"2013","journal-title":"J. Artif. Intell. Res."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hwang, C.-L., and Masud, A.S.M. (1979). Multiple Objective Decision Making, Methods and Application: A State-of-the-Art Survey, Springer.","DOI":"10.1007\/978-3-642-45511-7"},{"key":"ref_43","unstructured":"Melo, F. (2001). Convergence of Q-learning: A simple proof, Institute of Systems and Robotics. Institute Of Systems and Robotics, Tech. Rep (2001)."},{"key":"ref_44","unstructured":"Jaakkola, T., Jordan, M.I., and Singh, S. (December, January 29). Convergence of stochastic iterative dynamic programming algorithms. Proceedings of the 6th International Conference on Neural Information Processing Systems, Denver, CO, USA."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1162\/neco.1994.6.6.1185","article-title":"On the convergence of stochastic iterative dynamic programming algorithms","volume":"6","author":"Jaakkola","year":"1994","journal-title":"Neural Comput."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Akbari, Z., and Unland, R. (2017, January 23\u201326). A Holonic Multi-Agent System Approach to Differential Diagnosis. Proceedings of the Multiagent System Technologies. MATES 2017, Leipzig, Germany.","DOI":"10.1007\/978-3-319-64798-2_17"},{"key":"ref_47","unstructured":"(2019, March 11). GAMA Platform. Available online: https:\/\/gama-platform.github.io\/."},{"key":"ref_48","unstructured":"Gerber, C., Siekmann, J.H., and Vierke, G. (1999). Holonic Multi-Agent Systems, DFKI-RR-99-03."},{"key":"ref_49","unstructured":"Merriam-Webster (2019, March 11). Differential Diagnosis. Available online: https:\/\/www.merriam-webster.com\/dictionary\/differential%20diagnosis."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1515\/dx-2013-0009","article-title":"Differential diagnosis: The key to reducing diagnosis error, measuring diagnosis and a mechanism to reduce healthcare costs","volume":"1","author":"Maude","year":"2014","journal-title":"Diagnosis"},{"key":"ref_51","unstructured":"Koestler, A. (1967). The Ghost in the Machine, Hutchinson."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Rodriguez, S.A. (2005). From Analysis to Design of Holonic Multi-Agent Systems: A Framework, Methodological Guidelines and Applications. [Ph.D. Thesis, University of Technology of Belfort-Montb\u00e9liard].","DOI":"10.1007\/11428862_98"},{"key":"ref_53","unstructured":"Lavendelis, E., and Grundspenkis, J. (2008, January 22\u201324). Open holonic multi-agent architecture for intelligent tutoring system development. Proceedings of the IADIS International Conference on Intelligent Systems and Agents, Amsterdam, The Netherlands."},{"key":"ref_54","unstructured":"Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996, January 2\u20134). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the 2nd International Conference on Knowledge Discoverey and Data Mining, Portland, OR, USA."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Akbari, Z., and Unland, R. (2016, January 16\u201318). Automated Determination of the Input Parameter of the DBSCAN Based on Outlier Detection. Proceedings of the Artificial Intelligence Applications and Innovations (AIAI 2016), IFIP Advances in Information and Communication Technology, Thessaloniki, Greece.","DOI":"10.1007\/978-3-319-44944-9_24"},{"key":"ref_56","unstructured":"(2019, March 11). NIST\/SEMATECH e-Handbook of Statistical Methods, Available online: http:\/\/www.itl.nist.gov\/div898\/handbook\/."},{"key":"ref_57","unstructured":"(2019, March 11). Mayo Clinic. Available online: https:\/\/www.mayoclinic.org\/."},{"key":"ref_58","first-page":"351","article-title":"Distal Madelung-Launois-Bensaude disease: An unusual differential diagnosis of acromalic arthritis","volume":"26","author":"Lemaire","year":"2008","journal-title":"Clin. Exp. Rheumatol."},{"key":"ref_59","unstructured":"Polikar, R. (2019, March 11). Ensemble learning. Available online: http:\/\/www.scholarpedia.org\/article\/Ensemble_learning."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1162\/neco.1991.3.1.79","article-title":"Adaptive mixtures of local experts","volume":"3","author":"Jacobs","year":"1991","journal-title":"Neural Comput."},{"key":"ref_61","unstructured":"Jordan, M.I., and Jacobs, R.A. (1993, January 25\u201329). Hierarchical mixtures of experts and the EM algorithm. Proceedings of the 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Nagoya, Japan."},{"key":"ref_62","unstructured":"Read, J., Pfahringer, B., Holmes, G., and Frank, E. (2009, January 7\u201311). Classifier Chains for Multi-label Classification. Proceedings of the 13th European Conference on Principles and Practice of Knowledge Discovery in Databases and the 20th European Conference on Machine Learning, Bled, Slovenia."},{"key":"ref_63","unstructured":"Liu, W., and Tsang, I.W. (2015, January 7\u201312). On the optimality of classifier chain for multi-label classification. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/2\/35\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:45:55Z","timestamp":1760186755000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/2\/35"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,16]]},"references-count":63,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["make1020035"],"URL":"https:\/\/doi.org\/10.3390\/make1020035","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2019,4,16]]}}}