{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T17:54:40Z","timestamp":1754157280658,"version":"3.41.2"},"reference-count":29,"publisher":"Emerald","issue":"4","license":[{"start":{"date-parts":[[2009,11,20]],"date-time":"2009-11-20T00:00:00Z","timestamp":1258675200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,11,20]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-heading\">Purpose<\/jats:title><jats:p>The purpose of this paper is to present a neuro\u2010fuzzy system with a reinforcement learning algorithm (RL) for adaptive swarm behaviors acquisition. The basic idea is that each individual (agent) has the same internal model and the same learning procedure, and the adaptive behaviors are acquired only by the reward or punishment from the environment. The formation of the swarm is also designed by RL, e.g. temporal difference (TD)\u2010error learning algorithm, and it may bring out a faster exploration procedure comparing with the case of individual learning.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Design\/methodology\/approach<\/jats:title><jats:p>The internal model of each individual composes a part of input states classification by a fuzzy net, and a part of optimal behavior learning network which adopting a kind of RL methodology named actor\u2010critic method. The membership functions and fuzzy rules in the fuzzy net are adaptively formed online by the change of environment states observed in the trials of agent's behaviors. The weights of connections between the fuzzy net and the action\u2010value functions of actor which provides a stochastic policy of action selection, and critic which provides an evaluation to state transmission, are modified by TD\u2010error.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Findings<\/jats:title><jats:p>Simulation experiments of the proposed system with several goal\u2010directed navigation problems are accomplished and the results show that swarms are successfully formed and optimized routes are found by swarm learning faster than the case of individual learning.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Originality\/value<\/jats:title><jats:p>Two techniques, i.e. fuzzy identification system and RL algorithm, are fused into an internal model of the individuals for swarm formation and adaptive behavior acquisition. The proposed model may be applied to multi\u2010agent systems, swarm robotics, metaheuristic optimization, and so on.<\/jats:p><\/jats:sec>","DOI":"10.1108\/17563780911005854","type":"journal-article","created":{"date-parts":[[2009,11,14]],"date-time":"2009-11-14T07:05:24Z","timestamp":1258182324000},"page":"724-744","source":"Crossref","is-referenced-by-count":14,"title":["Adaptive swarm behavior acquisition by a neuro\u2010fuzzy system and reinforcement learning algorithm"],"prefix":"10.1108","volume":"2","author":[{"given":"Takashi","family":"Kuremoto","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Masanao","family":"Obayashi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kunikazu","family":"Kobayashi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","reference":[{"key":"key2022031520332369800_b2","doi-asserted-by":"crossref","unstructured":"Dorigo, M. and Caro, G.D. (1999), \u201cAnt colony optimization: a new meta\u2010heuristic\u201d, Proceedings of the 1999 Congress on Evolutionary Computation, Washington, DC, pp. 1470\u20107.","DOI":"10.1109\/CEC.1999.782657"},{"key":"key2022031520332369800_b3","doi-asserted-by":"crossref","unstructured":"Dorigo, M., Maniezzo, V. and Colorni, A. (1996), \u201cAnt system: optimization by a colony of cooperating agents\u201d, IEEE Transactions on Systems, Man, and Cybernetics \u2013 Part B, Vol. 26 No. 1, pp. 29\u201041.","DOI":"10.1109\/3477.484436"},{"key":"key2022031520332369800_b4","doi-asserted-by":"crossref","unstructured":"Doya, K. (2002), \u201cMetalearning and neuromodulation\u201d, Neural Networks, Vol. 15 No. 4, pp. 495\u2010506.","DOI":"10.1016\/S0893-6080(02)00044-8"},{"key":"key2022031520332369800_b5","doi-asserted-by":"crossref","unstructured":"Iima, H. and Yasuaki, K. (2006), \u201cSwarm reinforcement learning algorithm based on exchanging information among agents\u201d, Transaction of SICE, Vol. 42 No. 11, pp. 44\u20101251 (in Japanese).","DOI":"10.9746\/sicetr1965.42.1244"},{"key":"key2022031520332369800_b6","doi-asserted-by":"crossref","unstructured":"Jouffe, L. (1998), \u201cFuzzy inference system learning by reinforcement learning\u201d, IEEE Transactions on System, Man and Cybernetics \u2013 Part B, Vol. 28 No. 3, pp. 338\u201055.","DOI":"10.1109\/5326.704563"},{"key":"key2022031520332369800_b8","unstructured":"Kawakami, T., Kinoshita, M., Watanabe, M., Takatori, N. and Furukawa, M. (2005), \u201cAn actor\u2010critic approach for learning cooperative behaviors of multi\u2010agent seesaw balancing problems\u201d, Proceedings of the IEEE International Conference on System, Man and Cybernetics, IEEE Press, Piscataway, NJ, pp. 109\u201014."},{"key":"key2022031520332369800_b9","doi-asserted-by":"crossref","unstructured":"Kennedy, J. and Eberhart, R.C. (1995), \u201cParticle swarm optimization\u201d, Proceedings of the IEEE International Conference on Neural Networks, IEEE Press, New York, NY, pp. 1942\u20108.","DOI":"10.1109\/ICNN.1995.488968"},{"key":"key2022031520332369800_b10","unstructured":"Kennedy, J., Eberhart, R.C. and Shi, Y. (2001), Swarm Intelligence, Morgan Kaufmann, San Francisco, CA."},{"key":"key2022031520332369800_b11","unstructured":"Kobayashi, K., Mizuno, S., Kuremoto, T. and Obayashi, M. (2005), \u201cA reinforcement learning system based on state space construction using fuzzy ART\u201d, Proceedings of the International Conference on Instrumentation, Control and Information Technology (SICE Annual Conference 2005), August 8\u201010, Okayama, pp. 3653\u20108."},{"key":"key2022031520332369800_b12","doi-asserted-by":"crossref","unstructured":"Kobayashi, K., Nakano, K., Kuremoto, T. and Obayashi, M. (2008), \u201cA state predictor based reinforcement learning system\u201d, IEEJ Transactions on EIS, Vol. 128 No. 8, pp. 1303\u201011.","DOI":"10.1541\/ieejeiss.128.1303"},{"key":"key2022031520332369800_b13","unstructured":"Kuremoto, T., Obayashi, M. and Kobayashi, K. (2007), \u201cForecasting time series by SOFNN with reinforcement learning\u201d, Proceedings of the 27th Annual International Symposium on Forecasting (ISF2007), June 24\u201027, New York, NY, p. 99."},{"key":"key2022031520332369800_b14","doi-asserted-by":"crossref","unstructured":"Kuremoto, T., Obayashi, M. and Kobayashi, K. (2008a), \u201cNeural forecasting systems\u201d, in Weber, C., Elshaw, M. and Mayer, N.M. (Eds), Reinforcement Learning, Theory and Applications, Advanced Robotic Systems, IN\u2010TECH, Vienna, pp. 1\u201020.","DOI":"10.5772\/5272"},{"key":"key2022031520332369800_b17","unstructured":"Kuremoto, T., Obayashi, M., Yamamoto, A. and Kobayashi, K. (2003), \u201cPredicting chaotic time series by reinforcement learning\u201d, Proceedings of the 2nd International Conference on Computational Intelligence, Robotics and Autonomous Systems (CIRAS2003), CD\u2010ROM, December 15\u201018, Singapore."},{"key":"key2022031520332369800_b16","doi-asserted-by":"crossref","unstructured":"Kuremoto, T., Obayashi, M., Kobayashi, K., Adachi, H. and Yoneda, K. (2008b), \u201cA neuro\u2010fuzzy learning system for adaptive swarm behaviors dealing with continuous state space\u201d, Proceedings of the International Conference on Intelligent Computing (ICIC 2008), LNAI 5227, Springer, Berlin, pp. 675\u201083.","DOI":"10.1007\/978-3-540-85984-0_81"},{"key":"key2022031520332369800_b15","doi-asserted-by":"crossref","unstructured":"Kuremoto, T., Obayashi, M., Kobayashi, K., Adachi, H. and Yoneda, K. (2008c), \u201cA reinforcement learning system for swarm behaviors\u201d, Proceedings of the IEEE World Congress on Computational Intelligence (WCCI\/IJCNN 2008), June 1\u20107, Hong Kong, pp. 3710\u20105.","DOI":"10.1109\/IJCNN.2008.4634330"},{"key":"key2022031520332369800_b18","doi-asserted-by":"crossref","unstructured":"Obayashi, M., Kuremoto, T. and Kobayashi, K. (2008), \u201cA self\u2010organized fuzzy\u2010neuro reinforcement learning system for continuous state space for autonomous robots\u201d, Proceedings of the International Conference on Computational Intelligence for Modeling, Control and Automation (CIMCA 2008), December 10\u201012, Vienna, pp. 552\u20109.","DOI":"10.1109\/CIMCA.2008.25"},{"key":"key2022031520332369800_b19","doi-asserted-by":"crossref","unstructured":"P\u00e9rez\u2010Uribe, A. (2001), Using a Time\u2010Delay Actor\u2010Critic Neural Architecture with Dopamine\u2010Like Reinforcement Signal for Learning in Autonomous Robots, LNAI 2036, Springer, Heidelberg, pp. 522\u201033.","DOI":"10.1007\/3-540-44597-8_37"},{"key":"key2022031520332369800_b1","unstructured":"Renolds, C. (1986), \u201cBoids background and update\u201d, available at: www.red3d.com\/cwr\/boids\/."},{"key":"key2022031520332369800_b29","doi-asserted-by":"crossref","unstructured":"Samejima, K. and Omori, T. (1999), \u201cAdaptive internal state space construction method for reinforcement learning of a real\u2010world agent\u201d, Neural Networks, Vol. 12, pp. 1143\u201055.","DOI":"10.1016\/S0893-6080(99)00055-6"},{"key":"key2022031520332369800_b21","doi-asserted-by":"crossref","unstructured":"Schultz, W. (1998), \u201cPredictive reward signal of dopamine neurons\u201d, The Journal of Neurophysiology, Vol. 80, pp. 1\u201027.","DOI":"10.1152\/jn.1998.80.1.1"},{"key":"key2022031520332369800_b23","doi-asserted-by":"crossref","unstructured":"Schultz, W. (2001), \u201cReward signal by dopamine neurons\u201d, Neuroscientist, Vol. 7 No. 4, pp. 293\u2010302.","DOI":"10.1177\/107385840100700406"},{"key":"key2022031520332369800_b22","doi-asserted-by":"crossref","unstructured":"Schultz, W., Dayan, P. and Montague, R.P. (1997), \u201cA neural substrate of prediction and reward\u201d, Science, Vol. 275, pp. 1593\u20109.","DOI":"10.1126\/science.275.5306.1593"},{"key":"key2022031520332369800_b24","doi-asserted-by":"crossref","unstructured":"Sutton, R.S. and Barto, A.G. (1998), Reinforcement Learning: An Introduction, MIT, Cambridge, MA.","DOI":"10.1109\/TNN.1998.712192"},{"key":"key2022031520332369800_b25","unstructured":"Sycara, K.P. (1998), \u201cMultiagent systems\u201d, Artificial Intelligence Magazine, Summer, pp. 79\u201092B."},{"key":"key2022031520332369800_b28","doi-asserted-by":"crossref","unstructured":"Waelti, P., Dickinson, A. and Schultz, W. (2001), \u201cDopamine responses comply with basic assumptions of formal learning theory\u201d, Nature, Vol. 412, pp. 43\u20108.","DOI":"10.1038\/35083500"},{"key":"key2022031520332369800_b26","doi-asserted-by":"crossref","unstructured":"Wang, N., Gao, Y., Chen, Z.Q., Xie, J.Y. and Chen, S.F. (2007), \u201cA two\u2010layered multi\u2010agent reinforcement learning model and algorithm\u201d, Journal of Network and Computer Applications, Vol. 30 No. 4, pp. 1366\u201076.","DOI":"10.1016\/j.jnca.2006.09.004"},{"key":"key2022031520332369800_b27","doi-asserted-by":"crossref","unstructured":"Wang, X.S., Cheng, Y.H. and Yi, J.Q. (2007), \u201cA fuzzy actor\u2010critic reinforcement learning network\u201d, Information Sciences, Vol. 177, pp. 3764\u201081.","DOI":"10.1016\/j.ins.2007.03.012"},{"key":"key2022031520332369800_frd1","doi-asserted-by":"crossref","unstructured":"Kaelbling, L.P. and Littman, M.L. (1996), \u201cReinforcement learning: a survey\u201d, Journal of Artificial Intelligence Research, Vol. 4, pp. 237\u201085.","DOI":"10.1613\/jair.301"},{"key":"key2022031520332369800_frd2","unstructured":"Pyeatt, L.D. and Howe, A.E. (2001), \u201cDecision tree function approximation in reinforcement learning\u201d, Proceedings of the 3rd International Symposium on Adaptive Systems: Evolutionary Computation and Probabilistic Graphical Models, The Institute of Cybernetics, Mathematics and Phyisics, Habana, pp. 70\u20107."}],"container-title":["International Journal of Intelligent Computing and Cybernetics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.emeraldinsight.com\/doi\/full-xml\/10.1108\/17563780911005854","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/17563780911005854\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/17563780911005854\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:44:17Z","timestamp":1753400657000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/ijicc\/article\/2\/4\/724-744\/137260"}},"subtitle":[],"editor":[{"given":"Suranga","family":"Hettiarachchi","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2009,11,20]]},"references-count":29,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2009,11,20]]}},"alternative-id":["10.1108\/17563780911005854"],"URL":"https:\/\/doi.org\/10.1108\/17563780911005854","relation":{},"ISSN":["1756-378X"],"issn-type":[{"type":"print","value":"1756-378X"}],"subject":[],"published":{"date-parts":[[2009,11,20]]}}}