{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T22:24:50Z","timestamp":1770589490441,"version":"3.49.0"},"reference-count":39,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2017,7,1]],"date-time":"2017-07-01T00:00:00Z","timestamp":1498867200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"published-print":{"date-parts":[[2017,7]]},"abstract":"<jats:p>There are a lot of applications of multi-agent systems, such as robot navigation, distributed control, data mining, etc. Reinforcement learning (RL) is a popular method used in multi agent path planning. RL algorithm needs an accurate representation of a small and discrete space. In order to plan multi agents in continuous time, this paper approximate the Q-values with the fuzzy logic, such that the modified RL can work in continuous state space. The fuzzy reinforcement learning proposed in this paper uses fuzzy Q-iteration algorithm and a modified Wolf-PH algorithm. The convergence and existence of the algorithm are proven. The continuous time planning algorithm is applied to a cooperative task of two mobile Khepera robots. The experimental results show the effectiveness of the new path planning method for the multi agents in continuous time.<\/jats:p>","DOI":"10.3233\/jifs-161822","type":"journal-article","created":{"date-parts":[[2017,6,23]],"date-time":"2017-06-23T11:52:30Z","timestamp":1498218750000},"page":"491-501","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":21,"title":["Continuous-time path planning for\u00a0multi-agents with fuzzy reinforcement\u00a0learning"],"prefix":"10.1177","volume":"33","author":[{"given":"David","family":"Luviano","sequence":"first","affiliation":[{"name":"Departamento de Control Autom\u00e1tico, CINVESTAV-IPN (National Polytechnic Institute), Mexico City, Mexico"}]},{"given":"Wen","family":"Yu","sequence":"additional","affiliation":[{"name":"Departamento de Control Autom\u00e1tico, CINVESTAV-IPN (National Polytechnic Institute), Mexico City, Mexico"}]}],"member":"179","published-online":{"date-parts":[[2017,7]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"SenS. and WeissG. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence MIT PressCambridge."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008942012299"},{"key":"e_1_3_1_4_2","unstructured":"Wooldridge An Introduction to MultiAgent Systems John Wiley & Sons 2002."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.301"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-its.2009.0070"},{"key":"e_1_3_1_7_2","unstructured":"CherkasskyV. and MulierF. Learning from data: Concepts Theory and Methods Wiley-IEEE Press Chichester 1998."},{"key":"e_1_3_1_8_2","unstructured":"SejnowskiT.J. and HintonG. Unsupervised Learning: Foundations of Neural Computation MIT Press 1999."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12555-012-0382-9"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/SMC.2014.6974464"},{"key":"e_1_3_1_11_2","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1109\/5326.897075","article-title":"Multi-agent reinforcement learning using function approximation","author":"Abul O.","year":"2000","unstructured":"AbulO., PolatF. and AlhajjR., Multi-agent reinforcement learning using function approximation, IEEE transactions on Systems, Man and Cybernetics Part C: Applications and Reviews (2000), 485\u2013497.","journal-title":"IEEE transactions on Systems, Man and Cybernetics Part C: Applications and Reviews"},{"issue":"4","key":"e_1_3_1_12_2","first-page":"217","article-title":"Learning in large cooperative multirobots systems","volume":"16","author":"Fernandez F.","year":"2001","unstructured":"FernandezF. and ParkerL.E., Learning in large cooperative multirobots systems, International Journal of Robotics and Automatization, Special Issue on Computational Intelligence Techniques in Cooperative Robots16(4) (2001), 217\u2013226.","journal-title":"International Journal of Robotics and Automatization, Special Issue on Computational Intelligence Techniques in Cooperative Robots"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02481502"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(03)00040-X"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(02)00121-2"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2015.11.030"},{"key":"e_1_3_1_17_2","first-page":"195","article-title":"Planning, Learning and Coordination in Multiagent Decision Processes","author":"Boutilier C.","year":"1996","unstructured":"BoutilierC., Planning, Learning and Coordination in Multiagent Decision Processes, In Proceedings of the Sixth Conference on Theoretical Aspects of Rationality and Knowledge (TARK96), 1996, pp. 195\u20132102.","journal-title":"Proceedings of the Sixth Conference on Theoretical Aspects of Rationality and Knowledge (TARK96)"},{"key":"e_1_3_1_18_2","unstructured":"HarsanyiJ.C. and SeltenR. A General Theory of Equilibrium Selection in Games MIT Press Cambridge 1988."},{"key":"e_1_3_1_19_2","doi-asserted-by":"crossref","unstructured":"BusoniuL. BabuskaR. and De SchutterB. Multi-agent Reinforcement Learning: An Overview Innovation in MASs and Applications. SCI 310 Springer VerlagBerlin Heidelberg pp. 183\u2013221.","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"e_1_3_1_20_2","volume-title":"Society for Andustrial and Applied Mathematics","author":"Basar T.","year":"1999","unstructured":"BasarT. and OlsderG.J., Dynamic Noncooperative Game Theory, 2nd edition. Society for Andustrial and Applied Mathematics, SIAM, 1999.","edition":"2"},{"key":"e_1_3_1_21_2","article-title":"Decentralized Reinforcement Learning Control of a robotic Manipulator","author":"Busoniu L.","year":"2006","unstructured":"BusoniuL., De SchutterB. and BabuskaR., Decentralized Reinforcement Learning Control of a robotic Manipulator, International Conference on Control, Automation, Robotics and Vision, 2006, I CARCV \u201906. 9th.","journal-title":"International Conference on Control, Automation, Robotics and Vision"},{"key":"e_1_3_1_22_2","unstructured":"BertsekasD.P. Dynamic Programming and optimal control vol 2 third edition Athena Scientific."},{"key":"e_1_3_1_23_2","unstructured":"IstrateskuV. Fixed Point Theory: An introduction Springer 2002."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390240"},{"key":"e_1_3_1_25_2","first-page":"791","volume-title":"Procedings 21st International Conference on Machine Learning (ICML-04)","author":"Szepesvari Cs.","unstructured":"SzepesvariCs. and SmartW.D., Interpolation baes Q-learning, Procedings 21st International Conference on Machine Learning (ICML-04), Bannf, Canada, pp. 791\u2013798."},{"key":"e_1_3_1_26_2","first-page":"1057","volume-title":"Advances in Neural Information Processing Systems 12","author":"Sutton R.S.","year":"2000","unstructured":"SuttonR.S., McAllesterD.A., SinghS.P. and MansourY., Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems 12, MIT Press, 2000, pp. 1057\u20131063."},{"key":"e_1_3_1_27_2","unstructured":"BertsekasD.P. and TsitsiklisJ.N. Neuro-dynamic programming Athena Scientific 1996."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114724"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1977.1674779"},{"key":"e_1_3_1_30_2","unstructured":"KruseR. GebhardtJ.E. and KlowonF. Foundations of Fuzzy Systems Wiley 1994."},{"key":"e_1_3_1_31_2","first-page":"1040","volume-title":"Advances in Neural Information Processing Systems 13","author":"Gordon G.J.","year":"2001","unstructured":"GordonG.J., Reinforcement learning with function approximation converges to a region. In LeenT.K., DietterichT.G. and TrespV., editors, Advances in Neural Information Processing Systems 13, MIT Press, 2001, pp. 1040\u20131046."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00993306"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/72.159061"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1017992615625"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/9.133184"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2010.02.006"},{"key":"e_1_3_1_37_2","unstructured":"K-team Corporation 2013http:\/\/www-k-team.com"},{"key":"e_1_3_1_38_2","volume-title":"IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009)","author":"Ganapathy V.","unstructured":"GanapathyV., YunS.C. and LuiW.L.D., Utilization of webots and Khepera II as a Platform for neural Q-learning controllers, IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009), Kuala Lumpur, Malaysia."},{"key":"e_1_3_1_39_2","doi-asserted-by":"crossref","unstructured":"VlassisN. A concise Introduction to Multi Agent Systems amd Distributed Artificial Intelligence. Synthesis Lectures in Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers 2007.","DOI":"10.1007\/978-3-031-01543-4"},{"key":"e_1_3_1_40_2","doi-asserted-by":"crossref","unstructured":"BusoniuL. ErnstD. SchutterB. and BabuskaR. Continuous-State Reinforcement Learning with Fuzzy Approximation Adaptive Agents and MAS III Springer-VerlagBerlin Heidelberg TuylsK.et al. (Eds.) LNAI 48652008 pp. 27\u201343.","DOI":"10.1007\/978-3-540-77949-0_3"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-161822","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.3233\/JIFS-161822","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-161822","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T01:22:29Z","timestamp":1770513749000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.3233\/JIFS-161822"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2017,7]]}},"alternative-id":["10.3233\/JIFS-161822"],"URL":"https:\/\/doi.org\/10.3233\/jifs-161822","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,7]]}}}