{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T05:45:13Z","timestamp":1761975913239,"version":"build-2065373602"},"reference-count":46,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2021,6,11]],"date-time":"2021-06-11T00:00:00Z","timestamp":1623369600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.<\/jats:p>","DOI":"10.3390\/e23060737","type":"journal-article","created":{"date-parts":[[2021,6,11]],"date-time":"2021-06-11T02:55:28Z","timestamp":1623380128000},"page":"737","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3778-9127","authenticated-orcid":false,"given":"Fengjie","family":"Sun","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Changchun 130012, China"},{"name":"Key Laboratory of Symbolic Computing and Knowledge Engineering of Ministry of Education, Jinlin University, Changchun 130012, China"}]},{"given":"Xianchang","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Changchun 130012, China"},{"name":"Key Laboratory of Symbolic Computing and Knowledge Engineering of Ministry of Education, Jinlin University, Changchun 130012, China"},{"name":"Chengdu Kestrel Artificial Intelligence Institute, Chengdu 610000, China"}]},{"given":"Rui","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Changchun 130012, China"},{"name":"Key Laboratory of Symbolic Computing and Knowledge Engineering of Ministry of Education, Jinlin University, Changchun 130012, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3698","DOI":"10.4249\/scholarpedia.3698","article-title":"Policy gradient methods","volume":"5","author":"Peters","year":"2010","journal-title":"Scholarpedia"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"109301","DOI":"10.1109\/ACCESS.2019.2933454","article-title":"Efficient Training Techniques for Multi-Agent Reinforcement Learning in Combat Tasks","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"4","DOI":"10.5772\/10528","article-title":"Mobile robot Navigation Based on Q-Learning Technique","volume":"8","author":"Khriji","year":"2011","journal-title":"Int. J. Adv. Robot. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"109544","DOI":"10.1109\/ACCESS.2019.2933492","article-title":"Crowd Navigation in an Unknown and Dynamic Environment Based on Deep Reinforcement Learning","volume":"7","author":"Sun","year":"2019","journal-title":"IEEE Access"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Nguyen, H., and La, H. (2019, January 25\u201327). Review of Deep Reinforcement Learning for Robot Manipulation. Proceedings of the 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy.","DOI":"10.1109\/IRC.2019.00120"},{"key":"ref_7","first-page":"139","article-title":"Acute pesticide poisoning: A major global health problem","volume":"43","author":"Jeyaratnam","year":"1990","journal-title":"World Health Stat. Q."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Sun, F., Wang, X., and Zhang, R. (2020). Fair Task Allocation When Cost of Task Is Multidimensional. Appl. Sci., 10.","DOI":"10.3390\/app10082798"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"139793","DOI":"10.1016\/j.scitotenv.2020.139793","article-title":"Field evaluation of spray drift and environmental impact using an agricultural unmanned aerial vehicle (UAV) sprayer","volume":"737","author":"Wang","year":"2020","journal-title":"Sci. Total Environ."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1546","DOI":"10.1002\/ps.5321","article-title":"Field evaluation of an unmanned aerial vehicle (UAV) sprayer: Effect of spray volume on deposition and the control of pests and disease in wheat","volume":"75","author":"Wang","year":"2019","journal-title":"Pest Manag. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ajmeri, N., Guo, H., Murukannaiah, P.K., and Singh, M.P. (2018, January 13\u201319). Robust Norm Emergence by Revealing and Reasoning about Context: Socially Intelligent Agents for Enhancing Privacy. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/4"},{"key":"ref_12","unstructured":"Hao, J., and Leung, H.F. (2013, January 3\u20139). The dynamics of reinforcement social learning in cooperative multiagent systems. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI \u201913), Beijing, China."},{"key":"ref_13","first-page":"36","article-title":"Path planning Based on Minimum Enerny Consumption for plant Protection UAVs in Sorties","volume":"46","author":"Bo","year":"2015","journal-title":"Trans. Chin. Soc. Agric. Mach."},{"key":"ref_14","first-page":"29","article-title":"Path Planning Method Based on Grid-GSA for Plant Protection UAV","volume":"48","author":"Wang","year":"2017","journal-title":"Trans. Chin. Soc. Agric. Mach."},{"key":"ref_15","first-page":"28","article-title":"3D Path Planning Approach Based on Gravitational Search Algorithm for Sprayer UAV","volume":"49","author":"Wang","year":"2018","journal-title":"Trans. Chin. Soc. Agric. Mach."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sun, F., Wang, X., and Zhang, R. (2020). Task scheduling system for UAV operations in agricultural plant protection environment. J. Ambient. Intell. Humaniz. Comput., 1\u201315.","DOI":"10.1007\/s12652-020-01969-1"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/MAES.2019.2914986","article-title":"Conflict Detection and Resolution for Civil Aviation: A Literature Survey","volume":"34","author":"Tang","year":"2019","journal-title":"IEEE Aerosp. Electron. Syst. Mag."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1016\/j.trc.2018.10.006","article-title":"A causal encounter model of traffic collision avoidance system operations for safety assessment and advisory optimization in high-density airspace","volume":"96","author":"Tang","year":"2018","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_19","first-page":"7","article-title":"From Social Monitoring to Normative Influence","volume":"4","author":"Conte","year":"2001","journal-title":"J. Artif. Soc. Soc. Simul."},{"key":"ref_20","unstructured":"Alechina, N., Dastani, M., and Logan, B. (2012, January 4\u20138). Programming norm-aware agents. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems\u2014Volume 2, Valencia, Spain."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1007\/s10458-017-9372-x","article-title":"Severity-sensitive norm-governed multi-agent planning","volume":"32","author":"Gasparini","year":"2018","journal-title":"Auton. Agents Multi-Agent Syst."},{"key":"ref_22","unstructured":"Meneguzzi, F., and Luck, M. (2009, January 10\u201315). Norm-based behaviour modification in BDI agents. Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS \u201909)\u2014Volume 1, Budapest, Hungary."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Dignum, F., Morley, D., Sonenberg, E., and Cavedon, L. (2000). Towards socially sophisticated BDI agents. Proceedings of the Fourth International Conference on MultiAgent Systems, IEEE Computer Society.","DOI":"10.1109\/ICMAS.2000.858442"},{"key":"ref_24","unstructured":"Fagundes, M.S., Billhardt, H., and Ossowski, S. (2010, January 1\u20135). Normative reasoning with an adaptive self-interested agent model based on Markov decision processes. Proceedings of the 12th Ibero-American Conference on Advances in Artificial Intelligence (IBERAMIA\u201910), Bah\u00eda Blanca, Argentina."},{"key":"ref_25","unstructured":"Ajmeri, N., Jiang, J., Chirkova, R., Doyle, J., and Singh, M.P. (2016, January 9\u201315). Coco: Runtime reasoning about conflicting commitments. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI\u201916), New York, NY, USA."},{"key":"ref_26","unstructured":"van Riemsdijk, M.B., Dennis, L., Fisher, M., and Hindriks, K.V. (2015, January 4\u20138). A Semantic Framework for Socially Adaptive Agents: Towards strong norm compliance. Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems  (AAMAS \u201915), Istanbul, Turkey."},{"key":"ref_27","unstructured":"Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. [Ph.D. Thesis, King\u2019s College, University of Cambridge]."},{"key":"ref_28","unstructured":"Smart, W., and Kaelbling, L.P. (2002, January 11\u201315). Effective reinforcement learning for mobile robots. Proceedings of the 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), Washington, DC, USA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Martinez-Gil, F., Lozano, M., and Fern\u00e1ndez, F. (2011, January 2). Multi-agent reinforcement learning for simulating pedestrian navigation. Proceedings of the 11th International Conference on Adaptive and Learning Agents, Taipei, Taiwan.","DOI":"10.1007\/978-3-642-28499-1_4"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Casadiego, L., and Pelechano, N. (2015, January 26\u201328). From One to Many: Simulating Groups of Agents with Reinforcement Learning Controllers. Proceedings of the Intelligent Virtual Agents: 15th International Conference (IVA 2015), Delft, The Netherlands.","DOI":"10.1007\/978-3-319-21996-7_12"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Li, S., Xu, X., and Zuo, L. (2015, January 8\u201310). Dynamic path planning of a mobile robot with improved Q-learning algorithm. Proceedings of the 2015 IEEE International Conference on Information and Automation, Lijiang, China.","DOI":"10.1109\/ICInfA.2015.7279322"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bianchi, R.A.C., Ribeiro, C.H.C., and Costa, A.H.R. (2004). Heuristically Accelerated Q\u2013Learning: A New Approach to Speed Up Reinforcement Learning. Proceedings of the Brazilian Symposium on Artificial Intelligence, Springer.","DOI":"10.1007\/978-3-540-28645-5_25"},{"key":"ref_33","first-page":"77","article-title":"A study of FMQ heuristic in cooperative multi-agent games","volume":"Volume 1","author":"Matignon","year":"2008","journal-title":"Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems. Workshop 10: Multi-Agent Sequential Decision Making in Uncertain Multi-Agent Domains, Aamas\u201908"},{"key":"ref_34","first-page":"663","article-title":"Algorithms for Inverse Reinforcement Learning","volume":"Volume 67","author":"Ng","year":"2000","journal-title":"Proceedings of the Seventeenth International Conference on Machine Learning"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Henry, P., Vollmer, C., Ferris, B., and Fox, D. (2010, January 3\u20137). Learning to navigate through crowded environments. Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA.","DOI":"10.1109\/ROBOT.2010.5509772"},{"key":"ref_36","unstructured":"Anschel, O., Baram, N., and Shimkin, N. (2017, January 6\u201311). Averaged-DQN: Variance reduction and stabilization for deep reinforcement learning. Proceedings of the 34th International Conference on Machine Learning\u2014Volume 70, Sydney, Australia."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wang, P., Li, H., and Chan, C.Y. (2019, January 9\u201312). Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm. Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France.","DOI":"10.1109\/IVS.2019.8813903"},{"key":"ref_38","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"118898","DOI":"10.1109\/ACCESS.2019.2937108","article-title":"Multi-Agent Deep Reinforcement Learning-Based Cooperative Spectrum Sensing With Upper Confidence Bound Exploration","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"124147","DOI":"10.1109\/ACCESS.2019.2938390","article-title":"A Reinforcement Learning-Based QAM\/PSK Symbol Synchronizer","volume":"7","author":"Matta","year":"2019","journal-title":"IEEE Access"},{"key":"ref_42","unstructured":"Littman, M.L., and Szepesv\u00e1ri, C. (1996, January 3\u20136). A Generalized Reinforcement-Learning Model: Convergence and Applications. Proceedings of the Machine Learning, Thirteenth International Conference (ICML \u201996), Bari, Italy."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","article-title":"Reinforcement learning: A survey","volume":"4","author":"Kaelbling","year":"1996","journal-title":"J. Artif. Intell. Res."},{"key":"ref_44","unstructured":"Science, D. (2018, July 05). MG-1200P Flight Battery User Guide. Available online: https:\/\/dl.djicdn.com\/downloads\/mg_1p\/20180705\/MG-12000P+Flight+Battery+User+Guide_Multi.pdf."},{"key":"ref_45","first-page":"175","article-title":"Research on the Informatization Construction of Agricultural Cooperatives\u2014A Case Study of Rural Areas in Southern Fujian","volume":"24","author":"Cai","year":"2012","journal-title":"J. Jiangxi Agric."},{"key":"ref_46","unstructured":"News, N. (2018, July 05). Japanese Companies Develop New UAVs to Cope with Aging Farmers. Available online: https:\/\/news.163.com\/air\/18\/0904\/15\/DQSCN107000181O6.html."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/6\/737\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:13:09Z","timestamp":1760163189000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/6\/737"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,11]]},"references-count":46,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2021,6]]}},"alternative-id":["e23060737"],"URL":"https:\/\/doi.org\/10.3390\/e23060737","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2021,6,11]]}}}