{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T00:41:35Z","timestamp":1775090495063,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2022,6,17]],"date-time":"2022-06-17T00:00:00Z","timestamp":1655424000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>As one of the main elements of reinforcement learning, the design of the reward function is often not given enough attention when reinforcement learning is used in concrete applications, which leads to unsatisfactory performances. In this study, a reward function matrix is proposed for training various decision-making modes with emphasis on decision-making styles and further emphasis on incentives and punishments. Additionally, we model a traffic scene via graph model to better represent the interaction between vehicles, and adopt the graph convolutional network (GCN) to extract the features of the graph structure to help the connected autonomous vehicles perform decision-making directly. Furthermore, we combine GCN with deep Q-learning and multi-step double deep Q-learning to train four decision-making modes, which are named the graph convolutional deep Q-network (GQN) and the multi-step double graph convolutional deep Q-network (MDGQN). In the simulation, the superiority of the reward function matrix is proved by comparing it with the baseline, and evaluation metrics are proposed to verify the performance differences among decision-making modes. Results show that the trained decision-making modes can satisfy various driving requirements, including task completion rate, safety requirements, comfort level, and completion efficiency, by adjusting the weight values in the reward function matrix. Finally, the decision-making modes trained by MDGQN had better performance in an uncertain highway exit scene than those trained by GQN.<\/jats:p>","DOI":"10.3390\/s22124586","type":"journal-article","created":{"date-parts":[[2022,6,19]],"date-time":"2022-06-19T21:19:26Z","timestamp":1655673566000},"page":"4586","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Multi-Agent Decision-Making Modes in Uncertain Interactive Traffic Scenarios via Graph Convolution-Based Deep Reinforcement Learning"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7317-8059","authenticated-orcid":false,"given":"Xin","family":"Gao","sequence":"first","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100080, China"}]},{"given":"Xueyuan","family":"Li","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100080, China"}]},{"given":"Qi","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100080, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7056-4264","authenticated-orcid":false,"given":"Zirui","family":"Li","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100080, China"},{"name":"Department of Transport and Planning, Faculty of Civil Engineering and Geosciences, Delft University of Technology, Stevinweg 1, 2628 CN Delft, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5119-0519","authenticated-orcid":false,"given":"Fan","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100080, China"}]},{"given":"Tian","family":"Luan","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100080, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Stoma, M., Dudziak, A., Caban, J., and Dro\u017adziel, P. (2021). The Future of Autonomous Vehicles in the Opinion of Automotive Market Users. Energies, 14.","DOI":"10.3390\/en14164777"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Liu, Q., Li, X., Yuan, S., and Li, Z. (2021, January 19\u201322). Decision-Making Technology for Autonomous Vehicles Learning-Based Methods, Applications and Future Outlook. Proceedings of the IEEE International Intelligent Transportation Systems Conference, Indianapolis, IN, USA.","DOI":"10.1109\/ITSC48978.2021.9564580"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"4642","DOI":"10.1109\/TVT.2022.3150793","article-title":"Joint optimization of sensing, decision-making and motion-controlling for autonomous vehicles: A deep reinforcement learning approach","volume":"71","author":"Chen","year":"2022","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1109\/TRO.2016.2624754","article-title":"Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age","volume":"32","author":"Cadena","year":"2016","journal-title":"IEEE Trans. Robot."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Liu, Q., Yuan, S., and Li, Z. (2020, January 27\u201328). A Survey on Sensor Technologies for Unmanned Ground Vehicles. Proceedings of the 2020 3rd International Conference on Unmanned Systems, Harbin, China.","DOI":"10.1109\/ICUS50048.2020.9274845"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"113816","DOI":"10.1016\/j.eswa.2020.113816","article-title":"Self-driving cars: A survey","volume":"165","author":"Badue","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yu, Y., Lu, C., Yang, L., Li, Z., and Gong, J. (November, January 19). Hierarchical Reinforcement Learning Combined with Motion Primitives for Automated Overtaking. Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA.","DOI":"10.1109\/IV47402.2020.9304815"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1421","DOI":"10.1016\/j.ifacol.2018.08.315","article-title":"Computational Intelligence in Control of AGV Multimodal Systems","volume":"51","author":"Gola","year":"2018","journal-title":"IFAC-PapersOnLine"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, Q., Li, Z., Yuan, S., Zhu, Y., and Li, X. (2021). Review on Vehicle Detection Technology for Unmanned Ground Vehicles. Sensors, 21.","DOI":"10.3390\/s21041354"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bouton, M., Nakhaei, A., Fujimura, K., and Kochenderfer, M.J. (2019, January 27\u201330). Cooperation-aware reinforcement learning for merging in dense traffic. Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand.","DOI":"10.1109\/ITSC.2019.8916924"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Caban, J., Nieoczym, A., Dudziak, A., Krajka, T., and Stopkov\u00e1, M. (2022). The Planning Process of Transport Tasks for Autonomous Vans\u2013Case Study. Appl. Sci., 12.","DOI":"10.3390\/app12062993"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1515\/eng-2020-0006","article-title":"Autonomous vans - the planning process of transport tasks","volume":"10","author":"Nieoczym","year":"2020","journal-title":"Open Eng."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"10704","DOI":"10.1109\/TIE.2022.3146549","article-title":"Personalized Driver Braking Behavior Modelling in the Car-following Scenario: An Importance Weight-based Transfer Learning Approach","volume":"69","author":"Li","year":"2022","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1146\/annurev-control-060117-105157","article-title":"Planning and Decision-Making for Autonomous Vehicles","volume":"1","author":"Schwarting","year":"2018","journal-title":"Annu. Rev. Control Robot. Auton. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Matignon, L., Laurent, G.J., and Fort-Piat, N.L. (2006). Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning, Springer.","DOI":"10.1007\/11840817_87"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jmsy.2018.11.005","article-title":"Simulation study on reward function of reinforcement learning in gantry work cell scheduling","volume":"50","author":"Ou","year":"2019","journal-title":"J. Manuf. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5876","DOI":"10.1109\/TVT.2020.2986005","article-title":"A Decision-Making Strategy for Vehicle Autonomous Braking in Emergency via Deep Reinforcement Learning","volume":"69","author":"Fu","year":"2020","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"8707","DOI":"10.1109\/TVT.2021.3098321","article-title":"Interpretable Decision-Making for Autonomous Vehicles at Highway On-Ramps with Latent Space Reinforcement Learning","volume":"70","author":"Wang","year":"2021","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2472","DOI":"10.1109\/TVT.2022.3143840","article-title":"ES-DQN: A Learning Method for Vehicle Intelligent Speed Control Strategy under Uncertain Cut-in Scenario","volume":"71","author":"Chen","year":"2022","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"102505","DOI":"10.1016\/j.sysarc.2022.102505","article-title":"DRL-GAT-SA: Deep reinforcement learning for autonomous driving planning based on graph attention networks and simplex architecture","volume":"126","author":"Peng","year":"2022","journal-title":"J. Syst. Archit."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Li, Z., Lu, C., Yi, Y., and Gong, J. (2021). A Hierarchical Framework for Interactive Behaviour Prediction of Heterogeneous Traffic Participants Based on Graph Neural Network. IEEE Trans. Intell. Transp. Syst., 1\u201313.","DOI":"10.1109\/TITS.2021.3113995"},{"key":"ref_22","unstructured":"Jiang, J., Dun, C., Huang, T., and Lu, Z. (2018). Graph Convolutional Reinforcement Learning. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1016\/j.ins.2021.07.007","article-title":"Dynamic Graph Convolutional Network for Long-Term Traffic Flow Prediction with Reinforcement Learning","volume":"578","author":"Peng","year":"2021","journal-title":"Inf. Sci."},{"key":"ref_24","unstructured":"Kipf, T.N., and Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks. arXiv."},{"key":"ref_25","unstructured":"Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. [Ph.D. Thesis, Kings College University of Cambridge]."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_27","unstructured":"Dong, J., Chen, S., Ha, P., Li, Y., and Labi, S. (2020). A DRL-based Multiagent Cooperative Control Framework for CAV Networks: A Graphic Convolution Q Network. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"103452","DOI":"10.1016\/j.trc.2021.103452","article-title":"Decision making of autonomous vehicles in lane change scenarios: Deep reinforcement learning approaches with risk awareness","volume":"134","author":"Li","year":"2022","journal-title":"Transp. Res. Part Emerg. Technol."},{"key":"ref_29","unstructured":"Jazar, R.N., and Dai, L. (2020). Artificial Intelligence and Internet of Things for Autonomous Vehicles. Nonlinear Approaches in Engineering Applications: Automotive Applications of Engineering Problems, Springer International Publishing."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1016\/j.trpro.2020.02.053","article-title":"Distraction of the Driver and Its Impact on Road Safety","volume":"44","author":"Tarkowski","year":"2020","journal-title":"Transp. Res. Procedia"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/12\/4586\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:33:56Z","timestamp":1760139236000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/12\/4586"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,17]]},"references-count":30,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["s22124586"],"URL":"https:\/\/doi.org\/10.3390\/s22124586","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,17]]}}}