{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T22:49:12Z","timestamp":1775602152487,"version":"3.50.1"},"reference-count":41,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,3,19]],"date-time":"2024-03-19T00:00:00Z","timestamp":1710806400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>In real-world scenarios, making navigation decisions for autonomous driving involves a sequential set of steps. These judgments are made based on partial observations of the environment, while the underlying model of the environment remains unknown. A prevalent method for resolving such issues is reinforcement learning, in which the agent acquires knowledge through a succession of rewards in addition to fragmentary and noisy observations. This study introduces an algorithm named deep reinforcement learning navigation via decision transformer (DRLNDT) to address the challenge of enhancing the decision-making capabilities of autonomous vehicles operating in partially observable urban environments. The DRLNDT framework is built around the Soft Actor-Critic (SAC) algorithm. DRLNDT utilizes Transformer neural networks to effectively model the temporal dependencies in observations and actions. This approach aids in mitigating judgment errors that may arise due to sensor noise or occlusion within a given state. The process of extracting latent vectors from high-quality images involves the utilization of a variational autoencoder (VAE). This technique effectively reduces the dimensionality of the state space, resulting in enhanced training efficiency. The multimodal state space consists of vector states, including velocity and position, which the vehicle's intrinsic sensors can readily obtain. Additionally, latent vectors derived from high-quality images are incorporated to facilitate the Agent's assessment of the present trajectory. Experiments demonstrate that DRLNDT may achieve a superior optimal policy without prior knowledge of the environment, detailed maps, or routing assistance, surpassing the baseline technique and other policy methods that lack historical data.<\/jats:p>","DOI":"10.3389\/fnbot.2024.1338189","type":"journal-article","created":{"date-parts":[[2024,3,19]],"date-time":"2024-03-19T04:55:09Z","timestamp":1710824109000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Deep reinforcement learning navigation via decision transformer in autonomous driving"],"prefix":"10.3389","volume":"18","author":[{"given":"Lun","family":"Ge","sequence":"first","affiliation":[]},{"given":"Xiaoguang","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Yongqiang","family":"Li","sequence":"additional","affiliation":[]},{"given":"Yongcong","family":"Wang","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,3,19]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2006.05990","article-title":"What matters in on-policy reinforcement learning? A large-scale empirical study","author":"Andrychowicz","year":"2020","journal-title":"arXiv"},{"key":"B2","doi-asserted-by":"publisher","first-page":"19817","DOI":"10.1109\/TITS.2022.3160673","article-title":"An end-to-end curriculum learning approach for autonomous driving scenarios","volume":"23","author":"Anzalone","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1708.05866","article-title":"A brief survey of deep reinforcement learning","author":"Arulkumaran","year":"2017","journal-title":"arXiv"},{"key":"B4","doi-asserted-by":"crossref","first-page":"2765","DOI":"10.1109\/ITSC.2019.8917306","article-title":"\u201cModel-free deep reinforcement learning for urban autonomous driving,\u201d","volume-title":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","author":"Chen","year":"2019"},{"key":"B5","first-page":"15084","article-title":"Decision transformer: reinforcement learning via sequence modeling","volume":"34","author":"Chen","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2009.14794","article-title":"Rethinking attention with performers","author":"Choromanski","year":"2020","journal-title":"arXiv"},{"key":"B7","first-page":"12792","article-title":"Cogltx: applying bert to long texts","volume":"33","author":"Ding","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2010.11929","article-title":"An image is worth 16x16 words: transformers for image recognition at scale","author":"Dosovitskiy","year":"2020","journal-title":"arXiv"},{"key":"B9","unstructured":"\u201cCARLA: an open urban driving simulator,\u201d116\n            DosovitskiyA.\n            RosG.\n            CodevillaF.\n            LopezA.\n            KoltunV.\n          Mountain View, CAPMLRProceedings of the 1st Annual Conference on Robot Learning2017"},{"key":"B10","first-page":"25502","article-title":"Why generalization in rl is difficult: epistemic pomdps and implicit partial observability","volume":"34","author":"Ghosh","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B11","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1109\/TITS.2015.2498841","article-title":"A review of motion planning techniques for automated vehicles","volume":"17","author":"Gonz\u00e1lez","year":"2015","journal-title":"IEEE Trans. Intell. Transp. Syst"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1812.05905","article-title":"Soft actor-critic algorithms and applications","author":"Haarnoja","year":"2018","journal-title":"arXiv"},{"key":"B13","article-title":"\u201cDeep recurrent q-learning for partially observable MDPS,\u201d","author":"Hausknecht","year":"2015","journal-title":"2015 AAAI Fall Symposium Series"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1512.04455","article-title":"Memory-based control with recurrent neural networks","author":"Heess","year":"2015","journal-title":"arXiv"},{"key":"B15","first-page":"2117","article-title":"Deep variational reinforcement learning for pomdps","volume":"80","author":"Igl","year":"2018","journal-title":"Proc. 35th Intl. Conf. Machine Learn. Proc. Mach. Learn. Res."},{"key":"B16","first-page":"1273","article-title":"Offline reinforcement learning as one big sequence modeling problem","volume":"34","author":"Janner","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B17","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1016\/S0004-3702(98)00023-X","article-title":"Planning and acting in partially observable stochastic domains","volume":"101","author":"Kaelbling","year":"1998","journal-title":"Artif. Intell"},{"key":"B18","doi-asserted-by":"crossref","first-page":"8248","DOI":"10.1109\/ICRA.2019.8793742","article-title":"\u201cLearning to drive in a day,\u201d","volume-title":"2019 International Conference on Robotics and Automation (ICRA)","author":"Kendall","year":"2019"},{"key":"B19","doi-asserted-by":"publisher","first-page":"4909","DOI":"10.1109\/TITS.2021.3054625","article-title":"Deep reinforcement learning for autonomous driving: a survey","volume":"23","author":"Kiran","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst"},{"key":"B20","first-page":"584","article-title":"\u201cCIRL: controllable imitative reinforcement learning for vision-based self-driving,\u201d","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","author":"Liang","year":"2018"},{"key":"B21","first-page":"32","article-title":"The continuous Bernoulli: fixing a pervasive error in variational autoencoders","author":"Loaiza-Ganem","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1312.5602","article-title":"Playing atari with deep reinforcement learning","author":"Mnih","year":"2013","journal-title":"arXiv"},{"key":"B23","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"B24","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1007\/s11370-021-00398-z","article-title":"A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning","volume":"14","author":"Morales","year":"2021","journal-title":"Intell. Serv. Robot"},{"key":"B25","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1507.04296","article-title":"Massively parallel methods for deep reinforcement learning","author":"Nair","year":"2015","journal-title":"arXiv"},{"key":"B26","doi-asserted-by":"crossref","first-page":"358","DOI":"10.1109\/IVWorkshops54471.2021.9669203","article-title":"\u201cInvestigating value of curriculum reinforcement learning in autonomous driving under diverse road and weather conditions,\u201d","volume-title":"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)","author":"Ozturk","year":"2021"},{"key":"B27","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1109\/TIV.2016.2578706","article-title":"A survey of motion planning and control techniques for self-driving urban vehicles","volume":"1","author":"Paden","year":"2016","journal-title":"IEEE Trans. Intell. Veh"},{"key":"B28","first-page":"7487","article-title":"\u201cStabilizing transformers for reinforcement learning,\u201d","volume-title":"International Conference on Machine Learning PMLR","author":"Parisotto","year":"2020"},{"key":"B29","first-page":"4055","article-title":"\u201cImage transformer,\u201d","volume-title":"Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research","author":"Parmar","year":"2018"},{"key":"B30","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman","year":"2014"},{"key":"B31","first-page":"387","article-title":"\u201cDeterministic policy gradient algorithms,\u201d","volume-title":"International Conference on Machine Learning","author":"Silver","year":"2014"},{"key":"B32","doi-asserted-by":"publisher","first-page":"29","DOI":"10.24963\/ijcai.2017\/700","article-title":"Value iteration networks","author":"Tamar","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B33","first-page":"30","article-title":"Attention is all you need","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B34","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2006.04768","article-title":"Linformer: self-attention with linear complexity","author":"Wang","year":"2020","journal-title":"arXiv"},{"key":"B35","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn"},{"key":"B36","doi-asserted-by":"publisher","first-page":"153651","DOI":"10.1109\/ACCESS.2020.3018151","article-title":"Variations in variational autoencoders-a comparative evaluation","volume":"8","author":"Wei","year":"2020","journal-title":"IEEE Access"},{"key":"B37","doi-asserted-by":"crossref","first-page":"1073","DOI":"10.1109\/IV48863.2021.9575880","article-title":"\u201cA survey of deep reinforcement learning algorithms for motion planning and control of autonomous vehicles,\u201d","volume-title":"2021 IEEE Intelligent Vehicles Symposium (IV)","author":"Ye","year":"2021"},{"key":"B38","doi-asserted-by":"publisher","first-page":"338","DOI":"10.18178\/ijmerr.11.5.338-344","article-title":"Deep reinforcement learning based autonomous driving with collision free for mobile robots","volume":"11","author":"Yeom","year":"2022","journal-title":"Int. J. Mech. Eng. Robot. Res"},{"key":"B39","first-page":"2659","article-title":"\u201cAttentionnet: aggregating weak directions for accurate object detection,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Yoo","year":"2015"},{"key":"B40","doi-asserted-by":"publisher","first-page":"1235","DOI":"10.1162\/neco_a_01199","article-title":"A review of recurrent neural networks: LSTM cells and network architectures","volume":"31","author":"Yu","year":"2019","journal-title":"Neural Comput"},{"key":"B41","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1704.07978","article-title":"On improving deep reinforcement learning for pomdps","author":"Zhu","year":"2017","journal-title":"arXiv"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2024.1338189\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,20]],"date-time":"2024-03-20T12:51:43Z","timestamp":1710939103000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2024.1338189\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,19]]},"references-count":41,"alternative-id":["10.3389\/fnbot.2024.1338189"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2024.1338189","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,19]]},"article-number":"1338189"}}