{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T14:49:24Z","timestamp":1777128564617,"version":"3.51.4"},"reference-count":34,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,9,6]],"date-time":"2023-09-06T00:00:00Z","timestamp":1693958400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>Deep reinforcement learning (DRL) has been demonstrated to be effective for several complex decision-making applications, such as autonomous driving and robotics. However, DRL is notoriously limited by its high sample complexity and its lack of stability. Prior knowledge, e.g., as expert demonstrations, is often available but challenging to leverage to mitigate these issues. In this paper, we propose General Reinforced Imitation (GRI), a novel method which combines benefits from exploration and expert data and is straightforward to implement over any off-policy RL algorithm. We make one simplifying hypothesis: expert demonstrations can be seen as perfect data whose underlying policy gets a constant high reward. Based on this assumption, GRI introduces the notion of offline demonstration agent. This agent sends expert data which are processed both concurrently and indistinguishably with the experiences coming from the online RL exploration agent. We show that our approach enables major improvements on camera-based autonomous driving in urban environments. We further validate the GRI method on Mujoco continuous control tasks with different off-policy RL algorithms. Our method ranked first on the CARLA Leaderboard and outperforms World on Rails, the previous state-of-the-art method, by 17%.<\/jats:p>","DOI":"10.3390\/robotics12050127","type":"journal-article","created":{"date-parts":[[2023,9,6]],"date-time":"2023-09-06T10:07:37Z","timestamp":1693994857000},"page":"127","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":44,"title":["GRI: General Reinforced Imitation and Its Application to Vision-Based Autonomous Driving"],"prefix":"10.3390","volume":"12","author":[{"given":"Raphael","family":"Chekroun","sequence":"first","affiliation":[{"name":"Center for Robotics, Mines Paris, PSL University, 75006 Paris, France"},{"name":"Valeo Driving Assistant Research, 75017 Paris, France"}]},{"given":"Marin","family":"Toromanoff","sequence":"additional","affiliation":[{"name":"Valeo Driving Assistant Research, 75017 Paris, France"}]},{"given":"Sascha","family":"Hornauer","sequence":"additional","affiliation":[{"name":"Center for Robotics, Mines Paris, PSL University, 75006 Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4799-7285","authenticated-orcid":false,"given":"Fabien","family":"Moutarde","sequence":"additional","affiliation":[{"name":"Center for Robotics, Mines Paris, PSL University, 75006 Paris, France"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,6]]},"reference":[{"key":"ref_1","unstructured":"Bojarski, M., Testa, D.D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., and Zhang, J. (2016). End to End Learning for Self-Driving Cars. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2300000053","article-title":"An algorithmic perspective on imitation learning","volume":"7","author":"Osa","year":"2018","journal-title":"Found. Trends\u00ae Robot."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Prakash, A., Chitta, K., and Geiger, A. (2021, January 19\u201325). Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR46437.2021.00700"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Toromanoff, M., Wirbel, E., Wilhelm, F., Vejarano, C., Perrotton, X., and Moutarde, F. (2018, January 1\u20135). End to End Vehicle Lateral Control Using a Single Fisheye Camera. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594090"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_6","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_7","unstructured":"Fujimoto, S., van Hoof, H., and Meger, D. (2018, January 10\u201315). Addressing Function Approximation Error in Actor-Critic Methods. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_8","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 20\u201322). Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the The 33rd International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_9","unstructured":"Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and Koltun, V. (2017, January 13\u201315). CARLA: An Open Urban Driving Simulator. Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chen, D., Koltun, V., and Kr\u00e4henb\u00fchl, P. (2021, January 11\u201317). Learning to drive from a world on rails. Proceedings of the ICCV, Virtual.","DOI":"10.1109\/ICCV48922.2021.01530"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Codevilla, F., Santana, E., Lopez, A., and Gaidon, A. (November, January 27). Exploring the Limitations of Behavior Cloning for Autonomous Driving. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00942"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Todorov, E., Erez, T., and Tassa, Y. (2012, January 7\u201312). MuJoCo: A physics engine for model-based control. Proceedings of the 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal.","DOI":"10.1109\/IROS.2012.6386109"},{"key":"ref_13","unstructured":"Chen, D., Zhou, B., Koltun, V., and Kr\u00e4henb\u00fchl, P. (2019, January 8\u201311). Learning by Cheating. Proceedings of the Conference on Robot Learning (CoRL), London, UK."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Gordon, D., Kadian, A., Parikh, D., Hoffman, J., and Batra, D. (November, January 27). SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00111"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Toromanoff, M., Wirbel, E., and Moutarde, F. (2020, January 13\u201319). End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00718"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Liniger, A., Dai, D., Yu, F., and Van Gool, L. (2021, January 11\u201317). End-to-End Urban Driving by Imitating a Reinforcement Learning Coach. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Virtual.","DOI":"10.1109\/ICCV48922.2021.01494"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Hester, T., Vecer\u00edk, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Sendonaris, A., Dulac-Arnold, G., Osband, I., and Agapiou, J.P. (2017). Learning from Demonstrations for Real World Reinforcement Learning. arXiv.","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"ref_18","unstructured":"Reddy, S., Dragan, A.D., and Levine, S. (2019). SQIL: Imitation Learning via Regularized Behavioral Cloning. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Rajeswaran, A., Kumar, V., Gupta, A., Schulman, J., Todorov, E., and Levine, S. (2017). Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. arXiv.","DOI":"10.15607\/RSS.2018.XIV.049"},{"key":"ref_20","unstructured":"Martin, J.B., Chekroun, R., and Moutarde, F. (2021). Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Xu, D., Nair, S., Zhu, Y., Gao, J., Garg, A., Fei-Fei, L., and Savarese, S. (2018, January 21\u201325). Neural Task Programming: Learning to Generalize Across Hierarchical Tasks. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia.","DOI":"10.1109\/ICRA.2018.8460689"},{"key":"ref_22","unstructured":"Gao, Y., Xu, H., Lin, J., Yu, F., Levine, S., and Darrell, T. (2018). Reinforcement Learning from Imperfect Demonstrations. arXiv."},{"key":"ref_23","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, January 10\u201315). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_24","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2016, January 2\u20134). Continuous control with deep reinforcement learning. Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hessel, M., Modayil, J., van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M.G., and Silver, D. (2017). Rainbow: Combining Improvements in Deep Reinforcement Learning. arXiv.","DOI":"10.1609\/aaai.v32i1.11796"},{"key":"ref_26","unstructured":"Dabney, W., Ostrovski, G., Silver, D., and Munos, R. (2018, January 10\u201315). Implicit Quantile Networks for Distributional Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_27","unstructured":"Tan, M., and Le, Q. (2019, January 9\u201315). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_28","unstructured":"Toromanoff, M., Wirbel, E., and Moutarde, F. (2019, January 8\u201314). Is Deep Reinforcement Learning Really Superhuman on Atari?. Proceedings of the Deep Reinforcement Learning Workshop of 39th Conference on Neural Information Processing Systems (Neurips\u20192019), Vancouver, BC, Canada."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hu, H., Liu, Z., Chitlangia, S., Agnihotri, A., and Zhao, D. (2022, January 19\u201324). Investigating the impact of multi-lidar placement on object detection for autonomous driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00258"},{"key":"ref_30","first-page":"6119","article-title":"Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline","volume":"35","author":"Wu","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Shao, H., Wang, L., Chen, R., Waslander, S.L., Li, H., and Liu, Y. (2023, January 18\u201322). ReasonNet: End-to-End Driving with Temporal and Global Reasoning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01319"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chen, D., and Kr\u00e4henb\u00fchl, P. (2022, January 19\u201324). Learning from all vehicles. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01671"},{"key":"ref_33","unstructured":"Shao, H., Wang, L., Chen, R., Li, H., and Liu, Y. (2023, January 6\u20139). Safety-enhanced autonomous driving using interpretable sensor fusion transformer. Proceedings of the Conference on Robot Learning, Atlanta, GA, USA."},{"key":"ref_34","first-page":"3557","article-title":"ChainerRL: A Deep Reinforcement Learning Library","volume":"22","author":"Fujita","year":"2021","journal-title":"J. Mach. Learn. Res."}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/12\/5\/127\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:46:02Z","timestamp":1760129162000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/12\/5\/127"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,6]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["robotics12050127"],"URL":"https:\/\/doi.org\/10.3390\/robotics12050127","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,6]]}}}