{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T23:19:09Z","timestamp":1768346349768,"version":"3.49.0"},"reference-count":22,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,2,23]],"date-time":"2019-02-23T00:00:00Z","timestamp":1550880000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2017R1D1A1B04036354"],"award-info":[{"award-number":["NRF-2017R1D1A1B04036354"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>In this paper, we propose a controller for a bicycle using the DDPG (Deep Deterministic Policy Gradient) algorithm, which is a state-of-the-art deep reinforcement learning algorithm. We use a reward function and a deep neural network to build the controller. By using the proposed controller, a bicycle can not only be stably balanced but also travel to any specified location. We confirm that the controller with DDPG shows better performance than the other baselines such as Normalized Advantage Function (NAF) and Proximal Policy Optimization (PPO). For the performance evaluation, we implemented the proposed algorithm in various settings such as fixed and random speed, start location, and destination location.<\/jats:p>","DOI":"10.3390\/sym11020290","type":"journal-article","created":{"date-parts":[[2019,2,25]],"date-time":"2019-02-25T03:06:52Z","timestamp":1551064012000},"page":"290","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Toward Self-Driving Bicycles Using State-of-the-Art Deep Reinforcement Learning Algorithms"],"prefix":"10.3390","volume":"11","author":[{"given":"SeungYoon","family":"Choi","sequence":"first","affiliation":[{"name":"Artificial Intelligence Lab, Computer Science and Engineering, Kyung Hee University, Yongin-si, Gyonggi-do, Gyeonggi 446-701, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1345-2650","authenticated-orcid":false,"given":"Tuyen P.","family":"Le","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Lab, Computer Science and Engineering, Kyung Hee University, Yongin-si, Gyonggi-do, Gyeonggi 446-701, Korea"}]},{"given":"Quang D.","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Lab, Computer Science and Engineering, Kyung Hee University, Yongin-si, Gyonggi-do, Gyeonggi 446-701, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0253-4597","authenticated-orcid":false,"given":"Md Abu","family":"Layek","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Lab, Computer Science and Engineering, Kyung Hee University, Yongin-si, Gyonggi-do, Gyeonggi 446-701, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5510-390X","authenticated-orcid":false,"given":"SeungGwan","family":"Lee","sequence":"additional","affiliation":[{"name":"Humanitas College, Kyung Hee University, Yongin, Gyeonggi 446-701, Korea"}]},{"given":"TaeChoong","family":"Chung","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Lab, Computer Science and Engineering, Kyung Hee University, Yongin-si, Gyonggi-do, Gyeonggi 446-701, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2019,2,23]]},"reference":[{"key":"ref_1","unstructured":"Nederland, G. (2018, December 10). Introducing the self-driving bicycle in The Netherlands. Available online: https:\/\/www.youtube.com\/watch?v=LSZPNwZex9s."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Keo, L., and Yamakita, M. (2009, January 10\u201315). Controlling balancer and steering for bicycle stabilization. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA.","DOI":"10.1109\/IROS.2009.5353966"},{"key":"ref_3","first-page":"1955","article-title":"Linearized dynamics equations for the balance and steer of a bicycle: a benchmark and review","volume":"Volume 463","author":"Meijaard","year":"2007","journal-title":"Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences"},{"key":"ref_4","unstructured":"Schwab, A., Meijaard, J., and Kooijman, J. (2007). Some recent developments in bicycle dynamics. Proceedings of the 12th World Congress in Mechanism and Machine Science, Russian Academy of Sciences."},{"key":"ref_5","first-page":"50","article-title":"Learning bicycle stunts","volume":"33","author":"Tan","year":"2014","journal-title":"ACM Trans. Gr. (TOG)"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lu, M., and Li, X. (2018, January 9\u201311). Deep reinforcement learning policy in Hex game system. Proceedings of the IEEE Chinese Control And Decision Conference (CCDC), Shenyang, China.","DOI":"10.1109\/CCDC.2018.8408296"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bejar, E., and Moran, A. (2018, January 20\u201323). Deep reinforcement learning based neuro-control for a two-dimensional magnetic positioning system. Proceedings of the IEEE 4th International Conference on Control, Automation and Robotics (ICCAR), Auckland, New Zealand.","DOI":"10.1109\/ICCAR.2018.8384682"},{"key":"ref_8","unstructured":"Yasuda, T., and Ohkura, K. (February, January 31). Collective behavior acquisition of real robotic swarms using deep reinforcement learning. Proceedings of the Second IEEE International Conference on Robotic Computing (IRC), Laguna Hills, CA, USA."},{"key":"ref_9","first-page":"463","article-title":"Learning to drive a bicycle using reinforcement learning and shaping","volume":"98","year":"1998","journal-title":"ICML"},{"key":"ref_10","unstructured":"Le, T.P., and Chung, T.C. (July, January 28). Controlling bicycle using deep deterministic policy gradient algorithm. Proceedings of the 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, Korea."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sutton, R., and Barto, A. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1016\/j.neunet.2008.02.003","article-title":"Reinforcement learning of motor skills with policy gradients","volume":"21","author":"Peters","year":"2008","journal-title":"Neural Netw."},{"key":"ref_13","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (arXiv, 2015). Continuous control with deep reinforcement learning, arXiv."},{"key":"ref_14","unstructured":"Le, T.P., Quang, N.D., Choi, S., and Chung, T. (2018, January 17\u201320). Learning a self-driving bicycle using deep deterministic policy Gradient. Proceedings of the 18th International Conference on Control, Automation and Systems (ICCAS), Pyeongchang, Korea."},{"key":"ref_15","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014, January 21\u201326). Deterministic policy gradient algorithms. Proceedings of the 31st International Conference on Machine Learning, Beijing, China."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"658","DOI":"10.1109\/TCST.2008.2004349","article-title":"Fuzzy sliding-mode underactuated control for autonomous dynamic balance of an electrical bicycle","volume":"17","author":"Hwang","year":"2009","journal-title":"IEEE Trans. Control Syst. Technol."},{"key":"ref_18","unstructured":"Kingma, D.P., and Ba, J. (arXiv, 2014). Adam: A method for stochastic optimization, arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1103\/PhysRev.36.823","article-title":"On the theory of the Brownian motion","volume":"36","author":"Uhlenbeck","year":"1930","journal-title":"Phys. Rev."},{"key":"ref_20","unstructured":"Gu, S., Lillicrap, T., Sutskever, I., and Levine, S. (2016, January 19\u201324). Continuous deep q-learning with model-based acceleration. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_21","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (arXiv, 2017). Proximal policy optimization algorithms, arXiv."},{"key":"ref_22","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 6\u201311). Trust region policy optimization. Proceedings of the International Conference on Machine Learning, Lille, France."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/2\/290\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:34:18Z","timestamp":1760186058000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/2\/290"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,23]]},"references-count":22,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,2]]}},"alternative-id":["sym11020290"],"URL":"https:\/\/doi.org\/10.3390\/sym11020290","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,2,23]]}}}