{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T08:51:59Z","timestamp":1768985519638,"version":"3.49.0"},"reference-count":21,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T00:00:00Z","timestamp":1626220800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004329","name":"Javna Agencija za Raziskovalno Dejavnost RS","doi-asserted-by":"publisher","award":["P2-0270"],"award-info":[{"award-number":["P2-0270"]}],"id":[{"id":"10.13039\/501100004329","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Autonomous mobile robots (AMRs) are increasingly used in modern intralogistics systems as complexity and performance requirements become more stringent. One way to increase performance is to improve the operation and cooperation of multiple robots in their shared environment. The paper addresses these problems with a method for off-line route planning and on-line route execution. In the proposed approach, pre-computation of routes for frequent pick-up and drop-off locations limits the movements of AMRs to avoid conflict situations between them. The paper proposes a reinforcement learning approach where an agent builds the routes on a given layout while being rewarded according to different criteria based on the desired characteristics of the system. The results show that the proposed approach performs better in terms of throughput and reliability than the commonly used shortest-path-based approach for a large number of AMRs operating in the system. The use of the proposed approach is recommended when the need for high throughput requires the operation of a relatively large number of AMRs in relation to the size of the space in which the robots operate.<\/jats:p>","DOI":"10.3390\/s21144809","type":"journal-article","created":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T10:13:42Z","timestamp":1626257622000},"page":"4809","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems"],"prefix":"10.3390","volume":"21","author":[{"given":"Dominik","family":"Kozjek","sequence":"first","affiliation":[{"name":"Faculty of Mechanical Engineering, University of Ljubljana, SI-1000 Ljubljana, Slovenia"}]},{"given":"Andreja","family":"Malus","sequence":"additional","affiliation":[{"name":"Faculty of Mechanical Engineering, University of Ljubljana, SI-1000 Ljubljana, Slovenia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1720-9145","authenticated-orcid":false,"given":"Rok","family":"Vrabi\u010d","sequence":"additional","affiliation":[{"name":"Faculty of Mechanical Engineering, University of Ljubljana, SI-1000 Ljubljana, Slovenia"}]}],"member":"1968","published-online":{"date-parts":[[2021,7,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/TSSC.1968.300136","article-title":"Formal Basis for the Heuristic Determination of Minimum Cost Path","volume":"4","author":"Hart","year":"1968","journal-title":"IEEE Trans. Syst. Sci. Cybern."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1109\/70.508439","article-title":"Probabilistic roadmaps for path planning in high-dimensional configuration spaces","volume":"12","author":"Kavraki","year":"1996","journal-title":"IEEE Trans. Robot. Autom."},{"key":"ref_3","unstructured":"LaValle, S.M. (1998). Rapidly-Exploring Random Trees: A New Tool for Path Planning. Technical Report No. 98\u201311, Available online: https:\/\/www.cs.csustan.edu\/~xliang\/Courses\/CS4710-21S\/Papers\/06%20RRT.pdf."},{"key":"ref_4","unstructured":"Khatib, O. (1985, January 25\u201328). Real-time obstacle avoidance for manipulators and mobile robots. Proceedings of the 1985 IEEE International Conference on Robotics and Automation, St. Louis, MO, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1504\/IJVAS.2011.038177","article-title":"Coordination of Industrial AGVs","volume":"9","author":"Olmi","year":"2011","journal-title":"Int. J. Veh. Auton. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Erdmann, M., and Lozano-Perez, T. (1986, January 7\u201310). On Multiple Moving Objects. Proceedings of the 1986 IEEE International Conference on Robotics and Automation, San Francisco, CA, USA.","DOI":"10.1109\/ROBOT.1986.1087401"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Pecora, F., Cirillo, M., and Dimitrov, D. (2012, January 7\u201312). On mission-dependent coordination of multiple vehicles under spatial and temporal constraints. Proceedings of the 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systemsm, Vilamoura-Algarve, Portugal.","DOI":"10.1109\/IROS.2012.6385862"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Uttendorf, S., Eilert, B., and Overmeyer, L. (2016, January 4\u20137). A fuzzy logic expert system for the automated generation of roadmaps for automated guided vehicle systems. Proceedings of the 2016 IEEE International Conference on Industrial Engineering and Engineering Management, Bali, Indonesia.","DOI":"10.1109\/IEEM.2016.7798023"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Digani, V., Sabattini, L., Secchi, C., and Fantuzzi, C. (2014, January 14\u201318). An automatic approach for the generation of the roadmap for multi-AGV systems in an industrial environment. Proceedings of the 2014 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA.","DOI":"10.1109\/IROS.2014.6942789"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kleiner, K., Sun, D., and Meyer-Delius, D. (2011, January 25\u201330). ARMO: Adaptive road map optimization for large robot teams. Proceedings of the 2011 IEEE\/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA.","DOI":"10.1109\/IROS.2011.6094734"},{"key":"ref_11","unstructured":"Henkel, C., and Toussaint, M. (April, January 30). Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent. Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic."},{"key":"ref_12","unstructured":"Xiao, X., Liu, B., Warnell, G., and Stone, P. (2020). Motion Control for Mobile Robot Navigation Using Machine Learning: A Survey. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Faust, A., Oslund, K., Ramirez, O., Francis, A., Tapia, L., Fiser, M., and Davidson, J. (2018, January 21\u201326). PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. Proceedings of the ICRA 2018\u2014IEEE International Conference on Robotics and Automation, Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461096"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Gao, J., Ye, W., Guo, J., and Li, Z. (2020). Deep reinforcement learning for indoor mobile robot path planning. Sensors, 20.","DOI":"10.3390\/s20195493"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2378","DOI":"10.1109\/LRA.2019.2903261","article-title":"PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning","volume":"4","author":"Sartoretti","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liu, Z., Chen, B., Zhou, H., Koushik, G., Hebert, M., and Zhao, D. (2020, January 25\u201329). MAPPER: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments. Proceedings of the IROS 2020\u2014International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA.","DOI":"10.1109\/IROS45743.2020.9340876"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Bae, H., Kim, G., Kim, J., Qian, D., and Lee, S. (2019). Multi-robot path planning method using reinforcement learning. Appl. Sci., 9.","DOI":"10.3390\/app9153057"},{"key":"ref_18","unstructured":"(2021, March 08). MiR100. Available online: https:\/\/www.mobile-industrial-robots.com\/en\/solutions\/robots\/mir100\/."},{"key":"ref_19","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_20","unstructured":"Stable Baselines (2021, April 20). GitHub Repository 2018. Available online: https:\/\/github.com\/hill-a\/stable-baselines."},{"key":"ref_21","unstructured":"(2020, October 30). ROS&Gazebo MiR100 Simulation. Available online: https:\/\/github.com\/dfki-ric\/mir_robot."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/14\/4809\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:30:33Z","timestamp":1760164233000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/14\/4809"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,14]]},"references-count":21,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["s21144809"],"URL":"https:\/\/doi.org\/10.3390\/s21144809","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,14]]}}}