{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T16:48:27Z","timestamp":1776358107223,"version":"3.51.2"},"reference-count":59,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,8,27]],"date-time":"2023-08-27T00:00:00Z","timestamp":1693094400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Commission","award":["SUN HORIZON-CL4-2022-HUMAN-01-14-101092612"],"award-info":[{"award-number":["SUN HORIZON-CL4-2022-HUMAN-01-14-101092612"]}]},{"name":"European Commission","award":["AIDEAS HORIZON-CL4-2021-TWIN-TRANSITION-01-07-101057294"],"award-info":[{"award-number":["AIDEAS HORIZON-CL4-2021-TWIN-TRANSITION-01-07-101057294"]}]},{"name":"European Commission","award":["2020-DI-116"],"award-info":[{"award-number":["2020-DI-116"]}]},{"name":"European Commission","award":["INVEST\/2022\/342"],"award-info":[{"award-number":["INVEST\/2022\/342"]}]},{"name":"Industrial Doctorate Program of the Catalan Government","award":["SUN HORIZON-CL4-2022-HUMAN-01-14-101092612"],"award-info":[{"award-number":["SUN HORIZON-CL4-2022-HUMAN-01-14-101092612"]}]},{"name":"Industrial Doctorate Program of the Catalan Government","award":["AIDEAS HORIZON-CL4-2021-TWIN-TRANSITION-01-07-101057294"],"award-info":[{"award-number":["AIDEAS HORIZON-CL4-2021-TWIN-TRANSITION-01-07-101057294"]}]},{"name":"Industrial Doctorate Program of the Catalan Government","award":["2020-DI-116"],"award-info":[{"award-number":["2020-DI-116"]}]},{"name":"Industrial Doctorate Program of the Catalan Government","award":["INVEST\/2022\/342"],"award-info":[{"award-number":["INVEST\/2022\/342"]}]},{"name":"Investigo Program of the Generalitat Valenciana","award":["SUN HORIZON-CL4-2022-HUMAN-01-14-101092612"],"award-info":[{"award-number":["SUN HORIZON-CL4-2022-HUMAN-01-14-101092612"]}]},{"name":"Investigo Program of the Generalitat Valenciana","award":["AIDEAS HORIZON-CL4-2021-TWIN-TRANSITION-01-07-101057294"],"award-info":[{"award-number":["AIDEAS HORIZON-CL4-2021-TWIN-TRANSITION-01-07-101057294"]}]},{"name":"Investigo Program of the Generalitat Valenciana","award":["2020-DI-116"],"award-info":[{"award-number":["2020-DI-116"]}]},{"name":"Investigo Program of the Generalitat Valenciana","award":["INVEST\/2022\/342"],"award-info":[{"award-number":["INVEST\/2022\/342"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The use of simulation and reinforcement learning can be viewed as a flexible approach to aid managerial decision-making, particularly in the face of growing complexity in manufacturing and logistic systems. Efficient supply chains heavily rely on steamlined warehouse operations, and therefore, having a well-informed storage location assignment policy is crucial for their improvement. The traditional methods found in the literature for tackling the storage location assignment problem have certain drawbacks, including the omission of stochastic process variability or the neglect of interaction between various warehouse workers. In this context, we explore the possibilities of combining simulation with reinforcement learning to develop effective mechanisms that allow for the quick acquisition of information about a complex environment, the processing of that information, and then the decision-making about the best storage location assignment. In order to test these concepts, we will make use of the FlexSim commercial simulator.<\/jats:p>","DOI":"10.3390\/a16090408","type":"journal-article","created":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T05:46:47Z","timestamp":1693201607000},"page":"408","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["A Hybrid Simulation and Reinforcement Learning Algorithm for Enhancing Efficiency in Warehouse Operations"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2521-9207","authenticated-orcid":false,"given":"Jonas F.","family":"Leon","sequence":"first","affiliation":[{"name":"Department of Computer Science, Multimedia and Telecommunication, Universitat Oberta de Catalunya, 08018 Barcelona, Spain"},{"name":"Spindox Espa\u00f1a S.L., Calle Muntaner 305, 08021 Barcelona, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8031-6555","authenticated-orcid":false,"given":"Yuda","family":"Li","sequence":"additional","affiliation":[{"name":"Research Center on Production Management and Engineering, Universitat Polit\u00e8cnica de Val\u00e8ncia, Plaza Ferrandiz-Salvador, 03801 Alcoy, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4182-0120","authenticated-orcid":false,"given":"Xabier A.","family":"Martin","sequence":"additional","affiliation":[{"name":"Research Center on Production Management and Engineering, Universitat Polit\u00e8cnica de Val\u00e8ncia, Plaza Ferrandiz-Salvador, 03801 Alcoy, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8425-1381","authenticated-orcid":false,"given":"Laura","family":"Calvet","sequence":"additional","affiliation":[{"name":"Department of Telecommunications & Systems Engineering, Universitat Aut\u00f2noma de Barcelona, 08202 Sabadell, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3793-3328","authenticated-orcid":false,"given":"Javier","family":"Panadero","sequence":"additional","affiliation":[{"name":"Department of Computer Architecture & Operating Systems, Universitat Aut\u00f2noma de Barcelona, 08193 Bellaterra, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1392-1776","authenticated-orcid":false,"given":"Angel A.","family":"Juan","sequence":"additional","affiliation":[{"name":"Research Center on Production Management and Engineering, Universitat Polit\u00e8cnica de Val\u00e8ncia, Plaza Ferrandiz-Salvador, 03801 Alcoy, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2023,8,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"103123","DOI":"10.1016\/j.compind.2019.08.004","article-title":"Modeling and Simulation in Intelligent Manufacturing","volume":"112","author":"Zhang","year":"2019","journal-title":"Comput. Ind."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Leon, J.F., Li, Y., Peyman, M., Calvet, L., and Juan, A.A. (2023). A Discrete-Event Simheuristic for Solving a Realistic Storage Location Assignment Problem. Mathematics, 11.","DOI":"10.3390\/math11071577"},{"key":"ref_3","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"102712","DOI":"10.1016\/j.tre.2022.102712","article-title":"Reinforcement Learning for Logistics and Supply Chain Management: Methodologies, State of the Art, and Future Opportunities","volume":"162","author":"Yan","year":"2022","journal-title":"Transp. Res. Part E Logist. Transp. Rev."},{"key":"ref_5","unstructured":"Chick, S., S\u00e1nchez, P.J., Ferrin, D., and Morrice, D.J. (2002, January 8\u201311). FlexSim Simulation Environment. Proceedings of the Winter Simulation Conference, San Diego, CA, USA."},{"key":"ref_6","unstructured":"Van Rossum, G., and Drake, F.L. (2009). Python 3 Reference Manual, CreateSpace."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Leon, J.F., Marone, P., Peyman, M., Li, Y., Calvet, L., Dehghanimohammadabadi, M., and Juan, A.A. (2022, January 11\u201314). A Tutorial on Combining Flexsim With Python for Developing Discrete-Event Simheuristics. Proceedings of the 2022 Winter Simulation Conference (WSC), Singapore.","DOI":"10.1109\/WSC57314.2022.10015309"},{"key":"ref_8","first-page":"199","article-title":"The Storage Location Assignment Problem: A Literature Review","volume":"10","author":"Reyes","year":"2019","journal-title":"Int. J. Ind. Eng. Comput."},{"key":"ref_9","first-page":"1129","article-title":"Emergence of field intelligence for autonomous block agents in the automatic warehouse","volume":"9","author":"Kinoshita","year":"1999","journal-title":"Intell. Eng. Syst. Through Artif. Neural Netw."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"717","DOI":"10.1080\/00207720310001640755","article-title":"A simulation-based approach to study stochastic inventory-planning games","volume":"34","author":"Rao","year":"2003","journal-title":"Int. J. Syst. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Yan, W., Lin, C., and Pang, S. (2010, January 1\u20135). The Optimized Reinforcement Learning Approach to Run-Time Scheduling in Data Center. Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing, Nanjing, China.","DOI":"10.1109\/GCC.2010.22"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1002\/nav.21481","article-title":"A least squares temporal difference actor-critic algorithm with applications to warehouse management","volume":"59","author":"Estanjini","year":"2012","journal-title":"Nav. Res. Logist."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Dou, J., Chen, C., and Yang, P. (2015). Genetic Scheduling and Reinforcement Learning in Multirobot Systems for Intelligent Warehouses. Math. Probl. Eng., 2015.","DOI":"10.1155\/2015\/597956"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Rabe, M., and Dross, F. (2015, January 6\u20139). A reinforcement learning approach for a decision support system for logistics networks. Proceedings of the 2015 Winter Simulation Conference (WSC), Huntington Beach, CA, USA.","DOI":"10.1109\/WSC.2015.7408317"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, Z., Chen, C., Li, H.X., Dong, D., and Tarn, T.J. (2016, January 12\u201315). A novel incremental learning scheme for reinforcement learning in dynamic environments. Proceedings of the 2016 12th World Congress on Intelligent Control and Automation (WCICA), Guilin, China.","DOI":"10.1109\/WCICA.2016.7578530"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Drakaki, M., and Tzionas, P. (2017). Manufacturing scheduling using Colored Petri Nets and reinforcement learning. Appl. Sci., 7.","DOI":"10.3390\/app7020136"},{"key":"ref_17","first-page":"7","article-title":"Activation and spreading sequence for spreading activation policy selection method in transfer reinforcement learning","volume":"10","author":"Kono","year":"2019","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Li, M.P., Sankaran, P., Kuhl, M.E., Ganguly, A., Kwasinski, A., and Ptucha, R. (2018, January 9\u201312). Simulation analysis of a deep reinforcement learning approach for task selection by autonomous material handling vehicles. Proceedings of the 2018 Winter Simulation Conference (WSC), Gothenburg, Sweden.","DOI":"10.1109\/WSC.2018.8632448"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2378","DOI":"10.1109\/LRA.2019.2903261","article-title":"PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning","volume":"4","author":"Sartoretti","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, M.P., Sankaran, P., Kuhl, M.E., Ptucha, R., Ganguly, A., and Kwasinski, A. (2019, January 8\u201311). Task Selection by Autonomous Mobile Robots in a Warehouse Using Deep Reinforcement Learning. Proceedings of the 2019 Winter Simulation Conference, National Harbor, MD, USA.","DOI":"10.1109\/WSC40007.2019.9004792"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1007\/978-3-030-60843-9_3","article-title":"Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation","volume":"Volume 12025 LNAI","author":"Barat","year":"2020","journal-title":"International Workshop on Multi-Agent Systems and Agent-Based Simulation"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sun, Y., and Li, H. (2020, January 1\u201310). An end-to-end reinforcement learning method for automated guided vehicle path planning. Proceedings of the International Symposium on Artificial Intelligence and Robotics 2020, Kitakyushu, Japan.","DOI":"10.1117\/12.2579792"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Xiao, Y., Hoffman, J., Xia, T., and Amato, C. (August, January 31). Learning multi-robot decentralized macro-action-based policies via a centralized q-net. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9196684"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1049\/trit.2020.0024","article-title":"Multi-robot path planning based on a deep reinforcement learning DQN algorithm","volume":"5","author":"Yang","year":"2020","journal-title":"CAAI Trans. Intell. Technol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ushida, Y., Razan, H., Sakuma, T., and Kato, S. (2020, January 13\u201316). Policy Transfer from Simulation to Real World for Autonomous Control of an Omni Wheel Robot. Proceedings of the 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE), Kobe, Japan.","DOI":"10.1109\/GCCE50665.2020.9291969"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Shen, G., Ma, R., Tang, Z., and Chang, L. (2021, January 26\u201327). A deep reinforcement learning algorithm for warehousing multi-agv path planning. Proceedings of the 2021 International Conference on Networking, Communications and Information Technology (NetCIT), Manchester, UK.","DOI":"10.1109\/NetCIT54147.2021.00090"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Newaz, A.A.R., and Alam, T. (2021, January 5\u201317). Hierarchical Task and Motion Planning through Deep Reinforcement Learning. Proceedings of the 2021 Fifth IEEE International Conference on Robotic Computing (IRC), Taichung, Taiwan.","DOI":"10.1109\/IRC52146.2021.00023"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Peyas, I.S., Hasan, Z., Tushar, M.R.R., Musabbir, A., Azni, R.M., and Siddique, S. (2021, January 7\u201310). Autonomous Warehouse Robot using Deep Q-Learning. Proceedings of the TENCON 2021\u20132021 IEEE Region 10 Conference (TENCON), Auckland, New Zealand.","DOI":"10.1109\/TENCON54134.2021.9707256"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ha, W.Y., Cui, L., and Jiang, Z.P. (2021, January 6\u201310). A Warehouse Scheduling Using Genetic Algorithm and Collision Index. Proceedings of the 2021 20th International Conference on Advanced Robotics (ICAR), Ljubljana, Slovenia.","DOI":"10.1109\/ICAR53236.2021.9659439"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Liu, S., Wen, L., Cui, J., Yang, X., Cao, J., and Liu, Y. (October, January 27). Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic.","DOI":"10.1109\/IROS51168.2021.9636224"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ushida, Y., Razan, H., Sakuma, T., and Kato, S. (2021, January 12\u201315). Omnidirectional Mobile Robot Path Finding Using Deep Deterministic Policy Gradient for Real Robot Control. Proceedings of the 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), Kyoto, Japan.","DOI":"10.1109\/GCCE53005.2021.9621943"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Lee, H., and Jeong, J. (2021). Mobile Robot Path Optimization Technique Based on Reinforcement Learning Algorithm in Warehouse Environment. Appl. Sci., 11.","DOI":"10.3390\/app11031209"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"42568","DOI":"10.1109\/ACCESS.2021.3062457","article-title":"A Novel Hierarchical Soft Actor-Critic Algorithm for Multi-Logistics Robots Task Allocation","volume":"9","author":"Tang","year":"2021","journal-title":"IEEE Access"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"579","DOI":"10.23940\/ijpe.21.07.p2.579588","article-title":"Kubernetes virtual warehouse placement based on reinforcement learning","volume":"17","author":"Li","year":"2021","journal-title":"Int. J. Perform. Eng."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Ren, J., and Huang, X. (August, January 30). Potential Fields Guided Deep Reinforcement Learning for Optimal Path Planning in a Warehouse. Proceedings of the 2021 IEEE 7th International Conference on Control Science and Systems Engineering (ICCSSE), Qingdao, China.","DOI":"10.1109\/ICCSSE52761.2021.9545167"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Balachandran, A., Lal, A., and Sreedharan, P. (2022, January 16\u201317). Autonomous Navigation of an AMR using Deep Reinforcement Learning in a Warehouse Environment. Proceedings of the 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), Mysuru, India.","DOI":"10.1109\/MysuruCon55714.2022.9971804"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Arslan, B., and Ekren, B.Y. (2022). Transaction selection policy in tier-to-tier SBSRS by using Deep Q-Learning. Int. J. Prod. Res.","DOI":"10.1080\/00207543.2022.2148767"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lewis, T., Ibarra, A., and Jamshidi, M. (2022, January 11\u201315). Object Detection-Based Reinforcement Learning for Autonomous Point-to-Point Navigation. Proceedings of the 2022 World Automation Congress (WAC), San Antonio, TX, USA.","DOI":"10.23919\/WAC55640.2022.9934448"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Ho, T.M., Nguyen, K.K., and Cheriet, M. (2022). Federated Deep Reinforcement Learning for Task Scheduling in Heterogeneous Autonomous Robotic System. IEEE Trans. Autom. Sci. Eng., 1\u201313.","DOI":"10.1109\/TASE.2022.3221352"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zhou, L., Lin, C., Ma, Q., and Cao, Z. (2022, January 20\u201324). A Learning-based Iterated Local Search Algorithm for Order Batching and Sequencing Problems. Proceedings of the 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico.","DOI":"10.1109\/CASE49997.2022.9926486"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Cestero, J., Quartulli, M., Metelli, A.M., and Restelli, M. (2022, January 18\u201323). Storehouse: A reinforcement learning environment for optimizing warehouse management. Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy.","DOI":"10.1109\/IJCNN55064.2022.9891985"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"100478","DOI":"10.1109\/ACCESS.2022.3206537","article-title":"MARL-Based Cooperative Multi-AGV Control in Warehouse Systems","volume":"10","author":"Choi","year":"2022","journal-title":"IEEE Access"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Elkunchwar, N., Iyer, V., Anderson, M., Balasubramanian, K., Noe, J., Talwekar, Y., and Fuller, S. (2022, January 21\u201324). Bio-inspired source seeking and obstacle avoidance on a palm-sized drone. Proceedings of the 2022 International Conference on Unmanned Aircraft Systems (ICUAS), Dubrovnik, Croatia.","DOI":"10.1109\/ICUAS54217.2022.9836062"},{"key":"ref_44","first-page":"5510749","article-title":"Research on Hybrid Real-Time Picking Routing Optimization Based on Multiple Picking Stations","volume":"2022","author":"Wang","year":"2022","journal-title":"Math. Probl. Eng."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Ekren, B.Y., and Arslan, B. (2022). A reinforcement learning approach for transaction scheduling in a shuttle-based storage and retrieval system. Int. Trans. Oper. Res.","DOI":"10.1111\/itor.13135"},{"key":"ref_46","first-page":"8750580","article-title":"Research on Fresh Product Logistics Transportation Scheduling Based on Deep Reinforcement Learning","volume":"2022","author":"Yu","year":"2022","journal-title":"Sci. Program."},{"key":"ref_47","first-page":"3277","article-title":"A DQN-based cache strategy for mobile edge networks","volume":"71","author":"Sun","year":"2022","journal-title":"Comput. Mater. Contin."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1007\/s10015-021-00713-y","article-title":"Using sim-to-real transfer learning to close gaps between simulation and real environments through reinforcement learning","volume":"27","author":"Ushida","year":"2022","journal-title":"Artif. Life Robot."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Lee, H., Hong, J., and Jeong, J. (2022). MARL-Based Dual Reward Model on Segmented Actions for Multiple Mobile Robots in Automated Warehouse Environment. Appl. Sci., 12.","DOI":"10.3390\/app12094703"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"9138","DOI":"10.1109\/JIOT.2021.3093346","article-title":"Toward Deep Q-Network-Based Resource Allocation in Industrial Internet of Things","volume":"9","author":"Liang","year":"2022","journal-title":"IEEE Internet Things J."},{"key":"ref_51","first-page":"4916127","article-title":"Intelligent Path Planning for AGV-UAV Transportation in 6G Smart Warehouse","volume":"2023","author":"Guo","year":"2023","journal-title":"Mob. Inf. Syst."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1109\/TASE.2022.3168621","article-title":"Unified Automatic Control of Vehicular Systems with Reinforcement Learning","volume":"20","author":"Yan","year":"2023","journal-title":"IEEE Trans. Autom. Sci. Eng."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"106280","DOI":"10.1016\/j.asoc.2020.106280","article-title":"A Learnheuristic Approach for the Team Orienteering Problem with Aerial Drone Motion Constraints","volume":"92","author":"Bayliss","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Beysolow, T. (2019). Applied Reinforcement Learning with Python: With OpenAI Gym, Tensorflow, and Keras, Apress.","DOI":"10.1007\/978-1-4842-5127-0"},{"key":"ref_55","first-page":"1","article-title":"Stable-Baselines3: Reliable Reinforcement Learning Implementations","volume":"22","author":"Raffin","year":"2021","journal-title":"J. Mach. Learn. Res."},{"key":"ref_56","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 19\u201324). Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the International Conference on Machine Learning. PMLR, New York, NY, USA."},{"key":"ref_57","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_58","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing Atari With Deep Reinforcement Learning. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"102089","DOI":"10.1016\/j.simpat.2020.102089","article-title":"Speeding Up Computational Times in Simheuristics Combining Genetic Algorithms with Discrete-Event Simulation","volume":"103","author":"Rabe","year":"2020","journal-title":"Simul. Model. Pract. Theory"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/9\/408\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:40:18Z","timestamp":1760128818000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/9\/408"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,27]]},"references-count":59,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["a16090408"],"URL":"https:\/\/doi.org\/10.3390\/a16090408","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,27]]}}}