{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T14:33:49Z","timestamp":1781102029984,"version":"3.54.1"},"reference-count":34,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2021,8,15]],"date-time":"2021-08-15T00:00:00Z","timestamp":1628985600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Adaptive and highly synchronized supply chains can avoid a cascading rise-and-fall inventory dynamic and mitigate ripple effects caused by operational failures. This paper aims to demonstrate how a deep reinforcement learning agent based on the proximal policy optimization algorithm can synchronize inbound and outbound flows and support business continuity operating in the stochastic and nonstationary environment if end-to-end visibility is provided. The deep reinforcement learning agent is built upon the Proximal Policy Optimization algorithm, which does not require hardcoded action space and exhaustive hyperparameter tuning. These features, complimented with a straightforward supply chain environment, give rise to a general and task unspecific approach to adaptive control in multi-echelon supply chains. The proposed approach is compared with the base-stock policy, a well-known method in classic operations research and inventory control theory. The base-stock policy is prevalent in continuous-review inventory systems. The paper concludes with the statement that the proposed solution can perform adaptive control in complex supply chains. The paper also postulates fully fledged supply chain digital twins as a necessary infrastructural condition for scalable real-world applications.<\/jats:p>","DOI":"10.3390\/a14080240","type":"journal-article","created":{"date-parts":[[2021,8,15]],"date-time":"2021-08-15T21:43:55Z","timestamp":1629063835000},"page":"240","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":50,"title":["Adaptive Supply Chain: Demand\u2013Supply Synchronization Using Deep Reinforcement Learning"],"prefix":"10.3390","volume":"14","author":[{"given":"Zhandos","family":"Kegenbekov","sequence":"first","affiliation":[{"name":"Faculty of Engineering and Information Technology, Kazakh-German University, Pushkin 111, Almaty 050010, Kazakhstan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7457-6040","authenticated-orcid":false,"given":"Ilya","family":"Jackson","sequence":"additional","affiliation":[{"name":"Center for Transportation & Logistics, Massachusetts Institute of Technology, 1 Amherst Street, Cambridge, MA 02142, USA"},{"name":"Faculty of Engineering, Transport and Telecommunication Institute, Lomonosova 1, LV-1019 Riga, Latvia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,8,15]]},"reference":[{"key":"ref_1","unstructured":"Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep Learning, MIT Press. [2nd ed.]."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Mosavi, A., Faghan, Y., Ghamisi, P., Duan, P., Ardabili, S.F., Salwana, E., and Band, S.S. (2020). Comprehensive review of deep reinforcement learning methods and applications in economics. Mathematics, 8.","DOI":"10.31226\/osf.io\/53esy"},{"key":"ref_3","first-page":"363","article-title":"Autonomous inverted helicopter flight via reinforcement learning","volume":"9","author":"Ng","year":"2006","journal-title":"Exp. Robot."},{"key":"ref_4","unstructured":"Fridman, L., Terwilliger, J., and Jenik, B. (2018). Deep traffic: Crowdsourced hyperparameter tuning of deep reinforcement learning systems for multi-agent dense traffic navigation. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Isele, D., Rahimi, R., Cosgun, A., Subramanian, K., and Fujimura, K. (2018, January 21\u201325). Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461233"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2017), Singapore.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1126\/science.aar6404","article-title":"A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play","volume":"362","author":"Silver","year":"2018","journal-title":"Science"},{"key":"ref_8","unstructured":"Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., and Hesse, C. (2019). Dota 2 with large scale deep reinforcement learning. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"ref_10","unstructured":"Guss, W.H., Codel, C., Hofmann, K., Houghton, B., Kuno, N., Milani, S., Mohanty, S.P., Liebana, D.P., Salakhutdinov, R., and Topin, N. (2019). The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Perez, H.D., Hubbs, C.D., Li, C., and Grossmann, I.E. (2021). Algorithmic Approaches to Inventory Management Optimization. Processes, 9.","DOI":"10.3390\/pr9010102"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"546","DOI":"10.1287\/mnsc.43.4.546","article-title":"Information distortion in a supply chain: The bullwhip effect","volume":"43","author":"Lee","year":"1997","journal-title":"Manag. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1016\/j.ejor.2020.09.053","article-title":"Ripple effect in the supply chain network: Forward and backward disruption propagation, network health and firm vulnerability","volume":"291","author":"Li","year":"2021","journal-title":"Eur. J. Oper. Res."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1080\/09537287.2020.1768450","article-title":"A digital supply chain twin for managing the disruption risks and resilience in the era of Industry 4.0","volume":"32","author":"Ivanov","year":"2020","journal-title":"Prod. Plan. Control"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.engappai.2014.09.004","article-title":"Designing of an intelligent self-adaptive model for supply chain ordering management system","volume":"37","author":"Mortazavi","year":"2015","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_16","unstructured":"Barat, S., Khadilkar, H., Meisheri, H., Kulkarni, V., Baniwal, V., Kumar, P., and Gajrani, M. (2019, January 13\u201317). Actor based simulation for closed loop control of supply chain using reinforcement learning. Proceedings of the 18th International Conference on Autonomous Agents and Multi Agent Systems 2019, Montreal, QC, Canada."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Hemberg, E., Derbinsky, N., Mata, G., and O\u2019Reilly, U.M. (2021, January 10\u201314). Simulating a Logistics Enterprise Using an Asymmetrical Wargame Simulation with Soar Reinforcement Learning and Coevolutionary Algorithms. Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO\u2019 21), Lille, France.","DOI":"10.1145\/3449726.3463172"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"6643131","DOI":"10.1155\/2021\/6643131","article-title":"Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning","volume":"2021","author":"Wang","year":"2021","journal-title":"Complexity"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Oroojlooyjadid, A., Nazari, M., Snyder, L.V., and Takac, M. (2021). A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization. Manuf. Serv. Oper. Manag., 1\u201320.","DOI":"10.1287\/msom.2020.0939"},{"key":"ref_20","unstructured":"Hubbs, C., Perez, H.D., Sarwar, O., Sahinidis, N.V., Grossmann, I.E., and Wassick, J.M. (2020). OR-Gym: A Reinforcement Learning Library for Operations Research Problem. arXiv."},{"key":"ref_21","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_22","first-page":"679","article-title":"A Markovian decision process","volume":"6","author":"Bellman","year":"1957","journal-title":"J. Math. Mech."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1287\/mnsc.41.2.263","article-title":"Sensitivity analysis for base-stock levels in multiechelon production-inventory systems","volume":"41","author":"Glasserman","year":"1995","journal-title":"Manag. Sci."},{"key":"ref_24","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_25","unstructured":"Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., and Jordan, M.I. (2018, January 8\u201310). Ray: A distributed framework for emerging AI applications. Proceedings of the 13th Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kapuscinski, R., and Tayur, S. (1999). Optimal policies and simulation-based optimization for capacitated production inventory systems. Quantitative Models for Supply Chain Management, Springer.","DOI":"10.1007\/978-1-4615-4949-9_2"},{"key":"ref_27","unstructured":"(2021, August 14). Executable Colab Notebook: Google Colaboratory. Available online: https:\/\/colab.research.google.com\/drive\/1D_E1j10skbohOOA4vbuy9OCw4dqbAW_S?usp=sharing."},{"key":"ref_28","first-page":"191","article-title":"Review of Inventory Control Models: A Classification Based on Methods of Obtaining Optimal Control Parameters","volume":"21","author":"Jackson","year":"2020","journal-title":"Transp. Telecommun."},{"key":"ref_29","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). Openai gym. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Schuderer, A., Bromuri, S., and van Eekelen, M. (2021). Sim-Env: Decoupling OpenAI Gym Environments from Simulation Models. arXiv.","DOI":"10.1007\/978-3-030-85739-4_39"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Saifutdinov, F., Jackson, I., Tolujevs, J., and Zmanovska, T. (2020, January 15\u201316). Digital Twin as a Decision Support Tool for Airport Traffic Control. Proceedings of the 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS), Riga, Latvia.","DOI":"10.1109\/ITMS51158.2020.9259294"},{"key":"ref_32","unstructured":"Shao, K., Tang, Z., Zhu, Y., Li, N., and Zhao, D. (2019). A survey of deep reinforcement learning in video games. arXiv."},{"key":"ref_33","unstructured":"Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., and van Hasselt, H. (February, January 27). Multi-task deep reinforcement learning with popart. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_34","unstructured":"Teh, Y.W., Bapst, V., Czarnecki, W., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N., and Pascanu, R. (2017). Distral: Robust multitask reinforcement learning. arXiv."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/8\/240\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:46:22Z","timestamp":1760165182000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/8\/240"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,15]]},"references-count":34,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2021,8]]}},"alternative-id":["a14080240"],"URL":"https:\/\/doi.org\/10.3390\/a14080240","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,15]]}}}