{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T18:52:16Z","timestamp":1781203936466,"version":"3.54.1"},"reference-count":54,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2022,5,28]],"date-time":"2022-05-28T00:00:00Z","timestamp":1653696000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Open Access Publication Fund of the University of Duisburg-Essen"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The high number of devices with limited computational resources as well as limited communication resources are two characteristics of the Industrial Internet of Things (IIoT). With Industry 4.0 emerges a strong demand for data processing in the edge, constrained primarily by the limited available resources. In industry, deep reinforcement learning (DRL) is increasingly used in robotics, job shop scheduling and supply chain. In this work, DRL is applied for intelligent resource allocation for industrial edge devices. An optimal usage of available resources of the IIoT devices should be achieved. Due to the structure of IIoT systems as well as security aspects, multi-agent systems (MASs) are preferred for decentralized decision-making. In our study, we build a network from physical and virtualized representative IIoT devices. The proposed approach is capable of dealing with several dynamic changes of the target system. Three aspects are considered when evaluating the performance of the MASs: overhead due to the MASs, improvement of the resource usage of the devices as well as latency and error rate. In summary, the agents\u2019 resource usage with respect to traffic, computing resources and time is very low. It was confirmed that the agents not only achieve the desired results in training but also that the learned behavior is transferable to a real system.<\/jats:p>","DOI":"10.3390\/s22114099","type":"journal-article","created":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T02:30:06Z","timestamp":1653964206000},"page":"4099","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":33,"title":["Deep Reinforcement Learning Multi-Agent System for Resource Allocation in Industrial Internet of Things"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7207-6077","authenticated-orcid":false,"given":"Julia","family":"Rosenberger","sequence":"first","affiliation":[{"name":"Bosch Rexroth AG, Automation and Electrification Solutions, 97816 Lohr am Main, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael","family":"Urlaub","sequence":"additional","affiliation":[{"name":"Bosch Rexroth AG, Automation and Electrification Solutions, 97816 Lohr am Main, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Felix","family":"Rauterberg","sequence":"additional","affiliation":[{"name":"Bosch Rexroth AG, Automation and Electrification Solutions, 97816 Lohr am Main, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tina","family":"Lutz","sequence":"additional","affiliation":[{"name":"Bosch Rexroth AG, Automation and Electrification Solutions, 97816 Lohr am Main, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andreas","family":"Selig","sequence":"additional","affiliation":[{"name":"Bosch Rexroth AG, Automation and Electrification Solutions, 97816 Lohr am Main, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael","family":"B\u00fchren","sequence":"additional","affiliation":[{"name":"Westf\u00e4lische Hochschule, 46395 Bocholt, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7945-1853","authenticated-orcid":false,"given":"Dieter","family":"Schramm","sequence":"additional","affiliation":[{"name":"Faculty of Engineering, University of Duisburg-Essen, 47057 Duisburg, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hermann, M., Pentek, T., and Otto, B. (2016, January 5\u20138). Design Principles for Industrie 4.0 Scenarios. Proceedings of the 2016 49th Hawaii International Conference on System Sciences (HICSS), Koloa, HI, USA.","DOI":"10.1109\/HICSS.2016.488"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1109\/JIOT.2016.2579198","article-title":"Edge Computing: Vision and Challenges","volume":"3","author":"Shi","year":"2016","journal-title":"IEEE Internet Things J."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3133","DOI":"10.1109\/COMST.2019.2916583","article-title":"Applications of Deep Reinforcement Learning in Communications and Networking: A Survey","volume":"21","author":"Luong","year":"2019","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"4925","DOI":"10.1109\/TII.2020.3028963","article-title":"Deep Reinforcement Learning-Based Dynamic Resource Management for Mobile Edge Computing in Industrial Internet of Things","volume":"17","author":"Chen","year":"2021","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Rosenberger, J., Urlaub, M., and Schramm, D. (2021, January 12\u201316). Multi-agent reinforcement learning for intelligent resource allocation in IIoT networks. Proceedings of the 2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT), Dubai, United Arab Emirates.","DOI":"10.1109\/GCAIoT53516.2021.9692913"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"107969","DOI":"10.1016\/j.comnet.2021.107969","article-title":"Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning","volume":"190","author":"Wang","year":"2021","journal-title":"Comput. Netw."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bakakeu, J., Kisskalt, D., Franke, J., Baer, S., Klos, H.H., and Peschke, J. (September, January 30). Multi-Agent Reinforcement Learning for the Energy Optimization of Cyber-Physical Production Systems. Proceedings of the 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), London, ON, Canada.","DOI":"10.1109\/CCECE47787.2020.9255795"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Roesch, M., Linder, C., Bruckdorfer, C., Hohmann, A., and Reinhart, G. (2019, January 25\u201327). Industrial Load Management using Multi-Agent Reinforcement Learning for Rescheduling. Proceedings of the 2019 Second International Conference on Artificial Intelligence for Industries (AI4I), Laguna Hills, CA, USA.","DOI":"10.1109\/AI4I46381.2019.00033"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Luo, S., Zhang, L., and Fan, Y. (2021). Real-Time Scheduling for Dynamic Partial-No-Wait Multiobjective Flexible Job Shop by Deep Reinforcement Learning. IEEE Trans. Autom. Sci. Eng., 1\u201319.","DOI":"10.1109\/TASE.2021.3104716"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1133","DOI":"10.1109\/JSAC.2020.2986615","article-title":"Resource Allocation Based on Deep Reinforcement Learning in IoT Edge Computing","volume":"38","author":"Xiong","year":"2020","journal-title":"IEEE J. Sel. Areas Commun."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1529","DOI":"10.1109\/TETC.2019.2902661","article-title":"Smart Resource Allocation for Mobile Edge Computing: A Deep Reinforcement Learning Approach","volume":"9","author":"Wang","year":"2021","journal-title":"IEEE Trans. Emerg. Top. Comput."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"220","DOI":"10.23919\/JCC.2020.09.017","article-title":"Multi-agent reinforcement learning for resource allocation in IoT networks with edge computing","volume":"17","author":"Liu","year":"2020","journal-title":"China Commun."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"4978","DOI":"10.1109\/TII.2020.3021024","article-title":"Deep Reinforcement Learning Based Computation Offloading in Fog Enabled Industrial Internet of Things","volume":"17","author":"Ren","year":"2021","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"6201","DOI":"10.1109\/JIOT.2020.2968951","article-title":"Multiagent Deep Reinforcement Learning for Joint Multichannel Access and Task Offloading of Mobile-Edge Computing in Industry 4.0","volume":"7","author":"Cao","year":"2020","journal-title":"IEEE Internet Things J."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"705","DOI":"10.14778\/3184470.3184474","article-title":"Model-Free Control for Distributed Stream Data Processing Using Deep Reinforcement Learning","volume":"11","author":"Li","year":"2018","journal-title":"Proc. VLDB Endow."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Russo, G.R., Nardelli, M., Cardellini, V., and Presti, F.L. (2018). Multi-Level Elasticity for Wide-Area Data Streaming Systems: A Reinforcement Learning Approach. Algorithms, 11.","DOI":"10.3390\/a11090134"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"3163","DOI":"10.1109\/TVT.2019.2897134","article-title":"Deep Reinforcement Learning Based Resource Allocation for V2V Communications","volume":"68","author":"Ye","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1828","DOI":"10.1109\/TVT.2019.2961405","article-title":"Multi-Agent Deep Reinforcement Learning Based Spectrum Allocation for D2D Underlay Communications","volume":"69","author":"Li","year":"2020","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2018","DOI":"10.1109\/TVT.2021.3134467","article-title":"Multi-Agent Driven Resource Allocation and Interference Management for Deep Edge Networks","volume":"71","author":"Gong","year":"2021","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Murudkar, C.V., and Gitlin, R.D. (2019, January 8\u20139). Optimal-Capacity, Shortest Path Routing in Self-Organizing 5G Networks using Machine Learning. Proceedings of the 2019 IEEE 20th Wireless and Microwave Technology Conference (WAMICON), Cocoa Beach, FL, USA.","DOI":"10.1109\/WAMICON.2019.8765434"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"102865","DOI":"10.1016\/j.jnca.2020.102865","article-title":"DRL-R: Deep reinforcement learning approach for intelligent routing in software-defined data-center networks","volume":"177","author":"Liu","year":"2021","journal-title":"J. Netw. Comput. Appl."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhang, W., Liu, T., Xie, M., Zhang, J., and Pan, C. (2021, January 7\u20139). SAC: A Novel Multi-hop Routing Policy in Hybrid Distributed IoT System based on Multi-agent Reinforcement Learning. Proceedings of the 2021 22nd International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA.","DOI":"10.1109\/ISQED51717.2021.9424255"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"You, X., Li, X., Xu, Y., Feng, H., and Zhao, J. (2019, January 3\u20137). Toward Packet Routing with Fully-distributed Multi-agent Deep Reinforcement Learning. Proceedings of the 2019 International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), Avignon, France.","DOI":"10.23919\/WiOPT47501.2019.9144110"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ding, R., Yang, Y., Liu, J., Li, H., and Gao, F. (2020, January 17\u201320). Packet Routing Against Network Congestion: A Deep Multi-agent Reinforcement Learning Approach. Proceedings of the 2020 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA.","DOI":"10.1109\/ICNC47757.2020.9049759"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3590","DOI":"10.1109\/JSAC.2016.2611964","article-title":"Dynamic Computation Offloading for Mobile-Edge Computing With Energy Harvesting Devices","volume":"34","author":"Mao","year":"2016","journal-title":"IEEE J. Sel. Areas Commun."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Jin, T., Ji, Z., Zhu, S., and Chen, C. (2021, January 21\u201323). Learning-based Co-Design of Distributed Edge Sensing and Transmission for Industrial Cyber-Physical Systems. Proceedings of the 2021 IEEE 19th International Conference on Industrial Informatics (INDIN), Mallorca, Spain.","DOI":"10.1109\/INDIN45523.2021.9557472"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5565","DOI":"10.1109\/TII.2019.2933867","article-title":"Learning-Based Energy-Efficient Resource Management by Heterogeneous RF\/VLC for Ultra-Reliable Low-Latency Industrial IoT Networks","volume":"16","author":"Yang","year":"2020","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"4189","DOI":"10.1109\/TII.2021.3124848","article-title":"QoS and Privacy-Aware Routing for 5G enabled Industrial Internet of Things: A Federated Reinforcement Learning Approach","volume":"18","author":"Wang","year":"2021","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2724","DOI":"10.1109\/TII.2021.3076393","article-title":"A Reinforcement Learning-Empowered Feedback Control System for Industrial Internet of Things","volume":"18","author":"Chen","year":"2022","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"107230","DOI":"10.1016\/j.comnet.2020.107230","article-title":"MARVEL: Enabling controller load balancing in software-defined networks with multi-agent reinforcement learning","volume":"177","author":"Sun","year":"2020","journal-title":"Comput. Netw."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1855","DOI":"10.1007\/s10586-017-0852-1","article-title":"A big data enabled load-balancing control for smart manufacturing of Industry 4.0","volume":"20","author":"Li","year":"2017","journal-title":"Cluster Comput."},{"key":"ref_32","unstructured":"Zhang, K., Yang, Z., and Basar, T. (2019). Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wiering, M., and van Otterlo, M. (2012). Reinforcement Learning, Springer.","DOI":"10.1007\/978-3-642-27645-3"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Frochte, J. (2018). Maschinelles Lernen, Carl Hanser Verlag.","DOI":"10.3139\/9783446457058"},{"key":"ref_35","unstructured":"Bertsekas, D.P. (2005). Dynamic Programming and Optimal Control, Athena Scientific. [3rd ed.]."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1073\/pnas.39.10.1095","article-title":"Stochastic Games","volume":"39","author":"Shapley","year":"1953","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_37","unstructured":"Terry, J.K., Black, B., Hari, A., Santos, L., Dieffendahl, C., Williams, N.L., Lokesh, Y., Horsch, C., and Ravi, P. (2020). PettingZoo: Gym for Multi-Agent Reinforcement Learning. arXiv."},{"key":"ref_38","unstructured":"Yang, Y., and Wang, J. (2020). An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective. arXiv."},{"key":"ref_39","unstructured":"Terry, J.K., Grammel, N., Black, B., Hari, A., Horsch, C., and Santos, L. (2020). Agent Environment Cycle Games. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1726","DOI":"10.1631\/FITEE.1900533","article-title":"Deep reinforcement learning: A survey","volume":"21","author":"Wang","year":"2020","journal-title":"Front. Inf. Technol. Electron. Eng."},{"key":"ref_41","unstructured":"Rosenberger, J., M\u00fcller, K., Selig, A., B\u00fchren, M., and Schramm, D. (2021, January 14\u201316). Extended kernel density estimation for anomaly detection in streaming data. Proceedings of the 2021 15th CIRP Conference on Intelligent Computation in Manufacturing Engineering, Virtual Event."},{"key":"ref_42","unstructured":"Rauterberg, F. (2022). Performance Vergleich von Datenkompressions Algorithmen Auf Industriellen Edge-Devices, Technische Hochschule Mittelhessen. Studienarbeit."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Rosenberger, J., Rauterberg, F., Selig, A., B\u00fchren, M., and Schramm, D. (2021, January 12\u201316). Perspective on Efficiency Enhancements in Processing Streaming Data in Industrial IoT Networks. Proceedings of the 2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT) (2021 IEEE GCAIoT), Dubai, United Arab Emirates.","DOI":"10.1109\/GCAIoT53516.2021.9693073"},{"key":"ref_44","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Deep Sets. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_45","unstructured":"Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R. (2015). Pointer Networks. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_46","unstructured":"Vinyals, O., Bengio, S., and Kudlur, M. (2016). Order Matters: Sequence to sequence for sets. arXiv."},{"key":"ref_47","unstructured":"Mao, H., Gong, Z., and Xiao, Z. (2020). Reward Design in Cooperative Multi-agent Reinforcement Learning for Packet Routing. arXiv."},{"key":"ref_48","unstructured":"(2021). IEC 63278-1 ED1\u2014Asset Administration Shell (AAS) for Industrial Applications\u2014Part 1: Asset Administration Shell Structure (Standard No. IEC 63278-1)."},{"key":"ref_49","unstructured":"Hoffmeister, M., Boss, B., Orzelski, A., and Wagner, J. (Atp Magazin, 2021). Die Verwaltungsschale: Zentrum der digitalen Vernetzung in Fabriken (Teil 1), Atp Magazin."},{"key":"ref_50","unstructured":"Alagha, H.E. (2019). Communicating Intention in Decentralized Multi-Agent Multi-Objective Reinforcement Learning Systems. [Master\u2019s Thesis, University of Groningen]."},{"key":"ref_51","unstructured":"(2022, April 25). Available online: https:\/\/www.gymlibrary.ml\/."},{"key":"ref_52","unstructured":"(2022, April 25). Available online: https:\/\/openai.com\/."},{"key":"ref_53","unstructured":"(2022, April 25). Available online: https:\/\/www.pettingzoo.ml\/."},{"key":"ref_54","unstructured":"(2022, April 25). Available online: https:\/\/stable-baselines3.readthedocs.io\/en\/master\/."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/11\/4099\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:20:23Z","timestamp":1760138423000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/11\/4099"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,28]]},"references-count":54,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["s22114099"],"URL":"https:\/\/doi.org\/10.3390\/s22114099","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,28]]}}}