{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T12:49:03Z","timestamp":1763988543550,"version":"3.37.3"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"17-18","license":[{"start":{"date-parts":[[2024,6,26]],"date-time":"2024-06-26T00:00:00Z","timestamp":1719360000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,26]],"date-time":"2024-06-26T00:00:00Z","timestamp":1719360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100008530","name":"European Regional Development Fund","doi-asserted-by":"publisher","award":["EFRE-0801698"],"award-info":[{"award-number":["EFRE-0801698"]}],"id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100021130","name":"Bundesministerium f\u00fcr Wirtschaft und Klimaschutz","doi-asserted-by":"publisher","award":["21407 N","KK5371001ZG1"],"award-info":[{"award-number":["21407 N","KK5371001ZG1"]}],"id":[{"id":"10.13039\/100021130","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Advances in artificial intelligence (AI) have led to its application in many areas of everyday life. In the context of control engineering, reinforcement learning (RL) represents a particularly promising approach as it is centred around the idea of allowing an agent to freely interact with its environment to find an optimal strategy. One of the challenges professionals face when training and deploying RL agents is that the latter often have to run on dedicated embedded devices. This could be to integrate them into an existing toolchain or to satisfy certain performance criteria like real-time constraints. Conventional RL libraries, however, cannot be easily utilised in conjunction with that kind of hardware. In this paper, we present a framework named LExCI, the <jats:italic>Learning and Experiencing Cycle Interface<\/jats:italic>, which bridges this gap and provides end-users with a free and open-source tool for training agents on embedded systems using the open-source library RLlib. Its operability is demonstrated with two state-of-the-art RL-algorithms and a rapid control prototyping system.<\/jats:p>","DOI":"10.1007\/s10489-024-05573-0","type":"journal-article","created":{"date-parts":[[2024,6,26]],"date-time":"2024-06-26T06:02:35Z","timestamp":1719381755000},"page":"8384-8398","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["LExCI: A framework for reinforcement learning with embedded systems"],"prefix":"10.1007","volume":"54","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5593-0227","authenticated-orcid":false,"given":"Kevin","family":"Badalian","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7368-8833","authenticated-orcid":false,"given":"Lucas","family":"Koch","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2165-0371","authenticated-orcid":false,"given":"Tobias","family":"Brinkmann","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8835-3040","authenticated-orcid":false,"given":"Mario","family":"Picerno","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8715-8609","authenticated-orcid":false,"given":"Marius","family":"Wegener","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9246-8895","authenticated-orcid":false,"given":"Sung-Yong","family":"Lee","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6754-1907","authenticated-orcid":false,"given":"Jakob","family":"Andert","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,6,26]]},"reference":[{"issue":"11","key":"5573_CR1","doi-asserted-by":"publisher","first-page":"917","DOI":"10.1002\/ajim.23037","volume":"62","author":"J Howard","year":"2019","unstructured":"Howard J (2019) Artificial intelligence: Implications for the future of work. American Journal of Industrial Medicine. 62(11):917\u2013926","journal-title":"American Journal of Industrial Medicine."},{"issue":"11","key":"5573_CR2","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1109\/MC.2020.3006177","volume":"53","author":"P Laplante","year":"2020","unstructured":"Laplante P, Milojicic D, Serebryakov S, Bennett D (2020) Artificial Intelligence and Critical Systems: From Hype to Reality. Computer. 53(11):45\u201352. https:\/\/doi.org\/10.1109\/MC.2020.3006177","journal-title":"Computer."},{"key":"5573_CR3","unstructured":"Eurostat (2022) Use of artificial intelligence in enterprises. Accessed: 2023-05-15. https:\/\/ec.europa.eu\/eurostat\/statistics-explained\/index.php?title=Use_of_artificial_intelligence_in_enterprises"},{"issue":"3","key":"5573_CR4","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1002\/rob.21918","volume":"37","author":"S Grigorescu","year":"2020","unstructured":"Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A Survey of Deep Learning Techniques for Autonomous Driving. Journal of Field Robotics. 37(3):362\u2013386. https:\/\/doi.org\/10.1002\/rob.21918","journal-title":"Journal of Field Robotics."},{"key":"5573_CR5","doi-asserted-by":"publisher","unstructured":"Branco S, Ferreira AG, Cabral J (2019) Machine Learning in Resource-Scarce Embedded Systems, FPGAs, and End-Devices: A Survey. Electronics. 8(11) https:\/\/doi.org\/10.3390\/electronics8111289","DOI":"10.3390\/electronics8111289"},{"key":"5573_CR6","doi-asserted-by":"publisher","unstructured":"Barkalov A, Titarenko L, Mazurkiewicz M (2019) Foundations of Embedded Systems, 1st edn. Springer, Cham, Switzerland. https:\/\/doi.org\/10.1007\/978-3-030-11961-4","DOI":"10.1007\/978-3-030-11961-4"},{"key":"5573_CR7","unstructured":"Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Elibol M, Yang Z, Paul W, Jordan MI et\u00a0al (2018) Ray: A distributed framework for emerging $$\\{$$AI$$\\}$$ applications. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 561\u2013577"},{"key":"5573_CR8","unstructured":"Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Goldberg K, Gonzalez J, Jordan M, Stoica I (2018) RLlib: Abstractions for distributed reinforcement learning. In: International Conference on Machine Learning, pp. 3053\u20133062. PMLR"},{"issue":"268","key":"5573_CR9","first-page":"1","volume":"22","author":"A Raffin","year":"2021","unstructured":"Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N (2021) Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research. 22(268):1\u20138","journal-title":"Journal of Machine Learning Research."},{"key":"5573_CR10","unstructured":"Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Man\u00e9 D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Vi\u00e9gas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. https:\/\/www.tensorflow.org\/"},{"key":"5573_CR11","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: Wallach H, Larochelle H, Beygelzimer A, d\u2019Alch\u00e9-Buc F, Fox E, Garnett R (eds.) Advances in Neural Information Processing Systems 32, pp. 8024\u20138035. Curran Associates, Inc., Vancouver, Canada. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf"},{"key":"5573_CR12","doi-asserted-by":"publisher","unstructured":"Chen Y, Zheng B, Zhang Z, Wang Q, Shen C, Zhang Q (2020) Deep Learning on Mobile and Embedded Devices: State-of-the-Art, Challenges, and Future Directions. ACM Comput Surv 53(4). https:\/\/doi.org\/10.1145\/3398209","DOI":"10.1145\/3398209"},{"key":"5573_CR13","unstructured":"David R, Duke J, Jain A, Reddi VJ, Jeffries N, Li J, Kreeger N, Nappier I, Natraj M, Regev S, Rhodes R, Wang T, Warden P (2020) Tensorflow lite micro: Embedded machine learning on tinyml systems. CoRR. arXiv:2010.08678"},{"key":"5573_CR14","unstructured":"cONNXr (software) (2019) GitHub. Accessed: 2023-06-30. https:\/\/github.com\/alrevuelta\/cONNXr"},{"key":"5573_CR15","unstructured":"Genann v1.0.0 (software) (2016) GitHub. Accessed: 2023-06-30. https:\/\/github.com\/codeplea\/genann"},{"key":"5573_CR16","unstructured":"KANN (software) (2016) GitHub. Accessed: 2023-06-30 . https:\/\/github.com\/attractivechaos\/kann"},{"key":"5573_CR17","unstructured":"tiny-dnn v1.0.0 (software) (2012) GitHub. Accessed: 2023-06-30. https:\/\/github.com\/tiny-dnn\/tiny-dnn\/"},{"key":"5573_CR18","unstructured":"MiniDNN (software) (2017) GitHub. Accessed: 2023-06-30. https:\/\/github.com\/yixuan\/MiniDNN"},{"key":"5573_CR19","unstructured":"frugally-deep v0.15.20-p0 (software) (2016) GitHub. Accessed: 2023-06-30. https:\/\/github.com\/Dobiasd\/frugally-deep"},{"key":"5573_CR20","unstructured":"keras2cpp (software) (2016) GitHub. Accessed: 2023-06-30. https:\/\/github.com\/pplonski\/keras2cpp"},{"key":"5573_CR21","unstructured":"onnx2c (software) (2020) GitHub. Accessed: 2023-06-30. https:\/\/github.com\/kraiskil\/onnx2c"},{"key":"5573_CR22","unstructured":"MathWorks (2021) Reinforcement Learning Toolbox (software). Accessed: 2023-06-30. https:\/\/www.mathworks.com\/products\/reinforcement-learning.html"},{"key":"5573_CR23","doi-asserted-by":"crossref","unstructured":"Han H, Siebert J (2022) TinyML: A Systematic Review and Synthesis of Existing Research. In: 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 269\u2013274. IEEE","DOI":"10.1109\/ICAIIC54071.2022.9722636"},{"key":"5573_CR24","unstructured":"Hausknecht M, Stone P (2015) Deep Recurrent Q-Learning for Partially Observable MDPs. In: 2015 AAAI Fall Symposium Series"},{"key":"5573_CR25","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2022.105477","volume":"117","author":"L Koch","year":"2023","unstructured":"Koch L, Picerno M, Badalian K, Lee S-Y, Andert J (2023) Automated function development for emission control with deep reinforcement learning. Eng Appl Artif Intell 117:105477. https:\/\/doi.org\/10.1016\/j.engappai.2022.105477","journal-title":"Eng Appl Artif Intell"},{"key":"5573_CR26","unstructured":"Picerno M, Koch L, Badalian K, Wegener M, Schaub J, Koch CR, Andert J (2023) Transfer of Reinforcement Learning-Based Controllers from Model-to Hardware-in-the-Loop. arXiv:2310.17671"},{"key":"5573_CR27","doi-asserted-by":"publisher","unstructured":"Picerno M, Koch L, Badalian K, Lee S-Y, Andert J (2023) Turbocharger control for emission reduction based on deep reinforcement learning. IFAC-PapersOnLine. 56(2):8266\u20138271. https:\/\/doi.org\/10.1016\/j.ifacol.2023.10.1012. 22nd IFAC World Congress","DOI":"10.1016\/j.ifacol.2023.10.1012"},{"issue":"3","key":"5573_CR28","doi-asserted-by":"publisher","first-page":"914","DOI":"10.3390\/vehicles5030050","volume":"5","author":"L Koch","year":"2023","unstructured":"Koch L, Roeser D, Badalian K, Lieb A, Andert J (2023) Cloud-Based Reinforcement Learning in Automotive Control Function Development. Vehicles. 5(3):914\u2013930. https:\/\/doi.org\/10.3390\/vehicles5030050","journal-title":"Vehicles."},{"key":"5573_CR29","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1109\/OJPEL.2021.3065877","volume":"2","author":"G Book","year":"2021","unstructured":"Book G, Traue A, Balakrishna P, Brosch A, Schenke M, Hanke S, Kirchg\u00e4ssner W, Wallscheid O (2021) Transferring Online Reinforcement Learning for Electric Motor Control From Simulation to Real-World Experiments. IEEE Open Journal of Power Electronics. 2:187\u2013201. https:\/\/doi.org\/10.1109\/OJPEL.2021.3065877","journal-title":"IEEE Open Journal of Power Electronics."},{"key":"5573_CR30","unstructured":"Plappert M (2016) keras-rl (software). GitHub. https:\/\/github.com\/keras-rl\/keras-rl"},{"key":"5573_CR31","doi-asserted-by":"crossref","unstructured":"Szydlo T, Jayaraman PP, Li Y, Morgan G, Ranjan R (2022) TinyRL: Towards Reinforcement Learning on Tiny Embedded Devices. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 4985\u20134988","DOI":"10.1145\/3511808.3557206"},{"key":"5573_CR32","unstructured":"Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge, Massachusetts, USA. http:\/\/incompleteideas.net\/book\/RLbook2020.pdf"},{"key":"5573_CR33","unstructured":"OpenAI (2018) Spinning Up: Introduction to RL - Part 2: Kinds of RL Algorithms. Accessed: 2023-05-03. https:\/\/spinningup.openai.com\/en\/latest\/spinningup\/rl_intro2.html"},{"key":"5573_CR34","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal Policy Optimization Algorithms. CoRR. https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"5573_CR35","unstructured":"OpenAI (2018) Spinning Up: Algorithms Docs: Proximal Policy Optimization. Accessed: 2023-05-03. https:\/\/spinningup.openai.com\/en\/latest\/algorithms\/ppo.html"},{"key":"5573_CR36","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http:\/\/arxiv.org\/abs\/1509.02971"},{"key":"5573_CR37","unstructured":"OpenAI (2018) Spinning Up: Algorithms Docs: Deep Deterministic Policy Gradient. Accessed: 2023-05-03. https:\/\/spinningup.openai.com\/en\/latest\/algorithms\/ddpg.html"},{"key":"5573_CR38","unstructured":"OpenAI (2016) implementation of the inverted pendulum swing-up problem (code). Accessed: 2023-10-12 . https:\/\/github.com\/openai\/gym\/blob\/v0.21.0\/gym\/envs\/classic_control\/pendulum.py"},{"key":"5573_CR39","unstructured":"The Farama Foundation (2022) Pendulum. Accessed: 2023-07-13. https:\/\/gymnasium.farama.org\/environments\/classic_control\/pendulum\/"},{"key":"5573_CR40","unstructured":"Bi Y, Chen X, Xiao C (2021) A Deep Reinforcement Learning Approach towards Pendulum Swing-up Problem based on TF-Agents. arXiv preprint arXiv:2106.09556"},{"key":"5573_CR41","unstructured":"Kumar S (2021) Controlling an Inverted Pendulum with Policy Gradient Methods - A Tutorial. arXiv preprint arXiv:2105.07998"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-024-05573-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-024-05573-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-024-05573-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T12:35:23Z","timestamp":1723034123000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-024-05573-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,26]]},"references-count":41,"journal-issue":{"issue":"17-18","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["5573"],"URL":"https:\/\/doi.org\/10.1007\/s10489-024-05573-0","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"type":"print","value":"0924-669X"},{"type":"electronic","value":"1573-7497"}],"subject":[],"published":{"date-parts":[[2024,6,26]]},"assertion":[{"value":"27 May 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 June 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The following could be considered a potential financial interest: Lucas Koch, Mario Picerno, Kevin Badalian, Sung-Yong Lee, and Jakob Andert have a patent pending for <i>Automatisierte Funktionskalibrierung<\/i>\/<i>Automated Feature Calibration<\/i> (patent no. DE102022104648A1\/EP4235319A1).","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics Approval"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Participate"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for Publication"}}]}}