{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T15:30:19Z","timestamp":1780759819901,"version":"3.54.1"},"reference-count":54,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T00:00:00Z","timestamp":1768176000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon Europe project Mine.io \u201cA Holistic Digital Mine 4.0 Ecosystem\u201d","award":["101091885"],"award-info":[{"award-number":["101091885"]}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec>\n                    <jats:title>Introduction<\/jats:title>\n                    <jats:p>Predictive maintenance has emerged as a critical strategy in modern manufacturing, in the frame of Industry 4.0, enabling proactive intervention before equipment failure. However, traditional machine learning approaches require extensive labeled data and lack adaptability to evolving operational conditions. On the other hand, Reinforcement Learning (RL) enables agents to learn optimal policies through interaction with the environment, eliminating the need for labeled datasets and naturally capturing the sequential, uncertain dynamics of equipment degradation.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>In this paper, we propose an approach that incorporates four model-free RL algorithms, namely Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Soft Actor-Critic (SAC). We formulate the problem as a Markov Decision Process (MDP), which is solved with the aforementioned RL algorithms.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The proposed approach is validated in the context of CNC machine tool wear prediction, using sensor data from the 2010 PHM Society Data Challenge. We examine algorithmic performance across four custom made environments, corrective and non-corrective environments both with and without delay correction mechanisms in order to compare learning dynamics, convergence behavior, and generalization aspects. Our results reveal that PPO and SAC achieve the most stable and efficient performance, with SAC excelling in structured environments and PPO demonstrating robust generalization. A2C shows consistent long-term learning, while DDPG underperforms due to insufficient exploration.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>The findings highlight the potential of RL for predictive maintenance applications and underscore the importance of aligning algorithm design with environment characteristics and reward structures.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/frai.2025.1720140","type":"journal-article","created":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T08:19:19Z","timestamp":1768205959000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Health state prediction with reinforcement learning for predictive maintenance"],"prefix":"10.3389","volume":"8","author":[{"given":"Anastasis","family":"Aglogallos","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alexandros","family":"Bousdekis","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefanos","family":"Kontos","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gregoris","family":"Mentzas","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2026,1,12]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","first-page":"102240","DOI":"10.1016\/j.datak.2023.102240","article-title":"Hierarchical framework for interpretable and specialized deep reinforcement learning-based predictive maintenance","volume":"149","author":"Abbas","year":"2024","journal-title":"Data Knowl. Eng."},{"key":"ref3","doi-asserted-by":"publisher","first-page":"1067","DOI":"10.1080\/0951192X.2019.1686173","article-title":"The use of digital twin for predictive maintenance in manufacturing","volume":"32","author":"Aivaliotis","year":"2019","journal-title":"Int. J. Comput. Integr. Manuf."},{"key":"ref4","first-page":"9335","article-title":"Steady state analysis of episodic reinforcement learning","volume":"33","author":"Bojun","year":"2020","journal-title":"Adv. Neural Inf. Proces. Syst."},{"key":"ref6","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1109\/EMR.2019.2958037","article-title":"Predictive maintenance in the 4th industrial revolution: benefits, business opportunities, and managerial implications","volume":"48","author":"Bousdekis","year":"2019","journal-title":"IEEE Eng. Manag. Rev."},{"key":"ref7","doi-asserted-by":"publisher","first-page":"828","DOI":"10.3390\/electronics10070828","article-title":"A review of data-driven decision-making methods for industry 4.0 maintenance applications","volume":"10","author":"Bousdekis","year":"2021","journal-title":"Electronics"},{"key":"ref8","doi-asserted-by":"publisher","first-page":"1303","DOI":"10.1007\/s10845-015-1179-5","article-title":"Review, analysis and synthesis of prognostic-based decision support methods for condition based maintenance","volume":"29","author":"Bousdekis","year":"2018","journal-title":"J. Intell. Manuf."},{"key":"ref9","doi-asserted-by":"publisher","first-page":"106024","DOI":"10.1016\/j.cie.2019.106024","article-title":"A systematic literature review of machine learning methods applied to predictive maintenance","volume":"137","author":"Carvalho","year":"2019","journal-title":"Comput. Ind. Eng."},{"key":"ref10","doi-asserted-by":"publisher","first-page":"129","DOI":"10.36001\/ijphm.2011.v2i2.1353","article-title":"A multiple model prediction algorithm for CNC machine wear PHM","volume":"2","author":"Chen","year":"2011","journal-title":"Int. J. Progn. Health Manag."},{"key":"ref11","doi-asserted-by":"publisher","first-page":"8211","DOI":"10.3390\/su12198211","article-title":"Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0","volume":"12","author":"\u00c7\u0131nar","year":"2020","journal-title":"Sustainability"},{"key":"ref12","doi-asserted-by":"publisher","first-page":"103298","DOI":"10.1016\/j.compind.2020.103298","article-title":"Machine learning and reasoning for predictive maintenance in industry 4.0: current status and challenges","volume":"123","author":"Dalzochio","year":"2020","journal-title":"Comput. Ind."},{"key":"ref13","author":"Ding","year":"2008"},{"key":"ref14","author":"Eke","year":"2017"},{"key":"ref15","doi-asserted-by":"publisher","first-page":"18910","DOI":"10.1109\/ACCESS.2022.3151170","article-title":"Predictive maintenance decision making based on reinforcement learning in multistage production systems","volume":"10","author":"Feng","year":"2022","journal-title":"IEEE Access"},{"key":"ref16","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1007\/s40436-022-00433-x","article-title":"Application of sensor data based predictive maintenance and artificial neural networks to enable industry 4.0","volume":"11","author":"Fordal","year":"2023","journal-title":"Adv. Manuf."},{"key":"ref17","doi-asserted-by":"publisher","first-page":"1390","DOI":"10.1061\/(ASCE)0733-9445(1997)123:10(1390)","article-title":"Life-cycle cost design of deteriorating structures","volume":"123","author":"Frangopol","year":"1997","journal-title":"J. Struct. Eng."},{"key":"ref18","author":"Garcia","year":"2013"},{"key":"ref19","first-page":"3846","article-title":"Interpolated policy gradient: merging on-policy and off-policy gradient estimation for deep reinforcement learning","volume":"30","author":"Gu","year":"2017","journal-title":"Adv. Neural Inf. Proces. Syst."},{"key":"ref20","author":"Haarnoja","year":""},{"key":"ref21","doi-asserted-by":"publisher","first-page":"arXiv:1812.05905","DOI":"10.48550\/arXiv.1812.05905","article-title":"Soft actor-critic algorithms and applications","author":"Haarnoja","year":"","journal-title":"arXiv"},{"key":"ref22","first-page":"1","article-title":"Review of data-driven prognostics and health management techniques: lessions learned from PHM data challenge competitions","volume":"1","author":"Huang","year":"2017","journal-title":"Mach. Failure Prevent. Technol."},{"key":"ref23","author":"Kabir","year":"2018"},{"key":"ref24","doi-asserted-by":"publisher","first-page":"42","DOI":"10.3390\/jsan13040042","article-title":"Tool condition monitoring in the milling process using deep learning and reinforcement learning","volume":"13","author":"Kaliyannan","year":"2024","journal-title":"J. Sens. Actuator Netw."},{"key":"ref25","first-page":"1188","article-title":"Evolution-guided policy gradient in reinforcement learning","volume":"31","author":"Khadka","year":"2018","journal-title":"Adv. Neural Inf. Proces. Syst."},{"key":"ref26","doi-asserted-by":"publisher","first-page":"6336","DOI":"10.1016\/j.matpr.2022.02.550","article-title":"Time domain vibration analysis techniques for condition monitoring of rolling element bearing: a review","volume":"62","author":"Kumar","year":"2022","journal-title":"Mater Today Proc"},{"key":"ref27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.inffus.2022.03.003","article-title":"Exploration in deep reinforcement learning: a survey","volume":"85","author":"Ladosz","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref28","doi-asserted-by":"publisher","first-page":"arXiv:2112.12589","DOI":"10.48550\/arXiv.2112.12589","article-title":"A deep reinforcement learning model for predictive maintenance planning of road assets: integrating LCA and LCCA","author":"Latifi","year":"2021","journal-title":"arXiv"},{"key":"ref29","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1186\/s40537-020-00340-7","article-title":"Big data architecture for intelligent maintenance: a focus on query processing and machine learning algorithms","volume":"7","author":"Lehmann","year":"2020","journal-title":"J. Big Data"},{"key":"ref30","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/j.ijinfomgt.2019.04.003","article-title":"Prescriptive analytics: literature review and research challenges","volume":"50","author":"Lepenioti","year":"2020","journal-title":"Int. J. Inf. Manag."},{"key":"ref31","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1109\/MCS.2012.2214134","article-title":"Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers","volume":"32","author":"Lewis","year":"2012","journal-title":"IEEE Control. Syst. Mag."},{"key":"ref32","author":"Li","year":"2021"},{"key":"ref33","doi-asserted-by":"publisher","first-page":"ICLR 2016","DOI":"10.48550\/arXiv.1509.02971","article-title":"Continuous control with deep reinforcement learning","author":"Lillicrap","year":"2016","journal-title":"arXiv"},{"key":"ref34","author":"Ma","year":"2020"},{"key":"ref35","author":"Mnih","year":"2016"},{"key":"ref36","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1016\/j.jmsy.2023.07.014","article-title":"Reinforcement and deep reinforcement learning-based solutions for machine maintenance planning, scheduling policies, and optimization","volume":"70","author":"Ogunfowora","year":"2023","journal-title":"J. Manuf. Syst."},{"key":"ref37","doi-asserted-by":"publisher","first-page":"15725","DOI":"10.1109\/JIOT.2022.3151862","article-title":"Predictive maintenance model for IIoT-based manufacturing: a transferable deep reinforcement learning approach","volume":"9","author":"Ong","year":"2022","journal-title":"IEEE Internet Things J."},{"key":"ref38","doi-asserted-by":"publisher","first-page":"1470","DOI":"10.3390\/s21041470","article-title":"Predictive maintenance and intelligent sensors in smart factory","volume":"21","author":"Pech","year":"2021","journal-title":"Sensors"},{"key":"ref39","author":"Sateesh Babu","year":"2016"},{"key":"ref40","doi-asserted-by":"publisher","first-page":"6611","DOI":"10.1007\/s00170-022-09784-y","article-title":"Tool wear prediction using long short-term memory variants and hybrid feature selection techniques","volume":"121","author":"Sayyad","year":"2022","journal-title":"Int. J. Adv. Manuf. Technol."},{"key":"ref41","doi-asserted-by":"publisher","first-page":"arXiv:1707.06347","DOI":"10.48550\/arXiv.1707.06347","article-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017","journal-title":"arXiv"},{"key":"ref42","doi-asserted-by":"publisher","first-page":"10934","DOI":"10.1007\/s10489-021-03004-y","article-title":"Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects","volume":"52","author":"Serradilla","year":"2022","journal-title":"Appl. Intell."},{"key":"ref43","doi-asserted-by":"publisher","first-page":"120495","DOI":"10.1016\/j.eswa.2023.120495","article-title":"Reinforcement learning algorithms: a brief survey","volume":"231","author":"Shakya","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref44","author":"Silver","year":"2014"},{"key":"ref45","doi-asserted-by":"publisher","first-page":"12885","DOI":"10.1007\/s10462-023-10468-6","article-title":"Reinforcement learning for predictive maintenance: a systematic technical review","volume":"56","author":"Siraskar","year":"2023","journal-title":"Artif. Intell. Rev."},{"key":"ref46","doi-asserted-by":"publisher","first-page":"1300","DOI":"10.1016\/j.ymssp.2006.06.010","article-title":"Decision tree and PCA-based fault diagnosis of rotating machinery","volume":"21","author":"Sun","year":"2007","journal-title":"Mech. Syst. Signal Process."},{"key":"ref47","author":"Susto","year":"2013"},{"key":"ref48","author":"Susto","year":"2014"},{"key":"ref49","first-page":"9","volume-title":"Reinforcement learning: An introduction","author":"Sutton","year":"1998"},{"key":"ref50","doi-asserted-by":"publisher","first-page":"936","DOI":"10.1080\/0951192X.2019.1667033","article-title":"Intelligent decision support for maintenance: an overview and future trends","volume":"32","author":"Turner","year":"2019","journal-title":"Int. J. Comput. Integr. Manuf."},{"key":"ref51","doi-asserted-by":"publisher","first-page":"5277","DOI":"10.1063\/5.0225277","article-title":"Method for remaining useful life prediction of rolling bearings based on deep reinforcement learning","volume":"95","author":"Wang","year":"2024","journal-title":"Rev. Sci. Instrum."},{"key":"ref52","doi-asserted-by":"publisher","first-page":"e1392","DOI":"10.1002\/wics.1392","article-title":"Data representation for time series data mining: time domain approaches","volume":"9","author":"Wilson","year":"2017","journal-title":"WIREs Comput. Stat."},{"key":"ref53","doi-asserted-by":"publisher","first-page":"100469","DOI":"10.1016\/j.jii.2023.100469","article-title":"Digital twin-driven fault diagnosis method for composite faults by combining virtual and real data","volume":"33","author":"Yang","year":"","journal-title":"J. Ind. Inf. Integr."},{"key":"ref54","doi-asserted-by":"publisher","first-page":"110813","DOI":"10.1016\/j.ymssp.2023.110813","article-title":"Cross-validation enhanced digital twin driven fault diagnosis methodology for minor faults of subsea production control system","volume":"204","author":"Yang","year":"","journal-title":"Mech. Syst. Signal Process."},{"key":"ref55","doi-asserted-by":"publisher","first-page":"35432","DOI":"10.1109\/JIOT.2024.3436110","article-title":"TranDRL: a transformer-driven deep reinforcement learning enabled prescriptive maintenance framework","volume":"11","author":"Zhao","year":"2024","journal-title":"IEEE Internet Things J."},{"key":"ref56","author":"Zheng","year":"2017"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1720140\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T08:19:22Z","timestamp":1768205962000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1720140\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,12]]},"references-count":54,"alternative-id":["10.3389\/frai.2025.1720140"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1720140","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,12]]},"article-number":"1720140"}}