{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T08:20:51Z","timestamp":1772785251175,"version":"3.50.1"},"reference-count":64,"publisher":"ASME International","issue":"1","license":[{"start":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T00:00:00Z","timestamp":1729728000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.asme.org\/publications-submissions\/publishing-information\/legal-policies"}],"funder":[{"DOI":"10.13039\/501100003030","name":"Ag\u00e8ncia de Gesti\u00f3 d'Ajuts Universitaris i de Recerca","doi-asserted-by":"publisher","award":["AGAUR 2020 DI 87"],"award-info":[{"award-number":["AGAUR 2020 DI 87"]}],"id":[{"id":"10.13039\/501100003030","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["asmedigitalcollection.asme.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>This paper explores the application of offline reinforcement learning in batch manufacturing, with a specific focus on press hardening processes. Offline reinforcement learning presents a viable alternative to traditional control and reinforcement learning methods, which often rely on impractical real-world interactions or complex simulations and iterative adjustments to bridge the gap between simulated and real-world environments. We demonstrate how offline reinforcement learning can improve control policies by leveraging existing data, thereby streamlining the training pipeline and reducing reliance on high-fidelity simulators. Our study evaluates the impact of varying data exploration rates by creating five datasets with exploration rates ranging from \u03b5=0 to \u03b5=0.8. Using the conservative Q-learning algorithm, we train and assess policies against both a dynamic baseline and a static industry-standard policy. The results indicate that while offline reinforcement learning effectively refines behavior policies and enhances supervised learning methods, its effectiveness is heavily dependent on the quality and exploratory nature of the initial behavior policy.<\/jats:p>","DOI":"10.1115\/1.4066999","type":"journal-article","created":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T14:17:51Z","timestamp":1729779471000},"update-policy":"https:\/\/doi.org\/10.1115\/crossmarkpolicy-asme","source":"Crossref","is-referenced-by-count":5,"title":["Offline Reinforcement Learning for Adaptive Control in Manufacturing Processes: A Press Hardening Case Study"],"prefix":"10.1115","volume":"25","author":[{"given":"Nuria","family":"Nievas","sequence":"first","affiliation":[{"name":"Centre Tecnol\u00f2gic de Catalunya Eurecat, , Unit of Applied Artificial Intelligence, Lleida 25003 ,","place":["Spain"]}]},{"given":"Leonardo","family":"Espinosa-Leal","sequence":"additional","affiliation":[{"id":[{"id":"https:\/\/ror.org\/02s466x84","id-type":"ROR","asserted-by":"publisher"}],"name":"Arcada University of Applied Sciences Graduate School and Research, , Helsinki 00560 ,","place":["Finland"]},{"name":"Arcada University of Applied Sciences Graduate School and Research, , Helsinki 00560 ,","place":["Finland"]}]},{"given":"Adela","family":"Pag\u00e8s-Bernaus","sequence":"additional","affiliation":[{"id":[{"id":"https:\/\/ror.org\/050c3cw24","id-type":"ROR","asserted-by":"publisher"}],"name":"Universitat de Lleida Department of Economy and Business, , Lleida 25001 , ;","place":["Spain"]},{"name":"University of Lleida Department of Economy and Business, , Lleida 25001 , ;","place":["Spain"]},{"name":"AGROTECNIO-CERCA Center Department of Animal Sciences, , Lleida 25198 ,","place":["Spain"]}]},{"given":"Albert","family":"Abio","sequence":"additional","affiliation":[{"name":"Centre Tecnol\u00f2gic de Catalunya Eurecat, , Unit of Applied Artificial Intelligence, Barcelona 08005 ,","place":["Spain"]}]},{"given":"Llu\u00eds","family":"Echeverria","sequence":"additional","affiliation":[{"name":"Centre Tecnol\u00f2gic de Catalunya Eurecat, , Unit of Applied Artificial Intelligence, Lleida 25003 ,","place":["Spain"]}]},{"given":"Francesc","family":"Bonada","sequence":"additional","affiliation":[{"name":"Centre Tecnol\u00f2gic de Catalunya Eurecat, , Unit of Applied Artificial Intelligence, Cerdanyola del Vall\u00e8s 08290 ,","place":["Spain"]}]}],"member":"33","published-online":{"date-parts":[[2024,11,12]]},"reference":[{"issue":"11","key":"2024111216091539000_CIT0001","doi-asserted-by":"publisher","first-page":"12868","DOI":"10.1109\/JSEN.2020.3033153","article-title":"A Review on Soft Sensors for Monitoring, Control, and Optimization of Industrial Processes","volume":"21","author":"Jiang","year":"2020","journal-title":"IEEE Sens. J."},{"issue":"4","key":"2024111216091539000_CIT0002","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1109\/TCST.2005.847331","article-title":"PID Control System Analysis, Design, and Technology","volume":"13","author":"Ang","year":"2005","journal-title":"IEEE Trans. Control Syst. Technol."},{"issue":"3","key":"2024111216091539000_CIT0003","doi-asserted-by":"publisher","first-page":"1445","DOI":"10.1007\/s10845-021-01869-x","article-title":"Reconfigurability Improvement in Industry 4.0: A Hybrid Genetic Algorithm-Based Heuristic Approach for a Co-Generation of Setup and Process Plans in a Reconfigurable Environment","volume":"34","author":"Ameer","year":"2023","journal-title":"J. Intell. Manuf."},{"issue":"5","key":"2024111216091539000_CIT0004","doi-asserted-by":"publisher","first-page":"1327","DOI":"10.1007\/s00170-021-07682-3","article-title":"Review on Model Predictive Control: An Engineering Perspective","volume":"117","author":"Schwenzer","year":"2021","journal-title":"Int. J. Adv. Manuf. Technol."},{"issue":"6","key":"2024111216091539000_CIT0005","doi-asserted-by":"publisher","first-page":"8427","DOI":"10.3233\/JIFS-189161","article-title":"Autonomous Industrial Management Via Reinforcement Learning","volume":"39","author":"Espinosa-Leal","year":"2020","journal-title":"J. Intell. Fuzzy Syst."},{"key":"2024111216091539000_CIT0006","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"2018"},{"issue":"12","key":"2024111216091539000_CIT0007","doi-asserted-by":"publisher","first-page":"10844","DOI":"10.1109\/TIE.2019.2962465","article-title":"A Reinforced K-Nearest Neighbors Method With Application to Chatter Identification in High-Speed Milling","volume":"67","author":"Shi","year":"2020","journal-title":"IEEE Trans. Ind. Electron."},{"key":"2024111216091539000_CIT0008","doi-asserted-by":"publisher","first-page":"103803","DOI":"10.1109\/ACCESS.2020.2998052","article-title":"Adaptive Laser Welding Control: A Reinforcement Learning Approach","volume":"8","author":"Masinelli","year":"2020","journal-title":"IEEE Access"},{"key":"2024111216091539000_CIT0009","doi-asserted-by":"publisher","first-page":"479","DOI":"10.1016\/j.procir.2022.08.074","article-title":"Smart Closed-Loop Control of Laser Welding Using Reinforcement Learning","volume":"111","author":"Quang","year":"2022","journal-title":"Proc. CIRP"},{"key":"2024111216091539000_CIT0010","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1016\/j.cirpj.2022.11.003","article-title":"Deep Reinforcement Learning in Smart Manufacturing: A Review and Prospects","volume":"40","author":"Li","year":"2023","journal-title":"CIRP J. Manuf. Sci. Technol."},{"key":"2024111216091539000_CIT0011","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1016\/j.jprocont.2022.05.006","article-title":"Off-Policy Reinforcement Learning-Based Novel Model-Free Minmax Fault-Tolerant Tracking Control for Industrial Processes","volume":"115","author":"Li","year":"2022","journal-title":"J. Process Control"},{"key":"2024111216091539000_CIT0012","doi-asserted-by":"publisher","first-page":"108686","DOI":"10.1016\/j.ijepes.2022.108686","article-title":"An Autonomous Control Technology Based on Deep Reinforcement Learning for Optimal Active Power Dispatch","volume":"145","author":"Han","year":"2023","journal-title":"Int. J. Electr. Power Energy Syst."},{"issue":"3","key":"2024111216091539000_CIT0013","doi-asserted-by":"publisher","first-page":"1429","DOI":"10.1007\/s00521-023-09112-9","article-title":"DRL-DEWMA: A Composite Framework for Run-to-Run Control in the Semiconductor Manufacturing Process","volume":"36","author":"Ma","year":"2024","journal-title":"Neural Comput. Appl."},{"issue":"9","key":"2024111216091539000_CIT0014","doi-asserted-by":"publisher","first-page":"2419","DOI":"10.1007\/s10994-021-05961-4","article-title":"Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis","volume":"110","author":"Dulac-Arnold","year":"2021","journal-title":"Mach. Learn."},{"issue":"19","key":"2024111216091539000_CIT0015","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/app10196923","article-title":"Variable Compliance Control for Robotic Peg-in-Hole Assembly: A Deep-Reinforcement-Learning Approach","volume":"10","author":"Beltran-Hernandez","year":"2020","journal-title":"Appl. Sci."},{"key":"2024111216091539000_CIT0016","doi-asserted-by":"publisher","first-page":"104399","DOI":"10.1016\/j.robot.2023.104399","article-title":"Robotic Assembly Strategy Via Reinforcement Learning Based on Force and Visual Information","volume":"164","author":"Ahn","year":"2023","journal-title":"Rob. Auton. Syst."},{"key":"2024111216091539000_CIT0017","author":"Nguyen","year":"2024"},{"issue":"2255","key":"2024111216091539000_CIT0018","doi-asserted-by":"publisher","first-page":"20210618","DOI":"10.1098\/rspa.2021.0618","article-title":"Physics-Informed DYNA-Style Model-Based Deep Reinforcement Learning for Dynamic Control","volume":"477","author":"Liu","year":"2021","journal-title":"Proc. R. Soc. A"},{"key":"2024111216091539000_CIT0019","first-page":"26","article-title":"Physics-Informed Model-Based Reinforcement Learning","author":"Ramesh","year":"2023"},{"issue":"1","key":"2024111216091539000_CIT0020","doi-asserted-by":"publisher","first-page":"0","DOI":"10.1080\/08839514.2024.2383101","article-title":"Reinforcement Learning for Autonomous Process Control in Industry 4. 0: Advantages and Challenges","volume":"38","author":"Nievas","year":"2024","journal-title":"Appl. Artif. Intell."},{"key":"2024111216091539000_CIT0021","first-page":"1","article-title":"Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems","author":"Levine","year":"2020"},{"key":"2024111216091539000_CIT0022","doi-asserted-by":"crossref","unstructured":"Zhan, X., Xu, H., Zhang, Y., Zhu, X., Yin, H., and Zheng, Y., 2022, \u201cDeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning,\u201d Proceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual, Feb. 22\u2013Mar. 1, AAAI 2022, Vol. 36, pp. 4680\u20134688.","DOI":"10.1609\/aaai.v36i4.20393"},{"key":"2024111216091539000_CIT0023","doi-asserted-by":"publisher","first-page":"100471","DOI":"10.1016\/j.jii.2023.100471","article-title":"Logistics-Involved Task Scheduling in Cloud Manufacturing With Offline Deep Reinforcement Learning","volume":"34","author":"Wang","year":"2023","journal-title":"J. Ind. Inf. Integr."},{"key":"2024111216091539000_CIT0024","doi-asserted-by":"publisher","first-page":"108517","DOI":"10.1016\/j.est.2023.108517","article-title":"Energy Management Optimization for Connected Hybrid Electric Vehicle Using Offline Reinforcement Learning","volume":"72","author":"He","year":"2023","journal-title":"J. Energy Storage"},{"key":"2024111216091539000_CIT0025","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1016\/j.ins.2023.03.019","article-title":"Offline Reinforcement Learning for Industrial Process Control: A Case Study From Steel Industry","volume":"632","author":"Deng","year":"2023","journal-title":"Inf. Sci."},{"key":"2024111216091539000_CIT0026","first-page":"11761","article-title":"Stabilizing Off-Policy Q-Learning Via Bootstrapping Error Reduction","author":"Kumar","year":"2019"},{"key":"2024111216091539000_CIT0027","first-page":"1","article-title":"Characterization of Tribological and Thermal Properties of Metallic Coatings for Hot Stamping Boron-Manganese Steels","author":"Merklein","year":"2008"},{"key":"2024111216091539000_CIT0028","unstructured":"\u00c5kerstr\u00f6m, P.\n          , 2006, \u201cModelling and Simulation of Hot Stamping,\u201d Doctoral dissertation, Lule\u00e5 Tekniska Universitet, Lule\u00e5, Sweden."},{"issue":"1","key":"2024111216091539000_CIT0029","doi-asserted-by":"publisher","first-page":"012010","DOI":"10.1088\/1757-899X\/1157\/1\/012010","article-title":"A Thermography-Based Online Control Method for Press Hardening","volume":"1157","author":"Garcia-Llamas","year":"2021","journal-title":"IOP Conf. Ser.: Mater. Sci. Eng."},{"issue":"6","key":"2024111216091539000_CIT0030","doi-asserted-by":"publisher","first-page":"A378","DOI":"10.2355\/tetsutohagane.96.378","article-title":"Effect of Quenching Rate on Hardness and Microstructure of Hot-Stamped Steel","volume":"96","author":"Nishibata","year":"2010","journal-title":"TETSU to Hagane-J. Iron Steel Inst. Jpn"},{"issue":"10","key":"2024111216091539000_CIT0031","doi-asserted-by":"publisher","first-page":"3647","DOI":"10.3390\/ma15103647","article-title":"Machine Learning-Based Surrogate Model for Press Hardening Process of 22MnB5 Sheet Steel Simulation in Industry 4.0","volume":"15","author":"Abio","year":"2022","journal-title":"Materials"},{"key":"2024111216091539000_CIT0032","unstructured":"Pujante, J., Garc\u00eda-Llamas, E., and Casellas, D., 2019, \u201cStudy of Wear in Press Hardening Using a Pilot Facility,\u201d Proceedings of the 7th International Conference Hot Sheet Metal Forming of High-Performance Steel, Lulea, Sweden, June 2\u20135, pp. 151\u2013158."},{"issue":"7540","key":"2024111216091539000_CIT0033","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-Level Control Through Deep Reinforcement Learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"issue":"3\u20134","key":"2024111216091539000_CIT0034","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-Learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"issue":"268","key":"2024111216091539000_CIT0035","first-page":"1","article-title":"Stable-Baselines3: Reliable Reinforcement Learning Implementations","volume":"22","author":"Raffin","year":"2021","journal-title":"J. Mach. Learn. Res."},{"key":"2024111216091539000_CIT0036","first-page":"1","article-title":"Conservative Q-Learning for Offline Reinforcement Learning","author":"Kumar","year":"2020"},{"key":"2024111216091539000_CIT0037","first-page":"2008","article-title":"Value Penalized Q-Learning for Recommender Systems","author":"Gao","year":"2022"},{"key":"2024111216091539000_CIT0038","first-page":"613","article-title":"Engagement Rewarded Actor-Critic With Conservative Q-Learning for Speech-Driven Laughter Backchannel Generation","author":"Bayramo\u011flu","year":"2021"},{"key":"2024111216091539000_CIT0039","first-page":"2094","article-title":"Deep Reinforcement Learning With Double Q-Learning","author":"Van Hasselt","year":"2016"},{"key":"2024111216091539000_CIT0040","first-page":"1","article-title":"Behavior Regularized Offline Reinforcement Learning","author":"Wu","year":"2019"},{"key":"2024111216091539000_CIT0041","first-page":"3599","article-title":"Off-Policy Deep Reinforcement Learning Without Exploration","author":"Fujimoto","year":"2019"},{"key":"2024111216091539000_CIT0042","author":"Peng","year":"2019"},{"key":"2024111216091539000_CIT0043","author":"Nair","year":"2020"},{"key":"2024111216091539000_CIT0044","first-page":"6487","article-title":"Safe Policy Improvement With Baseline Bootstrapping","author":"Laroche","year":"2019"},{"key":"2024111216091539000_CIT0045","first-page":"20132","article-title":"A Minimalist Approach to Offline Reinforcement Learning","author":"Fujimoto","year":"2021"},{"key":"2024111216091539000_CIT0046","first-page":"92","article-title":"An Optimistic Perspective on Offline Reinforcement Learning","author":"Agarwal","year":"2020"},{"key":"2024111216091539000_CIT0047","author":"Kidambi","year":"2020"},{"key":"2024111216091539000_CIT0048","first-page":"1","article-title":"MOPO : Model-based Offline Policy Optimization","author":"Yu","year":"2020"},{"key":"2024111216091539000_CIT0049","article-title":"When to Trust Your Model: Model-Based Policy Optimization","author":"Janner","year":"2019"},{"key":"2024111216091539000_CIT0050","first-page":"28954","article-title":"COMBO: Conservative Offline Model-Based Policy Optimization","author":"Yu","year":"2021"},{"key":"2024111216091539000_CIT0051","first-page":"1","article-title":"Offline Reinforcement Learning With Implicit Q-Learning","author":"Kostrikov","year":"2022"},{"key":"2024111216091539000_CIT0052","first-page":"15084","article-title":"Decision Transformer: Reinforcement Learning Via Sequence Modeling Equal Contribution \u2020 Equal Advising","author":"Chen","year":"2021"},{"key":"2024111216091539000_CIT0053","author":"Kuhnle","year":"2017"},{"key":"2024111216091539000_CIT0054","author":"Gauci","year":"2018"},{"issue":"131","key":"2024111216091539000_CIT0055","first-page":"1","article-title":"Mushroomrl: Simplifying Reinforcement Learning Research","volume":"22","author":"D\u2019Eramo","year":"2021","journal-title":"J. Mach. Learn. Res."},{"key":"2024111216091539000_CIT0056","author":"Caspi","year":"2017"},{"issue":"274","key":"2024111216091539000_CIT0057","first-page":"1","article-title":"CleanRL: High-Quality Single-File Implementations of Deep Reinforcement Learning Algorithms","volume":"23","author":"Huang","year":"2022","journal-title":"J. Mach. Learn. Res."},{"issue":"267","key":"2024111216091539000_CIT0058","first-page":"1","article-title":"Tianshou: A Highly Modularized Deep Reinforcement Learning Library","volume":"23","author":"Weng","year":"2022","journal-title":"J. Mach. Learn. Res."},{"key":"2024111216091539000_CIT0059","author":"Bou","year":"2023"},{"key":"2024111216091539000_CIT0060","first-page":"3053","article-title":"RLlib: Abstractions for Distributed Reinforcement Learning","author":"Liang","year":"2018"},{"key":"2024111216091539000_CIT0061","author":"Zhu","year":"2023"},{"issue":"315","key":"2024111216091539000_CIT0062","first-page":"1","article-title":"d3rlpy: An Offline Deep Reinforcement Learning Library","volume":"23","author":"Seno","year":"2022","journal-title":"J. Mach. Learn. Res."},{"key":"2024111216091539000_CIT0063","first-page":"30997","article-title":"CORL: Research-Oriented Deep Offline Reinforcement Learning Library","author":"Tarasov","year":"2023"},{"key":"2024111216091539000_CIT0064","author":"Sun","year":"2023"}],"container-title":["Journal of Computing and Information Science in Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/asmedigitalcollection.asme.org\/computingengineering\/article-pdf\/25\/1\/011004\/7401986\/jcise_25_1_011004.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/asmedigitalcollection.asme.org\/computingengineering\/article-pdf\/25\/1\/011004\/7401986\/jcise_25_1_011004.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,12]],"date-time":"2024-11-12T16:09:27Z","timestamp":1731427767000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmedigitalcollection.asme.org\/computingengineering\/article\/25\/1\/011004\/1207612\/Offline-Reinforcement-Learning-for-Adaptive"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,12]]},"references-count":64,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1,1]]}},"URL":"https:\/\/doi.org\/10.1115\/1.4066999","relation":{},"ISSN":["1530-9827","1944-7078"],"issn-type":[{"value":"1530-9827","type":"print"},{"value":"1944-7078","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,12]]},"article-number":"011004"}}