{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,24]],"date-time":"2026-06-24T15:03:22Z","timestamp":1782313402450,"version":"3.54.5"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,4,30]],"date-time":"2024-04-30T00:00:00Z","timestamp":1714435200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Cyber-Phys. Syst."],"published-print":{"date-parts":[[2024,4,30]]},"abstract":"<jats:p>Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL optimizes a long-term reward, it is a promising approach to improving the energy efficiency of data-center cooling. However, enforcement of thermal safety constraints during DRL\u2019s state exploration is a main challenge. The widely adopted reward-shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this article, we propose a safety-aware DRL framework for data-center cooling control. It applies offline imitation learning and online post-hoc rectification to holistically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states explored by DRL. Extensive evaluation for chilled water and direct expansion-cooled data centers in two climate conditions show that our approach saves 18% to 26.6% of total data-center power compared with conventional control and reduces safety violations by 94.5% to 99% compared with reward shaping. We also extend the proposed framework to address data centers with non-uniform temperature distributions for detailed safety considerations. The evaluation shows that our approach saves 14% power usage compared with the PID control while addressing safety compliance during the training.<\/jats:p>","DOI":"10.1145\/3582577","type":"journal-article","created":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T07:23:16Z","timestamp":1675322596000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Green Data Center Cooling Control via Physics-guided Safe Reinforcement Learning"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8827-9373","authenticated-orcid":false,"given":"Ruihang","family":"Wang","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0341-8213","authenticated-orcid":false,"given":"Zhiwei","family":"Cao","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5405-9890","authenticated-orcid":false,"given":"Xin","family":"Zhou","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2751-5114","authenticated-orcid":false,"given":"Yonggang","family":"Wen","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8441-9973","authenticated-orcid":false,"given":"Rui","family":"Tan","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,5,14]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"Enviromon.net. 2017. How to monitor server room temperature and environmental conditions. https:\/\/www.enviromon.net\/how-to-monitor-server-room-temperature\/."},{"key":"e_1_3_3_3_2","unstructured":"Alibaba Inc. 2021. Alibaba cluster trace program. https:\/\/github.com\/alibaba\/clusterdata."},{"key":"e_1_3_3_4_2","unstructured":"Bigladder. 2021. EnergyPlus Setpoint Managers. https:\/\/bit.ly\/3EtLmZp."},{"key":"e_1_3_3_5_2","unstructured":"Business Wire. 2021. Global Internet data centers market report. https:\/\/www.businesswire.com\/news\/home\/20210903005160\/en\/."},{"key":"e_1_3_3_6_2","first-page":"136","volume-title":"ICML","author":"Amos B.","year":"2017","unstructured":"B. Amos and J. Z. Kolter. 2017. OptNet: Differentiable optimization as a layer in neural networks. In ICML. 136\u2013145."},{"key":"e_1_3_3_7_2","volume-title":"Computational Fluid Dynamics","author":"Anderson J. D.","year":"1995","unstructured":"J. D. Anderson and J. Wendt. 1995. Computational Fluid Dynamics. Vol. 206. Springer."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/87.736752"},{"key":"e_1_3_3_9_2","unstructured":"TC ASHRAE. 2011. 2011 Thermal Guidelines for Data Processing Environments-Expended Data Center Classes and Usage Guidance. White paper prepared by ASHRAE Technical Committee (TC) Vol. 9."},{"key":"e_1_3_3_10_2","article-title":"OpenAI gym","author":"Brockman G.","year":"2016","unstructured":"G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. 2016. OpenAI gym. arXiv:1606.01540 (2016).","journal-title":"arXiv:1606.01540"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2022.3161275"},{"key":"e_1_3_3_12_2","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1145\/3360322.3360849","volume-title":"ACM BuildSys","author":"Chen B.","year":"2019","unstructured":"B. Chen, Z. Cai, and M. Berg\u00e9s. 2019. Gnu-RL: A precocial reinforcement learning solution for building HVAC control using a differentiable MPC policy. In ACM BuildSys. 316\u2013325."},{"key":"e_1_3_3_13_2","first-page":"199","volume-title":"ACM e-Energy","author":"Chen B.","year":"2021","unstructured":"B. Chen, P. Donti, K. Baker, Z. Kolter, and M. Berges. 2021. Enforcing policy feasibility constraints through differentiable projection for energy optimization. In ACM e-Energy. 199\u2013210."},{"key":"e_1_3_3_14_2","volume-title":"ACM e-Energy","author":"Chi C.","year":"2020","unstructured":"C. Chi, K. Ji, A. Marahatta, P. Song, F. Zhang, and Z. Liu. 2020. Jointly optimizing the IT and cooling systems for data center energy efficiency based on multi-agent deep reinforcement learning. In ACM e-Energy."},{"issue":"4","key":"e_1_3_3_15_2","article-title":"EnergyPlus: Energy simulation program","volume":"42","author":"Crawley D.","year":"2000","unstructured":"D. Crawley, L. Lawrie, C. Pedersen, and F. Winkelmann. 2000. EnergyPlus: Energy simulation program. ASHRAE J. 42, 4 (2000).","journal-title":"ASHRAE J."},{"key":"e_1_3_3_16_2","article-title":"Safe exploration in continuous action spaces","author":"Dalal G.","year":"2018","unstructured":"G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa. 2018. Safe exploration in continuous action spaces. arXiv:1801.08757 (2018).","journal-title":"arXiv:1801.08757"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3408308.3427986"},{"key":"e_1_3_3_18_2","first-page":"163","volume-title":"ACM SIGMETRICS","author":"El-Sayed N.","year":"2012","unstructured":"N. El-Sayed, I. Stefanovici, G. Amvrosiadis, A. Hwang, and B. Schroeder. 2012. Temperature management in data centers: Why some (might) like it hot. In ACM SIGMETRICS. 163\u2013174."},{"key":"e_1_3_3_19_2","doi-asserted-by":"crossref","DOI":"10.1613\/jair.1666","article-title":"Risk-sensitive reinforcement learning applied to control under constraints","volume":"24","author":"Geibel P.","year":"2005","unstructured":"P. Geibel and F. Wysotzki. 2005. Risk-sensitive reinforcement learning applied to control under constraints. J. Artif. Intell. Res. 24 (2005).","journal-title":"J. Artif. Intell. Res."},{"key":"e_1_3_3_20_2","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511919701","volume-title":"Turbulence, Coherent Structures, Dynamical Systems and Symmetry","author":"Holmes P.","year":"2012","unstructured":"P. Holmes, J. L. Lumley, G. Berkooz, and C. Rowley. 2012. Turbulence, Coherent Structures, Dynamical Systems and Symmetry. Cambridge University Press."},{"key":"e_1_3_3_21_2","first-page":"140","volume-title":"ACM\/IEEE ICCPS","author":"Jain A.","year":"2018","unstructured":"A. Jain, T. Nghiem, M. Morari, and R. Mangharam. 2018. Learning and control using Gaussian processes. In ACM\/IEEE ICCPS. 140\u2013149."},{"key":"e_1_3_3_22_2","first-page":"1","volume-title":"International Workshop on Coupled Methods in Numerical Dynamics","author":"Jasak H.","year":"2007","unstructured":"H. Jasak, A. Jemcov, and Z. Tukovic. 2007. OpenFOAM: A C++ library for complex physics simulations. In International Workshop on Coupled Methods in Numerical Dynamics, Vol. 1000. IUC Dubrovnik Croatia, 1\u201320."},{"key":"e_1_3_3_23_2","first-page":"3818","volume-title":"NeurIPS","author":"Lazic N.","year":"2018","unstructured":"N. Lazic, T. Lu, C. Boutilier, M. Ryu, E. Wong, B. Roy, and G. Imwalle. 2018. Data center cooling using model-predictive control. In NeurIPS. 3818\u20133827."},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2021.3051400"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2019.2927410"},{"key":"e_1_3_3_26_2","article-title":"Continuous control with deep reinforcement learning","author":"Lillicrap T.","year":"2015","unstructured":"T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015).","journal-title":"arXiv:1509.02971"},{"key":"e_1_3_3_27_2","first-page":"181","volume-title":"ICCPS","author":"Liu Hsin-Yu","year":"2022","unstructured":"Hsin-Yu Liu, Bharathan Balaji, Sicun Gao, Rajesh Gupta, and Dezhi Hong. 2022. Safe HVAC control via batch reinforcement learning. In ICCPS. IEEE, 181\u2013192."},{"key":"e_1_3_3_28_2","volume-title":"NeurIPS","author":"Mao H.","year":"2019","unstructured":"H. Mao, M. Schwarzkopf, H. He, and M. Alizadeh. 2019. Towards safe online reinforcement learning in computer systems. In NeurIPS."},{"key":"e_1_3_3_29_2","first-page":"1","volume-title":"IEEE CLUSTER","author":"Menon H.","year":"2013","unstructured":"H. Menon, B. Acun, S. G. De Gonzalo, O. Sarood, and L. Kal\u00e9. 2013. Thermal aware automated load balancing for HPC applications. In IEEE CLUSTER. 1\u20138."},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_3_31_2","unstructured":"B. Mohammadi and O. Pironneau. 1993. Analysis of the k-epsilon turbulence model. France: Editions MASSON."},{"key":"e_1_3_3_32_2","first-page":"45","volume-title":"AsiaSim","author":"Moriyama T.","year":"2018","unstructured":"T. Moriyama, G. De Magistris, M. Tatsubori, T.-H. Pham, A. Munawar, and R. Tachibana. 2018. Reinforcement learning testbed for power-consumption optimization. In AsiaSim. 45\u201359."},{"key":"e_1_3_3_33_2","first-page":"57","volume-title":"ACM e-Energy","author":"Nagarathinam S.","year":"2020","unstructured":"S. Nagarathinam, V. Menon, A. Vasan, and A. Sivasubramaniam. 2020. Marco-multi-agent reinforcement learning based control of building HVAC systems. In ACM e-Energy. 57\u201367."},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.5555\/201033"},{"key":"e_1_3_3_35_2","article-title":"PyTorch: An imperative style, high-performance deep learning library","volume":"32","author":"Paszke A.","year":"2019","unstructured":"A. Paszke, F. Gross, S. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, and A. Desmaison. 2019. PyTorch: An imperative style, high-performance deep learning library. NeurIPS 32 (2019).","journal-title":"NeurIPS"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-55754-6_6"},{"key":"e_1_3_3_37_2","first-page":"39","volume-title":"IEEE SEMI-THERM","author":"Radmehr A.","year":"2013","unstructured":"A. Radmehr, B. Noll, J. Fitzpatrick, and K. Karki. 2013. CFD modeling of an existing raised-floor data center. In IEEE SEMI-THERM. 39\u201344."},{"key":"e_1_3_3_38_2","first-page":"645","volume-title":"IEEE ICDCS","author":"Ran Y.","year":"2019","unstructured":"Y. Ran, H. Hu, X. Zhou, and Y. Wen. 2019. DeepEE: Joint optimization of job scheduling and cooling control for data center energy efficiency using deep reinforcement learning. In IEEE ICDCS. 645\u2013655."},{"issue":"7","key":"e_1_3_3_39_2","doi-asserted-by":"crossref","DOI":"10.1115\/1.4000978","article-title":"Proper orthogonal decomposition for reduced order thermal modeling of air cooled data centers","volume":"132","author":"Samadiani E.","year":"2010","unstructured":"E. Samadiani and Y. Joshi. 2010. Proper orthogonal decomposition for reduced order thermal modeling of air cooled data centers. J. Heat Transf. 132, 7 (2010).","journal-title":"J. Heat Transf."},{"issue":"4","key":"e_1_3_3_40_2","doi-asserted-by":"crossref","DOI":"10.1115\/1.4004011","article-title":"Reduced order thermal modeling of data centers via distributed sensor data","volume":"134","author":"Samadiani E.","year":"2012","unstructured":"E. Samadiani, Y. Joshi, H. Hamann, M. K. Iyengar, S. Kamalsy, and J. Lacey. 2012. Reduced order thermal modeling of data centers via distributed sensor data. J. Heat Transf. 134, 4 (2012).","journal-title":"J. Heat Transf."},{"key":"e_1_3_3_41_2","doi-asserted-by":"crossref","unstructured":"A. Shehabi S. Smith D. Sartor R. Brown M. Herrlin J. Koomey E. Masanet N. Horner I. Azevedo and W. Lintner. 2016. US data center energy usage report. Tech. Rep.","DOI":"10.2172\/1372902"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature24270"},{"key":"e_1_3_3_43_2","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1145\/3360322.3360845","volume-title":"ACM BuildSys","author":"Le D. Van","year":"2019","unstructured":"D. Van Le, Y. Liu, R. Wang, R. Tan, Y. W. Wong, and Y. Wen. 2019. Control of air free-cooled data centers in tropics via deep reinforcement learning. In ACM BuildSys. 306\u2013315."},{"key":"e_1_3_3_44_2","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1145\/3408308.3427607","volume-title":"ACM BuildSys","author":"Wang R.","year":"2020","unstructured":"R. Wang, D. Van Le, R. Tan, Y. W. Wong, and Y. Wen. 2020. Real-time cooling power attribution for co-located data center rooms with distinct temperatures. In ACM BuildSys. 190\u2013199."},{"key":"e_1_3_3_45_2","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1145\/3408308.3427982","volume-title":"ACM BuildSys","author":"Wang R.","year":"2020","unstructured":"R. Wang, X. Zhou, L. Dong, Y. Wen, R. Tan, L. Chen, G. Wang, and F. Zeng. 2020. Kalibre: Knowledge-based neural surrogate model calibration for data center digital twins. In ACM BuildSys. 200\u2013209."},{"key":"e_1_3_3_46_2","first-page":"2192","volume-title":"ACC","author":"Wang Y. G.","year":"2001","unstructured":"Y. G. Wang, Z. G. Shi, and W. J. Cai. 2001. PID autotuner and its application in HVAC systems. In ACC. IEEE, 2192\u20132196."},{"key":"e_1_3_3_47_2","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1145\/3360322.3360861","volume-title":"ACM BuildSys","author":"Zhang C.","year":"2019","unstructured":"C. Zhang, S. Kuppannagari, R. Kannan, and V. Prasanna. 2019. Building HVAC scheduling using reinforcement learning via neural network based model approximation. In ACM BuildSys. 287\u2013296."}],"container-title":["ACM Transactions on Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582577","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3582577","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:13Z","timestamp":1750183753000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582577"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,30]]},"references-count":46,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,4,30]]}},"alternative-id":["10.1145\/3582577"],"URL":"https:\/\/doi.org\/10.1145\/3582577","relation":{},"ISSN":["2378-962X","2378-9638"],"issn-type":[{"value":"2378-962X","type":"print"},{"value":"2378-9638","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,30]]},"assertion":[{"value":"2022-07-28","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-07","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}