{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T01:42:57Z","timestamp":1780969377754,"version":"3.54.1"},"reference-count":97,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2024,6,13]],"date-time":"2024-06-13T00:00:00Z","timestamp":1718236800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,13]],"date-time":"2024-06-13T00:00:00Z","timestamp":1718236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100008530","name":"European Regional Development Fund","doi-asserted-by":"publisher","award":["SE21_UGR_IFMIF-DONES"],"award-info":[{"award-number":["SE21_UGR_IFMIF-DONES"]}],"id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008530","name":"European Regional Development Fund","doi-asserted-by":"publisher","award":["A-TIC-244-UGR20"],"award-info":[{"award-number":["A-TIC-244-UGR20"]}],"id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011011","name":"Junta de Andaluc\u00eda","doi-asserted-by":"publisher","award":["SE21_UGR_IFMIF-DONES"],"award-info":[{"award-number":["SE21_UGR_IFMIF-DONES"]}],"id":[{"id":"10.13039\/501100011011","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011011","name":"Junta de Andaluc\u00eda","doi-asserted-by":"publisher","award":["A-TIC-244-UGR20"],"award-info":[{"award-number":["A-TIC-244-UGR20"]}],"id":[{"id":"10.13039\/501100011011","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["MIA.2021.M04.0008"],"award-info":[{"award-number":["MIA.2021.M04.0008"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006393","name":"Universidad de Granada","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006393","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Heating, ventilation, and air conditioning (HVAC) systems are a major driver of energy consumption in commercial and residential buildings. Recent studies have shown that Deep Reinforcement Learning (DRL) algorithms can outperform traditional reactive controllers. However, DRL-based solutions are generally designed for ad hoc setups and lack standardization for comparison. To fill this gap, this paper provides a critical and reproducible evaluation, in terms of comfort and energy consumption, of several state-of-the-art DRL algorithms for HVAC control. The study examines the controllers\u2019 robustness, adaptability, and trade-off between optimization goals by using the S<jats:sc>inergym<\/jats:sc> framework. The results obtained confirm the potential of DRL algorithms, such as SAC and TD3, in complex scenarios and reveal several challenges related to generalization and incremental learning.<\/jats:p>","DOI":"10.1007\/s10462-024-10819-x","type":"journal-article","created":{"date-parts":[[2024,6,13]],"date-time":"2024-06-13T08:01:51Z","timestamp":1718265711000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":43,"title":["An experimental evaluation of deep reinforcement learning algorithms for HVAC control"],"prefix":"10.1007","volume":"57","author":[{"given":"Antonio","family":"Manjavacas","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alejandro","family":"Campoy-Nieves","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Javier","family":"Jim\u00e9nez-Raboso","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Miguel","family":"Molina-Solana","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Juan","family":"G\u00f3mez-Romero","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,6,13]]},"reference":[{"key":"10819_CR1","doi-asserted-by":"publisher","unstructured":"Agarwal R, Schwarzer M, Castro PS, Courville A, Bellemare MG (2021) Deep reinforcement learning at the edge of the statistical precipice. In: Advances in neural information processing systems. https:\/\/doi.org\/10.48550\/arXiv.2108.13264","DOI":"10.48550\/arXiv.2108.13264"},{"key":"10819_CR2","unstructured":"ASHRAE (2004) ASHRAE: ASHRAE 55-2004: thermal environmental conditions for human occupancy. ASHRAE"},{"key":"10819_CR3","unstructured":"ASHRAE (2016) ASHRAE: ASHRAE TC9.9: data center power equipment thermal guidelines and best practices systems. ASHRAE"},{"key":"10819_CR4","unstructured":"ASHRAE (2021) ASHRAE: guideline 36-2021: high performance sequences of operation for HVAC systems. ASHRAE"},{"key":"10819_CR5","doi-asserted-by":"publisher","DOI":"10.1016\/j.egyai.2020.100020","author":"D Azuatalam","year":"2020","unstructured":"Azuatalam D, Lee W-L, de Nijs F, Liebman A (2020) Reinforcement learning for whole-building HVAC control and demand response. Energy AI. https:\/\/doi.org\/10.1016\/j.egyai.2020.100020","journal-title":"Energy AI"},{"key":"10819_CR6","doi-asserted-by":"publisher","unstructured":"Barrett E, Linder S (2015) Autonomous HVAC control, a reinforcement learning approach. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 3\u201319. https:\/\/doi.org\/10.1007\/978-3-319-23461-8_1","DOI":"10.1007\/978-3-319-23461-8_1"},{"issue":"1","key":"10819_CR7","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1080\/1350486042000271638","volume":"12","author":"FE Benth","year":"2005","unstructured":"Benth FE, \u0160altyt\u0117-Benth J (2005) Stochastic modelling of temperature variations with a view towards weather derivatives. Appl Math Financ 12(1):53\u201385. https:\/\/doi.org\/10.1080\/1350486042000271638","journal-title":"Appl Math Financ"},{"key":"10819_CR8","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2021.117164","author":"M Biemann","year":"2021","unstructured":"Biemann M, Scheller F, Liu X, Huang L (2021) Experimental evaluation of model-free reinforcement learning algorithms for continuous HVAC control. Appl Energy. https:\/\/doi.org\/10.1016\/j.apenergy.2021.117164","journal-title":"Appl Energy"},{"issue":"5","key":"10819_CR9","doi-asserted-by":"publisher","first-page":"586","DOI":"10.1080\/19401493.2021.1986574","volume":"14","author":"D Blum","year":"2021","unstructured":"Blum D, Arroyo J, Huang S, Drgo\u0148a J, Jorissen F, Walnum HT, Chen Y, Benne K, Vrabie D, Wetter M et al (2021) Building optimization testing framework (BOPTEST) for simulation-based benchmarking of control strategies in buildings. J Build Perform Simul 14(5):586\u2013610. https:\/\/doi.org\/10.1080\/19401493.2021.1986574","journal-title":"J Build Perform Simul"},{"issue":"2","key":"10819_CR10","doi-asserted-by":"publisher","first-page":"818","DOI":"10.1007\/s40435-020-00665-4","volume":"9","author":"RP Borase","year":"2021","unstructured":"Borase RP, Maghade D, Sondkar S, Pawar S (2021) A review of PID control, tuning methods and applications. Int J Dyn Control 9(2):818\u2013827. https:\/\/doi.org\/10.1007\/s40435-020-00665-4","journal-title":"Int J Dyn Control"},{"key":"10819_CR11","doi-asserted-by":"publisher","first-page":"110225","DOI":"10.1016\/j.enbuild.2020.110225","volume":"224","author":"S Brandi","year":"2020","unstructured":"Brandi S, Piscitelli MS, Martellacci M, Capozzoli A (2020) Deep reinforcement learning to optimise indoor temperature control and heating energy consumption in buildings. Energy Build 224:110225. https:\/\/doi.org\/10.1016\/j.enbuild.2020.110225","journal-title":"Energy Build"},{"key":"10819_CR12","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2022.104128","author":"S Brandi","year":"2022","unstructured":"Brandi S, Fiorentini M, Capozzoli A (2022) Comparison of online and offline deep reinforcement learning with model predictive control for thermal energy management. Autom Constr. https:\/\/doi.org\/10.1016\/j.autcon.2022.104128","journal-title":"Autom Constr"},{"key":"10819_CR13","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/j.enbuild.2018.03.051","volume":"169","author":"Y Chen","year":"2018","unstructured":"Chen Y, Norford LK, Samuelson HW, Malkawi A (2018) Optimal control of HVAC and window systems for natural ventilation through reinforcement learning. Energy Build 169:195\u2013205. https:\/\/doi.org\/10.1016\/j.enbuild.2018.03.051","journal-title":"Energy Build"},{"issue":"4","key":"10819_CR14","doi-asserted-by":"publisher","first-page":"997","DOI":"10.3390\/en14040997","volume":"14","author":"D Coraci","year":"2021","unstructured":"Coraci D, Brandi S, Piscitelli MS, Capozzoli A (2021) Online implementation of a soft actor-critic agent to enhance indoor temperature control and energy efficiency in buildings. Energies 14(4):997. https:\/\/doi.org\/10.3390\/en14040997","journal-title":"Energies"},{"key":"10819_CR15","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/j.segan.2016.02.002","volume":"6","author":"GT Costanzo","year":"2016","unstructured":"Costanzo GT, Iacovella S, Ruelens F, Leurs T, Claessens BJ (2016) Experimental analysis of data-driven control for a building heating system. Sustain Energy Grids Netw 6:81\u201390. https:\/\/doi.org\/10.1016\/j.segan.2016.02.002","journal-title":"Sustain Energy Grids Netw"},{"key":"10819_CR16","doi-asserted-by":"publisher","DOI":"10.1016\/j.buildenv.2021.108680","author":"X Deng","year":"2022","unstructured":"Deng X, Zhang Y, Qi H (2022) Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning. Build Environ. https:\/\/doi.org\/10.1016\/j.buildenv.2021.108680","journal-title":"Build Environ"},{"key":"10819_CR17","doi-asserted-by":"publisher","unstructured":"Ding X, Du W, Cerpa AE (2020) MB2C: model-based deep reinforcement learning for multi-zone building control. In: Proceedings of the 7th ACM international conference on systems for energy-efficient buildings, cities, and transportation. pp 50\u201359. https:\/\/doi.org\/10.1145\/3408308.3427986","DOI":"10.1145\/3408308.3427986"},{"key":"10819_CR18","doi-asserted-by":"publisher","first-page":"106959","DOI":"10.1016\/j.epsr.2020.106959","volume":"192","author":"Y Du","year":"2021","unstructured":"Du Y, Li F, Munk J, Kurte K, Kotevska O, Amasyali K, Zandi H (2021) Multi-task deep reinforcement learning for intelligent multi-zone residential HVAC control. Electr Power Syst Res 192:106959","journal-title":"Electr Power Syst Res"},{"key":"10819_CR19","doi-asserted-by":"publisher","unstructured":"Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. https:\/\/doi.org\/10.48550\/arXiv.1604.06778","DOI":"10.48550\/arXiv.1604.06778"},{"key":"10819_CR20","doi-asserted-by":"publisher","unstructured":"Efheij H, Albagul A, Albraiki NA (2019) Comparison of model predictive control and PID controller in real time process control system. In: 2019 19th international conference on sciences and techniques of automatic control and computer engineering (STA). IEEE, pp 64\u201369. https:\/\/doi.org\/10.1109\/STA.2019.8717271","DOI":"10.1109\/STA.2019.8717271"},{"issue":"6","key":"10819_CR21","doi-asserted-by":"publisher","first-page":"675","DOI":"10.3233\/AIS-140288","volume":"6","author":"P Fazenda","year":"2014","unstructured":"Fazenda P, Veeramachaneni K, Lima P, O\u2019Reilly U-M (2014) Using reinforcement learning to optimize occupant comfort and energy usage in HVAC systems. J Ambient Intell Smart Environ 6(6):675\u2013690. https:\/\/doi.org\/10.3233\/AIS-140288","journal-title":"J Ambient Intell Smart Environ"},{"key":"10819_CR22","doi-asserted-by":"publisher","unstructured":"Findeis A, Kazhamiaka F, Jeen S, Keshav S (2022) Beobench: a toolkit for unified access to building simulations for reinforcement learning. In: Proceedings of the thirteenth ACM international conference on future energy systems. pp 374\u2013382. https:\/\/doi.org\/10.1145\/3538637.3538866","DOI":"10.1145\/3538637.3538866"},{"issue":"4","key":"10819_CR23","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1016\/S1364-6613(99)01294-2","volume":"3","author":"RM French","year":"1999","unstructured":"French RM (1999) Catastrophic forgetting in connectionist networks. Trends Cogn Sci 3(4):128\u2013135. https:\/\/doi.org\/10.1016\/S1364-6613(99)01294-2","journal-title":"Trends Cogn Sci"},{"key":"10819_CR24","doi-asserted-by":"publisher","first-page":"130845","DOI":"10.1109\/ACCESS.2021.3114161","volume":"9","author":"C Fu","year":"2021","unstructured":"Fu C, Zhang Y (2021) Research and application of predictive control method based on deep reinforcement learning for HVAC systems. IEEE Access 9:130845\u2013130852. https:\/\/doi.org\/10.1109\/ACCESS.2021.3114161","journal-title":"IEEE Access"},{"key":"10819_CR25","doi-asserted-by":"publisher","DOI":"10.1016\/j.jobe.2022.104165","author":"Q Fu","year":"2022","unstructured":"Fu Q, Han Z, Chen J, Lu Y, Wu H, Wang Y (2022a) Applications of reinforcement learning for building energy efficiency control: a review. J Build Eng. https:\/\/doi.org\/10.1016\/j.jobe.2022.104165","journal-title":"J Build Eng"},{"key":"10819_CR26","doi-asserted-by":"publisher","first-page":"112284","DOI":"10.1016\/j.enbuild.2022.112284","volume":"270","author":"Q Fu","year":"2022","unstructured":"Fu Q, Chen X, Ma S, Fang N, Xing B, Chen J (2022b) Optimal control method of HVAC based on multi-agent deep reinforcement learning. Energy Build 270:112284","journal-title":"Energy Build"},{"key":"10819_CR27","doi-asserted-by":"publisher","unstructured":"Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning. pp 1582\u20131591. https:\/\/doi.org\/10.48550\/arXiv.1802.09477","DOI":"10.48550\/arXiv.1802.09477"},{"issue":"9","key":"10819_CR28","doi-asserted-by":"publisher","first-page":"8472","DOI":"10.1109\/JIOT.2020.2992117","volume":"7","author":"G Gao","year":"2020","unstructured":"Gao G, Li J, Wen Y (2020) Deepcomfort: energy-efficient thermal comfort control in buildings via reinforcement learning. IEEE Internet Things J 7(9):8472\u20138484. https:\/\/doi.org\/10.1109\/JIOT.2020.2992117","journal-title":"IEEE Internet Things J"},{"key":"10819_CR29","doi-asserted-by":"publisher","unstructured":"Geng G, Geary G (1993) On performance and tuning of PID controllers in HVAC systems. In: Proceedings of IEEE international conference on control and applications. IEEE, pp 819\u2013824 . https:\/\/doi.org\/10.1109\/CCA.1993.348229","DOI":"10.1109\/CCA.1993.348229"},{"key":"10819_CR30","doi-asserted-by":"publisher","first-page":"930","DOI":"10.1016\/j.apenergy.2015.12.115","volume":"165","author":"A Ghahramani","year":"2016","unstructured":"Ghahramani A, Zhang K, Dutta K, Yang Z, Becerik-Gerber B (2016) Energy savings from temperature setpoints and deadband: quantifying the influence of building and system properties on savings. Appl Energy 165:930\u2013942. https:\/\/doi.org\/10.1016\/j.apenergy.2015.12.115","journal-title":"Appl Energy"},{"key":"10819_CR31","doi-asserted-by":"publisher","first-page":"102480","DOI":"10.1016\/j.scs.2020.102480","volume":"63","author":"M Gholamzadehmir","year":"2020","unstructured":"Gholamzadehmir M, Del Pero C, Buffa S, Fedrizzi R et al (2020) Adaptive-predictive control strategy for HVAC systems in smart buildings: a review. Sustain Cities Soc 63:102480. https:\/\/doi.org\/10.1016\/j.scs.2020.102480","journal-title":"Sustain Cities Soc"},{"issue":"7587","key":"10819_CR32","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1038\/529445a","volume":"529","author":"E Gibney","year":"2016","unstructured":"Gibney E et al (2016) Google AI algorithm masters ancient game of Go. Nature 529(7587):445\u2013446. https:\/\/doi.org\/10.1038\/529445a","journal-title":"Nature"},{"key":"10819_CR33","unstructured":"GlobalABC (2021) Global alliance for buildings and construction: global status report for buildings and construction: towards a zero-emission, efficient and resilient buildings and construction sector. GlobalABC. https:\/\/www.unep.org\/resources\/report\/2021-global-status-report-buildings-and-construction"},{"key":"10819_CR34","doi-asserted-by":"publisher","first-page":"38748","DOI":"10.1109\/ACCESS.2019.2906311","volume":"7","author":"J Gomez-Romero","year":"2019","unstructured":"Gomez-Romero J, Fernandez-Basso CJ, Cambronero MV, Molina-Solana M, Campana JR, Ruiz MD, Martin-Bautista MJ (2019) A probabilistic algorithm for predictive control with full-complexity models in non-residential buildings. IEEE Access 7:38748\u201338765","journal-title":"IEEE Access"},{"issue":"7","key":"10819_CR35","doi-asserted-by":"publisher","first-page":"4715","DOI":"10.1007\/s11831-021-09552-3","volume":"28","author":"S Gupta","year":"2021","unstructured":"Gupta S, Singal G, Garg D (2021) Deep reinforcement learning techniques in diversified domains: a survey. Arch Comput Methods Eng 28(7):4715\u20134754. https:\/\/doi.org\/10.1007\/s11831-021-09552-3","journal-title":"Arch Comput Methods Eng"},{"key":"10819_CR36","doi-asserted-by":"publisher","unstructured":"Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning. PMLR, pp 1861\u20131870. https:\/\/doi.org\/10.48550\/arXiv.1801.01290","DOI":"10.48550\/arXiv.1801.01290"},{"key":"10819_CR37","doi-asserted-by":"publisher","DOI":"10.1016\/j.scs.2019.101748","author":"M Han","year":"2019","unstructured":"Han M, May R, Zhang X, Wang X, Pan S, Yan D, Jin Y, Xu L (2019) A review of reinforcement learning methodologies for controlling occupant comfort in buildings. Sustain Cities Soc. https:\/\/doi.org\/10.1016\/j.scs.2019.101748","journal-title":"Sustain Cities Soc"},{"key":"10819_CR38","unstructured":"IEA (2021) International Energy Agency: tracking buildings. IEA. https:\/\/www.iea.org\/reports\/tracking-buildings-2021"},{"key":"10819_CR39","doi-asserted-by":"publisher","unstructured":"Islam R, Henderson P, Gomrokchi M, Precup D (2017) Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. https:\/\/doi.org\/10.48550\/arXiv.1708.04133","DOI":"10.48550\/arXiv.1708.04133"},{"key":"10819_CR40","doi-asserted-by":"publisher","unstructured":"Jim\u00e9nez-Raboso J, Campoy-Nieves A, Manjavacas-Lucas A, G\u00f3mez-Romero J, Molina-Solana M (2021) Sinergym: a building simulation and control framework for training reinforcement learning agents. In: Proceedings of the 8th ACM international conference on systems for energy-efficient buildings, cities, and transportation. Association for Computing Machinery, New York, USA, pp 319\u2013323. https:\/\/doi.org\/10.1145\/3486611.3488729","DOI":"10.1145\/3486611.3488729"},{"key":"10819_CR41","doi-asserted-by":"publisher","first-page":"100131","DOI":"10.1016\/j.segy.2024.100131","volume":"13","author":"K Kadamala","year":"2024","unstructured":"Kadamala K, Chambers D, Barrett E (2024) Enhancing HVAC control systems through transfer learning with deep reinforcement learning agents. Smart Energy 13:100131","journal-title":"Smart Energy"},{"key":"10819_CR42","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/2042\/1\/012037","author":"A K\u00fcmpel","year":"2021","unstructured":"K\u00fcmpel A, Stoffel P, M\u00fcller D (2021) Self-adjusting model predictive control for modular subsystems in HVAC systems. J Phys Conf Ser. https:\/\/doi.org\/10.1088\/1742-6596\/2042\/1\/012037","journal-title":"J Phys Conf Ser"},{"key":"10819_CR43","doi-asserted-by":"publisher","first-page":"5699","DOI":"10.1109\/ACCESS.2019.2963502","volume":"8","author":"J Leitao","year":"2020","unstructured":"Leitao J, Gil P, Ribeiro B, Cardoso A (2020) A survey on home energy management. IEEE Access 8:5699\u20135722. https:\/\/doi.org\/10.1109\/ACCESS.2019.2963502","journal-title":"IEEE Access"},{"issue":"5","key":"10819_CR44","doi-asserted-by":"publisher","first-page":"2002","DOI":"10.48550\/arXiv.1709.05077","volume":"50","author":"Y Li","year":"2019","unstructured":"Li Y, Wen Y, Tao D, Guan K (2019) Transforming cooling optimization for green data center via deep reinforcement learning. IEEE Trans Cybern 50(5):2002\u20132013. https:\/\/doi.org\/10.48550\/arXiv.1709.05077","journal-title":"IEEE Trans Cybern"},{"key":"10819_CR45","doi-asserted-by":"publisher","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv Preprint. https:\/\/doi.org\/10.48550\/arXiv.1509.02971","DOI":"10.48550\/arXiv.1509.02971"},{"key":"10819_CR46","doi-asserted-by":"publisher","DOI":"10.1016\/j.egyai.2020.100043","author":"P Lissa","year":"2021","unstructured":"Lissa P, Deane C, Schukat M, Seri F, Keane M, Barrett E (2021) Deep reinforcement learning for home energy management system control. Energy AI. https:\/\/doi.org\/10.1016\/j.egyai.2020.100043","journal-title":"Energy AI"},{"key":"10819_CR47","doi-asserted-by":"publisher","first-page":"300","DOI":"10.1016\/j.compeleceng.2019.07.019","volume":"78","author":"K Mason","year":"2019","unstructured":"Mason K, Grijalva S (2019) A review of reinforcement learning for autonomous building energy management. Comput Electr Eng 78:300\u2013312. https:\/\/doi.org\/10.1016\/j.compeleceng.2019.07.019","journal-title":"Comput Electr Eng"},{"key":"10819_CR48","doi-asserted-by":"publisher","DOI":"10.1016\/j.energy.2021.120436","author":"VJ Mawson","year":"2021","unstructured":"Mawson VJ, Hughes BR (2021) Optimisation of HVAC control and manufacturing schedules for the reduction of peak energy demand in the manufacturing sector. Energy. https:\/\/doi.org\/10.1016\/j.energy.2021.120436","journal-title":"Energy"},{"key":"10819_CR49","doi-asserted-by":"publisher","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv Preprint. https:\/\/doi.org\/10.48550\/arXiv.1312.5602","DOI":"10.48550\/arXiv.1312.5602"},{"issue":"7540","key":"10819_CR50","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533. https:\/\/doi.org\/10.1038\/nature14236","journal-title":"Nature"},{"key":"10819_CR51","doi-asserted-by":"publisher","unstructured":"Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. PMLR, pp 1928\u20131937. https:\/\/doi.org\/10.48550\/arXiv.1602.01783","DOI":"10.48550\/arXiv.1602.01783"},{"issue":"4","key":"10819_CR52","doi-asserted-by":"publisher","first-page":"3698","DOI":"10.1109\/TSG.2018.2834219","volume":"10","author":"E Mocanu","year":"2018","unstructured":"Mocanu E, Mocanu DC, Nguyen PH, Liotta A, Webber ME, Gibescu M, Slootweg JG (2018) On-line building energy optimization using deep reinforcement learning. IEEE Trans Smart Grid 10(4):3698\u20133708. https:\/\/doi.org\/10.1109\/TSG.2018.2834219","journal-title":"IEEE Trans Smart Grid"},{"issue":"4\u20135","key":"10819_CR53","doi-asserted-by":"publisher","first-page":"667","DOI":"10.1016\/S0098-1354(98)00301-9","volume":"23","author":"M Morari","year":"1999","unstructured":"Morari M, Lee JH (1999) Model predictive control: past, present and future. Comput Chem Eng 23(4\u20135):667\u2013682. https:\/\/doi.org\/10.1016\/S0098-1354(98)00301-9","journal-title":"Comput Chem Eng"},{"key":"10819_CR54","doi-asserted-by":"publisher","unstructured":"Morinibu, T, Noda T, Shota T (2019) Application of deep reinforcement learning in residential preconditioning for radiation temperature. In: 2019 8th international congress on advanced applied informatics (IIAI-AAI). IEEE, pp 561\u2013566. https:\/\/doi.org\/10.1109\/IIAI-AAI.2019.00120","DOI":"10.1109\/IIAI-AAI.2019.00120"},{"key":"10819_CR55","doi-asserted-by":"publisher","unstructured":"Moriyama T, Magistris GD, Tatsubori M, Pham T-H, Munawar A, Tachibana R (2018) Reinforcement learning testbed for power-consumption optimization. In: Asian simulation conference. Springer, pp 45\u201359. https:\/\/doi.org\/10.1007\/978-981-13-2853-4_4","DOI":"10.1007\/978-981-13-2853-4_4"},{"key":"10819_CR56","unstructured":"Mozer MC (1998) The neural network house: an environment hat adapts to its inhabitants. In: Proceedings of the AAAI spring symposium on intelligent environments, vol 58. pp 110\u2013114"},{"key":"10819_CR57","doi-asserted-by":"crossref","unstructured":"Nagarathinam S, Menon V, Vasan A, Sivasubramaniam A (2020) Marco-multi-agent reinforcement learning based control of building HVAC systems. In: Proceedings of the eleventh ACM international conference on future energy systems. pp 57\u201367","DOI":"10.1145\/3396851.3397694"},{"key":"10819_CR58","doi-asserted-by":"publisher","DOI":"10.1016\/j.rser.2020.110618","author":"A Perera","year":"2021","unstructured":"Perera A, Kamalaruban P (2021) Applications of reinforcement learning in energy systems. Renew Sustain Energy Rev. https:\/\/doi.org\/10.1016\/j.rser.2020.110618","journal-title":"Renew Sustain Energy Rev"},{"issue":"3","key":"10819_CR59","doi-asserted-by":"publisher","first-page":"394","DOI":"10.1016\/j.enbuild.2007.03.007","volume":"40","author":"L P\u00e9rez-Lombard","year":"2008","unstructured":"P\u00e9rez-Lombard L, Ortiz J, Pout C (2008) A review on buildings energy consumption information. Energy Build 40(3):394\u2013398. https:\/\/doi.org\/10.1016\/j.enbuild.2007.03.007","journal-title":"Energy Build"},{"key":"10819_CR60","doi-asserted-by":"publisher","first-page":"120725","DOI":"10.1016\/j.energy.2021.120725","volume":"229","author":"G Pinto","year":"2021","unstructured":"Pinto G, Piscitelli MS, V\u00e1zquez-Canteli JR, Nagy Z, Capozzoli A (2021) Coordinated energy management for a cluster of buildings through deep reinforcement learning. Energy 229:120725","journal-title":"Energy"},{"key":"10819_CR61","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1016\/j.enbuild.2012.10.024","volume":"56","author":"S Privara","year":"2013","unstructured":"Privara S, Cigler J, V\u00e1\u0148a Z, Oldewurtel F, Sagerschnig C, \u017d\u00e1\u010dekov\u00e1 E (2013) Building modeling as a crucial part for building predictive control. Energy Build 56:8\u201322","journal-title":"Energy Build"},{"issue":"4","key":"10819_CR62","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1109\/TETCI.2020.2991728","volume":"4","author":"B Rajasekhar","year":"2020","unstructured":"Rajasekhar B, Tushar W, Lork C, Zhou Y, Yuen C, Pindoriya NM, Wood KL (2020) A survey of computational intelligence techniques for air-conditioners energy management. IEEE Trans Emerg Top Comput Intell 4(4):555\u2013570. https:\/\/doi.org\/10.1109\/TETCI.2020.2991728","journal-title":"IEEE Trans Emerg Top Comput Intell"},{"key":"10819_CR63","doi-asserted-by":"publisher","unstructured":"Raman NS, Devraj AM, Barooah P, Meyn SP (2020) Reinforcement learning for control of building HVAC systems. In: 2020 American control conference (ACC). IEEE, pp 2326\u20132332. https:\/\/doi.org\/10.23919\/ACC45564.2020.9147629","DOI":"10.23919\/ACC45564.2020.9147629"},{"key":"10819_CR64","doi-asserted-by":"publisher","unstructured":"Sakuma Y, Nishi H (2020) Airflow direction control of air conditioners using deep reinforcement learning. In: 2020 SICE international symposium on control systems (SICE ISCS). IEEE, pp 61\u201368. https:\/\/doi.org\/10.23919\/SICEISCS48470.2020.9083565","DOI":"10.23919\/SICEISCS48470.2020.9083565"},{"key":"10819_CR65","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1016\/j.enbuild.2016.09.044","volume":"133","author":"S Salakij","year":"2016","unstructured":"Salakij S, Yu N, Paolucci S, Antsaklis P (2016) Model-based predictive control for building energy management. I: energy modeling and optimal control. Energy Build 133:345\u2013358. https:\/\/doi.org\/10.1016\/j.enbuild.2016.09.044","journal-title":"Energy Build"},{"issue":"1","key":"10819_CR66","doi-asserted-by":"publisher","first-page":"90","DOI":"10.3182\/20050703-6-CZ-1902.01397","volume":"38","author":"TI Salsbury","year":"2005","unstructured":"Salsbury TI (2005) A survey of control technologies in the building automation industry. IFAC Proc Vol 38(1):90\u2013100. https:\/\/doi.org\/10.3182\/20050703-6-CZ-1902.01397","journal-title":"IFAC Proc Vol"},{"key":"10819_CR67","doi-asserted-by":"publisher","DOI":"10.3390\/app11083518","author":"P Scharnhorst","year":"2021","unstructured":"Scharnhorst P, Schubnel B, Fern\u00e1ndez Bandera C, Salom J, Taddeo P, Boegli M, Gorecki T, Stauffer Y, Peppas A, Politi C (2021) Energym: a building model library for controller benchmarking. Appl Sci. https:\/\/doi.org\/10.3390\/app11083518","journal-title":"Appl Sci"},{"key":"10819_CR68","doi-asserted-by":"publisher","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017a) Proximal policy optimization algorithms. arXiv Preprint. https:\/\/doi.org\/10.48550\/arXiv.1707.06347","DOI":"10.48550\/arXiv.1707.06347"},{"key":"10819_CR69","doi-asserted-by":"publisher","unstructured":"Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2017b) Trust region policy optimization. https:\/\/doi.org\/10.48550\/arXiv.1502.05477","DOI":"10.48550\/arXiv.1502.05477"},{"issue":"3","key":"10819_CR70","doi-asserted-by":"publisher","first-page":"631","DOI":"10.3390\/en11030631","volume":"11","author":"G Serale","year":"2018","unstructured":"Serale G, Fiorentini M, Capozzoli A, Bernardini D, Bemporad A (2018) Model predictive control (MPC) for enhancing building and HVAC system energy efficiency: problem formulation, applications and opportunities. Energies 11(3):631. https:\/\/doi.org\/10.3390\/en11030631","journal-title":"Energies"},{"key":"10819_CR71","doi-asserted-by":"publisher","DOI":"10.1109\/tnn.1998.712192","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge. https:\/\/doi.org\/10.1109\/tnn.1998.712192"},{"key":"10819_CR72","doi-asserted-by":"publisher","DOI":"10.5555\/1577069.1755839","author":"ME Taylor","year":"2009","unstructured":"Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res. https:\/\/doi.org\/10.5555\/1577069.1755839","journal-title":"J Mach Learn Res"},{"key":"10819_CR73","first-page":"242","volume-title":"Transfer learning. Handbook of research on machine learning applications and trends: algorithms. Methods, and techniques","author":"L Torrey","year":"2010","unstructured":"Torrey L, Shavlik J (2010) Transfer learning. Handbook of research on machine learning applications and trends: algorithms. Methods, and techniques. IGI Global, Hershey, pp 242\u2013264"},{"key":"10819_CR74","unstructured":"U.S. Department of Energy: Prototype Building Models | Building Energy Codes Program (2021). https:\/\/www.energycodes.gov\/prototype-building-models#Weather Accessed 29 Mar 2022"},{"key":"10819_CR75","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.buildenv.2019.03.038","volume":"155","author":"W Valladares","year":"2019","unstructured":"Valladares W, Galindo M, Guti\u00e9rrez J, Wu W-C, Liao K-K, Liao J-C, Lu K-C, Wang C-C (2019) Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning algorithm. Build Environ 155:105\u2013117. https:\/\/doi.org\/10.1016\/j.buildenv.2019.03.038","journal-title":"Build Environ"},{"key":"10819_CR76","doi-asserted-by":"publisher","first-page":"1072","DOI":"10.1016\/j.apenergy.2018.11.002","volume":"235","author":"JR V\u00e1zquez-Canteli","year":"2019","unstructured":"V\u00e1zquez-Canteli JR, Nagy Z (2019) Reinforcement learning for demand response: a review of algorithms and modeling techniques. Appl Energy 235:1072\u20131089. https:\/\/doi.org\/10.1016\/j.apenergy.2018.11.002","journal-title":"Appl Energy"},{"key":"10819_CR77","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/j.scs.2018.11.021","volume":"45","author":"JR V\u00e1zquez-Canteli","year":"2019","unstructured":"V\u00e1zquez-Canteli JR, Ulyanin S, K\u00e4mpf J, Nagy Z (2019) Fusing TensorFlow with building energy simulation for intelligent energy management in smart cities. Sustain Cities Soc 45:243\u2013257. https:\/\/doi.org\/10.1016\/j.scs.2018.11.021","journal-title":"Sustain Cities Soc"},{"key":"10819_CR78","doi-asserted-by":"crossref","unstructured":"Vazquez-Canteli JR, Henze G, Nagy Z (2020) MARLISA: multi-agent reinforcement learning with iterative sequential action selection for load shaping of grid-interactive connected buildings. In: Proceedings of the 7th ACM international conference on systems for energy-efficient buildings, cities, and transportation. pp 170\u2013179","DOI":"10.1145\/3408308.3427604"},{"key":"10819_CR79","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2020.115036","author":"Z Wang","year":"2020","unstructured":"Wang Z, Hong T (2020) Reinforcement learning for building controls: the opportunities and challenges. Appl Energy. https:\/\/doi.org\/10.1016\/j.apenergy.2020.115036","journal-title":"Appl Energy"},{"key":"10819_CR80","doi-asserted-by":"publisher","DOI":"10.3390\/pr5030046","author":"Y Wang","year":"2017","unstructured":"Wang Y, Velswamy K, Huang B (2017) A long-short term memory recurrent neural network based reinforcement learning controller for office heating ventilation and air conditioning systems. Processes. https:\/\/doi.org\/10.3390\/pr5030046","journal-title":"Processes"},{"key":"10819_CR81","doi-asserted-by":"publisher","DOI":"10.1016\/j.enbuild.2020.109791","author":"C Wang","year":"2020","unstructured":"Wang C, Pattawi K, Lee H (2020) Energy saving impact of occupancy-driven thermostat for residential buildings. Energy Build. https:\/\/doi.org\/10.1016\/j.enbuild.2020.109791","journal-title":"Energy Build"},{"issue":"3\u20134","key":"10819_CR82","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJ Watkins","year":"1992","unstructured":"Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3\u20134):279\u2013292","journal-title":"Mach Learn"},{"key":"10819_CR83","doi-asserted-by":"publisher","unstructured":"Wei T, Wang Y, Zhu Q (2017) Deep reinforcement learning for building HVAC control. In: Proceedings of the 54th annual design automation conference 2017. pp 1\u20136. https:\/\/doi.org\/10.1145\/3061639.3062224","DOI":"10.1145\/3061639.3062224"},{"key":"10819_CR84","doi-asserted-by":"publisher","unstructured":"W\u00f6lfle D, Vishwanath A, Schmeck H (2020) A guide for the design of benchmark environments for building energy optimization. In: Proceedings of the 7th ACM international conference on systems for energy-efficient buildings, cities, and transportation. pp 220\u2013229 . https:\/\/doi.org\/10.1145\/3408308.3427614","DOI":"10.1145\/3408308.3427614"},{"key":"10819_CR85","doi-asserted-by":"publisher","unstructured":"Xu S, Wang Y, Wang Y, O\u2019Neill Z, Zhu Q (2020) One for many: transfer learning for building HVAC control. In: Proceedings of the 7th ACM international conference on systems for energy-efficient buildings, cities, and transportation. pp 230\u2013239. https:\/\/doi.org\/10.1145\/3408308.3427617","DOI":"10.1145\/3408308.3427617"},{"key":"10819_CR86","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1016\/j.arcontrol.2020.03.001","volume":"49","author":"T Yang","year":"2020","unstructured":"Yang T, Zhao L, Li W, Zomaya AY (2020) Reinforcement learning in sustainable energy and electric systems: a survey. Annu Rev Control 49:145\u2013163. https:\/\/doi.org\/10.1016\/j.arcontrol.2020.03.001","journal-title":"Annu Rev Control"},{"issue":"6","key":"10819_CR87","doi-asserted-by":"publisher","first-page":"2586","DOI":"10.1109\/TCST.2020.3047407","volume":"29","author":"Y Yang","year":"2021","unstructured":"Yang Y, Srinivasan S, Hu G, Spanos CJ (2021) Distributed control of multizone HVAC systems considering indoor air quality. IEEE Trans Control Syst Technol 29(6):2586\u20132597. https:\/\/doi.org\/10.1109\/TCST.2020.3047407","journal-title":"IEEE Trans Control Syst Technol"},{"key":"10819_CR88","doi-asserted-by":"publisher","DOI":"10.1016\/j.buildenv.2021.107952","author":"Y Yao","year":"2021","unstructured":"Yao Y, Shekhar DK (2021) State of the art review on model predictive control (MPC) in heating ventilation and air-conditioning (HVAC) field. Build Environ. https:\/\/doi.org\/10.1016\/j.buildenv.2021.107952","journal-title":"Build Environ"},{"key":"10819_CR89","doi-asserted-by":"publisher","DOI":"10.1016\/j.enbuild.2019.109420","author":"YR Yoon","year":"2019","unstructured":"Yoon YR, Moon HJ (2019) Performance based thermal comfort control (PTCC) using deep reinforcement learning for space cooling. Energy Build. https:\/\/doi.org\/10.1016\/j.enbuild.2019.109420","journal-title":"Energy Build"},{"issue":"1","key":"10819_CR90","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1109\/TSG.2020.3011739","volume":"12","author":"L Yu","year":"2020","unstructured":"Yu L, Sun Y, Xu Z, Shen C, Yue D, Jiang T, Guan X (2020) Multi-agent deep reinforcement learning for HVAC control in commercial buildings. IEEE Trans Smart Grid 12(1):407\u2013419. https:\/\/doi.org\/10.1109\/TSG.2020.3011739","journal-title":"IEEE Trans Smart Grid"},{"key":"10819_CR92","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2021.3078462","author":"L Yu","year":"2021","unstructured":"Yu L, Qin S, Zhang M, Shen C, Jiang T, Guan X (2021) A review of deep reinforcement learning for smart building energy management. IEEE Internet Things J. https:\/\/doi.org\/10.1109\/JIOT.2021.3078462","journal-title":"IEEE Internet Things J"},{"key":"10819_CR93","doi-asserted-by":"publisher","DOI":"10.3139\/9783446466081","volume-title":"Deep reinforcement learning in action","author":"A Zai","year":"2020","unstructured":"Zai A, Brown B (2020) Deep reinforcement learning in action. Manning Publications, Shelter Island"},{"key":"10819_CR94","doi-asserted-by":"publisher","unstructured":"Zhang H, Yu T (2020) Taxonomy of reinforcement learning algorithms. In: Deep reinforcement learning: fundamentals, research and applications. Springer, Singapore, pp 125\u2013133 (2020). https:\/\/doi.org\/10.1007\/978-981-15-4095-0_3","DOI":"10.1007\/978-981-15-4095-0_3"},{"issue":"3","key":"10819_CR95","doi-asserted-by":"publisher","first-page":"362","DOI":"10.17775\/CSEEJPES.2018.00520","volume":"4","author":"D Zhang","year":"2018","unstructured":"Zhang D, Han X, Deng C (2018) Review on the research and practice of deep learning and reinforcement learning in smart grids. CSEE J Power Energy Syst 4(3):362\u2013370. https:\/\/doi.org\/10.17775\/CSEEJPES.2018.00520","journal-title":"CSEE J Power Energy Syst"},{"key":"10819_CR96","doi-asserted-by":"publisher","first-page":"472","DOI":"10.1016\/j.enbuild.2019.07.029","volume":"199","author":"Z Zhang","year":"2019","unstructured":"Zhang Z, Chong A, Pan Y, Zhang C, Lam KP (2019a) Whole building energy model for HVAC optimal control: a practical framework based on deep reinforcement learning. Energy Build 199:472\u2013490. https:\/\/doi.org\/10.1016\/j.enbuild.2019.07.029","journal-title":"Energy Build"},{"key":"10819_CR97","doi-asserted-by":"publisher","unstructured":"Zhang C, Kuppannagari SR, Kannan R, Prasanna VK (2019b) Building HVAC scheduling using reinforcement learning via neural network based model approximation. In: Proceedings of the 6th ACM international conference on systems for energy-efficient buildings, cities, and transportation. pp 287\u2013296. https:\/\/doi.org\/10.1145\/3360322.3360861","DOI":"10.1145\/3360322.3360861"},{"key":"10819_CR98","doi-asserted-by":"publisher","DOI":"10.1016\/j.buildenv.2019.106535","author":"Z Zou","year":"2020","unstructured":"Zou Z, Yu X, Ergan S (2020) Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network. Build Environ. https:\/\/doi.org\/10.1016\/j.buildenv.2019.106535","journal-title":"Build Environ"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-024-10819-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-024-10819-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-024-10819-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T10:17:37Z","timestamp":1721038657000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-024-10819-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,13]]},"references-count":97,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["10819"],"URL":"https:\/\/doi.org\/10.1007\/s10462-024-10819-x","relation":{},"ISSN":["1573-7462"],"issn-type":[{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,13]]},"assertion":[{"value":"28 May 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 June 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial or non-financial interests that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"173"}}