{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,22]],"date-time":"2026-05-22T08:07:49Z","timestamp":1779437269673,"version":"3.53.1"},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,3,1]],"date-time":"2026-03-01T00:00:00Z","timestamp":1772323200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T00:00:00Z","timestamp":1776384000000},"content-version":"vor","delay-in-days":47,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Karlsruher Institut f\u00fcr Technologie (KIT)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["K\u00fcnstl Intell"],"published-print":{"date-parts":[[2026,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Heating systems account for a significant share of residential energy consumption, and rising energy prices call for intelligent, cost-aware control strategies. Traditional methods, such as rule-based or model predictive control (MPC), often require detailed system modeling or lack adaptability to dynamic price signals. This work explores the use of deep reinforcement learning (DRL) to control heat pumps in a way that balances occupant comfort with energy-cost minimization. We evaluate deep Q-network (DQN) and proximal policy optimization (PPO) methods across discrete and continuous action spaces. The agents are trained in simulation using real weather and electricity price data, with a model representing the thermal dynamics of the building. Short-term electricity price forecasts are included to enable anticipatory heating strategies. Reward functions combine price penalties with piecewise-linear or quadratic comfort penalties. Among the DRL variants, a DQN agent with discrete actions and a piecewise-linear comfort reward achieves the best overall trade-off between comfort and cost. MPC still performs best in absolute cost terms because it uses an exact model, while the DQN policy approaches MPC performance and retains the model-free, adaptive advantages of RL. The findings highlight the potential of DRL for adaptive and price-aware heating control without the need for detailed physical modeling.<\/jats:p>","DOI":"10.1007\/s13218-026-00908-0","type":"journal-article","created":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T06:23:32Z","timestamp":1776407012000},"page":"17-25","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Deep Reinforcement Learning for Price-Aware Building Heating Control"],"prefix":"10.1007","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1958-6094","authenticated-orcid":false,"given":"Qiong","family":"Huang","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7402-0770","authenticated-orcid":false,"given":"Adrian Till","family":"Assmuth","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3473-5545","authenticated-orcid":false,"given":"Felix","family":"Langner","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1607-9748","authenticated-orcid":false,"given":"Benjamin","family":"Sch\u00e4fer","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Veit","family":"Hagenmeyer","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,4,17]]},"reference":[{"issue":"4","key":"908_CR1","doi-asserted-by":"publisher","first-page":"472","DOI":"10.1080\/19401493.2020.1770861","volume":"13","author":"J Arroyo","year":"2020","unstructured":"Arroyo J, Spiessens F, Helsen L (2020) Identification of multi-zone grey-box building models for use in model predictive control. J Build Perform Simul 13(4):472\u2013486","journal-title":"J Build Perform Simul"},{"key":"908_CR2","doi-asserted-by":"publisher","DOI":"10.1016\/j.enbuild.2020.110225","volume":"224","author":"S Brandi","year":"2020","unstructured":"Brandi S, Piscitelli MS, Martellacci M, Capozzoli A (2020) Deep reinforcement learning to optimise indoor temperature control and heating energy consumption in buildings. Energy Build 224:110225. https:\/\/doi.org\/10.1016\/j.enbuild.2020.110225","journal-title":"Energy and Buildings"},{"key":"908_CR3","unstructured":"Bundesnetzagentur (2025) Smard. https:\/\/www.smard.de\/home\/downloadcenter\/download-marktdaten\/"},{"key":"908_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2022.120598","volume":"333","author":"D Coraci","year":"2023","unstructured":"Coraci D, Brandi S, Hong T, Capozzoli A (2023) Online transfer learning strategy for enhancing the scalability and deployment of deep reinforcement learning control in smart buildings. Appl Energy 333:120598. https:\/\/doi.org\/10.1016\/j.apenergy.2022.120598","journal-title":"Appl Energy"},{"key":"908_CR5","unstructured":"European\u00a0Commission JRC (2025) Photovolatic geographical information system. https:\/\/re.jrc.ec.europa.eu\/pvg_tools\/de\/tools.html"},{"key":"908_CR6","doi-asserted-by":"publisher","DOI":"10.1016\/j.energy.2023.126913","volume":"270","author":"G Han","year":"2023","unstructured":"Han G, Joo HJ, Lim HW, An YS, Lee WJ, Lee KH (2023) Data-driven heat pump operation strategy using rainbow deep reinforcement learning for significant reduction of electricity cost. Energy 270:126913. https:\/\/doi.org\/10.1016\/j.energy.2023.126913","journal-title":"Energy"},{"key":"908_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/j.segy.2024.100131","volume":"13","author":"K Kadamala","year":"2024","unstructured":"Kadamala K, Chambers D, Barrett E (2024) Enhancing hvac control systems through transfer learning with deep reinforcement learning agents. Smart Energy 13:100131. https:\/\/doi.org\/10.1016\/j.segy.2024.100131","journal-title":"Smart Energy"},{"key":"908_CR8","doi-asserted-by":"publisher","first-page":"527","DOI":"10.1016\/j.rser.2018.09.045","volume":"101","author":"P Kohlhepp","year":"2019","unstructured":"Kohlhepp P, Harb H, Wolisz H, Waczowicz S, M\u00fcller D, Hagenmeyer V (2019) Large-scale grid integration of residential thermal energy storages as demand-side flexibility resource: a review of international field studies. Renew Sustain Energy Rev 101:527\u2013547","journal-title":"Renew Sustain Energy Rev"},{"key":"908_CR9","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2022.120020","volume":"327","author":"L Langer","year":"2022","unstructured":"Langer L, Volling T (2022) A reinforcement learning approach to home energy management for modulating heat pumps and photovoltaic systems. Appl Energy 327:120020","journal-title":"Appl Energy"},{"key":"908_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.segy.2021.100044","volume":"3","author":"P Lissa","year":"2021","unstructured":"Lissa P, Schukat M, Keane M, Barrett E (2021) Transfer learning applied to drl-based heat pump control to leverage microgrid energy efficiency. Smart Energy 3:100044. https:\/\/doi.org\/10.1016\/j.segy.2021.100044","journal-title":"Smart Energy"},{"issue":"7540","key":"908_CR11","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533","journal-title":"Nature"},{"key":"908_CR12","doi-asserted-by":"publisher","DOI":"10.1016\/j.buildenv.2023.110435","volume":"241","author":"Z Nagy","year":"2023","unstructured":"Nagy Z, Henze G, Dey S, Arroyo J, Helsen L, Zhang X, Chen B, Amasyali K, Kurte K, Zamzam A et al (2023) Ten questions concerning reinforcement learning for building energy management. Build Environ 241:110435","journal-title":"Build Environ"},{"key":"908_CR13","doi-asserted-by":"crossref","unstructured":"Rohrer T, Frison L, Kaupenjohann L, Scharf K, Hergenr\u00f6ther E (2023) Deep reinforcement learning for heat pump control. In: science and information conference, Springer, pp 459\u2013471","DOI":"10.1007\/978-3-031-37717-4_29"},{"issue":"8","key":"908_CR14","doi-asserted-by":"publisher","first-page":"8300","DOI":"10.3390\/en8088300","volume":"8","author":"F Ruelens","year":"2015","unstructured":"Ruelens F, Iacovella S, Claessens BJ, Belmans R (2015) Learning agent for a heat-pump thermostat with a set-back strategy using model-free reinforcement learning. Energies 8(8):8300\u20138318. https:\/\/doi.org\/10.3390\/en8088300","journal-title":"Energies"},{"issue":"5","key":"908_CR15","doi-asserted-by":"publisher","first-page":"2149","DOI":"10.1109\/TSG.2016.2517211","volume":"8","author":"F Ruelens","year":"2017","unstructured":"Ruelens F, Claessens BJ, Vandael S, De Schutter B, Babu\u0161ka R, Belmans R (2017) Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Trans Smart Grid 8(5):2149\u20132159. https:\/\/doi.org\/10.1109\/TSG.2016.2517211","journal-title":"IEEE Transactions on Smart Grid"},{"key":"908_CR16","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2024.123688","volume":"371","author":"S Schmitz","year":"2024","unstructured":"Schmitz S, Brucke K, Kasturi P, Ansari E, Klement P (2024) Forecast-based and data-driven reinforcement learning for residential heat pump operation. Appl Energy 371:123688. https:\/\/doi.org\/10.1016\/j.apenergy.2024.123688","journal-title":"Appl Energy"},{"key":"908_CR17","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347"},{"key":"908_CR18","doi-asserted-by":"crossref","unstructured":"Urieli D, Stone P (2013) A learning agent for heat-pump thermostat control. In: proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp 1093\u20131100","DOI":"10.65109\/VFQO4752"},{"key":"908_CR19","doi-asserted-by":"publisher","DOI":"10.1016\/j.enbuild.2023.113811","volume":"303","author":"C Vallianos","year":"2024","unstructured":"Vallianos C, Candanedo J, Athienitis A (2024) Thermal modeling for control applications of 60,000 homes in north america using smart thermostat data. Energy Build 303:113811","journal-title":"Energy and Buildings"},{"key":"908_CR20","doi-asserted-by":"crossref","unstructured":"Wei T, Wang Y, Zhu Q (2017) Deep reinforcement learning for building hvac control. In: proceedings of the 54th annual design automation conference 2017, pp 1\u20136","DOI":"10.1145\/3061639.3062224"}],"container-title":["KI - K\u00fcnstliche Intelligenz"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13218-026-00908-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13218-026-00908-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13218-026-00908-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,22]],"date-time":"2026-05-22T07:50:41Z","timestamp":1779436241000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13218-026-00908-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3]]},"references-count":20,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,3]]}},"alternative-id":["908"],"URL":"https:\/\/doi.org\/10.1007\/s13218-026-00908-0","relation":{},"ISSN":["0933-1875","1610-1987"],"issn-type":[{"value":"0933-1875","type":"print"},{"value":"1610-1987","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3]]},"assertion":[{"value":"1 August 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 March 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 April 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}