{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T14:34:01Z","timestamp":1764686041303,"version":"3.46.0"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T00:00:00Z","timestamp":1760486400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T00:00:00Z","timestamp":1760486400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003130","name":"Fonds Wetenschappelijk Onderzoek","doi-asserted-by":"publisher","award":["1SD9523N"],"award-info":[{"award-number":["1SD9523N"]}],"id":[{"id":"10.13039\/501100003130","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100018693","name":"HORIZON EUROPE Framework Programme","doi-asserted-by":"publisher","award":["101093046","101093046"],"award-info":[{"award-number":["101093046","101093046"]}],"id":[{"id":"10.13039\/100018693","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Deploying deep reinforcement learning on resource-constrained devices remains a significant challenge due to the energy-intensive nature of the sequential decision-making process. Model compression can reduce the spatial (e.g. storage, memory) requirements of a policy network, but this does not always translate to a proportional increase in inference speed and computational efficiency. We introduce a novel temporal compression paradigm that improves the efficiency more directly, by reducing the number of predictions needed to complete a task. This method, based on policy distillation, allows a student model to learn when a change of action will be required by observing sequences of identical actions in the trajectories of an existing teacher model. At each decision, the student can then predict both an action and how many times to perform this action consecutively. This approach allows any existing policy for discrete action spaces to be optimized for energy efficiency through both spatial and temporal compression simultaneously. Experiments on devices ranging from a microcontroller and smartphone processor to a data centre GPU show how this method can decrease the average time it takes to predict an action by up to 13.5 times, compared to 4 times through spatial compression alone, while maintaining a similar average return as the original teacher. In practice, this allows complex models to be deployed on ultra-low-power devices, enabling them to conserve energy by remaining in sleep mode for longer periods, and still achieve high runtime and task performance.<\/jats:p>","DOI":"10.1007\/s10994-025-06889-9","type":"journal-article","created":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T19:45:10Z","timestamp":1760557510000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Temporal distillation: compressing a policy in space and time"],"prefix":"10.1007","volume":"114","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9849-5547","authenticated-orcid":false,"given":"Thomas","family":"Av\u00e9","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6091-294X","authenticated-orcid":false,"given":"Matthias","family":"Hutsebaut-Buysse","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4812-4841","authenticated-orcid":false,"given":"Kevin","family":"Mets","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,10,15]]},"reference":[{"key":"6889_CR1","unstructured":"Av\u00e9, T., Mets, K., De\u00a0Schepper, T., & Latr\u00e9, S. (2022). Quantization-aware policy distillation (QPD). In Deep reinforcement learning workshop, NEURIPS 2022 (pp. 1\u201315), 9 December 2022."},{"key":"6889_CR2","doi-asserted-by":"publisher","unstructured":"Av\u00e9, T., Soto, P., Camelo, M., Schepper, T. D., & Mets, K. (2024). Policy compression for low-power intelligent scaling in software-based network architectures. In NOMS 2024 IEEE network operations and management symposium (pp. 1\u20137), Seoul, Republic of Korea, 6\u201310 May 2024. IEEE. https:\/\/doi.org\/10.1109\/NOMS59830.2024.10575377","DOI":"10.1109\/NOMS59830.2024.10575377"},{"key":"6889_CR3","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1613\/jair.3912","volume":"47","author":"MG Bellemare","year":"2013","unstructured":"Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, 253\u2013279.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"6889_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.116995","volume":"201","author":"A Bertolini","year":"2022","unstructured":"Bertolini, A., Martins, M. S., Vieira, S. M., & Sousa, J. M. (2022). Power output optimization of electric vehicles smart charging hubs using deep reinforcement learning. Expert Systems with Applications, 201, Article 116995.","journal-title":"Expert Systems with Applications"},{"key":"6889_CR5","unstructured":"Biedenkapp, A., Rajan, R., Hutter, F., & Lindauer, M. (2021). Temporl: Learning when to act. In M.\u00a0Meila & T.\u00a0Zhang (Eds.), Proceedings of the 38th international conference on machine learning, ICML 2021, 18-24 july 2021, virtual event (Vol.\u00a0139, pp. 914\u2013924). PMLR. http:\/\/proceedings.mlr.press\/v139\/biedenkapp21a.html"},{"key":"6889_CR6","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1016\/j.future.2019.04.041","volume":"99","author":"F Bu","year":"2019","unstructured":"Bu, F., & Wang, X. (2019). A smart agriculture iot system based on deep reinforcement learning. Future Generation Computer Systems, 99, 500\u2013507.","journal-title":"Future Generation Computer Systems"},{"key":"6889_CR7","unstructured":"Chevalier-Boisvert, M., Dai, B., Towers, M., Perez-Vicente, R., Willems, L., Lahlou, S., & Terry, J. (2024). Minigrid & Miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. In Advances in neural information processing systems (Vol. 36, pp. 73383\u201373394)."},{"key":"6889_CR8","unstructured":"Czarnecki, W. M., Pascanu, R., Osindero, S., Jayakumar, S. M., Swirszcz, G., & Jaderberg, M. (2019). Distilling policy distillation. In K.\u00a0Chaudhuri & M.\u00a0Sugiyama (Eds.), The 22nd international conference on artificial intelligence and statistics, AISTATS 2019 (Vol.\u00a089, pp. 1331\u20131340), 16\u201318 April 2019, Naha, Okinawa, Japan. PMLR. http:\/\/proceedings.mlr.press\/v89\/czarnecki19a.html"},{"key":"6889_CR9","doi-asserted-by":"publisher","unstructured":"Desai, S. S., & Lee, S. (2021). Auxiliary tasks for efficient learning of point-goal navigation. In IEEE winter conference on applications of computer vision, WACV 2021 (pp. 717\u2013725), Waikoloa, HI, USA, 3\u20138 January 2021. IEEE. https:\/\/doi.org\/10.1109\/WACV48630.2021.00076","DOI":"10.1109\/WACV48630.2021.00076"},{"key":"6889_CR10","doi-asserted-by":"crossref","unstructured":"Duisterhof, B. P., Krishnan, S., Cruz, J. J., Banbury, C. R., Fu, W., Faust, A., & Janapa\u00a0Reddi, V. (2021). Tiny robot learning (tinyrl) for source seeking on a nano quadcopter. In 2021 IEEE international conference on robotics and automation (ICRA) (pp. 7242\u20137248).","DOI":"10.1109\/ICRA48506.2021.9561590"},{"key":"6889_CR11","unstructured":"Green, S., Vineyard, C. M., Ko\u00e7, & \u00c7. K. (2019). Distillation strategies for proximal policy optimization. CoRR. arXiv:1901.08128"},{"key":"6889_CR12","doi-asserted-by":"publisher","unstructured":"Hansen, N., & Wang, X. (2021). Generalization in reinforcement learning by soft data augmentation. In IEEE international conference on robotics and automation, ICRA 2021 (pp. 13611\u201313617), Xi\u2019an, China, 30 May\u20135 June 2021. IEEE. https:\/\/doi.org\/10.1109\/ICRA48506.2021.9561103","DOI":"10.1109\/ICRA48506.2021.9561103"},{"key":"6889_CR13","unstructured":"Hinton, G.E., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. CoRR. arXiv:1503.02531"},{"key":"6889_CR14","unstructured":"Hugging Face Inc. (2023a). Retrieved August 1, 2024, from https:\/\/huggingface.co\/sb3\/ppo-MiniGrid-FourRooms-v0"},{"key":"6889_CR15","unstructured":"Hugging Face Inc. (2023b). Retrieved August 1, 2024, from https:\/\/huggingface.co\/sb3\/ppo-MiniGrid-Unlock-v0"},{"key":"6889_CR16","doi-asserted-by":"publisher","unstructured":"Hutsebaut-Buysse, M., Guinjoan, F.G., Rademakers, E., Latr\u00e9, S., Bey-Temsamani, A., Mets, K., & Schepper, T. D. (2023). Directed real-world learned exploration. In IROS (pp. 5227\u20135234). https:\/\/doi.org\/10.1109\/IROS55552.2023.10341504","DOI":"10.1109\/IROS55552.2023.10341504"},{"key":"6889_CR17","unstructured":"Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv:1611.05397,"},{"key":"6889_CR18","unstructured":"Kalyanakrishnan, S., Aravindan, S., Bagdawat, V., Bhatt, V., Goka, H., Gupta, A., & Piratla, V. (2021). An analysis of frame-skipping in reinforcement learning. arXiv:2102.03718,"},{"key":"6889_CR19","doi-asserted-by":"publisher","unstructured":"Lakshminarayanan, A., Sharma, S., & Ravindran, B. (2017). Dynamic action repetition for deep reinforcement learning. Proceedings of the AAAI conference on artificial intelligence (Vol. 31(1)). https:\/\/doi.org\/10.1609\/aaai.v31i1.10918","DOI":"10.1609\/aaai.v31i1.10918"},{"key":"6889_CR20","unstructured":"Microsoft. (2019). ONNX runtime. Retrieved August 1, 2024, from https:\/\/github.com\/microsoft\/onnxruntime"},{"key":"6889_CR21","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. A. (2013). Playing atari with deep reinforcement learning. CoRR. arXiv:1312.5602"},{"key":"6889_CR22","doi-asserted-by":"publisher","unstructured":"Mu\u00f1oz, G., Barrado, C., \u00c7etin, E., & Salami, E. (2019). Deep reinforcement learning for drone delivery. Drones. https:\/\/doi.org\/10.3390\/drones3030072","DOI":"10.3390\/drones3030072"},{"key":"6889_CR23","unstructured":"ONNX. (2017). Open Neural Network Exchange. Retrieved August 1, 2024, from https:\/\/github.com\/onnx\/onnx"},{"key":"6889_CR24","unstructured":"Onnx2c. (2020). Onnx2c. Retrieved August 1, 2024, from https:\/\/github.com\/kraiskil\/onnx2c"},{"issue":"268","key":"6889_CR25","first-page":"1","volume":"22","author":"A Raffin","year":"2021","unstructured":"Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-baselines3: reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268), 1\u20138. http:\/\/jmlr.org\/papers\/v22\/20-1364.html.","journal-title":"Journal of Machine Learning Research"},{"key":"6889_CR26","unstructured":"Rusu, A. A., Colmenarejo, S. G., G\u00fcl\u00e7ehre, \u00c7., Desjardins, G., Kirkpatrick, J., Pascanu, R., & Hadsell, R. (2016). Policy distillation. In Y.\u00a0Bengio & Y.\u00a0LeCun (Eds.), 4th international conference on learning representations, ICLR 2016, conference track proceedings, San Juan, Puerto Rico, May 2\u20134, 2016. arXiv:1511.06295"},{"key":"6889_CR27","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347,"},{"key":"6889_CR28","doi-asserted-by":"crossref","unstructured":"Sharma, S., Lakshminarayanan, A. S., & Ravindran, B. (2017). Learning to repeat: Fine grained action repetition for deep reinforcement learning. In 5th international conference on learning representations, ICLR 2017, conference track proceedings, Toulon, France, 24\u201326 April 2017. OpenReview.net. https:\/\/openreview.net\/forum?id=B1GOWV5eg","DOI":"10.1609\/aaai.v31i1.10918"},{"issue":"6","key":"6889_CR29","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1109\/MCOM.001.2200529","volume":"61","author":"P Soto","year":"2023","unstructured":"Soto, P., Camelo, M., De Vleeschauwer, D., De Bock, Y., Chang, C.-Y., Botero, J. F., & Latr\u00e9, S. (2023). Network intelligence for NFV scaling in closed-loop architectures. IEEE Communications Magazine, 61(6), 66\u201372. https:\/\/doi.org\/10.1109\/MCOM.001.2200529","journal-title":"IEEE Communications Magazine"},{"key":"6889_CR30","unstructured":"Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT. Retrieved from http:\/\/incompleteideas.net\/book\/the-book-2nd.html"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06889-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-025-06889-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06889-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T14:30:03Z","timestamp":1764685803000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-025-06889-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,15]]},"references-count":30,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["6889"],"URL":"https:\/\/doi.org\/10.1007\/s10994-025-06889-9","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"type":"print","value":"0885-6125"},{"type":"electronic","value":"1573-0565"}],"subject":[],"published":{"date-parts":[[2025,10,15]]},"assertion":[{"value":"22 October 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 August 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 September 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 October 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declared no potential conflict of interest with respect to the research, authorship and\/or publication of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"The authors agree with the content and give appropriate consent to participate.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"All authors of this manuscript give consent for publication.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}],"article-number":"248"}}