{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,22]],"date-time":"2025-12-22T21:39:59Z","timestamp":1766439599798,"version":"3.48.0"},"reference-count":16,"publisher":"Springer Science and Business Media LLC","issue":"12","license":[{"start":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T00:00:00Z","timestamp":1762473600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T00:00:00Z","timestamp":1762473600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Netherlands Organisation for Applied Scientific Research TNO and the Dutch Ministry of Economic Affairs and Climate","award":["AI211006"],"award-info":[{"award-number":["AI211006"]}]},{"name":"German Federal Ministry of Education and Research","award":["01IS21022G"],"award-info":[{"award-number":["01IS21022G"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Temporal Difference (TD) and Least-Squares Temporal Difference (LSTD) are related methods to estimate the value function of a Markov Decision Process (MDP). While TD is a direct method using local data to update the value function estimate, LSTD is a Bellman projected equation method using full data to compute a one-time estimate. TD(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ) and LSTD(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ) extend TD and LSTD with eligibility traces. While estimating the value function, TD(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ) and LSTD(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ) use actual histories of features as traces. Recently, expected eligibility traces have been proposed for TD(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ) to not only include actual histories, but also all potential histories of features that could have occurred based on the model or the available data. While this idea can account for non-linear feature architectures, here we limit ourselves to linear feature architectures with full data updates in the context of LSTD. We show that, in striking contrast with the direct versions, an extension of LSTD to include the theoretical expected eligibility traces is equivalent to LSTD without eligibility traces (LSTD(0)). We obtain a similar result if we consider mixed eligibility traces; a combination of expected eligibility traces and ordinary eligibility traces. In fact, we show that LSTD with theoretical mixed eligibility traces is equivalent to LSTD(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda ^\\prime $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ) for a given\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda ^\\prime $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    that captures both the decay of the eligibility trace, as well as the balance between the expected eligibility trace and the ordinary trace. Furthermore, we consider alternative methods LSET(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ) and LSET(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\eta $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ,\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ), which rely on the empirical means of the eligibility traces rather than the theoretical expected eligibility traces, and show that their value estimates converges to those of LSTD(0) and LSTD(\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\lambda ^\\prime $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    ).\n                  <\/jats:p>","DOI":"10.1007\/s10994-025-06912-z","type":"journal-article","created":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T16:44:23Z","timestamp":1762533863000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Least-squares temporal difference with expected eligibility traces"],"prefix":"10.1007","volume":"114","author":[{"given":"Roy","family":"van Zuijlen","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Duarte","family":"Antunes","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,11,7]]},"reference":[{"unstructured":"Bertsekas, D. P. (2018). Dynamic programming and optimal control, (4th ed., Vol.\u00a02). Athena Scientific.","key":"6912_CR1"},{"issue":"1","key":"6912_CR2","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.cam.2008.07.037","volume":"227","author":"DP Bertsekas","year":"2009","unstructured":"Bertsekas, D. P., & Yu, H. (2009). Projected equation methods for approximate solution of large linear systems. Journal of Computational and Applied Mathematics, 227(1), 27\u201350. https:\/\/doi.org\/10.1016\/j.cam.2008.07.037","journal-title":"Journal of Computational and Applied Mathematics"},{"key":"6912_CR3","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1023\/A:1017936530646","volume":"49","author":"JA Boyan","year":"2002","unstructured":"Boyan, J. A. (2002). Technical update: least-squares temporal difference learning. Machine Learning, 49, 233\u2013246. https:\/\/doi.org\/10.1023\/A:1017936530646","journal-title":"Machine Learning"},{"key":"6912_CR4","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/BF00114723","volume":"22","author":"SJ Bradtke","year":"1996","unstructured":"Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33\u201357. https:\/\/doi.org\/10.1007\/BF00114723","journal-title":"Machine Learning"},{"key":"6912_CR5","first-page":"809","volume":"15","author":"C Dann","year":"2014","unstructured":"Dann, C., Neumann, G., & Peters, J. (2014). Policy evaluation with temporal differences: a survey and comparison. Journal of Machine Learning Research, 15, 809\u2013883.","journal-title":"Journal of Machine Learning Research"},{"issue":"4","key":"6912_CR6","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1162\/neco.1993.5.4.613","volume":"5","author":"P Dayan","year":"1993","unstructured":"Dayan, P. (1993). Improving generalization for temporal difference learning: the successor representation. Neural Computation, 5(4), 613\u2013624. https:\/\/doi.org\/10.1162\/neco.1993.5.4.613","journal-title":"Neural Computation"},{"unstructured":"Deisenroth, M. P. (2010). Efficient reinforcement learning using gaussian processes. KIT Scientific Publishing.","key":"6912_CR7"},{"key":"6912_CR8","first-page":"441","volume":"19","author":"A Geramifard","year":"2006","unstructured":"Geramifard, A., Bowling, M., Zinkevich, M., & Sutton, R. S. (2006). Ilstd: eligibility traces and convergence analysis. Advances in Neural Information Processing Systems, 19, 441\u2013448.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"6912_CR9","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1023\/A:1022192903948","volume":"13","author":"A Nedi\u0107","year":"2003","unstructured":"Nedi\u0107, A., & Bertsekas, D. P. (2003). Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13(1), 79\u2013110. https:\/\/doi.org\/10.1023\/A:1022192903948","journal-title":"Discrete Event Dynamic Systems"},{"doi-asserted-by":"crossref","unstructured":"Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., & Littman, M. L. (2008). An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning Proceedings of the 25th international conference on machine learning (pp. 752\u2013759).","key":"6912_CR10","DOI":"10.1145\/1390156.1390251"},{"key":"6912_CR11","doi-asserted-by":"publisher","first-page":"3952","DOI":"10.1609\/aaai.v32i1.11813","volume":"32","author":"S Pitis","year":"2018","unstructured":"Pitis, S. (2018). Source traces for temporal difference learning. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 3952\u20133959.","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"doi-asserted-by":"crossref","unstructured":"Ross, S. M. (2010). Introduction to probability models (10th ed.). Academic press.","key":"6912_CR12","DOI":"10.1016\/B978-0-12-375686-2.00007-8"},{"key":"6912_CR13","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1007\/BF00115009","volume":"3","author":"RS Sutton","year":"1988","unstructured":"Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9\u201344. https:\/\/doi.org\/10.1007\/BF00115009","journal-title":"Machine Learning"},{"unstructured":"Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). The MIT Press.","key":"6912_CR14"},{"key":"6912_CR15","doi-asserted-by":"publisher","first-page":"9997","DOI":"10.1609\/aaai.v35i11.17200","volume":"35","author":"H Hasselt","year":"2021","unstructured":"Hasselt, H., Madjiheurem, S., Hessel, M., Silver, D., Barreto, A., & Borsa, D. (2021). Expected eligibility traces. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 9997\u201310005.","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"issue":"6","key":"6912_CR16","doi-asserted-by":"publisher","first-page":"3310","DOI":"10.1137\/100807879","volume":"50","author":"H Yu","year":"2012","unstructured":"Yu, H. (2012). Least squares temporal difference methods: an analysis under general conditions. SIAM Journal on Control and Optimization, 50(6), 3310\u20133343. https:\/\/doi.org\/10.1137\/100807879","journal-title":"SIAM Journal on Control and Optimization"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06912-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-025-06912-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06912-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,22]],"date-time":"2025-12-22T21:29:16Z","timestamp":1766438956000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-025-06912-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,7]]},"references-count":16,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["6912"],"URL":"https:\/\/doi.org\/10.1007\/s10994-025-06912-z","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"type":"print","value":"0885-6125"},{"type":"electronic","value":"1573-0565"}],"subject":[],"published":{"date-parts":[[2025,11,7]]},"assertion":[{"value":"10 July 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 July 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 October 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 November 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"269"}}