{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T17:14:46Z","timestamp":1768324486563,"version":"3.49.0"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2005,2,1]],"date-time":"2005-02-01T00:00:00Z","timestamp":1107216000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Ann Oper Res"],"published-print":{"date-parts":[[2005,2]]},"DOI":"10.1007\/s10479-005-5732-z","type":"journal-article","created":{"date-parts":[[2005,3,31]],"date-time":"2005-03-31T13:24:00Z","timestamp":1112275440000},"page":"215-238","source":"Crossref","is-referenced-by-count":122,"title":["Basis Function Adaptation in Temporal Difference Reinforcement Learning"],"prefix":"10.1007","volume":"134","author":[{"given":"Ishai","family":"Menache","sequence":"first","affiliation":[]},{"given":"Shie","family":"Mannor","sequence":"additional","affiliation":[]},{"given":"Nahum","family":"Shimkin","sequence":"additional","affiliation":[]}],"member":"297","reference":[{"key":"5732_CR1","doi-asserted-by":"crossref","unstructured":"Alon, G., D.P. Kroese, T. Raviv, and R.Y. Rubinstein. (2005).\u201cApplication of the Cross-Entropy Method to the Buffer Allocation Problem in a Simulation-Based Environment. \u201dAnnals of Operation Research 134, 137\u2013151, a preliminary version appeared in the third Aegean International Conference on Design and Analysis of Manufacturing Systems.","DOI":"10.1007\/s10479-005-5728-8"},{"key":"5732_CR2","unstructured":"Auer, P., M. Herbster, and M. Warmuth. (1996).\u201cExponentially Many Local Minima for Single Neurons.\u201d In D. Touretzky, M. Mozer, and M. Hasselmo (eds.), Advances in Neural Information Processing Systems, Vol. 8, MIT Press, pp. 316\u2013322."},{"key":"5732_CR3","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TSMC.1983.6313077","volume":"13","author":"A. Barto","year":"1983","unstructured":"Barto, A., R. Sutton, and C. Anderson. (1983).\u201cNeuron-Like Adaptive Elements that can Solve Difficult Learning Control Problems.\u201d IEEE Transactions on Systems, Man, and Cybernetics 13, 834\u2013846.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"5732_CR4","unstructured":"Bertsekas, D. (1995).Dynamic Programming and Optimal Control. Athena Scientific."},{"key":"5732_CR5","unstructured":"Bertsekas, D. (1999).Nonlinear Programming, 2nd edition. Athena Scientific."},{"key":"5732_CR6","unstructured":"Bertsekas, D. and J. Tsitsiklis (1996). Neuro-Dynamic Programming. Athena Scientific."},{"key":"5732_CR7","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1023\/A:1017936530646","volume":"49","author":"J.A. Boyan","year":"2002","unstructured":"Boyan, J.A. (2002).\u201cTechnical Update: Least-Squares Temporal Difference Learning.\u201dMachine Learning 49, 233\u2013246.","journal-title":"Machine Learning"},{"key":"5732_CR8","unstructured":"Bradtke, S. (1993).\u201cReinforcement Learning Applied to Linear Quadratic Regulation.\u201dIn S. Hanson and J. Cowan (eds.), Advances in Neural Information Processing Systems, Vol. 5, Morgan Kaufmann, pp. 295\u2013302."},{"issue":"1\/2\/3","key":"5732_CR9","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1023\/A:1018056104778","volume":"22","author":"S. Bradtke","year":"1996","unstructured":"Bradtke, S. and A. Barto. (1996).\u201cLinear Least-Squares Algorithms for Temporal Difference Learning.\u201dMachine Learning 22(1\/2\/3), 33\u201357.","journal-title":"Machine Learning"},{"key":"5732_CR10","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1007\/s10479-005-5724-z","volume":"134","author":"P. de-Boer","year":"2005","unstructured":"de-Boer, P., D.Y. Kroese, S. Mannor, and R.Y. Rubinstein. (2005).\u201cA Tutorial on the Cross-Entropy Method.\u201dAvailable from http:\/\/www.cemethod.org . Annals of Operation Research 134, 19\u201367.","journal-title":"Annals of Operation Research"},{"key":"5732_CR11","unstructured":"Dubin, U. (2002). \u201cApplication of the Cross-Entropy Method to Neural Computation.\u201d Unpublished Master\u2019s thesis, Technion."},{"key":"5732_CR12","unstructured":"Ghosh, J. and A. Nag. (2000). \u201cAn Overview on Radial Basis Function Networks.\u201d In R.J. Howlett and L.C. Jain (eds.), Radial Basis Function Neural Networks Theory and Applications., Physica-Verlag."},{"key":"5732_CR13","unstructured":"Haykin, S.S. (1998). Neural Networks : A Comprehensive Foundation. Prentice Hall."},{"key":"5732_CR14","doi-asserted-by":"crossref","unstructured":"Helvik, B.E. and O. Wittner. (2001). \u201cUsing the Cross-Entropy Method to Guide\/Govern Mobile Agent\u2019s Path Finding in Networks.\u201d In Proceedings of the 3rd International Workshop on Mobile Agents for Telecommunication Applications \u2013 MATA 01. Morgan Kaufmann.","DOI":"10.1007\/3-540-44651-6_24"},{"key":"5732_CR15","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","volume":"4","author":"L.P. Kaelbling","year":"1996","unstructured":"Kaelbling, L.P., M. Littman, and A.W. Moore. (1996). \u201cReinforcement Learning \u2013 A Survey.\u201d Journal of Artificial Intelligence Research 4, 237\u2013285.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"5732_CR16","unstructured":"Lagoudakis, M.G. and R. Parr (2001). \u201cModel-Free Least-Squares Policy Iteration.\u201d In Advances in Neural Information Processing Systems, Vol. 14, Morgan Kaufmann, pp. 1547\u20131554."},{"key":"5732_CR17","unstructured":"Mannor, S., R.Y. Rubinstein, and Y. Gat (2003). \u201cThe Cross-Entropy Method for Fast Policy Search.\u201d In T. Fawcett and N. Mishra (eds.), Machine Learning, Proceedings of the Twentieth International Conference, AAAI press, pp. 512\u2013519."},{"key":"5732_CR18","unstructured":"McGovern, A. and A.G. Barto. (2001). \u201cAutomatic Discovery of Subgoals in Reinforcement Learning Using Diverse Density.\u201d In Proceedings of the 18th International Conference on Machine Learning, Morgan Kaufmann, pp. 361\u2013368."},{"key":"5732_CR19","unstructured":"McLachlan, G. and T. Krishnan. (1997). The EM Algorithm and Extensions. John Wiley & Sons."},{"key":"5732_CR20","doi-asserted-by":"crossref","unstructured":"Menache, I., S. Mannor, and N. Shimkin. (2002). \u201cQ-cut-Dynamic Discovery of Sub-Goals in Reinforcement Learning.\u201d In Proceedings of the 13th European Conference on Machine Learning, Vol 2430, Springer, pp. 295\u2013306.","DOI":"10.1007\/3-540-36755-1_25"},{"key":"5732_CR21","unstructured":"Munos, R. (2003). \u201cError Bounds for Approximate Policy Iteration.\u201d In T. Fawcett and N. Mishra (eds.), Machine Learning, Proceedings of the Twentieth International Conference, AAAI press, pp. 560\u2013567."},{"key":"5732_CR22","unstructured":"Nedic, A. and D. Bertsekas. (2001). Least-Squares Policy Evaluation Algorithms with Linear Function Approximation. LIDS Report LIDS-P-2537, to appear in J. of Discrete Event Systems."},{"key":"5732_CR23","doi-asserted-by":"crossref","unstructured":"Puterman, M. (1994). Markov Decision Processes. Wiley-Interscience.","DOI":"10.1002\/9780470316887"},{"key":"5732_CR24","doi-asserted-by":"crossref","unstructured":"Ratitch, B. and D. Precup. (2002). \u201cCharacterizing Markov Decision Processes.\u201d In Proceedings of the 13th European Conference on Machine Learning, Vol. 2430, Springer, pp. 391\u2013404.","DOI":"10.1007\/3-540-36755-1_33"},{"key":"5732_CR25","doi-asserted-by":"crossref","unstructured":"Rubinstein, R.Y. and D.P. Kroese. (2004). The Cross-Entropy Method. A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Neural Computation. Springer.","DOI":"10.1007\/978-1-4757-4321-0"},{"key":"5732_CR26","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1023\/A:1010091220143","volume":"1","author":"R.Y. Rubinstein","year":"1999","unstructured":"Rubinstein, R.Y. (1999). \u201cThe Cross-Entropy Method for Combinatorial and Continuous Optimization.\u201d Methodology and Computing in Applied Probability 1, 127\u2013190.","journal-title":"Methodology and Computing in Applied Probability"},{"key":"5732_CR27","unstructured":"Singh, S.P., T. Jaakkola, and M.I. Jordan. (1995). \u201cReinforcement Learning with Soft State Aggregation.\u201d In Advances in Neural Information Processing Systems, Vol. 7, MIT Press, pp. 361\u2013368."},{"key":"5732_CR28","first-page":"9","volume":"3","author":"R.S. Sutton","year":"1988","unstructured":"Sutton, R.S. (1988). \u201cLearning to Predict by the Method of Temporal Differences.\u201d Machine Learning 3, 9\u201344.","journal-title":"Machine Learning"},{"key":"5732_CR29","doi-asserted-by":"crossref","unstructured":"Sutton, R.S. and A.G. Barto. (1998). Reinforcement Learning: An Introduction. MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"5732_CR30","first-page":"50","volume":"22","author":"J. Tsitsiklis","year":"1996","unstructured":"Tsitsiklis, J. and B. Van-Roy. (1996). \u201cFeature-Based Methods for Large Scale Dynamic Programming.\u201d Machine Learning 22, 50\u201394.","journal-title":"Machine Learning"},{"key":"5732_CR31","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1109\/9.580874","volume":"42","author":"J. Tsitsiklis","year":"1997","unstructured":"Tsitsiklis, J. and B. Van Roy. (1997). \u201cAn Analysis of Temporal-Difference Learning with Function Approximation.\u201d IEEE Transactions on Automatic Control 42, 674\u2013690.","journal-title":"IEEE Transactions on Automatic Control"},{"key":"5732_CR32","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1016\/0893-6080(90)90088-3","volume":"3","author":"P. Werbos","year":"1990","unstructured":"Werbos, P. (1990). \u201cConsistency of HDP Applied to Simple Reinforcement Learning Problem.\u201d Neural Networks 3, 170\u2013189.","journal-title":"Neural Networks"},{"key":"5732_CR33","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1016\/S0019-9958(77)90354-0","volume":"34","author":"I.H. Witten","year":"1977","unstructured":"Witten, I.H. (1977). \u201cAn Adaptive Optimal Controller for Discrete-Time Markov Environments.\u201d Information and Control 34, 286\u2013295.","journal-title":"Information and Control"}],"container-title":["Annals of Operations Research"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10479-005-5732-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10479-005-5732-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10479-005-5732-z","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,4,6]],"date-time":"2020-04-06T15:44:15Z","timestamp":1586187855000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10479-005-5732-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,2]]},"references-count":33,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2005,2]]}},"alternative-id":["5732"],"URL":"https:\/\/doi.org\/10.1007\/s10479-005-5732-z","relation":{},"ISSN":["0254-5330","1572-9338"],"issn-type":[{"value":"0254-5330","type":"print"},{"value":"1572-9338","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,2]]}}}