{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T04:51:15Z","timestamp":1774587075511,"version":"3.50.1"},"reference-count":26,"publisher":"Elsevier","isbn-type":[{"value":"9781558603356","type":"print"}],"license":[{"start":{"date-parts":[[1994,1,1]],"date-time":"1994-01-01T00:00:00Z","timestamp":757382400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.elsevier.com\/tdm\/userlicense\/1.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[1994]]},"DOI":"10.1016\/b978-1-55860-335-6.50042-8","type":"book-chapter","created":{"date-parts":[[2014,7,1]],"date-time":"2014-07-01T02:58:29Z","timestamp":1404183509000},"page":"284-292","source":"Crossref","is-referenced-by-count":137,"title":["Learning Without State-Estimation in Partially Observable Markovian Decision Processes"],"prefix":"10.1016","author":[{"given":"Satinder P.","family":"Singh","sequence":"first","affiliation":[]},{"given":"Tommi","family":"Jaakkola","sequence":"additional","affiliation":[]},{"given":"Michael I.","family":"Jordan","sequence":"additional","affiliation":[]}],"member":"78","reference":[{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib1","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1109\/TSMC.1985.6313371","article-title":"Pattern recognizing stochastic learning automata","volume":"15","author":"Barto","year":"1985","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib2","unstructured":"Barto, A. G., Bradtke, S. J., & Singh, S. P. (to appear). Learning to act using real-time dynamic programming. Artificial Intelligence. also, University of Massachusetts, Amherst, CMPSCI Technical Report 93\u201302."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib3","article-title":"Monte carlo matrix inversion and reinforcement learning","volume":"6","author":"Barto","year":"1994"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib4","first-page":"835","article-title":"Neuronlike elements that can solve difficult learning control problems","volume":"13","author":"Barto","year":"1983","journal-title":"IEEE SMC"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib5","first-page":"686","article-title":"Sequential decision problems and neural networks","volume":"2","author":"Barto","year":"1990"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib6","series-title":"Dynamic Programming: Deterministic and Stochastic Models","author":"Bertsekas","year":"1987"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib7","unstructured":"Chrisman, L. (1992a) Planning for closed-loop execution using partially observable markovian decision processes. Submitted to AAAI, 1992."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib8","unstructured":"Chrisman, L. (1992b). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In AAAI-92."},{"issue":"3\/4","key":"10.1016\/B978-1-55860-335-6.50042-8_bib9","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1023\/A:1022632907294","article-title":"The convergence of TD(A) for general A","volume":"8","author":"Dayan","year":"1992","journal-title":"Machine Learning"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib10","unstructured":"Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). Stochastic convergence of iterative DP algorithms. In Advances in Neural Information Processing Systems 6. also, to appear in Neural Computation."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib11","unstructured":"Jaakkola, T., Singh, S. P., & Jordan, M. I. (1994). Monte Carlo reinforcement learning in non-Markovian decision problems. Submitted to NIPS'94."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib12","unstructured":"Lin, L. J. & Mitchell, T. M. (1992). Reinforcement learning with hidden states. In In Proceedings of the Second International Conference on Simulation of Adaptive Behavior: From Animals to Animats."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib13","unstructured":"McCallum, R. A. (1993). Overcoming incomplete perception with utile distinction memory. In Utgoff, P. (Ed.), Machine Learning: Proceedings of the Tenth International Conference, pages 190\u2013196. Morgan Kaufmann."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib14","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1109\/TSMC.1974.5408453","article-title":"Learning automata\u2014A survey","volume":"4","author":"Narendra","year":"1974","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib15","series-title":"Introduction to Stochastic Dynamic Programming","author":"Ross","year":"1983"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib16","doi-asserted-by":"crossref","unstructured":"Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth Machine Learning Conference.","DOI":"10.1016\/B978-1-55860-307-3.50045-9"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib17","unstructured":"Singh, S. P. (1994). Reinforcement learning algorithms for average-payoff markovian decision processes. In Proceedings of the Twelth National Conference on Artificial Intelligence, Seattle, WA."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib18","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1287\/opre.26.2.282","article-title":"The optimal control of partially observable markov processes over the infinite horizon: discounted case","volume":"26","author":"Sondik","year":"1978","journal-title":"Operations Research"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib19","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/BF00115009","article-title":"Learning to predict by the methods of temporal differences","volume":"3","author":"Sutton","year":"1988","journal-title":"Machine Learning"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib20","unstructured":"Sutton, R. S. (1990). Integrating architectures for learning, planning, and reacting based on approximating dynamic programming. In Proc. of the Seventh International Conference on Machine Learning, pages 216\u2013224, San Mateo, CA. Morgan Kaufmann."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib21","unstructured":"Sutton, R. S. (1994). personal communication."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib22","unstructured":"Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge Univ., Cambridge, England."},{"issue":"3\/4","key":"10.1016\/B978-1-55860-335-6.50042-8_bib23","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1023\/A:1022676722315","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Machine Learning"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib24","unstructured":"Whitehead, S. D. (1992). Reinforcement Learning for the Adaptive Control of Perception and Action. PhD thesis, University of Rochester."},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib25","doi-asserted-by":"crossref","unstructured":"Whitehead, S. D. & Ballard, D. H. (1990). Active perception and reinforcement learning. In Proc. of the Seventh International Conference on Machine Learning, Austin, TX. M.","DOI":"10.1016\/B978-1-55860-141-3.50025-0"},{"key":"10.1016\/B978-1-55860-335-6.50042-8_bib26","unstructured":"Whitehead, S. D. & Lin, L. J. (1993). Reinforcement learning in non-markov environments. working paper."}],"container-title":["Machine Learning Proceedings 1994"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.elsevier.com\/content\/article\/PII:B9781558603356500428?httpAccept=text\/xml","content-type":"text\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/api.elsevier.com\/content\/article\/PII:B9781558603356500428?httpAccept=text\/plain","content-type":"text\/plain","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2019,8,12]],"date-time":"2019-08-12T05:45:13Z","timestamp":1565588713000},"score":1,"resource":{"primary":{"URL":"https:\/\/linkinghub.elsevier.com\/retrieve\/pii\/B9781558603356500428"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1994]]},"ISBN":["9781558603356"],"references-count":26,"URL":"https:\/\/doi.org\/10.1016\/b978-1-55860-335-6.50042-8","relation":{},"subject":[],"published":{"date-parts":[[1994]]}}}