{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,18]],"date-time":"2026-05-18T16:45:01Z","timestamp":1779122701922,"version":"3.51.4"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"3-4","license":[{"start":{"date-parts":[[1992,5,1]],"date-time":"1992-05-01T00:00:00Z","timestamp":704678400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[1992,5]]},"DOI":"10.1007\/bf00992696","type":"journal-article","created":{"date-parts":[[2005,1,9]],"date-time":"2005-01-09T16:35:16Z","timestamp":1105288516000},"page":"229-256","source":"Crossref","is-referenced-by-count":3249,"title":["Simple statistical gradient-following algorithms for connectionist reinforcement learning"],"prefix":"10.1007","volume":"8","author":[{"given":"Ronald J.","family":"Williams","sequence":"first","affiliation":[]}],"member":"297","reference":[{"key":"CR1","first-page":"229","volume":"4","author":"A.G. Barto","year":"1985","unstructured":"Barto, A.G. (1985). Learning by statistical cooperation of self-interested neuron-like computing elements.Human Neurobiology, 4, 229?256.","journal-title":"Human Neurobiology"},{"key":"CR2","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1109\/TSMC.1985.6313371","volume":"15","author":"A.G. Barto","year":"1985","unstructured":"Barto, A.G. & Anandan, P. (1985). Pattern recognizing stochastic learning automata.IEEE Transactions on Systems, Man, and Cybernetics, 15, 360?374.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"CR3","unstructured":"Barto, A.G. & Anderson, C.W. (1985). Structural learning in connectionist systems.Proceedings of the Seventh Annual Conference of the Cognitive Science Society, (pp. 43?53). Irvine, CA."},{"key":"CR4","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1109\/TSMC.1983.6313077","volume":"13","author":"A.G. Barto","year":"1983","unstructured":"Barto, A.G., Sutton, R.S., & Anderson, C.W. (1983). Neuronlike elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man, and Cybernetics, 13, 835?846.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"CR5","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1007\/BF00453370","volume":"40","author":"A.G. Barto","year":"1981","unstructured":"Barto, A.G., Sutton, R.S., & Brouwer, P.S. (1981). Associative search network: A reinforcement learning associative memory.Biological Cybernetics, 40, 201?211.","journal-title":"Biological Cybernetics"},{"key":"CR6","unstructured":"Barto, A.G., & Jordan, M.I. (1987). Gradient following without back-propagation in layered networks.Proceedings of the First Annual International Conference on Neural Networks, Vol. II (pp. 629?636). San Diego, CA."},{"key":"CR7","volume-title":"Learning and computational neuroscience: Foundations of adaptive networks","author":"A.G. Barto","year":"1990","unstructured":"Barto, A.G., Sutton, R.S., & Watkins, C.J.C.H. (1990). Learning and sequential decision making. In: M. Gabriel & J.W. Moore (Eds.),Learning and computational neuroscience: Foundations of adaptive networks. Cambridge, MA: MIT Press."},{"key":"CR8","first-page":"45","volume-title":"Proceedings of the 1990 Connectionist Models Summer School","author":"P. Dayan","year":"1990","unstructured":"Dayan, P. (1990). Reinforcement comparison. In D.S. Touretzky, J.L. Elman, T.J. Sejnowski, & G.E. Hinton (Eds.),Proceedings of the 1990 Connectionist Models Summer School (pp. 45?51). San Mateo, CA: Morgan Kaufmann."},{"key":"CR9","volume-title":"Adaptive filtering prediction and control","author":"G.C. Goodwin","year":"1984","unstructured":"Goodwin, G.C. & Sin, K.S. (1984).Adaptive filtering prediction and control. Englewood Cliffs, NJ: Prentice-Hall."},{"key":"CR10","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1016\/0893-6080(90)90056-Q","volume":"3","author":"V. Gullapalli","year":"1990","unstructured":"Gullapalli, V. (1990). A stochastic reinforcement learning algorithm for learning real-valued functions.Neural Networks, 3, 671?692.","journal-title":"Neural Networks"},{"key":"CR11","volume-title":"Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations","author":"G.E. Hinton","year":"1986","unstructured":"Hinton, G.E. & Sejnowski, T.J. (1986). Learning and relearning in Boltzmann machines. In: D.E. Rumelhart & J.L. McClelland, (Eds.),Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations. Cambridge, MA: MIT Press."},{"key":"CR12","series-title":"Occasional Paper","volume-title":"Forward models: supervised learning with a distal teacher","author":"M.I. Jordan","year":"1990","unstructured":"Jordan, M.I. & Rumelhart, D.E. (1990).Forward models: supervised learning with a distal teacher. (Occasional Paper ? 40). Cambridge, MA: Massachusetts Institute of Technology, Center for Cognitive Science."},{"key":"CR13","first-page":"599","volume":"85","author":"Y. leCun","year":"1985","unstructured":"leCun, Y. (1985). Une procedure d'apprentissage pour resau a sequil assymetrique [A learning procedure for asymmetric threshold networks].Proceedings of Cognitiva, 85, 599?604.","journal-title":"Proceedings of Cognitiva"},{"key":"CR14","unstructured":"Munro, P. (1987). A dual back-propagation scheme for scalar reward learning.Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 165?176). Seattle, WA."},{"key":"CR15","volume-title":"Learning Automata: An introduction","author":"K.S. Narendra","year":"1989","unstructured":"Narendra, K.S. & Thathatchar, M.A.L. (1989).Learning Automata: An introduction. Englewood Cliffs, NJ: Prentice Hall."},{"key":"CR16","doi-asserted-by":"crossref","first-page":"1154","DOI":"10.1109\/TSMC.1983.6313193","volume":"13","author":"K.S. Narendra","year":"1983","unstructured":"Narendra, K.S. & Wheeler, R.M., Jr. (1983). AnN-player sequential stochastic game with identical payoffs.IEEE Transactions on Systems, Man, and Cybernetics, 13, 1154?1158.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"CR17","volume-title":"Principles of artificial intelligence","author":"N.J. Nilsson","year":"1980","unstructured":"Nilsson, N.J. (1980).Principles of artificial intelligence. Palo Alto, CA: Tioga."},{"key":"CR18","series-title":"Technical Report","volume-title":"Learning-logic","author":"D.B. Parker","year":"1985","unstructured":"Parker, D.B. (1985).Learning-logic. (Technical Report TR-47). Cambridge, MA: Massachusetts Institute of Technology, Center for Computational Research in Economics and Management Science."},{"key":"CR19","volume-title":"An introduction to probability theory and mathematical statistics","author":"V.K. Rohatgi","year":"1976","unstructured":"Rohatgi, V.K. (1976)An introduction to probability theory and mathematical statistics. New York: Wiley."},{"key":"CR20","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/5236.001.0001","volume-title":"Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations","author":"D.E. Rumelhart","year":"1986","unstructured":"Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In: D.E. Rumelhart & J.L. McClelland, (Eds.),Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations. Cambridge: MIT Press."},{"key":"CR21","unstructured":"Schmidhuber, J.H. & Huber, R. (1990). Learning to generate focus trajectories for attentive vision. (Technical Report FKI-128-90). Technische Universit\u00e4t M\u00fcnchen, Institut f\u00fcr Informatik."},{"key":"CR22","volume-title":"Temporal credit assignment in reinforcement learning","author":"R.S. Sutton","year":"1984","unstructured":"Sutton, R.S. (1984).Temporal credit assignment in reinforcement learning. Ph.D. Dissertation, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA."},{"key":"CR23","first-page":"9","volume":"3","author":"R.S. Sutton","year":"1988","unstructured":"Sutton, R.S. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3, 9?44.","journal-title":"Machine Learning"},{"key":"CR24","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1109\/TSMC.1985.6313407","volume":"15","author":"M.A.L. Thathatchar","year":"1985","unstructured":"Thathatchar, M.A.L. & Sastry, P.S. (1985). A new approach to the design of reinforcement schemes for learning automata.IEEE Transactions on Systems, Man, and Cybernetics, 15, 168?175.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"CR25","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1109\/TAC.1986.1104342","volume":"31","author":"R.M. Wheeler Jr.","year":"1986","unstructured":"Wheeler, R.M., Jr. & Narendra K.S. (1986). Decentralized learning in finite Markov chains.IEEE Transactions on Automatic Control, 31, 519?526.","journal-title":"IEEE Transactions on Automatic Control"},{"key":"CR26","volume-title":"Learning from delayed rewards","author":"C.J.C.H. Watkins","year":"1989","unstructured":"Watkins, C.J.C.H. (1989).Learning from delayed rewards. Ph.D. Dissertation, Cambridge University, Cambridge, England."},{"key":"CR27","volume-title":"Beyond regression: new tools for prediction and analysis in the behavioral sciences","author":"P.J. Werbos","year":"1974","unstructured":"Werbos, P.J. (1974).Beyond regression: new tools for prediction and analysis in the behavioral sciences. Ph.D. Dissertation, Harvard University, Cambridge, MA."},{"key":"CR28","series-title":"Technical Report","volume-title":"Reinforcement learning in connectionist networks: A mathematical analysis","author":"R.J. Williams","year":"1986","unstructured":"Williams, R.J. (1986).Reinforcement learning in connectionist networks: A mathematical analysis. (Technical Report 8605). San Diego: University of California, Institute for Cognitive Science."},{"key":"CR29","series-title":"Technical Report","volume-title":"Reinforcement-learning connectionist systems","author":"R.J. Williams","year":"1987","unstructured":"Williams, R.J. (1987a).Reinforcement-learning connectionist systems. (Technical Report NU-CCS-87-3). Boston, MA: Northeastern University, College of Computer Science."},{"key":"CR30","unstructured":"Williams, R.J. (1987b). A class of gradient-estimating algorithms for reinforcement learning in neural networks.Proceedings of the First Annual International Conference on Neural Networks, Vol. II (pp. 601?608). San Diego, CA."},{"key":"CR31","doi-asserted-by":"crossref","unstructured":"Williams, R.J. (1988a). On the use of backpropagation in associative reinforcement learning.Proceedings of the Second Annual International Conference on Neural Networks, Vol. I (pp. 263?270). San Diego, CA.","DOI":"10.1109\/ICNN.1988.23856"},{"key":"CR32","series-title":"Technical Report","volume-title":"Toward a theory of reinforcement-learning connectionist systems","author":"R.J. Williams","year":"1988","unstructured":"Williams, R.J. (1988b).Toward a theory of reinforcement-learning connectionist systems. (Technical Report NU-CCS-88-3). Boston, MA: Northeastern University, College of Computer Science."},{"key":"CR33","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1080\/09540099108946587","volume":"3","author":"R.J. Williams","year":"1991","unstructured":"Williams, R.J. & Peng, J. (1991). Function optimization using connectionist reinforcement learning algorithms.Connection Science, 3, 241?268.","journal-title":"Connection Science"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00992696.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/BF00992696\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00992696","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,4,5]],"date-time":"2020-04-05T08:10:57Z","timestamp":1586074257000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/BF00992696"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1992,5]]},"references-count":33,"journal-issue":{"issue":"3-4","published-print":{"date-parts":[[1992,5]]}},"alternative-id":["BF00992696"],"URL":"https:\/\/doi.org\/10.1007\/bf00992696","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[1992,5]]}}}