{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T05:52:47Z","timestamp":1757310767349,"version":"3.38.0"},"reference-count":18,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[1995,9,1]],"date-time":"1995-09-01T00:00:00Z","timestamp":809913600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[1995,9]]},"abstract":"<jats:p> An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small. <\/jats:p>","DOI":"10.1177\/105971239500400102","type":"journal-article","created":{"date-parts":[[2007,3,18]],"date-time":"2007-03-18T01:21:19Z","timestamp":1174180879000},"page":"3-28","source":"Crossref","is-referenced-by-count":26,"title":["Reinforcement Learning Applied to a Differential Game"],"prefix":"10.1177","volume":"4","author":[{"given":"Mance E.","family":"Harmon","sequence":"first","affiliation":[{"name":"Wright Laboratory"}]},{"suffix":"III","given":"Leemon C.","family":"Baird","sequence":"additional","affiliation":[{"name":"United States Air Force Academy"}]},{"given":"A. Harry","family":"Klopf","sequence":"additional","affiliation":[{"name":"Wright Laboratory"}]}],"member":"179","published-online":{"date-parts":[[1995,9,1]]},"reference":[{"volume-title":"Proceedings of the IEEE Conference on Systems, Man, and Cybernetics","author":"Baird, L.C.","key":"atypb1"},{"volume-title":"Advantage updating (DTIC Rep. AD WL-TR-93-1146, available from the Defense Technical Information Center, Cameron Station, Alexandria, VA 22304-6145)","year":"1993","author":"Baird, L.C.","key":"atypb2"},{"volume-title":"Machine learning: Proceedings of the Twelfth International Conference","author":"Baird, L.C.","key":"atypb3"},{"volume-title":"ANN emulations of arbitrary PDFs (Department of Computer Science Tech. Rep","author":"Baird, L.C.","key":"atypb4"},{"volume-title":"Reinforcement learning with high-dimensional, continuous actions (DTIC Rep. AD WL-TR-93-1147, available from the Defense Technical Information Center, Cameron Station, Alexandria, VA 22304-6145)","year":"1993","author":"Baird, L.C.","key":"atypb5"},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1177\/105971239300100303"},{"key":"atypb7","doi-asserted-by":"publisher","DOI":"10.1063\/1.36249"},{"volume-title":"Dynamic programming: Deterministic and stochastic models","year":"1987","author":"Bertsekas, D.P.","key":"atypb8"},{"volume-title":"Reinforcement learning applied to linear quadratic regulation. Proceedings of the Fifth Conference on Neural Information Processing Systems","year":"1993","author":"Bradtke, S.J.","key":"atypb9"},{"volume-title":"Differential games","year":"1965","author":"Isaacs, R.","key":"atypb10"},{"volume-title":"Associative reinforcement learningfor optimal control. Unpublished master's thesis","year":"1991","author":"Millington, P.J.","key":"atypb11"},{"key":"atypb12","doi-asserted-by":"publisher","DOI":"10.2514\/3.55982"},{"key":"atypb13","doi-asserted-by":"publisher","DOI":"10.1038\/323533a0"},{"volume-title":"Proceedings of the International Joint Conference on Neural Networks","author":"Tesauro, G.","key":"atypb14"},{"issue":"3","key":"atypb15","first-page":"279","volume":"8","author":"Tesauro, G.","year":"1992","journal-title":"Machine Learning"},{"volume-title":"Learning from delayed rewards","year":"1989","author":"Watkins, C.J.C.H.","key":"atypb16"},{"key":"atypb17","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"volume-title":"Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems","year":"1993","author":"Williams, R.J.","key":"atypb18"}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/105971239500400102","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/105971239500400102","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T11:40:12Z","timestamp":1740915612000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/105971239500400102"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1995,9]]},"references-count":18,"journal-issue":{"issue":"1","published-print":{"date-parts":[[1995,9]]}},"alternative-id":["10.1177\/105971239500400102"],"URL":"https:\/\/doi.org\/10.1177\/105971239500400102","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"type":"print","value":"1059-7123"},{"type":"electronic","value":"1741-2633"}],"subject":[],"published":{"date-parts":[[1995,9]]}}}