{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T15:25:14Z","timestamp":1767713114983,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,2,9]],"date-time":"2016-02-09T00:00:00Z","timestamp":1454976000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Department of Science and Technology"},{"name":"Robert Bosch Centre"},{"name":"Department of Science and Technology for a project titled \u201cDistributed Computation over Large Networks and High-Dimensional Data Analysis.\u201d"},{"name":"Xerox Corporation, USA"},{"DOI":"10.13039\/100012913","name":"Tata Consultancy Services","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100012913","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Model. Comput. Simul."],"published-print":{"date-parts":[[2016,5,2]]},"abstract":"<jats:p>We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other works for discounted cost MDPs. Our actor-critic architecture incorporates parameterization both in the policy and the value function. A gradient search in the policy parameters is performed to improve the performance of the actor. The computation of the aforementioned gradient, however, requires an estimate of the value function of the policy corresponding to the current actor parameter. The value function, on the other hand, is approximated using linear function approximation and obtained from the critic. The error in approximation of the value function, however, results in suboptimal policies. In our article, we also update the features by performing a gradient descent on the Grassmannian of features to minimize a mean square Bellman error objective in order to find the best features. The aim is to obtain a good approximation of the value function and thereby ensure convergence of the actor to locally optimal policies. In order to estimate the gradient of the objective in the case of the average cost criterion, we utilize the policy gradient theorem, while in the case of the discounted cost objective, we utilize the simultaneous perturbation stochastic approximation (SPSA) scheme. We prove that our actor-critic algorithms converge to locally optimal policies. Experiments on two different settings show performance improvements resulting from our feature adaptation scheme.<\/jats:p>","DOI":"10.1145\/2868723","type":"journal-article","created":{"date-parts":[[2016,2,22]],"date-time":"2016-02-22T13:07:16Z","timestamp":1456146436000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Actor-Critic Algorithms with Online Feature Adaptation"],"prefix":"10.1145","volume":"26","author":[{"given":"K. J.","family":"Prabuchandran","sequence":"first","affiliation":[{"name":"Indian Institute of Science, Bangalore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shalabh","family":"Bhatnagar","sequence":"additional","affiliation":[{"name":"Indian Institute of Science, Bangalore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vivek S.","family":"Borkar","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology, Powai, Mumbai"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2016,2,9]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"crossref","unstructured":"P. A. Absil R. Mahony and R. Sepulchre. 2009. Optimization Algorithms on Matrix Manifolds. Princeton University Press.   P. A. Absil R. Mahony and R. Sepulchre. 2009. Optimization Algorithms on Matrix Manifolds. Princeton University Press.","DOI":"10.1515\/9781400830244"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50013-X"},{"key":"e_1_2_2_3_1","volume-title":"Proceedings of the 39th IEEE Conference on Decision and Control","volume":"4","author":"Baras J. S.","unstructured":"J. S. Baras and V. S. Borkar . 2000. A learning algorithm for Markov decision processes with adaptive state aggregation . In Proceedings of the 39th IEEE Conference on Decision and Control , Vol. 4 . 3351--3356. J. S. Baras and V. S. Borkar. 2000. A learning algorithm for Markov decision processes with adaptive state aggregation. In Proceedings of the 39th IEEE Conference on Decision and Control, Vol. 4. 3351--3356."},{"key":"e_1_2_2_4_1","volume-title":"Reinforcement Learning: An Introduction","author":"Barto A. G.","year":"1998","unstructured":"A. G. Barto . 1998 . Reinforcement Learning: An Introduction . MIT Press . A. G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1983.6313077"},{"key":"e_1_2_2_6_1","volume-title":"Dynamic Programming and Optimal Control","author":"Bertsekas D. P.","unstructured":"D. P. Bertsekas . 2011. Dynamic Programming and Optimal Control . Vol. 2 , 4 th ed. Athena Scientific , Belmont, MA . D. P. Bertsekas. 2011. Dynamic Programming and Optimal Control. Vol. 2, 4th ed. Athena Scientific, Belmont, MA.","edition":"4"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2013.2255022"},{"key":"e_1_2_2_8_1","doi-asserted-by":"crossref","unstructured":"S. Bhatnagar V. S. Borkar and L. A. Prashanth. 2012. Adaptive feature pursuit: Online adaptation of features in reinforcement learning. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE Press Computational Intelligence Science IEEE Press and Wiley 517--534.  S. Bhatnagar V. S. Borkar and L. A. Prashanth. 2012. Adaptive feature pursuit: Online adaptation of features in reinforcement learning. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE Press Computational Intelligence Science IEEE Press and Wiley 517--534.","DOI":"10.1002\/9781118453988.ch23"},{"key":"e_1_2_2_9_1","doi-asserted-by":"crossref","unstructured":"S. Bhatnagar H. L. Prasad and L. A. Prashanth. 2013b. Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods. Springer.  S. Bhatnagar H. L. Prasad and L. A. Prashanth. 2013b. Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods. Springer.","DOI":"10.1007\/978-1-4471-4285-0"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2009.07.008"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6911(97)90015-3"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-93-86279-38-5"},{"key":"e_1_2_2_13_1","doi-asserted-by":"crossref","unstructured":"D. D. Castro and S. Mannor. 2010. Adaptive bases for reinforcement learning. Machine Learning and Knowledge Discovery in Databases (2010) 312--327.   D. D. Castro and S. Mannor. 2010. Adaptive bases for reinforcement learning. Machine Learning and Knowledge Discovery in Databases (2010) 312--327.","DOI":"10.1007\/978-3-642-15880-3_26"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479895290954"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390204"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143901"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0363012901385691"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1214\/105051604000000116"},{"key":"e_1_2_2_19_1","doi-asserted-by":"crossref","unstructured":"H. J. Kushner and D. S. Clark. 1978. Stochastic Approximation Methods for Constrained and Unconstrained Systems. Vol. 6. Springer-Verlag New York.  H. J. Kushner and D. S. Clark. 1978. Stochastic Approximation Methods for Constrained and Unconstrained Systems. Vol. 6. Springer-Verlag New York.","DOI":"10.1007\/978-1-4684-9352-8"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964290"},{"key":"e_1_2_2_21_1","unstructured":"S. Mahadevan and B. Liu. 2010. Basis construction from power series expansions of value functions. In Advances in Neural Information Processing Systems. 1540--1548.  S. Mahadevan and B. Liu. 2010. Basis construction from power series expansions of value functions. In Advances in Neural Information Processing Systems. 1540--1548."},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/1314498.1314570"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/9.905687"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10479-005-5732-z"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.1953066"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273589"},{"volume-title":"Proceedings of the 53rd IEEE Conference on Decision and Control. 3597--3602","author":"Prabuchandran K. J.","key":"e_1_2_2_27_1","unstructured":"K. J. Prabuchandran , S. Bhatnagar , and V. S. Borkar . 2014. An actor critic algorithm based on Grassmanian search . In Proceedings of the 53rd IEEE Conference on Decision and Control. 3597--3602 . K. J. Prabuchandran, S. Bhatnagar, and V. S. Borkar. 2014. An actor critic algorithm based on Grassmanian search. In Proceedings of the 53rd IEEE Conference on Decision and Control. 3597--3602."},{"volume-title":"Workshop on Abstraction in Reinforcement Learning. 42--48","author":"Rohanimanesh K.","key":"e_1_2_2_28_1","unstructured":"K. Rohanimanesh , N. Roy , and R. Tedrake . 2009. Towards feature selection in actor-critic algorithms . In Workshop on Abstraction in Reinforcement Learning. 42--48 . K. Rohanimanesh, N. Roy, and R. Tedrake. 2009. Towards feature selection in actor-critic algorithms. In Workshop on Abstraction in Reinforcement Learning. 42--48."},{"volume-title":"Geometric Optimization Methods for Adaptive Filtering","author":"Smith S. T.","key":"e_1_2_2_29_1","unstructured":"S. T. Smith . 1993. Geometric Optimization Methods for Adaptive Filtering . Harvard University , Cambridge, MA . S. T. Smith. 1993. Geometric Optimization Methods for Adaptive Filtering. Harvard University, Cambridge, MA."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/9.119632"},{"volume-title":"Proceedings of the 28th International Conference on Machine Learning. 481--488","author":"Sun Y.","key":"e_1_2_2_31_1","unstructured":"Y. Sun , M. Ring , J. Schmidhuber , and F. J. Gomez . 2011. Incremental basis construction from temporal difference error . In Proceedings of the 28th International Conference on Machine Learning. 481--488 . Y. Sun, M. Ring, J. Schmidhuber, and F. J. Gomez. 2011. Incremental basis construction from temporal difference error. In Proceedings of the 28th International Conference on Machine Learning. 481--488."},{"key":"e_1_2_2_32_1","first-page":"1057","article-title":"Policy gradient methods for reinforcement learning with function approximation","volume":"12","author":"Sutton R. S.","year":"2000","unstructured":"R. S. Sutton , D. McAllester , S. Singh , and Y. Mansour . 2000 . Policy gradient methods for reinforcement learning with function approximation . In Advances in Neural Information Processing Systems , Vol. 12. 1057 -- 1063 . R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, Vol. 12. 1057--1063.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_33_1","unstructured":"P. S. Thomas W. C. Dabney S. Giguere and S. Mahadevan. 2013. Projected natural actor-critic. In Advances in Neural Information Processing Systems. 2337--2345.  P. S. Thomas W. C. Dabney S. Giguere and S. Mahadevan. 2013. Projected natural actor-critic. In Advances in Neural Information Processing Systems. 2337--2345."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/9.580874"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0005-1098(99)00099-0"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964284"},{"key":"e_1_2_2_37_1","doi-asserted-by":"crossref","unstructured":"H. Yu and D. P. Bertsekas. 2009. Basis function adaptation methods for cost approximation in MDP. In Adaptive Dynamic Programming and Reinforcement Learning. IEEE 74--81.  H. Yu and D. P. Bertsekas. 2009. Basis function adaptation methods for cost approximation in MDP. In Adaptive Dynamic Programming and Reinforcement Learning. IEEE 74--81.","DOI":"10.1109\/ADPRL.2009.4927528"}],"container-title":["ACM Transactions on Modeling and Computer Simulation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2868723","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2868723","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:04:23Z","timestamp":1750273463000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2868723"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,2,9]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,5,2]]}},"alternative-id":["10.1145\/2868723"],"URL":"https:\/\/doi.org\/10.1145\/2868723","relation":{},"ISSN":["1049-3301","1558-1195"],"issn-type":[{"type":"print","value":"1049-3301"},{"type":"electronic","value":"1558-1195"}],"subject":[],"published":{"date-parts":[[2016,2,9]]},"assertion":[{"value":"2014-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-02-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}