{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T08:48:31Z","timestamp":1774946911775,"version":"3.50.1"},"reference-count":40,"publisher":"SAGE Publications","issue":"12","license":[{"start":{"date-parts":[[2008,12,1]],"date-time":"2008-12-01T00:00:00Z","timestamp":1228089600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["SIMULATION"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:p> We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of these algorithms are developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Of the former two, one algorithm uses a linear parameterization for the policy, resulting in reduced memory complexity. Convergence analysis is briefly sketched and illustrative numerical experiments with the four algorithms are shown for a problem of flow control in communication networks. <\/jats:p>","DOI":"10.1177\/0037549708098120","type":"journal-article","created":{"date-parts":[[2008,11,24]],"date-time":"2008-11-24T15:33:40Z","timestamp":1227540820000},"page":"577-600","source":"Crossref","is-referenced-by-count":14,"title":["Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes"],"prefix":"10.1177","volume":"84","author":[{"given":"Shalabh","family":"Bhatnagar","sequence":"first","affiliation":[{"name":"Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012, India"}]},{"given":"Mohammed Shahid","family":"Abdulla","sequence":"additional","affiliation":[{"name":"General Motors India Science Lab Bangalore"}]}],"member":"179","published-online":{"date-parts":[[2008,12,1]]},"reference":[{"key":"atypb1","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887"},{"key":"atypb2","volume-title":"Dynamic Programming and Optimal Control, Volume I","author":"Bertsekas, D.","year":"1995"},{"key":"atypb3","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton, R.","year":"1998"},{"key":"atypb4","volume-title":"Neuro-Dynamic Programming","author":"Bertsekas, D.","year":"1996"},{"key":"atypb5","volume-title":"Handbook of Markov Decision Processes: Methods and Applications","author":"Van Roy, B.","year":"2001"},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1109\/9.580874"},{"key":"atypb7","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"atypb8","first-page":"835","volume":"13","author":"Barto, A.","year":"1983","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"atypb9","doi-asserted-by":"publisher","DOI":"10.1137\/S036301299731669X"},{"key":"atypb10","doi-asserted-by":"publisher","DOI":"10.1137\/S0363012901385691"},{"key":"atypb11","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2004.825622"},{"key":"atypb12","doi-asserted-by":"publisher","DOI":"10.1109\/9.119632"},{"key":"atypb13","doi-asserted-by":"publisher","DOI":"10.1016\/S0005-1098(96)00149-5"},{"key":"atypb14","doi-asserted-by":"publisher","DOI":"10.1145\/858481.858486"},{"key":"atypb15","first-page":"1380","volume":"2","author":"Bhatnagar, S.","year":"1999","journal-title":"Proceedings of 38th IEEE Conference on Decision and Control"},{"key":"atypb16","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2003.812782"},{"key":"atypb17","doi-asserted-by":"publisher","DOI":"10.1016\/0377-2217(94)00091-P"},{"key":"atypb18","doi-asserted-by":"publisher","DOI":"10.1287\/ijoc.1020.0024"},{"key":"atypb19","volume-title":"A hierarchical structure for finite horizon dynamic programming problems","author":"Zhang, C.","year":"2000"},{"key":"atypb20","volume-title":"Proceedings of the Fifteenth International Conference on Machine Learning","author":"Garcia, F."},{"key":"atypb21","doi-asserted-by":"publisher","DOI":"10.1016\/S0005-1098(99)00099-0"},{"key":"atypb22","volume-title":"Proceedings of 17th Annual Conference on Neural Information Processing Systems (NIPS'03)","author":"Parkes, D.C."},{"key":"atypb23","volume-title":"Proceedings of 18th Annual Conference on Neural Information Processing Systems (NIPS'04)","author":"Parkes, D.C."},{"key":"atypb24","volume-title":"Cisco Frame Relay Solutions Guide","author":"Chin, J.","year":"2004"},{"key":"atypb25","doi-asserted-by":"publisher","DOI":"10.1109\/90.944345"},{"key":"atypb26","unstructured":"Sutton, R., D. McAllester, S. Singh, and Y. Mansour. 1999. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In S. A. Solla and T. K. Leen and K.R. Muller, (Ed.) Advances in Neural Information Processing Systems-12, pp. 1057-1063. Cambridge, MA: MIT Press."},{"key":"atypb27","doi-asserted-by":"publisher","DOI":"10.1109\/9.905687"},{"key":"atypb28","volume-title":"Proceedings of Sixteenth ICML","author":"Boyan, J."},{"key":"atypb29","doi-asserted-by":"publisher","DOI":"10.1007\/s10626-006-0003-y"},{"key":"atypb30","doi-asserted-by":"publisher","DOI":"10.1214\/105051604000000116"},{"key":"atypb31","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4684-9352-8"},{"key":"atypb32","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-2696-8"},{"key":"atypb33","volume-title":"Discrete Parameter Martingales","author":"Neveu, J.","year":"1975"},{"key":"atypb34","first-page":"1143","volume":"2","author":"M.W. Hirsch .","year":"2003","journal-title":"Neural Networks"},{"key":"atypb35","volume-title":"Proceedings of the 38th IEEE Conference on Decision and Control-CDC99","author":"Gerencser, L."},{"key":"atypb36","volume-title":"Introduction to Probability Models, 7\/e","author":"Ross, S.M.","year":"2000"},{"key":"atypb37","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2006.887917"},{"key":"atypb38","doi-asserted-by":"publisher","DOI":"10.1145\/1044322.1044326"},{"key":"atypb39","doi-asserted-by":"publisher","DOI":"10.1145\/1315575.1315577"},{"key":"atypb40","volume-title":"Proceedings of 15th International Conference on Machine Learning (ICML)","author":"Cesa-Bianchi, N."}],"container-title":["SIMULATION"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0037549708098120","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0037549708098120","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T02:27:20Z","timestamp":1740968840000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0037549708098120"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,12]]},"references-count":40,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["10.1177\/0037549708098120"],"URL":"https:\/\/doi.org\/10.1177\/0037549708098120","relation":{},"ISSN":["0037-5497","1741-3133"],"issn-type":[{"value":"0037-5497","type":"print"},{"value":"1741-3133","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,12]]}}}