{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T10:36:29Z","timestamp":1763807789559,"version":"3.45.0"},"reference-count":63,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:00:00Z","timestamp":1750291200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:00:00Z","timestamp":1750291200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100032090","name":"Universit\u00e4t der Bundeswehr M\u00fcnchen","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100032090","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["OR Spectrum"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Due to exponentially growing state and action spaces, network dynamic pricing problems are analytically intractable such that state-of-the-art approaches rely on heuristics. Reinforcement learning has successfully been applied in various complex domains, but its successful applicability to pricing may be limited by two factors. First, the need for extensive state and action space exploration causes lost revenues when directly training within the real world. Secondly, alternatively replicating the real world in an accurate simulation to perform the training therein comes with limitations as well, because calibrating the simulation would require precise domain knowledge, which in general does not exist. To overcome the above issues, with this work, we propose a new dynamic pricing approach based on offline reinforcement learning. In contrast to online reinforcement learning, training solely requires a static data set containing information on historic sales, which stems from applying some arbitrary behavior policy in the past. In particular, we develop a low-dimensional state and actions space reformulation of the considered generic dynamic pricing problem which allows to incorporate the critic-regularized regression algorithm within a scalable approach. We also adapt the standard algorithm\u2019s actor loss function, such that it can deal with the pricing problem\u2019s state-dependent action space. Our studies show that the trained policy dominates and in some cases substantially outperforms the respective behavior policy. Hence, although there are some limitations that have to be discussed, offline reinforcement learning seems to be a promising approach for dynamic pricing in case online reinforcement learning is not an option.<\/jats:p>","DOI":"10.1007\/s00291-025-00821-2","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T10:59:21Z","timestamp":1750330761000},"page":"1217-1266","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Improving network dynamic pricing policies through offline reinforcement learning"],"prefix":"10.1007","volume":"47","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-3884-7648","authenticated-orcid":false,"given":"Philipp","family":"Hausenblas","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0710-750X","authenticated-orcid":false,"given":"Dominik","family":"Eichhorn","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6727-1591","authenticated-orcid":false,"given":"Andreas","family":"Brieden","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3399-0525","authenticated-orcid":false,"given":"Matthias","family":"Soppert","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4263-6608","authenticated-orcid":false,"given":"Claudius","family":"Steinhardt","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"issue":"4","key":"821_CR1","doi-asserted-by":"publisher","first-page":"647","DOI":"10.1287\/opre.1060.0368","volume":"55","author":"D Adelman","year":"2007","unstructured":"Adelman D (2007) Dynamic bid prices in revenue management. Op Res 55(4):647\u2013661","journal-title":"Op Res"},{"key":"821_CR2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jairtraman.2020.101979","volume":"91","author":"J An","year":"2021","unstructured":"An J, Mikhaylov A, Jung S-U (2021) A linear programming approach for robust network revenue management in the airline industry. J Air Transp Manag 91:101979","journal-title":"J Air Transp Manag"},{"issue":"1","key":"821_CR3","first-page":"16","volume":"22","author":"C Barz","year":"2023","unstructured":"Barz C, Laumer S, Freyschmidt M, Mart\u00ednez-Blanco J (2023) Discrete dynamic pricing and application of network revenue management for flixbus. J Rev Pric Manag 22(1):16\u201333","journal-title":"J Rev Pric Manag"},{"issue":"1","key":"821_CR4","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1287\/opre.1040.0164","volume":"53","author":"D Bertsimas","year":"2005","unstructured":"Bertsimas D, De Boer S (2005) Simulation-based booking limits for airline revenue management. Op Res 53(1):90\u2013106","journal-title":"Op Res"},{"issue":"6","key":"821_CR5","doi-asserted-by":"publisher","first-page":"1537","DOI":"10.1287\/opre.1120.1103","volume":"60","author":"O Besbes","year":"2012","unstructured":"Besbes O, Zeevi A (2012) Blind network revenue management. Op Res 60(6):1537\u20131550","journal-title":"Op Res"},{"issue":"3","key":"821_CR6","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1287\/trsc.2013.0469","volume":"48","author":"\u015e\u0130 Birbil","year":"2014","unstructured":"Birbil \u015e\u0130, Frenk J, Gromicho JA, Zhang S (2014) A network airline revenue management framework based on decomposition by origins and destinations. Transp Sci 48(3):313\u2013333","journal-title":"Transp Sci"},{"issue":"1","key":"821_CR7","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1287\/mnsc.43.1.64","volume":"43","author":"G Bitran","year":"1997","unstructured":"Bitran G, Mondschein S (1997) Periodic pricing of seasonal products in retailing. Manag Sci 43(1):64\u201379","journal-title":"Manag Sci"},{"issue":"5","key":"821_CR8","first-page":"332","volume":"19","author":"N Bondoux","year":"2020","unstructured":"Bondoux N, Quan A, Fiig T, Acuna-Agost R (2020) Reinforcement learning applied to airline revenue management. J Rev Pric Manag 19(5):332\u2013348","journal-title":"J Rev Pric Manag"},{"issue":"3","key":"821_CR9","doi-asserted-by":"publisher","first-page":"769","DOI":"10.1287\/opre.1080.0567","volume":"57","author":"JJM Bront","year":"2009","unstructured":"Bront JJM, M\u00e9ndez-D\u00edaz I, Vulcano G (2009) A column generation algorithm for choice-based network revenue management. Op Res 57(3):769\u2013784","journal-title":"Op Res"},{"doi-asserted-by":"crossref","unstructured":"Bu J, Simchi-Levi D, Xu Y (2020) Online pricing with offline data: Phase transition and inverse square law. In international conference on machine learning, pp 1202\u20131210. PMLR","key":"821_CR10","DOI":"10.2139\/ssrn.3471501"},{"issue":"1","key":"821_CR11","first-page":"1","volume":"20","author":"AV den Boer","year":"2015","unstructured":"den Boer AV (2015) Dynamic pricing and learning: historical origins, current research, and new directions. Surv Op Res Manag Sci 20(1):1\u201318","journal-title":"Surv Op Res Manag Sci"},{"issue":"4","key":"821_CR12","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1016\/0191-2615(88)90001-X","volume":"22","author":"M Dror","year":"1988","unstructured":"Dror M, Trudeau P, Ladany SP (1988) Network models for seat allocation on flights. Transp Res Part B: Methodol 22(4):239\u2013250","journal-title":"Transp Res Part B: Methodol"},{"issue":"10","key":"821_CR13","doi-asserted-by":"publisher","first-page":"1287","DOI":"10.1287\/mnsc.49.10.1287.17315","volume":"49","author":"W Elmaghraby","year":"2003","unstructured":"Elmaghraby W, Keskinocak P (2003) Dynamic pricing in the presence of inventory considerations: research overview, current practices, and future directions. Manag Sci 49(10):1287\u20131309","journal-title":"Manag Sci"},{"key":"821_CR14","first-page":"325","volume":"10","author":"A Erdelyi","year":"2011","unstructured":"Erdelyi A, Topaloglu H (2011) Using decomposition methods to solve pricing problems in network revenue management. J Rev Pric Manag 10:325\u2013343","journal-title":"J Rev Pric Manag"},{"issue":"6","key":"821_CR15","doi-asserted-by":"publisher","first-page":"1586","DOI":"10.1287\/opre.2018.1755","volume":"66","author":"KJ Ferreira","year":"2018","unstructured":"Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using thompson sampling. Op Res 66(6):1586\u20131602","journal-title":"Op Res"},{"key":"821_CR16","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4939-9606-3","volume-title":"Revenue management and pricing analytics","author":"G Gallego","year":"2019","unstructured":"Gallego G, Topaloglu H (2019) Revenue management and pricing analytics. Springer, New York, NY"},{"issue":"1","key":"821_CR17","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1287\/opre.45.1.24","volume":"45","author":"G Gallego","year":"1997","unstructured":"Gallego G, van Ryzin GJ (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Op Res 45(1):24\u201341","journal-title":"Op Res"},{"doi-asserted-by":"crossref","unstructured":"Gallego G, Iyengar G, Phillips R, Dubey A (2004) Managing flexible products on a network. Technical report, CORC Technical Report Tr-2004-01","key":"821_CR18","DOI":"10.2139\/ssrn.3567371"},{"issue":"3","key":"821_CR19","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1287\/inte.12.3.73","volume":"12","author":"F Glover","year":"1982","unstructured":"Glover F, Glover R, Lorenzo J, McMillan C (1982) The passenger-mix problem in the scheduled airlines. Interfaces 12(3):73\u201380","journal-title":"Interfaces"},{"key":"821_CR20","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/B:MACH.0000019802.64038.6c","volume":"55","author":"A Gosavi","year":"2004","unstructured":"Gosavi A (2004) A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis. Mach Learn 55:5\u201329","journal-title":"Mach Learn"},{"key":"821_CR21","doi-asserted-by":"publisher","first-page":"729","DOI":"10.1080\/07408170208928908","volume":"34","author":"A Gosavi","year":"2002","unstructured":"Gosavi A, Bandla N, Das TK (2002) A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking. IIE Trans 34:729\u2013742","journal-title":"IIE Trans"},{"unstructured":"Greg B, Vicki C, Ludwig P, Jonas S, John S, Jie T, Wojciech Z (2016) Openai gym. https:\/\/gym.openai.com","key":"821_CR22"},{"unstructured":"Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2019) Soft actor-critic algorithms and applications","key":"821_CR23"},{"issue":"2","key":"821_CR24","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1287\/moor.1120.0537","volume":"37","author":"S Jasin","year":"2012","unstructured":"Jasin S, Kumar S (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math Op Res 37(2):313\u2013345","journal-title":"Math Op Res"},{"issue":"1","key":"821_CR25","first-page":"50","volume":"21","author":"A Kastius","year":"2022","unstructured":"Kastius A, Schlosser R (2022) Dynamic pricing under competition using reinforcement learning. J Rev Pric Manag 21(1):50\u201363","journal-title":"J Rev Pric Manag"},{"issue":"2","key":"821_CR26","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1016\/j.ejor.2019.06.034","volume":"284","author":"R Klein","year":"2020","unstructured":"Klein R, Koch S, Steinhardt C, Strauss AK (2020) A review of revenue management: recent generalizations and advances in industry applications. Eur J Op Res 284(2):397\u2013412","journal-title":"Eur J Op Res"},{"issue":"2","key":"821_CR27","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1016\/j.ejor.2019.06.034","volume":"284","author":"R Klein","year":"2020","unstructured":"Klein R, Koch S, Steinhardt C, Strauss AK (2020) A review of revenue management: recent generalizations and advances in industry applications. Eur J Op Res 284(2):397\u2013412","journal-title":"Eur J Op Res"},{"unstructured":"Kumar A, Fu J, Tucker G, Levine S (2019) Stabilizing off-policy q-learning via bootstrapping error reduction. In Advances in Neural Information Processing Systems","key":"821_CR28"},{"key":"821_CR29","first-page":"1179","volume":"33","author":"A Kumar","year":"2020","unstructured":"Kumar A, Zhou A, Tucker G, Levine S (2020) Conservative q-learning for offline reinforcement learning. Adv Neural Inf Process Syst 33:1179\u20131191","journal-title":"Adv Neural Inf Process Syst"},{"key":"821_CR30","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1016\/j.engappai.2019.04.008","volume":"82","author":"RJ Lawhead","year":"2019","unstructured":"Lawhead RJ, Gosavi A (2019) A bounded actor-critic reinforcement learning algorithm applied to airline revenue management. Eng Appl Artif Intell 82:252\u2013262","journal-title":"Eng Appl Artif Intell"},{"key":"821_CR31","first-page":"1","volume":"17","author":"S Levine","year":"2016","unstructured":"Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17:1\u201340","journal-title":"J Mach Learn Res"},{"unstructured":"Levine S, Kumar A, Tucker G, Fu J (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arxiv","key":"821_CR32"},{"unstructured":"Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Goldberg K, Gonzalez J\u00a0E, Jordan M\u00a0I, Stoica I (2018) Abstractions for distributed reinforcement learning. In International Conference on Machine Learning (ICML)","key":"821_CR33"},{"issue":"2","key":"821_CR34","doi-asserted-by":"publisher","first-page":"288","DOI":"10.1287\/msom.1070.0169","volume":"10","author":"Q Liu","year":"2008","unstructured":"Liu Q, Van Ryzin G (2008) On the choice-based linear programming model for network revenue management. Manuf Serv Op Manag 10(2):288\u2013310","journal-title":"Manuf Serv Op Manag"},{"issue":"2","key":"821_CR35","doi-asserted-by":"publisher","first-page":"459","DOI":"10.1016\/j.ejor.2011.06.033","volume":"216","author":"J Meissner","year":"2012","unstructured":"Meissner J, Strauss A (2012) Network revenue management with inventory-sensitive bid prices and customer choice. Eur J Op Res 216(2):459\u2013468","journal-title":"Eur J Op Res"},{"issue":"2","key":"821_CR36","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1287\/mksc.2018.1129","volume":"38","author":"K Misra","year":"2019","unstructured":"Misra K, Schwartz EM, Abernethy J (2019) Dynamic online pricing with incomplete information using multiarmed bandit experiments. Mark Sci 38(2):226\u2013252","journal-title":"Mark Sci"},{"issue":"7540","key":"821_CR37","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533","journal-title":"Nature"},{"unstructured":"Mnih V, Badia AP, Mirza L, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In 33rd international conference on machine learning, pp 2850\u20132869. PMLR","key":"821_CR38"},{"unstructured":"Monier L, Kmec J, Laterre A, Pierrot T, Courgeau V, Sigaud O, Beguir K (2020) Offline reinforcement learning hands-on. arxiv","key":"821_CR39"},{"key":"821_CR40","doi-asserted-by":"publisher","DOI":"10.1093\/oxfordhb\/9780199543175.001.0001","volume-title":"The Oxford handbook of pricing management","author":"\u00d6 \u00d6zer","year":"2012","unstructured":"\u00d6zer \u00d6, Phillips R (2012) The Oxford handbook of pricing management. Oxford University Press, Oxford, United Kingdom"},{"issue":"1","key":"821_CR41","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1287\/msom.1080.0252","volume":"12","author":"G Perakis","year":"2010","unstructured":"Perakis G, Roels G (2010) Robust controls for network revenue management. Manuf Serv Op Manag 12(1):56\u201376","journal-title":"Manuf Serv Op Manag"},{"key":"821_CR42","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1007\/s10479-012-1077-6","volume":"241","author":"W Powell","year":"2016","unstructured":"Powell W (2016) Perspectives of approximate dynamic programming. Ann Op Res 241:319\u2013356","journal-title":"Ann Op Res"},{"key":"821_CR43","first-page":"759","volume":"2000","author":"D Precup","year":"2000","unstructured":"Precup D, Sutton RS, Singh S (2000) Eligibility traces for off-policy policy evaluation. Computer Sci Dep Fac Publ Series. 2000:759\u2013766","journal-title":"Computer Sci Dep Fac Publ Series."},{"key":"821_CR44","doi-asserted-by":"publisher","first-page":"10237","DOI":"10.1109\/TNNLS.2023.3250269","volume":"35","author":"RF Prudencio","year":"2023","unstructured":"Prudencio RF, Maximo MROA, Colombini EL (2023) A survey on offline reinforcement learning: taxonomy, review, and open problems. IEEE Trans Neural Netw Learn Syst 35:10237\u201310257","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"7","key":"821_CR45","first-page":"2801","volume":"66","author":"P Pumpensanti","year":"2020","unstructured":"Pumpensanti P, Wang H (2020) A re-solving heuristic with uniformly bounded loss for network revenue management. Manag Sci 66(7):2801\u20133294","journal-title":"Manag Sci"},{"key":"821_CR46","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1007\/s10479-006-7372-3","volume":"143","author":"CVL Raju","year":"2006","unstructured":"Raju CVL, Narahari Y, Ravikumar K (2006) Learning dynamic prices in electronic retail markets with customer segmentation. Ann Op Res 143:59\u201375","journal-title":"Ann Op Res"},{"key":"821_CR47","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1016\/j.omega.2013.10.004","volume":"47","author":"R Rana","year":"2014","unstructured":"Rana R, Oliveira FS (2014) Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning. Omega 47:116\u2013126","journal-title":"Omega"},{"unstructured":"Rummery GA, Niranjan M (1994) On-line q-learning using connectionist systems. Technical report, University of Cambridge, Department of Engineering, Cambridge, England","key":"821_CR48"},{"issue":"3","key":"821_CR49","doi-asserted-by":"publisher","first-page":"650","DOI":"10.1016\/j.ejor.2010.01.008","volume":"205","author":"C Sch\u00f6n","year":"2010","unstructured":"Sch\u00f6n C (2010) Optimal dynamic price selection under attraction choice models. Eur J Op Res 205(3):650\u2013660","journal-title":"Eur J Op Res"},{"issue":"7587","key":"821_CR50","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484\u2013489","journal-title":"Nature"},{"issue":"2","key":"821_CR51","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1016\/j.ejor.2018.01.011","volume":"271","author":"AK Strauss","year":"2018","unstructured":"Strauss AK, Klein R, Steinhardt C (2018) A review of choice-based revenue management: theory and methods. Eur J Op Res 271(2):375\u2013387","journal-title":"Eur J Op Res"},{"key":"821_CR52","volume-title":"Reinforcement Learning: An Introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA"},{"key":"821_CR53","doi-asserted-by":"publisher","DOI":"10.1007\/b139000","volume-title":"The theory and practice of revenue management","author":"KT Talluri","year":"2004","unstructured":"Talluri KT, van Ryzin GJ (2004) The theory and practice of revenue management. Springer, Boston, MA"},{"issue":"1","key":"821_CR54","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1287\/mnsc.1030.0147","volume":"50","author":"KT Talluri","year":"2004","unstructured":"Talluri KT, van Ryzin GJ (2004) Revenue management under a general discrete choice model of consumer behavior. Manag Sci 50(1):15\u201333","journal-title":"Manag Sci"},{"doi-asserted-by":"crossref","unstructured":"Thomas P, Theocharous G, Ghavamzadeh M (2015) High-confidence off-policy evaluation. In Twenty-Ninth AAAI conference on artificial intelligence","key":"821_CR55","DOI":"10.1609\/aaai.v29i1.9541"},{"issue":"3","key":"821_CR56","doi-asserted-by":"publisher","first-page":"637","DOI":"10.1287\/opre.1080.0597","volume":"57","author":"H Topaloglu","year":"2009","unstructured":"Topaloglu H (2009) Using lagrangian relaxation to compute capacity-dependent bid prices in network revenue management. Op Res 57(3):637\u2013649","journal-title":"Op Res"},{"issue":"4","key":"821_CR57","doi-asserted-by":"publisher","first-page":"865","DOI":"10.1287\/opre.1080.0550","volume":"56","author":"G Van Ryzin","year":"2008","unstructured":"Van Ryzin G, Vulcano G (2008) Simulation-based optimization of virtual nesting controls for network revenue management. Op Res 56(4):865\u2013880","journal-title":"Op Res"},{"issue":"6","key":"821_CR58","doi-asserted-by":"publisher","first-page":"1352","DOI":"10.1287\/opre.2015.1442","volume":"63","author":"TW Vossen","year":"2015","unstructured":"Vossen TW, Zhang D (2015) Reductions of approximate linear programs for network revenue management. Op Res 63(6):1352\u20131371","journal-title":"Op Res"},{"key":"821_CR59","first-page":"7768","volume":"33","author":"Z Wang","year":"2020","unstructured":"Wang Z, Novikov A, Zolna K, Merel JS, Springenberg JT, Reed SE, Shahriari B, Siegel N, Merel J, Gulcehre C et al (2020) Critic regularized regression. Neural Inf Process Syst 33:7768\u20137778","journal-title":"Neural Inf Process Syst"},{"key":"821_CR60","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1023\/A:1022672621406","volume":"8","author":"RJ Williams","year":"1992","unstructured":"Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229\u2013256","journal-title":"Mach Learn"},{"issue":"1","key":"821_CR61","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1287\/msom.1100.0302","volume":"13","author":"D Zhang","year":"2011","unstructured":"Zhang D (2011) An improved dynamic programming decomposition approach for network revenue management. Manuf Serv Op Manag 13(1):35\u201352","journal-title":"Manuf Serv Op Manag"},{"issue":"3","key":"821_CR62","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1287\/trsc.1090.0262","volume":"43","author":"D Zhang","year":"2009","unstructured":"Zhang D, Adelman D (2009) An approximate dynamic programming approach to network revenue management with customer choice. Transp Sci 43(3):381\u2013394","journal-title":"Transp Sci"},{"issue":"1","key":"821_CR63","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1287\/ijoc.1110.0488","volume":"25","author":"D Zhang","year":"2013","unstructured":"Zhang D, Lu Z (2013) Assessing the value of dynamic pricing in network revenue management. INFORMS J Computing 25(1):102\u2013115","journal-title":"INFORMS J Computing"}],"container-title":["OR Spectrum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00291-025-00821-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00291-025-00821-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00291-025-00821-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T01:02:23Z","timestamp":1763773343000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00291-025-00821-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":63,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["821"],"URL":"https:\/\/doi.org\/10.1007\/s00291-025-00821-2","relation":{},"ISSN":["0171-6468","1436-6304"],"issn-type":[{"type":"print","value":"0171-6468"},{"type":"electronic","value":"1436-6304"}],"subject":[],"published":{"date-parts":[[2025,6,19]]},"assertion":[{"value":"21 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 June 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}