{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T16:12:55Z","timestamp":1764000775295,"version":"build-2065373602"},"reference-count":59,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,4,22]],"date-time":"2022-04-22T00:00:00Z","timestamp":1650585600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The accurate estimation of how future demand will react to prices is central to the optimization of pricing decisions. The systems responsible for demand prediction and pricing optimization are called revenue management (RM) systems, and, in the airline industry, they play an important role in the company\u2019s profitability. As airlines\u2019 current pricing decisions impact future knowledge of the demand behavior, the RM systems may have to compromise immediate revenue by efficiently performing price experiments with the expectation that the information gained about the demand behavior will lead to better future pricing decisions. This earning while learning (EWL) problem has captured the attention of both the industry and academia in recent years, resulting in many proposed solutions based on heuristic optimization. We take a different approach that does not depend on human-designed heuristics. We present the EWL problem to a reinforcement learning agent, and the agent\u2019s goal is to maximize long-term revenue without explicitly considering the optimal way to perform price experimentation. The agent discovers through experience that \u201cmyopic\u201d revenue-maximizing policies may lead to a decrease in the demand model quality (which it relies on to take decisions). We show that the agent finds novel pricing policies that balance revenue maximization and demand model quality in a surprisingly effective way, generating more revenue over the long run than current practices.<\/jats:p>","DOI":"10.3390\/a15050142","type":"journal-article","created":{"date-parts":[[2022,4,23]],"date-time":"2022-04-23T08:14:06Z","timestamp":1650701646000},"page":"142","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Outsmarting Human Design in Airline Revenue Management"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2401-4768","authenticated-orcid":false,"given":"Giovanni","family":"Gatti Pinheiro","sequence":"first","affiliation":[{"name":"Amadeus SAS, 821 Avenue Jack Kilby, 06270 Villeneuve-Loubet, France"},{"name":"Department of Computer Science, Universite de la Cote d\u2019Azur, CNRS, I3S, 06100 Nice, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7227-1414","authenticated-orcid":false,"given":"Michael","family":"Defoin-Platel","sequence":"additional","affiliation":[{"name":"Amadeus SAS, 821 Avenue Jack Kilby, 06270 Villeneuve-Loubet, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6204-5894","authenticated-orcid":false,"given":"Jean-Charles","family":"Regin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Universite de la Cote d\u2019Azur, CNRS, I3S, 06100 Nice, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1057\/s41272-018-00174-2","article-title":"Can demand forecast accuracy be linked to airline revenue?","volume":"18","author":"Fiig","year":"2019","journal-title":"J. Revenue Pricing Manag."},{"key":"ref_2","first-page":"770","article-title":"Simultaneously learning and optimizing using controlled variance pricing","volume":"60","author":"Zwart","year":"2014","journal-title":"Manag. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"11711","DOI":"10.1007\/s00500-021-06047-y","article-title":"Novel pricing strategies for revenue maximization and demand learning using an exploration\u2014Exploitation framework","volume":"25","author":"Elreedy","year":"2021","journal-title":"Soft Comput."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1287\/moor.2016.0807","article-title":"Chasing demand: Learning and earning in a changing environment","volume":"42","author":"Keskin","year":"2017","journal-title":"Math. Oper. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1057\/s41272-017-0120-2","article-title":"Learning and optimizing through dynamic pricing","volume":"17","author":"Kumar","year":"2018","journal-title":"J. Revenue Pricing Manag."},{"key":"ref_6","unstructured":"Olsson, F. (2009). A Literature Survey of Active Machine Learning in the Context of Natural Language Processing, Swedish Institute of Computer Science."},{"key":"ref_7","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1586","DOI":"10.1287\/opre.2018.1755","article-title":"Online network revenue management using thompson sampling","volume":"66","author":"Ferreira","year":"2018","journal-title":"Oper. Res."},{"key":"ref_9","unstructured":"Trovo, F., Paladino, S., Restelli, M., and Gatti, N. (2015, January 10\u201311). Multi-armed bandit for pricing. Proceedings of the 12th European Workshop on Reinforcement Learning, Lille, France."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/s41586-021-04301-9","article-title":"Magnetic control of tokamak plasmas through deep reinforcement learning","volume":"602","author":"Degrave","year":"2022","journal-title":"Nature"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"103535","DOI":"10.1016\/j.artint.2021.103535","article-title":"Reward is enough","volume":"299","author":"Silver","year":"2021","journal-title":"Artif. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"332","DOI":"10.1057\/s41272-020-00228-4","article-title":"Reinforcement learning applied to airline revenue management","volume":"19","author":"Bondoux","year":"2020","journal-title":"J. Revenue Pricing Manag."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1057\/s41272-021-00285-3","article-title":"Dynamic pricing under competition using reinforcement learning","volume":"21","author":"Kastius","year":"2021","journal-title":"J. Revenue Pricing Manag."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1057\/s41272-021-00281-7","article-title":"A deep reinforcement learning approach to seat inventory control for airline revenue management","volume":"21","author":"Shihab","year":"2021","journal-title":"J. Revenue Pricing Manag."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_18","first-page":"198","article-title":"Report of the Uppsala Meeting, August 2\u20134, 1954","volume":"23","author":"Hansen","year":"1955","journal-title":"Econometrica"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1177\/002224295702100406","article-title":"Methods of estimating demand","volume":"21","author":"Hawkins","year":"1957","journal-title":"J. Mark."},{"key":"ref_20","unstructured":"Lobo, M.S., and Boyd, S. (2003, January 2\u20135). Pricing and learning with uncertain demand. Proceedings of the INFORMS Revenue Management Conference, Honolulu, HI, USA."},{"key":"ref_21","unstructured":"Chhabra, M., and Das, S. (2011, January 2\u20136). Learning the demand curve in posted-price digital goods auctions. Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1017\/S0269964811000246","article-title":"Optimal markdown pricing strategy with demand learning","volume":"26","author":"Kwon","year":"2012","journal-title":"Probab. Eng. Inf. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1287\/opre.1100.0867","article-title":"On the minimax complexity of pricing in a changing environment","volume":"59","author":"Besbes","year":"2011","journal-title":"Oper. Res."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1142","DOI":"10.1287\/opre.2014.1294","article-title":"Dynamic pricing with an unknown demand model: A symptotically optimal semi-myopic policies","volume":"62","author":"Keskin","year":"2014","journal-title":"Oper. Res."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"974","DOI":"10.1287\/opre.2020.2016","article-title":"Nonparametric pricing analytics with customer covariates","volume":"69","author":"Chen","year":"2021","journal-title":"Oper. Res."},{"key":"ref_26","unstructured":"Chen, N., and Gallego, G. (2022, February 10). A Primal\u2014Dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint. Available online: https:\/\/pubsonline.informs.org\/doi\/abs\/10.1287\/moor.2021.1220."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Aoki, M. (1973, January 7\u201311). On a dual control approach to the pricing policies of a trading specialist. Proceedings of the IFIP Technical Conference on Optimization Techniques, Rome, Italy.","DOI":"10.1007\/3-540-06600-4_24"},{"key":"ref_28","first-page":"311","article-title":"Multistage pricing under uncertain demand","volume":"Volume 4","author":"Chong","year":"1975","journal-title":"Annals of Economic and Social Measurement"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1016\/0165-1889(84)90023-X","article-title":"Price dispersion and incomplete learning in the long run","volume":"7","author":"McLennan","year":"1984","journal-title":"J. Econ. Dyn. Control"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/0022-0531(74)90066-0","article-title":"A two-armed bandit theory of market pricing","volume":"9","author":"Rothschild","year":"1974","journal-title":"J. Econ. Theory"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1407","DOI":"10.1287\/opre.1080.0640","article-title":"Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms","volume":"57","author":"Besbes","year":"2009","journal-title":"Oper. Res."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1287\/opre.2015.1397","article-title":"Dynamic pricing and learning with finite inventories","volume":"63","author":"Zwart","year":"2015","journal-title":"Oper. Res."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Gatti Pinheiro, G., Defoin-Platel, M., and Regin, J.C. (2022). Optimizing revenue maximization and demand learning in airline revenue management. arXiv.","DOI":"10.3390\/a15050142"},{"key":"ref_34","unstructured":"Aviv, Y., and Pazgal, A. (2005). Dynamic Pricing of Short Life-Cycle Products through Active Learning, Olin School Business, Washington University."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1002\/nav.20204","article-title":"Bayesian strategies for dynamic pricing in e-commerce","volume":"54","author":"Cope","year":"2007","journal-title":"Nav. Res. Logist. (NRL)"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1111\/j.1937-5956.2007.tb00290.x","article-title":"Dynamic pricing in e-services under demand uncertainty","volume":"16","author":"Xia","year":"2007","journal-title":"Prod. Oper. Manag."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1093\/biomet\/25.3-4.285","article-title":"On the likelihood that one unknown probability exceeds another in view of the evidence of two samples","volume":"25","author":"Thompson","year":"1933","journal-title":"Biometrika"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1013689704352","article-title":"Finite-time analysis of the multiarmed bandit problem","volume":"47","author":"Auer","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_39","first-page":"213","article-title":"R-max-a general polynomial time algorithm for near-optimal reinforcement learning","volume":"3","author":"Brafman","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1023\/A:1017984413808","article-title":"Near-optimal reinforcement learning in polynomial time","volume":"49","author":"Kearns","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Pathak, D., Agrawal, P., Efros, A.A., and Darrell, T. (2017, January 6\u201311). Curiosity-driven exploration by self-supervised prediction. Proceedings of the International Conference on Machine Learning, Sydney, Australia.","DOI":"10.1109\/CVPRW.2017.70"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1287\/mnsc.40.8.999","article-title":"Optimal dynamic pricing of inventories with stochastic demand over finite horizons","volume":"40","author":"Gallego","year":"1994","journal-title":"Manag. Sci."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1287\/msom.2014.0475","article-title":"Estimation of choice-based models using sales data from a single firm","volume":"16","author":"Newman","year":"2014","journal-title":"Manuf. Serv. Oper. Manag."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Talluri, K.T., Van Ryzin, G., and Van Ryzin, G. (2004). The Theory and Practice of Revenue Management, Springer.","DOI":"10.1007\/b139000"},{"key":"ref_45","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014, January 21\u201326). Deterministic policy gradient algorithms. Proceedings of the International Conference on Machine Learning, Beijing, China."},{"key":"ref_46","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 20\u201322). Asynchronous methods for deep reinforcement learning. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_47","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 6\u201311). Trust region policy optimization. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_48","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_49","unstructured":"Wan, Y., Naik, A., and Sutton, R.S. (2021, January 18\u201324). Learning and planning in average-reward markov decision processes. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_50","unstructured":"Zhang, S., Wan, Y., Sutton, R.S., and Whiteson, S. (2021). Average-Reward Off-Policy Policy Evaluation with Function Approximation. arXiv."},{"key":"ref_51","unstructured":"Degris, T., White, M., and Sutton, R.S. (2012). Off-policy actor\u2013critic. arXiv."},{"key":"ref_52","unstructured":"Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015, January 6\u201311). Universal value function approximators. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0004-3702(98)00023-X","article-title":"Planning and acting in partially observable stochastic domains","volume":"101","author":"Kaelbling","year":"1998","journal-title":"Artif. Intell."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Luong, M.T., Pham, H., and Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. arXiv.","DOI":"10.18653\/v1\/D15-1166"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_57","unstructured":"Belobaba, P.P., and Hopperstad, C. (2004). Algorithms for revenue management in unrestricted fare markets. Proceedings of the Meeting of the INFORMS Section on Revenue Management, Massachusetts Institute of Technology."},{"key":"ref_58","unstructured":"Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., Gonzalez, J., Jordan, M., and Stoica, I. (2018, January 10\u201315). RLlib: Abstractions for distributed reinforcement learning. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_59","unstructured":"Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. arXiv."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/5\/142\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:58:47Z","timestamp":1760137127000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/5\/142"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,22]]},"references-count":59,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["a15050142"],"URL":"https:\/\/doi.org\/10.3390\/a15050142","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2022,4,22]]}}}