{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T15:57:20Z","timestamp":1773244640875,"version":"3.50.1"},"reference-count":22,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T00:00:00Z","timestamp":1738886400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nd\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Econ. Comput."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>This article uses reinforcement learning (RL) to approximate the policy rules of banks participating in a high-value payment system (HVPS). The objective of the RL agents is to learn a policy function for the choice of amount of liquidity provided to the system at the beginning of the day and the rate at which to pay intraday payments. Individual choices have complex strategic effects precluding a closed form solution of the optimal policy, except in simple cases. We show that, in a stylized two-agent setting, RL agents learn the optimal policy that minimizes the cost of processing their individual payments\u2014without complete knowledge of the environment. We further demonstrate that, in more complex settings, both agents learn to reduce the cost of processing their payments and effectively respond to liquidity-delay tradeoff. Our results show the potential of RL to solve liquidity management problems in HVPS and provide new tools to assist policymakers in their mandates of ensuring safety and improving the efficiency of payment systems.<\/jats:p>","DOI":"10.1145\/3691326","type":"journal-article","created":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T05:48:06Z","timestamp":1737697686000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Estimating Policy Functions in Payment Systems Using Reinforcement Learning"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3206-336X","authenticated-orcid":false,"given":"Pablo","family":"Castro","sequence":"first","affiliation":[{"name":"Brain Team, Google Research, Montreal, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7098-3186","authenticated-orcid":false,"given":"Ajit","family":"Desai","sequence":"additional","affiliation":[{"name":"Banking and Payments Research, Bank of Canada, Ottawa, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0256-180X","authenticated-orcid":false,"given":"Han","family":"Du","sequence":"additional","affiliation":[{"name":"Bank of Canada, Ottawa, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2667-2556","authenticated-orcid":false,"given":"Rodney","family":"Garratt","sequence":"additional","affiliation":[{"name":"University of California, Santa Barbara, Santa Barbara, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1423-0467","authenticated-orcid":false,"given":"Francisco","family":"Rivadeneyra","sequence":"additional","affiliation":[{"name":"Bank of Canada, Ottawa, Canada"}]}],"member":"320","published-online":{"date-parts":[[2025,2,7]]},"reference":[{"issue":"2","key":"e_1_3_9_2_2","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1016\/S0022-0531(03)00016-4","article-title":"The intraday liquidity management game","volume":"109","author":"Bech Morten L.","year":"2003","unstructured":"Morten L. Bech and Rod Garratt. 2003. The intraday liquidity management game. J. Econ. Theor. 109, 2 (2003), 198\u2013219.","journal-title":"J. Econ. Theor."},{"key":"e_1_3_9_3_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_3_9_4_2","doi-asserted-by":"publisher","DOI":"10.1257\/aer.20190623"},{"key":"e_1_3_9_5_2","volume-title":"Liquidity Usage and Payment Delay Estimates of the New Canadian High Value Payments System","author":"Rivadeneyra Francisco","year":"2020","unstructured":"Francisco Rivadeneyra and Nellie Zhang. 2020. Liquidity Usage and Payment Delay Estimates of the New Canadian High Value Payments System. Technical Report. Bank of Canada Discussion Paper No 2020-9."},{"issue":"3","key":"e_1_3_9_6_2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.69554\/CGHQ3530","article-title":"From LVTS to Lynx: Quantitative assessment of payment system transition in Canada","volume":"17","author":"Desai Ajit","year":"2023","unstructured":"Ajit Desai, Zhentong Lu, Hiru Rodrigo, Jacob Sharples, Phoebe Tian, and Nellie Zhang. 2023. From LVTS to Lynx: Quantitative assessment of payment system transition in Canada. J. Paym. Strat. Syst. 17, 3 (2023), 291\u2013314.","journal-title":"J. Paym. Strat. Syst."},{"key":"e_1_3_9_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jedc.2010.11.001"},{"key":"e_1_3_9_8_2","volume-title":"Bank of Italy Temi di Discussione, Working Paper No","author":"Arciero Luca","year":"2008","unstructured":"Luca Arciero, Claudia Biancotti, Leandro d\u2019Aurizio, and Claudio Impenna. 2008. Exploring Agent-based Methods for the Analysis of Payment Systems: A Crisis Model for StarLogo TNG. Bank of Italy Temi di Discussione, Working Paper No 686."},{"key":"e_1_3_9_9_2","doi-asserted-by":"crossref","unstructured":"Mitsuru Igami. 2020. Artificial intelligence as structural estimation: Deep Blue Bonanza and AlphaGo. The Econometrics Journal 23 3 (2024) S1\u2013S24.","DOI":"10.1093\/ectj\/utaa005"},{"key":"e_1_3_9_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0165-1889(02)00122-7"},{"key":"e_1_3_9_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0899-8256(05)80020-X"},{"issue":"4","key":"e_1_3_9_12_2","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1257\/jep.30.4.151","article-title":"Whither game theory? Towards a theory of learning in games","volume":"30","author":"Fudenberg Drew","year":"2016","unstructured":"Drew Fudenberg and David K. Levine. 2016. Whither game theory? Towards a theory of learning in games. J. Econ. Perspect. 30, 4 (2016), 151\u2013170.","journal-title":"J. Econ. Perspect."},{"key":"e_1_3_9_13_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-021-00365-4"},{"key":"e_1_3_9_14_2","article-title":"Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks","author":"Martin Carlos","year":"2022","unstructured":"Carlos Martin and Tuomas Sandholm. 2022. Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks. arXiv preprint arXiv:2211.15936 (2022).","journal-title":"arXiv preprint arXiv:2211.15936"},{"issue":"4","key":"e_1_3_9_15_2","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1007\/s00199-018-1150-8","article-title":"Convergence results on stochastic adaptive learning","volume":"68","author":"Funai Naoki","year":"2019","unstructured":"Naoki Funai. 2019. Convergence results on stochastic adaptive learning. Econ. Theor. 68, 4 (2019), 907\u2013934.","journal-title":"Econ. Theor."},{"issue":"18","key":"e_1_3_9_16_2","doi-asserted-by":"crossref","first-page":"eabk2607","DOI":"10.1126\/sciadv.abk2607","article-title":"The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning","volume":"8","author":"Zheng Stephan","year":"2022","unstructured":"Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, and Richard Socher. 2022. The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning. Sci. Advan. 8, 18 (2022), eabk2607.","journal-title":"Sci. Advan."},{"key":"e_1_3_9_17_2","article-title":"Improving the efficiency of payments systems using quantum computing","author":"McMahon Christopher","year":"2022","unstructured":"Christopher McMahon, Donald McGillivray, Ajit Desai, Francisco Rivadeneyra, Jean-Paul Lam, Thomas Lo, Danica Marsden, and Vladimir Skavysh. 2022. Improving the efficiency of payments systems using quantum computing. arXiv preprint arXiv:2209.15392 (2022).","journal-title":"arXiv preprint arXiv:2209.15392"},{"key":"e_1_3_9_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbankfin.2019.07.016"},{"key":"e_1_3_9_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_9_20_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard et\u00a0al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916). 265\u2013283."},{"key":"e_1_3_9_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/1023205"},{"key":"e_1_3_9_22_2","first-page":"2137","volume-title":"Advances in Neural Information Processing Systems 29","author":"Foerster Jakob","year":"2016","unstructured":"Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 2137\u20132145. Retrieved from http:\/\/papers.nips.cc\/paper\/6042-learning-to-communicate-with-deep-multi-agent-reinforcement-learning.pdf"},{"key":"e_1_3_9_23_2","doi-asserted-by":"crossref","unstructured":"Rodney J. Garratt. 2022. An application of Shapley value cost allocation to liquidity savings mechanisms. Journal of Money Credit and Banking 54 6 (2022) 1875\u20131888.","DOI":"10.1111\/jmcb.12853"}],"container-title":["ACM Transactions on Economics and Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3691326","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3691326","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:06:07Z","timestamp":1750291567000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3691326"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,7]]},"references-count":22,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3691326"],"URL":"https:\/\/doi.org\/10.1145\/3691326","relation":{},"ISSN":["2167-8375","2167-8383"],"issn-type":[{"value":"2167-8375","type":"print"},{"value":"2167-8383","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,7]]},"assertion":[{"value":"2023-01-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}