{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T08:00:35Z","timestamp":1776931235085,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":33,"publisher":"ACM","funder":[{"name":"HKUST (Guangzhou) Start-up Fund","award":["G0101000197"],"award-info":[{"award-number":["G0101000197"]}]},{"name":"Guangzhou-HKUST(GZ) Joint Funding Program","award":["No. 2024A03J0630"],"award-info":[{"award-number":["No. 2024A03J0630"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,11,15]]},"DOI":"10.1145\/3768292.3770357","type":"proceedings-article","created":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T07:24:26Z","timestamp":1763105066000},"page":"80-87","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Algorithmic pricing with independent learners and relative experience replay"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0680-7967","authenticated-orcid":false,"given":"Bingyan","family":"Han","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,11,14]]},"reference":[{"key":"e_1_3_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Ibrahim Abada and Xavier Lambin. 2023. Artificial intelligence: Can seemingly collusive outcomes be avoided? Management Science 69 9 (2023) 5042\u20135065.","DOI":"10.1287\/mnsc.2022.4623"},{"key":"e_1_3_3_1_3_2","first-page":"5048","volume-title":"Advances in Neural Information Processing Systems 30","author":"Andrychowicz Marcin","year":"2017","unstructured":"Marcin Andrychowicz, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight Experience Replay. In Advances in Neural Information Processing Systems 30, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna\u00a0M. Wallach, Rob Fergus, S.\u00a0V.\u00a0N. Vishwanathan, and Roman Garnett (Eds.). 5048\u20135058."},{"key":"e_1_3_3_1_4_2","doi-asserted-by":"crossref","unstructured":"Stephanie Assad Robert Clark Daniel Ershov and Lei Xu. 2024. Algorithmic pricing and competition: Empirical evidence from the German retail gasoline market. Journal of Political Economy 132 3 (2024) 723\u2013771.","DOI":"10.1086\/726906"},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Giovanni Bellitto Federica Proietto\u00a0Salanitri Matteo Pennisi Matteo Boschini Lorenzo Bonicelli Angelo Porrello Simone Calderara Simone Palazzo and Concetto Spampinato. 2024. Saliency-driven experience replay for continual learning. Advances in Neural Information Processing Systems 37 (2024) 103356\u2013103383.","DOI":"10.52202\/079017-3284"},{"key":"e_1_3_3_1_6_2","volume-title":"International Conference on Machine Learning","author":"Bertrand Quentin","year":"2025","unstructured":"Quentin Bertrand, Juan\u00a0Agustin Duque, Emilio Calvano, and Gauthier Gidel. 2025. Self-play Q-learners can provably collude in the iterated prisoner\u2019s dilemma. In International Conference on Machine Learning."},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"crossref","unstructured":"John Bizjak Swaminathan Kalpathy Zhichuan\u00a0Frank Li and Brian Young. 2022. The choice of peers for relative performance evaluation in executive compensation. Review of Finance 26 5 (2022) 1217\u20131239.","DOI":"10.1093\/rof\/rfac016"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Emilio Calvano Giacomo Calzolari Vincenzo Denicol\u00f2 Joseph\u00a0E Harrington and Sergio Pastorello. 2020. Protecting consumers from collusive prices due to AI. Science 370 6520 (2020) 1040\u20131042.","DOI":"10.1126\/science.abe3796"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"crossref","unstructured":"Emilio Calvano Giacomo Calzolari Vincenzo Denicol\u00f2 and Sergio Pastorello. 2020. Artificial intelligence algorithmic pricing and collusion. American Economic Review 110 10 (2020) 3267\u201397.","DOI":"10.1257\/aer.20190623"},{"key":"e_1_3_3_1_10_2","doi-asserted-by":"crossref","unstructured":"Alvaro Cartea Patrick Chang Jos\u00e9 Penalva and Harrison Waldon. 2022. The algorithmic learning equations: Evolving strategies in dynamic games. Available at SSRN 4175239 (2022).","DOI":"10.2139\/ssrn.4175239"},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"crossref","unstructured":"Rama Cont and Wei Xiong. 2024. Dynamics of market making algorithms in dealer markets: Learning and tacit collusion. Mathematical Finance 34 2 (2024) 467\u2013521.","DOI":"10.1111\/mafi.12401"},{"key":"e_1_3_3_1_12_2","unstructured":"Winston\u00a0Wei Dou Itay Goldstein and Yan Ji. 2025. AI-powered trading algorithmic collusion and price efficiency. Working Paper (2025). https:\/\/ssrn.com\/abstract=4452704"},{"key":"e_1_3_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.4159\/9780674973336"},{"key":"e_1_3_3_1_14_2","first-page":"3061","volume-title":"International Conference on Machine Learning","author":"Fedus William","year":"2020","unstructured":"William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, and Will Dabney. 2020. Revisiting fundamentals of experience replay. In International Conference on Machine Learning. PMLR, 3061\u20133071."},{"key":"e_1_3_3_1_15_2","first-page":"1146","volume-title":"Proceedings of the 34th International Conference on Machine Learning","volume":"70","author":"Foerster Jakob","year":"2017","unstructured":"Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H.\u00a0S. Torr, Pushmeet Kohli, and Shimon Whiteson. 2017. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning , Doina Precup and Yee\u00a0Whye Teh (Eds.), Vol.\u00a070. 1146\u20131155."},{"key":"e_1_3_3_1_16_2","series-title":"Proceedings of Machine Learning Research","first-page":"4414","volume-title":"International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event","volume":"151","author":"Fox Roy","year":"2022","unstructured":"Roy Fox, Stephen\u00a0M. McAleer, Will Overman, and Ioannis Panageas. 2022. Independent Natural Policy Gradient always converges in Markov Potential Games. In International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event(Proceedings of Machine Learning Research, Vol.\u00a0151), Gustau Camps-Valls, Francisco J.\u00a0R. Ruiz, and Isabel Valera (Eds.). PMLR, 4414\u20134425. https:\/\/proceedings.mlr.press\/v151\/fox22a.html"},{"key":"e_1_3_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Karsten\u00a0T Hansen Kanishka Misra and Mallesh\u00a0M Pai. 2021. Frontiers: Algorithmic collusion: Supra-competitive prices via independent algorithms. Marketing Science 40 1 (2021) 1\u201312.","DOI":"10.1287\/mksc.2020.1276"},{"key":"e_1_3_3_1_18_2","first-page":"3330","volume-title":"Advances in Neural Information Processing Systems 31","author":"Hughes Edward","year":"2018","unstructured":"Edward Hughes, Joel\u00a0Z. Leibo, Matthew Phillips, Karl Tuyls, Edgar\u00a0A. Du\u00e9\u00f1ez-Guzm\u00e1n, Antonio\u00a0Garc\u00eda Casta\u00f1eda, Iain Dunning, Tina Zhu, Kevin\u00a0R. McKee, Raphael Koster, Heather Roff, and Thore Graepel. 2018. Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in Neural Information Processing Systems 31, Samy Bengio, Hanna\u00a0M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). 3330\u20133340."},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"crossref","unstructured":"Timo Klein. 2021. Autonomous algorithmic collusion: Q-learning under sequential pricing. The RAND Journal of Economics 52 3 (2021) 538\u2013558.","DOI":"10.1111\/1756-2171.12383"},{"key":"e_1_3_3_1_20_2","first-page":"4190","volume-title":"Advances in Neural Information Processing Systems 30","author":"Lanctot Marc","year":"2017","unstructured":"Marc Lanctot, Vin\u00edcius\u00a0Flores Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien P\u00e9rolat, David Silver, and Thore Graepel. 2017. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. In Advances in Neural Information Processing Systems 30, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna\u00a0M. Wallach, Rob Fergus, S.\u00a0V.\u00a0N. Vishwanathan, and Roman Garnett (Eds.). 4190\u20134203."},{"key":"e_1_3_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/3091125.3091194"},{"key":"e_1_3_3_1_22_2","volume-title":"The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022","author":"Leonardos Stefanos","year":"2022","unstructured":"Stefanos Leonardos, Will Overman, Ioannis Panageas, and Georgios Piliouras. 2022. Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022."},{"key":"e_1_3_3_1_23_2","doi-asserted-by":"crossref","unstructured":"Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning planning and teaching. Machine Learning 8 3-4 (1992) 293\u2013321.","DOI":"10.1023\/A:1022628806385"},{"key":"e_1_3_3_1_24_2","unstructured":"Cong Lu Philip Ball Yee\u00a0Whye Teh and Jack Parker-Holder. 2023. Synthetic experience replay. Advances in Neural Information Processing Systems 36 (2023) 46323\u201346344."},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.5555\/3463952.3464059"},{"key":"e_1_3_3_1_26_2","doi-asserted-by":"crossref","unstructured":"Jeanine Mikl\u00f3s-Thal and Catherine Tucker. 2019. Collusion by algorithm: Does better demand prediction facilitate coordination between sellers? Management Science 65 4 (2019) 1552\u20131561.","DOI":"10.1287\/mnsc.2019.3287"},{"key":"e_1_3_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei\u00a0A Rusu Joel Veness Marc\u00a0G Bellemare Alex Graves Martin Riedmiller Andreas\u00a0K Fidjeland Georg Ostrovski et\u00a0al. 2015. Human-level control through deep reinforcement learning. Nature 518 7540 (2015) 529\u2013533.","DOI":"10.1038\/nature14236"},{"key":"e_1_3_3_1_28_2","doi-asserted-by":"crossref","unstructured":"Jan Potters and Sigrid Suetens. 2013. Oligopoly experiments in the current millennium. Journal of Economic Surveys 27 3 (2013) 439\u2013460.","DOI":"10.1111\/joes.12025"},{"key":"e_1_3_3_1_29_2","volume-title":"4th International Conference on Learning Representations","author":"Schaul Tom","year":"2016","unstructured":"Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized Experience Replay. In 4th International Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_3_1_30_2","volume-title":"The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023","author":"Sokota Samuel","year":"2023","unstructured":"Samuel Sokota, Ryan D\u2019Orazio, J.\u00a0Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, and Christian Kroer. 2023. A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023."},{"key":"e_1_3_3_1_31_2","doi-asserted-by":"crossref","unstructured":"Ludo Waltman and Uzay Kaymak. 2008. Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control 32 10 (2008) 3275\u20133293.","DOI":"10.1016\/j.jedc.2008.01.003"},{"key":"e_1_3_3_1_32_2","doi-asserted-by":"crossref","unstructured":"Christopher\u00a0JCH Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8 3-4 (1992) 279\u2013292.","DOI":"10.1023\/A:1022676722315"},{"key":"e_1_3_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104470"},{"key":"e_1_3_3_1_34_2","unstructured":"Shangtong Zhang and Richard\u00a0S. Sutton. 2017. A deeper look at experience replay. NIPS Deep Reinforcement Learning Symposium (2017). arxiv:https:\/\/arXiv.org\/abs\/1712.01275http:\/\/arxiv.org\/abs\/1712.01275"}],"event":{"name":"ICAIF '25: 6th ACM International Conference on AI in Finance","location":"Singapore Singapore","acronym":"ICAIF '25"},"container-title":["Proceedings of the 6th ACM International Conference on AI in Finance"],"original-title":[],"deposited":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T07:30:44Z","timestamp":1763105444000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3768292.3770357"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,14]]},"references-count":33,"alternative-id":["10.1145\/3768292.3770357","10.1145\/3768292"],"URL":"https:\/\/doi.org\/10.1145\/3768292.3770357","relation":{},"subject":[],"published":{"date-parts":[[2025,11,14]]},"assertion":[{"value":"2025-11-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}