{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,10]],"date-time":"2026-05-10T10:15:03Z","timestamp":1778408103426,"version":"3.51.4"},"reference-count":353,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>Reinforcement Learning (RL) has experienced significant advancement over the past decade, prompting a growing interest in applications within finance. This survey critically evaluates 167 publications, exploring diverse RL applications and frameworks in finance. Financial markets, marked by their complexity, multi-agent nature, information asymmetry, and inherent randomness, serve as an intriguing test-bed for RL. Traditional finance offers certain solutions, and RL advances these with a more dynamic approach, incorporating machine learning methods, including transfer learning, meta-learning, and multi-agent solutions. This survey dissects key RL components through the lens of Quantitative Finance. We uncover emerging themes, propose areas for future research, and critique the strengths and weaknesses of existing methods.<\/jats:p>","DOI":"10.1145\/3733714","type":"journal-article","created":{"date-parts":[[2025,5,2]],"date-time":"2025-05-02T11:51:36Z","timestamp":1746186696000},"page":"1-51","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["The Evolution of Reinforcement Learning in Quantitative Finance: A Survey"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-1869-8603","authenticated-orcid":false,"given":"Nikolaos","family":"Pippas","sequence":"first","affiliation":[{"name":"Centre for Interdisciplinary Methodologies, University of Warwick, Coventry, United Kingdom of Great Britain and Northern Ireland and Asset Management, HSBC Holdings plc, London, United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0031-6713","authenticated-orcid":false,"given":"Elliot A.","family":"Ludvig","sequence":"additional","affiliation":[{"name":"Department of Psychology, University of Warwick, Coventry, United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6788-251X","authenticated-orcid":false,"given":"Cagatay","family":"Turkay","sequence":"additional","affiliation":[{"name":"Centre for Interdisciplinary Methodologies, University of Warwick, Coventry, United Kingdom of Great Britain and Northern Ireland"}]}],"member":"320","published-online":{"date-parts":[[2025,6,11]]},"reference":[{"key":"e_1_3_4_2_2","volume-title":"Proceedings of the 21st International Conference on Machine Learning. 1.","author":"Abbeel P.","unstructured":"P. Abbeel and A. Y. Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning. 1."},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13748-020-00225-z"},{"key":"e_1_3_4_4_2","volume-title":"Proceedings of the 32nd International Conference on Neural Information Processing Systems.","author":"Adebayo J.","unstructured":"J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim. 2018. Sanity checks for saliency maps. Proceedings of the 32nd International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_5_2","first-page":"5","article-title":"Performance metrics for financial time series forecasting","volume":"7","author":"Abecasis S. M.","year":"1999","unstructured":"S. M. Abecasis, E. S. Lapenta, and C. E. Pedreira. 1999. Performance metrics for financial time series forecasting. Journal of Computational Intelligence in Finance 7, 4 (1999), 5\u201322.","journal-title":"Journal of Computational Intelligence in Finance"},{"key":"e_1_3_4_6_2","doi-asserted-by":"publisher","unstructured":"A. M. Aboussalah and C. G. Lee. 2020. Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization. Expert Systems with Applications 140 C (2020) 112891. 10.1016\/j.eswa.2019.112891","DOI":"10.1016\/j.eswa.2019.112891"},{"key":"e_1_3_4_7_2","doi-asserted-by":"publisher","DOI":"10.1080\/14697688.2021.2001032"},{"key":"e_1_3_4_8_2","volume-title":"Proceedings of the 2022 IEEE Conference on Control Technology and Applications (CCTA). IEEE, 1208\u20131213","author":"Alameer A.","unstructured":"A. Alameer and K. Al Shehri. 2022. Conditional value-at-risk for quantitative trading: A direct reinforcement learning approach. In Proceedings of the 2022 IEEE Conference on Control Technology and Applications (CCTA). IEEE, 1208\u20131213."},{"key":"e_1_3_4_9_2","doi-asserted-by":"publisher","unstructured":"S. Almahdi and S. Y. Yang. 2017. An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications 87 C (2017) 267\u2013279. 10.1016\/j.eswa.2017.06.023","DOI":"10.1016\/j.eswa.2017.06.023"},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","unstructured":"S. Almahdi and S. Y. Yang. 2019. A constrained portfolio trading system using particle swarm algorithm and recurrent reinforcement learning. Expert Systems with Applications 130 C (2019) 145\u2013156. 10.1016\/j.eswa.2019.04.013","DOI":"10.1016\/j.eswa.2019.04.013"},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","DOI":"10.21314\/JOR.2001.041"},{"key":"e_1_3_4_12_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-6261.2004.00662.x"},{"key":"e_1_3_4_13_2","doi-asserted-by":"publisher","DOI":"10.1093\/jjfinec\/nbz040"},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-405X(87)90066-3"},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.2307\/1907353"},{"key":"e_1_3_4_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-018-3225-7"},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/0022-247X(65)90154-X"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2008.07.006"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2009.01.016"},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"e_1_3_4_21_2","doi-asserted-by":"publisher","DOI":"10.1080\/14697680701381228"},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","DOI":"10.22004\/ag.econ.30810"},{"key":"e_1_3_4_23_2","first-page":"458","article-title":"Pseudomathematics and financial charlatanism: The effects of backtest over fitting on out-of-sample performance","volume":"61","author":"Bailey D. H.","year":"2014","unstructured":"D. H. Bailey, J. M. Borwein, M. L. de Prado, and Q. J. Zhu. 2014. Pseudomathematics and financial charlatanism: The effects of backtest over fitting on out-of-sample performance. Notices of the AMS 61, 5 (2014), 458\u2013471.","journal-title":"Notices of the AMS"},{"key":"e_1_3_4_24_2","volume-title":"Proceedings of the 12th International Conference on Neural Information Processing Systems.","author":"Baird L.","unstructured":"L. Baird and A. Moore. 1998. Gradient descent for general reinforcement learning. In Proceedings of the 12th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-405X(81)90018-0"},{"key":"e_1_3_4_26_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1906.11046"},{"key":"e_1_3_4_27_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2001.00918"},{"key":"e_1_3_4_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1574-0102(03)01027-6"},{"key":"e_1_3_4_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1985.6313371"},{"key":"e_1_3_4_30_2","volume-title":"Proceedings of the 2003 IEEE International Conference on Computational Intelligence for Financial Engineering. IEEE, 355\u2013362","author":"Bates R. G.","unstructured":"R. G. Bates, M. A. Dempster, and Y. S. Romahi. 2003. Evolutionary reinforcement learning in FX order book and order flow analysis. In Proceedings of the 2003 IEEE International Conference on Computational Intelligence for Financial Engineering. IEEE, 355\u2013362."},{"key":"e_1_3_4_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jedc.2010.01.015"},{"key":"e_1_3_4_32_2","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR, 449\u2013458","author":"Bellemare M. G.","unstructured":"M. G. Bellemare, W. Dabney, and R. Munos. 2017. A distributional perspective on reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 449\u2013458."},{"key":"e_1_3_4_33_2","first-page":"356","article-title":"Dynamic Programming","volume":"1957","author":"Bellman R.","year":"2013","unstructured":"R. Bellman. 2013. Dynamic Programming. Courier Corporation, 1957, 356","journal-title":"Courier Corporation"},{"key":"e_1_3_4_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/72.279181"},{"key":"e_1_3_4_35_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2010.09108"},{"key":"e_1_3_4_36_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2010.08497"},{"key":"e_1_3_4_37_2","volume-title":"Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 10050\u201310057","author":"Benhamou E.","unstructured":"E. Benhamou, D. Saltiel, J. J. Ohana, and J. Atif. 2021. Detecting and adapting to crisis pattern with context based deep reinforcement learning. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 10050\u201310057."},{"key":"e_1_3_4_38_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2104.10483"},{"key":"e_1_3_4_39_2","volume-title":"Proceedings of the International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. Springer","author":"Bertoluzzo F.","unstructured":"F. Bertoluzzo and M. Corazza. 2007. Making financial trading by recurrent reinforcement learning. In Proceedings of the International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. Springer, Berlin, 619\u2013626."},{"key":"e_1_3_4_40_2","doi-asserted-by":"publisher","unstructured":"F. Bertoluzzo and M. Corazza. 2012. Testing different reinforcement learning configurations for financial trading: Introduction and applications. Procedia Economics and Finance 3 C (2012) 68\u201377. 10.1016\/S2212-5671(12)00122-0","DOI":"10.1016\/S2212-5671(12)00122-0"},{"key":"e_1_3_4_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1386-4181(97)00012-8"},{"key":"e_1_3_4_43_2","volume-title":"Proceedings of the 2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE). IEEE, 1\u20136.","author":"Bisht K.","unstructured":"K. Bisht and A. Kumar. 2020. Deep reinforcement learning based multi-objective systems for financial trading. In Proceedings of the 2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE). IEEE, 1\u20136."},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/632"},{"key":"e_1_3_4_45_2","doi-asserted-by":"publisher","DOI":"10.1086\/260062"},{"key":"e_1_3_4_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-4076(86)90063-1"},{"key":"e_1_3_4_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3139510"},{"key":"e_1_3_4_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3166599"},{"key":"e_1_3_4_49_2","doi-asserted-by":"publisher","unstructured":"A. Briola J. Turiel R. Marcaccioli A. Cauderan and T. Aste. 2021. Deep reinforcement learning for active high-frequency trading. arXiv e-print 2101.07107. 10.48550\/arXiv.2101.07107","DOI":"10.48550\/arXiv.2101.07107"},{"key":"e_1_3_4_50_2","unstructured":"D. S. Broomhead and D. Lowe. 1988. Radial basis functions multi-variable functional interpolation and adaptive networks. Royal Signals and Radar Establishment Malvern (United Kingdom)."},{"key":"e_1_3_4_51_2","doi-asserted-by":"crossref","unstructured":"H. Buehler L. Gonon J. Teichmann B. Wood B. Mohan and J. Kochems. 2019. Deep hedging: Hedging derivatives under generic market frictions using reinforcement learning. Swiss Finance Institute Research Paper 19\u201380.","DOI":"10.2139\/ssrn.3355706"},{"key":"e_1_3_4_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2007.913919"},{"key":"e_1_3_4_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"e_1_3_4_54_2","doi-asserted-by":"publisher","DOI":"10.3905\/jfds.2020.1.052"},{"key":"e_1_3_4_55_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jfds.2023.100101"},{"key":"e_1_3_4_56_2","doi-asserted-by":"publisher","unstructured":"J. Carapu\u00e7o R. Neves and N. Horta. 2018. Reinforcement learning applied to forex trading. Applied Soft Computing 73 C (2018) 783\u2013794. 10.1016\/j.asoc.2018.09.017","DOI":"10.1016\/j.asoc.2018.09.017"},{"key":"e_1_3_4_57_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-020-01839-5"},{"key":"e_1_3_4_58_2","unstructured":"\u00c1. Cartea S. Jaimungal and J. Penalva. 2015. Algorithmic and High-Frequency Trading. Cambridge University Press."},{"key":"e_1_3_4_59_2","doi-asserted-by":"crossref","unstructured":"\u00c1. Cartea S. Jaimungal and L. S\u00e1nchez-Betancourt. 2021. Deep reinforcement learning for algorithmic trading. Available at SSRN 3812473.","DOI":"10.2139\/ssrn.3812473"},{"key":"e_1_3_4_60_2","volume-title":"Multitask Learning","author":"Caruana R.","unstructured":"R. Caruana. 1998. Multitask Learning. Springer US, 95\u2013133."},{"key":"e_1_3_4_61_2","doi-asserted-by":"publisher","DOI":"10.1080\/1350486X.2022.2136727"},{"key":"e_1_3_4_62_2","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785\u2013794","author":"Chen T.","unstructured":"T. Chen and C. Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785\u2013794."},{"key":"e_1_3_4_63_2","doi-asserted-by":"publisher","unstructured":"J. B. Chakole M. S. Kolhe G. D. Mahapurush A. Yadav and M. P. Kurhekar. 2021. A q-learning agent for automated trading in equity stock markets. Expert Systems with Applications 163 C (2021) 113761. 10.1016\/j.eswa.2020.113761","DOI":"10.1016\/j.eswa.2020.113761"},{"key":"e_1_3_4_64_2","volume-title":"Tech. Rep. AIM-2001-005, MIT Artificial Intelligence Laboratory","author":"Chan N. T.","year":"2001","unstructured":"N. T. Chan and C. R. Shelton. 2001. An electronic market-maker. Tech. Rep. AIM-2001-005, MIT Artificial Intelligence Laboratory. Cambridge, MA, USA. Available at https:\/\/dspace.mit.edu\/bitstream\/handle\/1721.1\/7220\/AIM-2001-005.pdf"},{"key":"e_1_3_4_65_2","doi-asserted-by":"publisher","unstructured":"G. Charness U. Gneezy and A. Imas. 2013. Experimental methods: Eliciting risk preferences. Journal of Economic Behavior and Organization 87 C (2013) 43\u201351. 10.1016\/j.jebo.2012.12.023","DOI":"10.1016\/j.jebo.2012.12.023"},{"key":"e_1_3_4_66_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10614-021-10119-4"},{"key":"e_1_3_4_67_2","volume-title":"Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. 1503\u20131503","author":"Chen Y.","unstructured":"Y. Chen, S. Mabu, K. Hirasawa, and J. Hu. 2007. Trading rules on stock markets using genetic network programming with sarsa learning. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. 1503\u20131503."},{"key":"e_1_3_4_68_2","volume-title":"Proceedings of the 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). IEEE, 29\u201333","author":"Chen L.","unstructured":"L. Chen and Q. Gao. 2019. Application of deep reinforcement learning on automated stock trading. In Proceedings of the 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). IEEE, 29\u201333."},{"key":"e_1_3_4_69_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_3_4_70_2","volume-title":"Asset pricing: Revised edition","author":"Cochrane J. H.","unstructured":"J. H. Cochrane. 2005. Asset pricing: Revised edition. Princeton University Press."},{"key":"e_1_3_4_71_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-6261.2008.01379.x"},{"key":"e_1_3_4_72_2","doi-asserted-by":"publisher","unstructured":"L. W. Cong K. Tang J. Wang and Y. Zhang. 2020. AlphaPortfolio: Direct construction through deep reinforcement learning and interpretable AI. SSRN Electronic Journal. 10.2139\/ssrn.3554486","DOI":"10.2139\/ssrn.3554486"},{"key":"e_1_3_4_73_2","doi-asserted-by":"publisher","DOI":"10.1080\/713665670"},{"key":"e_1_3_4_74_2","doi-asserted-by":"crossref","unstructured":"M. Corazza and F. Bertoluzzo. 2014. Q-learning-based financial trading systems with applications. University Ca\u2019Foscari of Venice Dept. of Economics Working Paper Series No 15.","DOI":"10.2139\/ssrn.2507826"},{"key":"e_1_3_4_75_2","doi-asserted-by":"crossref","unstructured":"M. Corazza and A. Sangalli. 2015. Q-Learning and SARSA: A comparison between two intelligent stochastic control approaches for financial trading. University Ca\u2019Foscari of Venice Dept. of Economics Research Paper Series No 15.","DOI":"10.2139\/ssrn.2617630"},{"key":"e_1_3_4_76_2","doi-asserted-by":"crossref","unstructured":"G. Coqueret and E. Andr\u00e9. 2022. Factor investing with reinforcement learning. Available at SSRN 4103045.","DOI":"10.2139\/ssrn.4103046"},{"key":"e_1_3_4_77_2","unstructured":"K. Dab\u00e9rius E. Granat and P. Karlsson. 2019. Deep execution-value and policy based reinforcement learning for trading and beating market benchmarks. Available at SSRN 3374766."},{"key":"e_1_3_4_78_2","volume-title":"Proceedings of AAAI-86","author":"Dechter R.","year":"1986","unstructured":"R. Dechter. 1986. Learning while searching in constraint-satisfaction problems. In Proceedings of AAAI-86. 178\u2013185."},{"key":"e_1_3_4_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/72.935088"},{"key":"e_1_3_4_80_2","volume-title":"Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning. Springer","author":"Dempster M. A. H.","unstructured":"M. A. H. Dempster and Y. S. Romahi. 2002. Intraday FX trading: An evolutionary reinforcement learning approach. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning. Springer, Berlin, 347\u2013358."},{"key":"e_1_3_4_81_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2005.10.012"},{"key":"e_1_3_4_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2015.2404299"},{"key":"e_1_3_4_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2016.2522401"},{"key":"e_1_3_4_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383455.3422559"},{"key":"e_1_3_4_85_2","volume-title":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1310\u20131319","author":"Ding Y.","unstructured":"Y. Ding, W. Liu, J. Bian, D. Zhang, and T. Y. Liu. 2018. Investor-imitator: A framework for trading knowledge extraction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1310\u20131319."},{"key":"e_1_3_4_86_2","doi-asserted-by":"publisher","unstructured":"M. Dixon and I. Halperin. 2020. G-Learner and GIRL: Goal-based wealth management with reinforcement learning. arXiv e-print 2002.10990. 10.48550\/arXiv.2002.10990","DOI":"10.48550\/arXiv.2002.10990"},{"key":"e_1_3_4_87_2","doi-asserted-by":"publisher","unstructured":"F. Doshi-Velez and B. Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv e-print 1702.08608. 10.48550\/arXiv.1702.08608","DOI":"10.48550\/arXiv.1702.08608"},{"key":"e_1_3_4_88_2","first-page":"1","article-title":"Algorithm trading using q-learning and recurrent reinforcement learning","volume":"1","author":"Du X.","year":"2016","unstructured":"X. Du, J. Zhai, and K. Lv. 2016. Algorithm trading using q-learning and recurrent reinforcement learning. Positions, 1 , 1.","journal-title":"Positions"},{"key":"e_1_3_4_89_2","doi-asserted-by":"publisher","DOI":"10.3905\/jfds.2020.1.045"},{"key":"e_1_3_4_90_2","doi-asserted-by":"crossref","unstructured":"B. Dubrov. 2015. Monte Carlo simulation with machine learning for pricing American options and convertible bonds. Available at SSRN 2684523.","DOI":"10.2139\/ssrn.2684523"},{"key":"e_1_3_4_91_2","doi-asserted-by":"publisher","unstructured":"D. Eilers C. L. Dunis H. J. von Mettenheim and M. H. Breitner. 2014. Intelligent trading of seasonal effects: A decision support algorithm based on reinforcement learning. Decision Support Systems 64 C (2014) 100\u2013108. 10.1016\/j.dss.2014.04.011","DOI":"10.1016\/j.dss.2014.04.011"},{"key":"e_1_3_4_92_2","doi-asserted-by":"publisher","DOI":"10.2307\/1913236"},{"key":"e_1_3_4_93_2","doi-asserted-by":"publisher","unstructured":"C. Esteban S. L. Hyland and G. R\u00e4tsch. 2017. Real-valued medical time-series generation with recurrent conditional GANs. In ML4H Workshop @ NeurIPS. arXiv e-print 1706.02633. 10.48550\/arXiv.1706.02633","DOI":"10.48550\/arXiv.1706.02633"},{"key":"e_1_3_4_94_2","unstructured":"F. J. Fabozzi and S. V. Mann. 2012. The Handbook of Fixed Income Securities. McGraw-Hill Education."},{"key":"e_1_3_4_95_2","doi-asserted-by":"publisher","DOI":"10.2307\/2325486"},{"key":"e_1_3_4_96_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-6261.1992.tb04398.x"},{"key":"e_1_3_4_97_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-405X(93)90023-5"},{"key":"e_1_3_4_98_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-6261.1996.tb05202.x"},{"key":"e_1_3_4_99_2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence 35","author":"Fang Y.","year":"2021","unstructured":"Y. Fang, K. Ren, W. Liu, D. Zhou, W. Zhang, J. Bian, Y. Yu, and T. Y. Liu. 2021. Universal trading for order execution with oracle policy distillation. Proceedings of the AAAI Conference on Artificial Intelligence 35, 1 (2021), 107\u2013115."},{"key":"e_1_3_4_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2982662"},{"key":"e_1_3_4_101_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2011.04391"},{"key":"e_1_3_4_102_2","doi-asserted-by":"publisher","unstructured":"S. Feuerriegel and H. Prendinger. 2016. News-based trading strategies. Decision Support Systems 90 C (2016) 65\u201374. 10.1016\/j.dss.2016.06.020","DOI":"10.1016\/j.dss.2016.06.020"},{"key":"e_1_3_4_103_2","unstructured":"T. G. Fischer. 2018. Reinforcement learning in the financial markets-a survey (No. 12\/2018). FAU Discussion Papers in Economics."},{"key":"e_1_3_4_104_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2017.11.054"},{"key":"e_1_3_4_105_2","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems.","author":"Foerster J.","unstructured":"J. Foerster, I. A. Assael, N. De Freitas, and S. Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_106_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1709.04326"},{"key":"e_1_3_4_107_2","doi-asserted-by":"publisher","DOI":"10.1111\/jofi.12514"},{"key":"e_1_3_4_108_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1706.10295"},{"key":"e_1_3_4_109_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-024-09659-4"},{"key":"e_1_3_4_110_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jfineco.2013.10.005"},{"key":"e_1_3_4_111_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-405X(80)90021-5"},{"key":"e_1_3_4_112_2","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR, 1587\u20131596","author":"Fujimoto S.","unstructured":"S. Fujimoto, H. Hoof, and D. Meger. 2018. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning. PMLR, 1587\u20131596."},{"key":"e_1_3_4_113_2","doi-asserted-by":"crossref","unstructured":"P. Gabrielsson and U. Johansson. 2015. High-frequency equity index futures trading using recurrent reinforcement learning with candlesticks. In Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence. IEEE 734\u2013741.","DOI":"10.1109\/SSCI.2015.111"},{"key":"e_1_3_4_114_2","doi-asserted-by":"publisher","unstructured":"P. Ganesh and P. Rakheja. 2018. Deep reinforcement learning in high-frequency trading. arXiv e-print 1809.01506. 10.48550\/arXiv.1809.01506","DOI":"10.48550\/arXiv.1809.01506"},{"key":"e_1_3_4_115_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1911.05892"},{"key":"e_1_3_4_116_2","volume-title":"Proceedings of the International Conference on Neural Information Processing. 832\u2013837","author":"Gao X.","unstructured":"X. Gao and L. Chan. 2000. An algorithm for trading and portfolio management using q-learning and sharpe ratio maximization. In Proceedings of the International Conference on Neural Information Processing. 832\u2013837."},{"key":"e_1_3_4_117_2","doi-asserted-by":"publisher","unstructured":"M. Garc\u00eda-Galicia A. A. Carsteanu and J. B. Clempner. 2019. Continuous-time reinforcement learning approach for portfolio management with time penalization. Expert Systems with Applications 129 C (2019) 27\u201336. 10.1016\/j.eswa.2019.03.055","DOI":"10.1016\/j.eswa.2019.03.055"},{"key":"e_1_3_4_118_2","doi-asserted-by":"publisher","DOI":"10.1111\/jofi.12080"},{"key":"e_1_3_4_119_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3074782"},{"key":"e_1_3_4_120_2","doi-asserted-by":"publisher","DOI":"10.1162\/089976600300015015"},{"key":"e_1_3_4_121_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-405X(85)90044-3"},{"key":"e_1_3_4_122_2","doi-asserted-by":"publisher","DOI":"10.1109\/CIFER.2003.1196283"},{"key":"e_1_3_4_123_2","doi-asserted-by":"publisher","DOI":"10.1162\/00335530360535162"},{"key":"e_1_3_4_124_2","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems. 2672\u20132680","author":"Goodfellow I.","unstructured":"I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. 2672\u20132680"},{"key":"e_1_3_4_125_2","unstructured":"I. Goodfellow Y. Bengio and A. Courville. 2016. Deep Learning. MIT Press."},{"key":"e_1_3_4_126_2","doi-asserted-by":"publisher","DOI":"10.2307\/1927792"},{"key":"e_1_3_4_127_2","volume-title":"Proceedings of the 9th International Conference on Neural Information Processing Systems.","author":"Gordon G. J.","year":"1995","unstructured":"G. J. Gordon. 1995. Stable fitted reinforcement learning. In Proceedings of the 9th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_128_2","volume-title":"Proceedings of the European Symposium on Artificial Neural Networks.","author":"Gorse D.","year":"2011","unstructured":"D. Gorse. 2011. Application of stochastic recurrent reinforcement learning to index trading. In Proceedings of the European Symposium on Artificial Neural Networks."},{"key":"e_1_3_4_129_2","volume-title":"The Intelligent Investor (4th rev. ed.). Harpers & Row","author":"Graham B.","unstructured":"B. Graham. 1973. The Intelligent Investor (4th rev. ed.). Harpers & Row, New York."},{"key":"e_1_3_4_130_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2012.2218595"},{"key":"e_1_3_4_131_2","volume-title":"Proceedings of the SICE Annual Conference","author":"Gu Y.","year":"2011","unstructured":"Y. Gu, S. Mabu, Y. Yang, J. Li, and K. Hirasawa. 2011. Trading rules on stock markets using genetic network programming-sarsa learning with plural subroutines. In Proceedings of the SICE Annual Conference 2011. IEEE, 143\u2013148."},{"key":"e_1_3_4_132_2","doi-asserted-by":"publisher","DOI":"10.1080\/1350486X.2020.1714455"},{"key":"e_1_3_4_133_2","volume-title":"Proceedings of the 15th International Conference on Neural Information Processing Systems.","author":"Guestrin C.","unstructured":"C. Guestrin, D. Koller, and R. Parr. 2001. Multiagent planning with factored MDPs. In Proceedings of the 15th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_134_2","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR","author":"Haarnoja T.","unstructured":"T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning. PMLR, 1861\u20131870."},{"key":"e_1_3_4_135_2","doi-asserted-by":"publisher","DOI":"10.3905\/jod.2020.1.108"},{"key":"e_1_3_4_136_2","doi-asserted-by":"publisher","DOI":"10.1111\/mafi.12382"},{"key":"e_1_3_4_137_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.58871"},{"key":"e_1_3_4_138_2","doi-asserted-by":"crossref","unstructured":"P. R. Hansen A. Lunde and J. M. Nason. 2005. Testing the significance of calendar effects. Federal Reserve Bank of Atlanta Working Paper 2005\u201302.","DOI":"10.2139\/ssrn.388601"},{"key":"e_1_3_4_139_2","doi-asserted-by":"crossref","unstructured":"C. R. Harvey and Y. Liu. 2019. A census of the factor zoo. Available at SSRN 3341728.","DOI":"10.2139\/ssrn.3341728"},{"key":"e_1_3_4_140_2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence (30","author":"Hasselt H.","year":"2016","unstructured":"H. Hasselt, A. Guez, and D. Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence (30 (2016), 1)."},{"key":"e_1_3_4_141_2","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems.","author":"Hasselt H. P.","unstructured":"H. P. Hasselt, A. Guez, M. Hessel, V. Mnih, and D. Silver. 2016. Learning values across many orders of magnitude. In Proceedings of the 30th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_142_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770\u2013778","author":"He K.","unstructured":"K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770\u2013778."},{"key":"e_1_3_4_143_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1512.04455"},{"key":"e_1_3_4_144_2","doi-asserted-by":"publisher","unstructured":"M. Henaff J. Bruna and Y. LeCun. 2015. Deep convolutional networks on graph-structured data. arXiv e-print 1506.05163. 10.48550\/arXiv.1506.05163","DOI":"10.48550\/arXiv.1506.05163"},{"key":"e_1_3_4_145_2","volume-title":"Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr). IEEE, 457\u2013464","author":"Hendricks D.","unstructured":"D. Hendricks and D. Wilcox. 2014. A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution. In Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr). IEEE, 457\u2013464."},{"key":"e_1_3_4_146_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10614-006-9064-0"},{"key":"e_1_3_4_147_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2006.18.7.1527"},{"key":"e_1_3_4_148_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1503.02531"},{"key":"e_1_3_4_149_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_4_150_2","volume-title":"Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence","author":"Holland J. H.","unstructured":"J. H. Holland. 1992. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press."},{"key":"e_1_3_4_151_2","doi-asserted-by":"publisher","DOI":"10.1257\/000282802762024700"},{"key":"e_1_3_4_152_2","doi-asserted-by":"publisher","DOI":"10.1080\/00207720412331303697"},{"key":"e_1_3_4_153_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1807.02787"},{"key":"e_1_3_4_154_2","doi-asserted-by":"crossref","first-page":"e0263689","DOI":"10.1371\/journal.pone.0263689","article-title":"MSPM: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management","volume":"17","author":"Huang Z.","year":"2022","unstructured":"Z. Huang and F. Tanaka. 2022. MSPM: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management. Plos One 17, 2 (2022), e0263689.","journal-title":"Plos One"},{"key":"e_1_3_4_155_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsos.171377"},{"key":"e_1_3_4_156_2","volume-title":"Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability","author":"Hutter M.","unstructured":"M. Hutter. 2005. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer Science and Business Media."},{"key":"e_1_3_4_157_2","doi-asserted-by":"publisher","DOI":"10.5555\/2968618.2968694"},{"key":"e_1_3_4_158_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2005.10.009"},{"key":"e_1_3_4_159_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-6261.1993.tb04702.x"},{"key":"e_1_3_4_160_2","doi-asserted-by":"publisher","unstructured":"G. Jeong and H. Y. Kim. 2019. Improving financial trading decisions using deep Q-learning: Predicting the number of shares action strategies and transfer learning. Expert Systems with Applications 117 C (2019) 125\u2013138. 10.1016\/j.eswa.2018.09.036","DOI":"10.1016\/j.eswa.2018.09.036"},{"key":"e_1_3_4_161_2","volume-title":"Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1\u20138.","author":"Jia W. U.","year":"2019","unstructured":"W. U. Jia, W. A. N. G. Chen, L. Xiong and S. U. N. Hongyong. 2019. Quantitative trading on stock market based on deep reinforcement learning. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1\u20138."},{"key":"e_1_3_4_162_2","doi-asserted-by":"publisher","unstructured":"Z. Jiang D. Xu and J. Liang. 2017. A deep reinforcement learning framework for the financial portfolio management problem. arXiv e-print 1706.10059. 10.48550\/arXiv.1706.10059","DOI":"10.48550\/arXiv.1706.10059"},{"key":"e_1_3_4_163_2","volume-title":"Proceedings of the 2017 Intelligent Systems Conference (IntelliSys). IEEE, 905\u2013913","author":"Jiang Z.","unstructured":"Z. Jiang and J. Liang. 2017. Cryptocurrency portfolio management with deep reinforcement learning. In Proceedings of the 2017 Intelligent Systems Conference (IntelliSys). IEEE, 905\u2013913."},{"key":"e_1_3_4_164_2","unstructured":"O. Jin and H. El-Saawy. 2016. Portfolio Management Using Reinforcement Learning. Stanford University."},{"key":"e_1_3_4_165_2","doi-asserted-by":"publisher","DOI":"10.1111\/jofi.12121"},{"key":"e_1_3_4_166_2","volume-title":"Conference on Robot Learning (CoRL\u201918)","author":"Kalashnikov D.","year":"2018","unstructured":"D. Kalashnikov, A. Irpan, P. Pastor, et\u00a0al. 2018. QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning (CoRL\u201918). 651\u2013673. arXiv e-print 1806.10293. https:\/\/proceedings.mlr.press\/v87\/kalashnikov18a.html"},{"key":"e_1_3_4_167_2","volume-title":"Proceedings of the 1st ACM International Conference on AI in Finance. 1\u20137.","author":"Karpe M.","unstructured":"M. Karpe, J. Fang, Z. Ma, and C. Wang. 2020. Multi-agent reinforcement learning in a realistic limit order book market simulation. In Proceedings of the 1st ACM International Conference on AI in Finance. 1\u20137."},{"key":"e_1_3_4_168_2","doi-asserted-by":"crossref","unstructured":"M. Katongo and R. Bhattacharyya. 2021. The use of deep reinforcement learning in tactical asset allocation. Available at SSRN 3812609.","DOI":"10.2139\/ssrn.3812609"},{"key":"e_1_3_4_169_2","unstructured":"S. Kaur. 2017. Algorithmic trading using sentiment analysis and reinforcement learning. Positions."},{"key":"e_1_3_4_170_2","volume-title":"Proceedings of ICNN\u201995-International Conference on Neural Networks.","volume":"4","author":"Kennedy J.","year":"1942","unstructured":"J. Kennedy and R. Eberhart. 1995. Particle swarm optimization. In Proceedings of ICNN\u201995-International Conference on Neural Networks. Vol. 4 . IEEE, 1942\u20131948."},{"key":"e_1_3_4_171_2","volume-title":"Complexity","author":"Kim T.","year":"2019","unstructured":"T. Kim and H. Y. Kim. 2019. Optimizing the pairs-trading strategy using deep reinforcement learning with trading and stop-loss boundaries. Complexity, 2019."},{"key":"e_1_3_4_172_2","doi-asserted-by":"publisher","DOI":"10.3390\/app12030944"},{"key":"e_1_3_4_173_2","doi-asserted-by":"publisher","DOI":"10.2307\/1927031"},{"key":"e_1_3_4_174_2","doi-asserted-by":"publisher","DOI":"10.3905\/jfds.2019.1.1.159"},{"key":"e_1_3_4_175_2","volume-title":"Proceedings of the 13th International Conference on Neural Information Processing Systems.","author":"Konda V.","unstructured":"V. Konda and J. Tsitsiklis. 1999. Actor-critic algorithms. In Proceedings of the 13th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_176_2","doi-asserted-by":"publisher","DOI":"10.1214\/105051604000000116"},{"key":"e_1_3_4_177_2","doi-asserted-by":"crossref","unstructured":"P. Koratamaddi K. Wadhwani M. Gupta and S. G. Sanjeevi. 2021. Market sentiment-aware deep reinforcement learning approach for stock portfolio allocation. Engineering Science and Technology an International Journal 24 4 (2021) 848\u2013859.","DOI":"10.1016\/j.jestch.2021.01.007"},{"key":"e_1_3_4_178_2","doi-asserted-by":"publisher","DOI":"10.2469\/faj.v66.n5.3"},{"key":"e_1_3_4_179_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_4_180_2","volume-title":"InProceedings of the 2022 14th International Conference on Contemporary Computing. 417\u2013428","author":"Kumar B.","unstructured":"B. Kumar, A. Roshan, A. Baranwal, S. Rajendran, S. Sharma, A. Mishra, and O. P. Vyas. 2022. Optimised forex trading using ensemble of deep q-learning agents. InProceedings of the 2022 14th International Conference on Contemporary Computing. 417\u2013428."},{"key":"e_1_3_4_181_2","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964290"},{"key":"e_1_3_4_182_2","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence.","author":"Lample G.","unstructured":"G. Lample and D. S. Chaplot. 2017. Playing FPS games with deep reinforcement learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_4_183_2","doi-asserted-by":"crossref","unstructured":"M. Lavko T. Klein and T. Walther. 2023. Reinforcement learning and portfolio allocation: Challenging traditional allocation methods. Queen\u2019s Management School Working Paper 1.","DOI":"10.2139\/ssrn.4346043"},{"key":"e_1_3_4_184_2","volume-title":"Reinforcement Learning: State-of-the-Art","author":"Lazaric A.","unstructured":"A. Lazaric. 2012. Transfer in reinforcement learning: A framework and a survey. In Reinforcement Learning: State-of-the-Art. Springer Berlin Heidelberg, Berlin, 143\u2013173."},{"key":"e_1_3_4_185_2","doi-asserted-by":"publisher","DOI":"10.1086\/296565"},{"key":"e_1_3_4_186_2","first-page":"1995","article-title":"Convolutional networks for images, speech, and time series","volume":"3361","author":"LeCun Y.","year":"1995","unstructured":"Y. LeCun and Y. Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks 3361, 10 (1995), 1995.","journal-title":"The Handbook of Brain Theory and Neural Networks"},{"key":"e_1_3_4_187_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35289-8_3"},{"key":"e_1_3_4_188_2","volume-title":"Proceedings of the International Conference on Database and Expert Systems Applications. Springer","author":"Lee J. W.","unstructured":"J. W. Lee and O. Jangmin. 2002. A multi-agent Q-learning framework for optimizing stock trading systems. In Proceedings of the International Conference on Database and Expert Systems Applications. Springer, Berlin, 153\u2013162."},{"key":"e_1_3_4_189_2","volume-title":"Proceedings of the 19th International Conference on Machine Learning. 451\u2013458","author":"Lee J. W.","unstructured":"J. W. Lee and B. T. Zhang. 2002. Stock trading system using reinforcement learning with cooperative agents. In Proceedings of the 19th International Conference on Machine Learning. 451\u2013458."},{"key":"e_1_3_4_190_2","first-page":"296","article-title":"An intelligent stock trading system based on reinforcement learning","volume":"86","author":"Lee J. W.","year":"2003","unstructured":"J. W. Lee, S. D. Kim, J. Lee, and J. Chae. 2003. An intelligent stock trading system based on reinforcement learning. IEICE TRANSACTIONS on Information and Systems 86, 2 (2003), 296\u2013305.","journal-title":"IEICE TRANSACTIONS on Information and Systems"},{"key":"e_1_3_4_191_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2007.904825"},{"key":"e_1_3_4_192_2","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR, 5757\u20135766","author":"Lee K.","unstructured":"K. Lee, Y. Seo, S. Lee, H. Lee, and J. Shin. 2020. Context-aware dynamics model for generalization in model-based reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 5757\u20135766."},{"key":"e_1_3_4_193_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2007.05402"},{"key":"e_1_3_4_194_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0236178"},{"key":"e_1_3_4_195_2","doi-asserted-by":"publisher","unstructured":"K. Lei B. Zhang Y. Li M. Yang and Y. Shen. 2020. Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading. Expert Systems with Applications 140 C (2020) 112872. 10.1016\/j.eswa.2019.112872","DOI":"10.1016\/j.eswa.2019.112872"},{"key":"e_1_3_4_196_2","volume-title":"International Conference on Learning Representations (ICLR\u201919)","author":"Levy A.","unstructured":"A. Levy, G. D. Konidaris, R. Platt, and K. Saenko. 2019. Learning multi-level hierarchies with hindsight. In International Conference on Learning Representations (ICLR\u201919). arXiv e-print 1712.00948. https:\/\/openreview.net\/forum?id=ryzECoAcY7"},{"key":"e_1_3_4_197_2","volume-title":"InProceedings of the 2006 IEEE International Joint Conference on Neural Network Proceedings . IEEE, 534\u2013541","author":"Li J.","unstructured":"J. Li and L. Chan. 2006. Reward adjustment reinforcement learning for risk-averse asset allocation. InProceedings of the 2006 IEEE International Joint Conference on Neural Network Proceedings . IEEE, 534\u2013541."},{"key":"e_1_3_4_198_2","volume-title":"Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE, 233\u2013240","author":"Li H.","unstructured":"H. Li, C. H. Dagli, and D. Enke. 2007. Short-term stock market timing prediction under reinforcement learning schemes. In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE, 233\u2013240."},{"key":"e_1_3_4_199_2","volume-title":"Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS\u201909), D. van Dyk and M. Welling (Eds.). Proceedings of Machine Learning Research","volume":"5","author":"Li Y.","unstructured":"Y. Li, C. Szepesv\u00e1ri, and D. Schuurmans. 2009. Learning exercise policies for American options. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS\u201909), D. van Dyk and M. Welling (Eds.). Proceedings of Machine Learning Research, Vol. 5. Hilton Clearwater Beach Resort, Clearwater Beach, FL, USA, 352\u2013359. https:\/\/proceedings.mlr.press\/v5\/li09d.html"},{"key":"e_1_3_4_200_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1907.01503"},{"key":"e_1_3_4_201_2","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems.","author":"Li S.","unstructured":"S. Li, R. Wang, M. Tang, and C. Zhang. 2019. Hierarchical reinforcement learning with advantage-based auxiliary rewards. In Proceedings of the 33rd International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_202_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-019-00773-w"},{"key":"e_1_3_4_203_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490354.3494376"},{"key":"e_1_3_4_204_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10479-016-2377-z"},{"key":"e_1_3_4_205_2","doi-asserted-by":"publisher","unstructured":"Z. Liang H. Chen J. Zhu K. Jiang and Y. Li. 2018. Adversarial deep reinforcement learning in portfolio management. arXiv e-print 1808.09940. 10.48550\/arXiv.1808.09940","DOI":"10.48550\/arXiv.1808.09940"},{"key":"e_1_3_4_206_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1509.02971"},{"key":"e_1_3_4_207_2","volume-title":"ESANN 2018-Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. ESANN, 521\u2013526","author":"Lim Y. S.","unstructured":"Y. S. Lim and D. Gorse. 2018. Reinforcement learning for high-frequency market making. In ESANN 2018-Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. ESANN, 521\u2013526."},{"key":"e_1_3_4_208_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.106218"},{"key":"e_1_3_4_209_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992699"},{"key":"e_1_3_4_210_2","volume-title":"Proceedings of the International Conference on International Joint Conferences on Artificial Intelligence. 4548\u20134554","author":"Lin S.","unstructured":"S. Lin and P. A. Beling. 2020. An end-to-end optimal trade execution framework based on proximal policy optimization. In Proceedings of the International Conference on International Joint Conferences on Artificial Intelligence. 4548\u20134554."},{"key":"e_1_3_4_211_2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence 34","author":"Liu Y.","year":"2020","unstructured":"Y. Liu, Q. Liu, H. Zhao, Z. Pan, and C. Liu. 2020. Adaptive quantitative trading: An imitative deep reinforcement learning approach. In Proceedings of the AAAI Conference on Artificial Intelligence 34, 02 (2020), 2128\u20132135."},{"key":"e_1_3_4_212_2","doi-asserted-by":"publisher","DOI":"10.1093\/rfs\/1.1.41"},{"key":"e_1_3_4_213_2","doi-asserted-by":"publisher","DOI":"10.2307\/2938368"},{"key":"e_1_3_4_214_2","doi-asserted-by":"publisher","DOI":"10.1093\/rfs\/14.1.113"},{"key":"e_1_3_4_215_2","doi-asserted-by":"publisher","unstructured":"D. W. Lu. 2017. Agent-inspired trading using recurrent reinforcement learning and LSTM neural networks. arXiv e-print 1707.07338. 10.48550\/arXiv.1707.07338","DOI":"10.48550\/arXiv.1707.07338"},{"key":"e_1_3_4_216_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-021-04013-x"},{"key":"e_1_3_4_217_2","volume-title":"Proceedings of the 15th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2019. Springer International Publishing, 247\u2013258","year":"2019","unstructured":"Lucarelli Giorgio and Matteo Borrotti. 2019. A deep reinforcement learning approach for automated cryptocurrency trading. In Proceedings of the 15th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2019. Springer International Publishing, 247\u2013258."},{"key":"e_1_3_4_218_2","doi-asserted-by":"publisher","DOI":"10.1111\/jofi.12196"},{"key":"e_1_3_4_219_2","doi-asserted-by":"publisher","DOI":"10.2307\/2235156"},{"key":"e_1_3_4_220_2","volume-title":"Proceedings of the 2007 IEEE Congress on Evolutionary Computation. IEEE, 508\u2013515","author":"Mabu S.","unstructured":"S. Mabu, Y. Chen, K. Hirasawa, and J. Hu. 2007. Stock trading rules using genetic network programming with actor-critic. In Proceedings of the 2007 IEEE Congress on Evolutionary Computation. IEEE, 508\u2013515."},{"key":"e_1_3_4_221_2","volume-title":"Proceedings of the European Conference on the Applications of Evolutionary Computation . Springer","author":"Maringer D.","unstructured":"D. Maringer and T. Ramtohul. 2010. Threshold recurrent reinforcement learning model for automated trading. In Proceedings of the European Conference on the Applications of Evolutionary Computation . Springer, Berlin, 212\u2013221."},{"key":"e_1_3_4_222_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10287-011-0131-1"},{"key":"e_1_3_4_223_2","volume-title":"Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr). IEEE, 407\u2013413","author":"Maringer D.","unstructured":"D. Maringer and J. Zhang. 2014. Transition variable selection for regime switching recurrent reinforcement learning. In Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr). IEEE, 407\u2013413."},{"key":"e_1_3_4_224_2","doi-asserted-by":"publisher","DOI":"10.2307\/2975974"},{"key":"e_1_3_4_225_2","volume-title":"Proceedings of the 2009 International Joint Conference on Neural Networks. IEEE","author":"Martinez L. C.","unstructured":"L. C. Martinez, D. N. da Hora, J. R. D. M. Palotti, W. Meira, and G. L. Pappa. 2009. From an artificial neural network to a stock market day-trading system: A case study on the BM&F BOVESPA. In Proceedings of the 2009 International Joint Conference on Neural Networks. IEEE, 2006\u20132013."},{"key":"e_1_3_4_226_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-6261.2010.01578.x"},{"key":"e_1_3_4_227_2","doi-asserted-by":"publisher","DOI":"10.2307\/3003143"},{"key":"e_1_3_4_228_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1017940631555"},{"key":"e_1_3_4_229_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1312.5602"},{"key":"e_1_3_4_230_2","doi-asserted-by":"crossref","unstructured":"V. Mnih K. Kavukcuoglu D. Silver A. A. Rusu J. Veness M. G. Bellemare A. Graves M. Riedmiller A. K. Fidjeland G. Ostrovski and S. Petersen. 2015. Human-level control through deep reinforcement learning. Nature 518 7540 (2015) 529\u2013533.","DOI":"10.1038\/nature14236"},{"key":"e_1_3_4_231_2","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR","author":"Mnih V.","unstructured":"V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1928\u20131937."},{"key":"e_1_3_4_232_2","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2750356"},{"key":"e_1_3_4_233_2","volume-title":"Proceedings of the IEEE\/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr) . IEEE, 300\u2013307","author":"Moody J.","unstructured":"J. Moody and L. Wu. 1997. Optimization of trading systems and portfolios. In Proceedings of the IEEE\/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr) . IEEE, 300\u2013307."},{"key":"e_1_3_4_234_2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1099-131X(1998090)17:5\/6<441::AID-FOR707>3.0.CO;2-#"},{"key":"e_1_3_4_235_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-5625-1_10"},{"key":"e_1_3_4_236_2","doi-asserted-by":"publisher","DOI":"10.1109\/72.935097"},{"key":"e_1_3_4_237_2","doi-asserted-by":"publisher","DOI":"10.3390\/math8101640"},{"key":"e_1_3_4_238_2","volume-title":"Fair Division and Collective Welfare","author":"Moulin H.","unstructured":"H. Moulin. 2004. Fair Division and Collective Welfare. MIT Press."},{"key":"e_1_3_4_239_2","volume-title":"Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications","author":"Murphy J. J.","unstructured":"J. J. Murphy. 1999. Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications. Penguin."},{"key":"e_1_3_4_240_2","volume-title":"Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6292\u20136299","author":"Nair A.","unstructured":"A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, and P. Abbeel. 2018. Overcoming exploration in reinforcement learning with demonstrations. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6292\u20136299."},{"key":"e_1_3_4_241_2","volume-title":"Proceedings of the International Conference on Database and Expert Systems Applications . Springer, Cham, 167\u2013180","author":"Nan A.","unstructured":"A. Nan, A. Perumal, and O. R. Zaiane. 2022. Sentiment and knowledge based algorithmic trading with deep reinforcement learning. In Proceedings of the International Conference on Database and Expert Systems Applications . Springer, Cham, 167\u2013180."},{"key":"e_1_3_4_242_2","volume-title":"Proceedings of the International Conference on Neural Information Processing Systems. 952\u2013958","author":"Neuneier R.","year":"1996","unstructured":"R. Neuneier. 1996. Optimal asset allocation using adaptive dynamic programming. Proceedings of the International Conference on Neural Information Processing Systems. 952\u2013958."},{"key":"e_1_3_4_243_2","volume-title":"Proceedings of the International Conference on Neural Information Processing Systems. 936\u2013942","author":"Neuneier R.","year":"1998","unstructured":"R. Neuneier. 1998. Enhancing q-learning for optimal asset allocation. In Proceedings of the International Conference on Neural Information Processing Systems. 936\u2013942."},{"key":"e_1_3_4_244_2","volume-title":"Proceedings of the 23rd International Conference on Machine Learning. 673\u2013680","author":"Nevmyvaka Y.","unstructured":"Y. Nevmyvaka, Y. Feng, and M. Kearns. 2006. Reinforcement learning for optimised trade execution. In Proceedings of the 23rd International Conference on Machine Learning. 673\u2013680."},{"key":"e_1_3_4_245_2","volume-title":"Proceedings of the International Conference on Machine Learning (1","author":"Ng A. Y.","year":"2000","unstructured":"A. Y. Ng and S. Russell. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the International Conference on Machine Learning (1 (2000), 2)."},{"key":"e_1_3_4_246_2","doi-asserted-by":"publisher","DOI":"10.1080\/1350486X.2022.2077783"},{"key":"e_1_3_4_247_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1755-053X.2012.01190.x"},{"key":"e_1_3_4_248_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.614"},{"key":"e_1_3_4_249_2","doi-asserted-by":"publisher","unstructured":"A. M. Ozbayoglu M. U. Gudelek and O. B. Sezer. 2020. Deep learning for financial applications: A survey. Applied Soft Computing 93 C (2020) 106384. 10.1016\/j.asoc.2020.106384","DOI":"10.1016\/j.asoc.2020.106384"},{"key":"e_1_3_4_250_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_3_4_251_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-6419.2007.00519.x"},{"key":"e_1_3_4_252_2","doi-asserted-by":"publisher","unstructured":"H. Park M. K. Sim and D. G. Choi. 2020. An intelligent financial portfolio trading strategy using deep Q-learning. Expert Systems with Applications 158 C (2020) 113573. 10.1016\/j.eswa.2020.113573","DOI":"10.1016\/j.eswa.2020.113573"},{"key":"e_1_3_4_253_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1812.10252"},{"key":"e_1_3_4_254_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6730\u20136739","author":"Pei W.","unstructured":"W. Pei, T. Baltrusaitis, D. M. Tax, and L. P. Morency. 2017. Temporal attention-gated model for robust sequence classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6730\u20136739."},{"key":"e_1_3_4_255_2","doi-asserted-by":"publisher","unstructured":"P. C. Pendharkar and P. Cusatis. 2018. Trading financial indices with reinforcement learning agents. Expert Systems with Applications 103 C (2018) 1\u201313. 10.1016\/j.eswa.2018.02.032","DOI":"10.1016\/j.eswa.2018.02.032"},{"key":"e_1_3_4_256_2","unstructured":"J. Peters D. Janzing and B. Sch\u00f6lkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms . The MIT Press 288."},{"key":"e_1_3_4_257_2","unstructured":"D. N. Perkins and G. Salomon. 1992. Transfer of learning. In International Encyclopedia of Education (2nd ed. Vol. 2) Torsten Hus\u00e9n and T. Neville Postlethwaite (Eds.). Pergamon Press Oxford UK 6452\u20136457."},{"key":"e_1_3_4_258_2","volume-title":"Fractal Market Analysis: Applying Chaos Theory to Investment and Economics","author":"Peters E. E.","unstructured":"E. E. Peters. 1994. Fractal Market Analysis: Applying Chaos Theory to Investment and Economics, Vol. 24. John Wiley & Sons."},{"key":"e_1_3_4_259_2","unstructured":"J. P\u00e9zier and A. White. 2006. The Relative Merits of Investable Hedge Fund Indices and of Funds of Hedge Funds in Optimal Passive Portfolios (No. icma-dp2006-10). Henley Business School Reading University."},{"key":"e_1_3_4_260_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2201.02135"},{"key":"e_1_3_4_261_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCAS.2006.1688199"},{"key":"e_1_3_4_262_2","doi-asserted-by":"publisher","DOI":"10.1134\/S1064226919120131"},{"key":"e_1_3_4_263_2","doi-asserted-by":"publisher","unstructured":"Y. Qiu Y. Qiu Y. Yuan Z. Chen and R. Lee. 2021. QF-TraderNet: Intraday trading via deep reinforcement with quantum price-level.based profit-and-loss control. Frontiers in Artificial Intelligence 4 Article 749878 (Oct. 2021). 10.3389\/frai.2021.749878","DOI":"10.3389\/frai.2021.749878"},{"key":"e_1_3_4_264_2","volume-title":"InProceedings of the Conference on Robot Learning. PMLR, 3654\u20133671","author":"Rafailov R.","unstructured":"R. Rafailov, K. B. Hatch, V. Kolev, J. D. Martin, M. Phielipp, and C. Finn. 2023. MOTO: Offline pre-training to online fine-tuning for model-based robot learning. InProceedings of the Conference on Robot Learning. PMLR, 3654\u20133671."},{"key":"e_1_3_4_265_2","doi-asserted-by":"publisher","DOI":"10.1109\/18.119724"},{"key":"e_1_3_4_266_2","doi-asserted-by":"publisher","DOI":"10.21314\/JOR.2000.038"},{"key":"e_1_3_4_267_2","volume-title":"Proceedings of the 13th International Conference on Artificial Intelligence and Statistics . JMLR Workshop and Conference Proceedings, 661\u2013668","author":"Ross S.","unstructured":"S. Ross and D. Bagnell. 2010. Efficient reductions for imitation learning. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics . JMLR Workshop and Conference Proceedings, 661\u2013668."},{"key":"e_1_3_4_268_2","volume-title":"Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 627\u2013635","author":"Ross S.","unstructured":"S. Ross, G. Gordon, and D. Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 627\u2013635."},{"key":"e_1_3_4_269_2","doi-asserted-by":"crossref","unstructured":"D. E. Rumelhart G. E. Hinton and R. J. Williams. 1985. Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science.","DOI":"10.21236\/ADA164453"},{"key":"e_1_3_4_270_2","unstructured":"G. A. Rummery and M. Niranjan. 1994. On-Line Q-Learning Using Connectionist Systems. Vol. 37. University of Cambridge Department of Engineering Cambridge England 20."},{"key":"e_1_3_4_271_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1511.06295"},{"key":"e_1_3_4_272_2","doi-asserted-by":"publisher","DOI":"10.3390\/app10041506"},{"key":"e_1_3_4_273_2","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4018\u20134030","author":"Sawhney R.","unstructured":"R. Sawhney, A. Wadhwa, S. Agarwal, and R. Shah. 2021. Quantitative day trading from natural language using reinforcement learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4018\u20134030."},{"key":"e_1_3_4_274_2","doi-asserted-by":"publisher","unstructured":"T. Schaul J. Quan I. Antonoglou and D. Silver. 2015. Prioritised experience replay. arXiv e-print 1511.05952. 10.48550\/arXiv.1511.05952","DOI":"10.48550\/arXiv.1511.05952"},{"key":"e_1_3_4_276_2","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR","author":"Schulman J.","unstructured":"J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. 2015. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning. PMLR, 1889\u20131897."},{"key":"e_1_3_4_277_2","doi-asserted-by":"publisher","unstructured":"J. Schulman F. Wolski P. Dhariwal A. Radford and O. Klimov. 2017. Proximal policy optimisation algorithms. arXiv e-print 1707.06347. 10.48550\/arXiv.1707.06347","DOI":"10.48550\/arXiv.1707.06347"},{"key":"e_1_3_4_278_2","volume-title":"Proceedings of the 34th Conference on Neural Information Processing Systems. 12968\u201312979","author":"Seo Y.","unstructured":"Y. Seo, K. Lee, I. Clavera Gilaberte, T. Kurutach, J. Shin, and P. Abbeel. 2020. Trajectory-wise multiple choice learning for dynamics generalization in reinforcement learning. In Proceedings of the 34th Conference on Neural Information Processing Systems. 12968\u201312979."},{"key":"e_1_3_4_279_2","doi-asserted-by":"publisher","DOI":"10.3905\/jpm.1994.409501"},{"key":"e_1_3_4_280_2","doi-asserted-by":"publisher","unstructured":"A. Shavandi and M. Khedmati. 2022. A multi-agent deep reinforcement learning framework for algorithmic trading in financial markets. Expert Systems with Applications 208 C (2022) 118124. 10.1016\/j.eswa.2022.118124","DOI":"10.1016\/j.eswa.2022.118124"},{"key":"e_1_3_4_281_2","volume-title":"Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering and Economics (CIFEr). IEEE, 391\u2013398","author":"Shen Y.","unstructured":"Y. Shen, R. Huang, C. Yan, and K. Obermayer. 2014. Risk-averse reinforcement learning for algorithmic trading. In Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering and Economics (CIFEr). IEEE, 391\u2013398."},{"key":"e_1_3_4_282_2","volume-title":"Proceedings of the International Workshop on Agent-Mediated Electronic Commerce. Springer","author":"Sherstov A. A.","unstructured":"A. A. Sherstov and P. Stone. 2004. Three automated stock-trading agents: A comparative study. In Proceedings of the International Workshop on Agent-Mediated Electronic Commerce. Springer, Berlin, 173\u2013187."},{"key":"e_1_3_4_283_2","volume-title":"Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1613\u20131622","author":"Shi S.","unstructured":"S. Shi, J. Li, G. Li, and P. Pan. 2019. A multi-scale temporal feature aggregation convolutional neural network for portfolio management. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1613\u20131622."},{"key":"e_1_3_4_284_2","volume-title":"Proceedings of the 2017 10th International Symposium on Computational Intelligence and Design (ISCID).","volume":"2","author":"Si W.","unstructured":"W. Si, J. Li, P. Ding, and R. Rao. 2017. A multi-objective deep reinforcement learning approach for stock index future\u2019 s intraday trading. In Proceedings of the 2017 10th International Symposium on Computational Intelligence and Design (ISCID). Vol. 2 , IEEE, 431\u2013436."},{"key":"e_1_3_4_285_2","volume-title":"Proceedings of the International Conference on Machine Learning . PMLR, 387\u2013395","author":"Silver D.","unstructured":"D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. 2014. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning . PMLR, 387\u2013395."},{"key":"e_1_3_4_286_2","volume-title":"J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, and S. Dieleman.","author":"Silver D.","year":"2016","unstructured":"D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, and S. Dieleman. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484\u2013489."},{"key":"e_1_3_4_287_2","doi-asserted-by":"crossref","unstructured":"D. Silver T. Hubert J. Schrittwieser I. Antonoglou M. Lai A. Guez M. Lanctot L. Sifre D. Kumaran T. Graepel and T. Lillicrap. 2018. A general reinforcement learning algorithm that masters chess shogi and Go through self-play. Science 362 6419 (2018) 1140\u20131144.","DOI":"10.1126\/science.aar6404"},{"key":"e_1_3_4_288_2","doi-asserted-by":"publisher","unstructured":"T. Spooner J. Fearnley R. Savani and A. Koukorinis. 2018. Market making via reinforcement learning. arXiv e-print 1804.04216. 10.48550\/arXiv.1804.04216","DOI":"10.48550\/arXiv.1804.04216"},{"key":"e_1_3_4_289_2","doi-asserted-by":"publisher","unstructured":"T. Spooner and R. Savani. 2020. Robust market making via adversarial reinforcement learning. arXiv e-print 2003.01820. 10.48550\/arXiv.2003.01820","DOI":"10.48550\/arXiv.2003.01820"},{"key":"e_1_3_4_290_2","doi-asserted-by":"crossref","unstructured":"P. Spirtes C. N. Glymour R. Scheines and D. Heckerman. 2000. Causation Prediction and Search. MIT Press.","DOI":"10.7551\/mitpress\/1754.001.0001"},{"key":"e_1_3_4_291_2","doi-asserted-by":"publisher","unstructured":"F. Soleymani and E. Paquet. 2021. Deep graph convolutional reinforcement learning for financial portfolio management\u2013DeepPocket. Expert Systems with Applications 182 C (2021) 115127. 10.1016\/j.eswa.2021.115127","DOI":"10.1016\/j.eswa.2021.115127"},{"key":"e_1_3_4_292_2","article-title":"Robust FOREX trading with deep q network (DQN)","volume":"39","author":"Sornmayura S.","year":"2019","unstructured":"S. Sornmayura. 2019. Robust FOREX trading with deep q network (DQN). ABAC Journal 39, 1 (2019).","journal-title":"ABAC Journal"},{"key":"e_1_3_4_293_2","doi-asserted-by":"publisher","DOI":"10.3905\/joi.3.3.59"},{"key":"e_1_3_4_294_2","doi-asserted-by":"publisher","DOI":"10.3905\/jwm.1999.320359"},{"key":"e_1_3_4_295_2","first-page":"393","article-title":"Credit rationing in markets with imperfect information","volume":"71","author":"Stiglitz J. E.","year":"1981","unstructured":"J. E. Stiglitz and A. Weiss. 1981. Credit rationing in markets with imperfect information. The American Economic Review 71, 3 (1981), 393\u2013410.","journal-title":"The American Economic Review"},{"key":"e_1_3_4_296_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1712.06567"},{"key":"e_1_3_4_297_2","doi-asserted-by":"publisher","DOI":"10.1111\/0022-1082.00163"},{"key":"e_1_3_4_298_2","volume-title":"Temporal Credit Assignment in Reinforcement Learning","author":"Sutton R. S.","unstructured":"R. S. Sutton. 1984. Temporal Credit Assignment in Reinforcement Learning. University of Massachusetts Amherst."},{"key":"e_1_3_4_299_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022633531479"},{"key":"e_1_3_4_300_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton R. S.","year":"2018","unstructured":"R. S. Sutton and A. G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_3_4_301_2","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems.","author":"Sutskever I.","unstructured":"I. Sutskever, O. Vinyals, and Q. V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_302_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20139.","author":"Szegedy C.","unstructured":"C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20139."},{"key":"e_1_3_4_303_2","doi-asserted-by":"publisher","unstructured":"M. Taghian A. Asadi and R. Safabakhsh. 2022. Learning financial asset-specific trading rules via deep reinforcement learning. Expert Systems with Applications 195 C (2022) 116523. 10.1016\/j.eswa.2022.116523","DOI":"10.1016\/j.eswa.2022.116523"},{"key":"e_1_3_4_304_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"e_1_3_4_305_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2010.09.001"},{"key":"e_1_3_4_306_2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence.","volume":"32","author":"Tavakoli A.","unstructured":"A. Tavakoli, F. Pardo, and P. Kormushev. 2018. Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32 , 1."},{"key":"e_1_3_4_307_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-6261.2007.01232.x"},{"key":"e_1_3_4_308_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-6261.2008.01362.x"},{"key":"e_1_3_4_309_2","doi-asserted-by":"publisher","unstructured":"T. Th\u00e9ate and D. Ernst. 2021. An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications 173 C (2021) 114632. 10.1016\/j.eswa.2021.114632","DOI":"10.1016\/j.eswa.2021.114632"},{"key":"e_1_3_4_310_2","volume-title":"Proceedings of the 9th International Conference on Neural Information Processing Systems.","author":"Thrun S.","year":"1995","unstructured":"S. Thrun. 1995. Is learning the n-th thing any easier than learning the first?. In Proceedings of the 9th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_311_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2997523"},{"key":"e_1_3_4_312_2","doi-asserted-by":"publisher","unstructured":"A. Tsantekidis N. Passalis and A. Tefas. 2021. Diversity-driven knowledge distillation for financial trading using deep reinforcement learning. Neural Networks 140 C (2021) 193\u2013202. 10.1016\/j.neunet.2021.02.026","DOI":"10.1016\/j.neunet.2021.02.026"},{"key":"e_1_3_4_313_2","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems.","author":"Vaswani A.","unstructured":"A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, \u0141. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_314_2","doi-asserted-by":"crossref","unstructured":"O. Vinyals I. Babuschkin W. M. Czarnecki M. Mathieu A. Dudzik J. Chung D. H. Choi R. Powell T. Ewalds P. Georgiev and J. Oh. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575 7782 (2019) 350\u2013354.","DOI":"10.1038\/s41586-019-1724-z"},{"key":"e_1_3_4_315_2","volume-title":"Proceedings of the 1st ACM International Conference on AI in Finance. 1\u20138.","author":"Vittori E.","unstructured":"E. Vittori, M. Trapletti, and M. Restelli. 2020. Option hedging with risk averse reinforcement learning. In Proceedings of the 1st ACM International Conference on AI in Finance. 1\u20138."},{"key":"e_1_3_4_316_2","volume-title":"InProceedings of the Workshops at the 29th AAAI Conference on Artificial Intelligence.","author":"Wang Z.","unstructured":"Z. Wang and T. Oates. 2015. Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. InProceedings of the Workshops at the 29th AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_4_317_2","volume-title":"InProceedings of the International Conference on Machine Learning. PMLR","author":"Wang Z.","unstructured":"Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas. 2016. Dueling network architectures for deep reinforcement learning. InProceedings of the International Conference on Machine Learning. PMLR, 1995\u20132003."},{"key":"e_1_3_4_318_2","volume-title":"Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 499\u2013508","author":"Wang J.","unstructured":"J. Wang, Q. Gu, J. Wu, G. Liu, and Z. Xiong. 2016. Traffic speed prediction and congestion source exploration: A deep learning method. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 499\u2013508."},{"key":"e_1_3_4_319_2","volume-title":"et\u00a0al","author":"Wang Y.","year":"2017","unstructured":"Y. Wang, D. Wang, S. Zhang, et\u00a0al. 2017. Deep Q-trading. Tech. Rep., CSLT, Tsinghua Univ., Beijing, China."},{"key":"e_1_3_4_320_2","volume-title":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2437\u20132446","author":"Wang J.","unstructured":"J. Wang, Z. Wang, J. Li, and J. Wu. 2018. Multilevel wavelet decomposition network for interpretable time series analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2437\u20132446."},{"key":"e_1_3_4_321_2","volume-title":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1900\u20131908","author":"Wang J.","unstructured":"J. Wang, Y. Zhang, K. Tang, J. Wu, and Z. Xiong. 2019. Alphastock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1900\u20131908."},{"key":"e_1_3_4_322_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1907.11718"},{"key":"e_1_3_4_323_2","doi-asserted-by":"publisher","DOI":"10.1111\/mafi.12281"},{"key":"e_1_3_4_324_2","volume-title":"InProceedings of the AAAI Conference on Artificial Intelligence 35","author":"Wang Z.","year":"2021","unstructured":"Z. Wang, B. Huang, S. Tu, K. Zhang, and L. Xu. 2021. DeepTrader: A deep reinforcement learning approach for risk-return balanced portfolio management with market conditions embedding. InProceedings of the AAAI Conference on Artificial Intelligence 35, 1 (2021), 643\u2013650."},{"key":"e_1_3_4_325_2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence 35","author":"Wang R.","year":"2021","unstructured":"R. Wang, H. Wei, B. An, Z. Feng, and J. Yao. 2021. Commission fee is not enough: A hierarchical reinforced framework for portfolio management. In Proceedings of the AAAI Conference on Artificial Intelligence 35, 1 (2021), 626\u2013633."},{"key":"e_1_3_4_326_2","volume-title":"Proceedings of the 2021 International Conference on Applied Artificial Intelligence (ICAPAI). IEEE, 1\u20137.","author":"Wang C.","unstructured":"C. Wang, P. Sand\u00e5s, and P. Beling. 2021. Improving pairs trading strategies via reinforcement learning. In Proceedings of the 2021 International Conference on Applied Artificial Intelligence (ICAPAI). IEEE, 1\u20137."},{"key":"e_1_3_4_327_2","volume-title":"Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 365\u2013372","author":"Wang H.","unstructured":"H. Wang and S. Yu. 2021. Robo-advising: Enhancing investment with inverse optimization and deep reinforcement learning. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 365\u2013372."},{"key":"e_1_3_4_329_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_3_4_330_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1910.03743"},{"key":"e_1_3_4_331_2","doi-asserted-by":"publisher","unstructured":"L. Weng X. Sun M. Xia J. Liu and Y. Xu. 2020. Portfolio trading system of digital currencies: A deep reinforcement learning with multidimensional attention gating mechanism. Neurocomputing 402 C (2020) 171\u2013182. 10.1016\/j.neucom.2020.04.004","DOI":"10.1016\/j.neucom.2020.04.004"},{"key":"e_1_3_4_332_2","doi-asserted-by":"publisher","DOI":"10.1016\/0165-4896(81)90018-4"},{"key":"e_1_3_4_333_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_4_334_2","doi-asserted-by":"publisher","DOI":"10.1145\/2601248.2601268"},{"key":"e_1_3_4_335_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(05)80023-1"},{"key":"e_1_3_4_336_2","doi-asserted-by":"publisher","unstructured":"X. Wu H. Chen J. Wang L. Troiano V. Loia and H. Fujita. 2020. Adaptive stock trading strategies with deep reinforcement learning methods. Information Sciences 538 C (2020) 142\u2013158. 10.1016\/j.ins.2020.05.066","DOI":"10.1016\/j.ins.2020.05.066"},{"key":"e_1_3_4_337_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1811.07522"},{"key":"e_1_3_4_338_2","volume-title":"Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 4647\u20134653","author":"Xu K.","unstructured":"K. Xu, Y. Zhang, D. Ye, P. Zhao, and M. Tan. 2021. Relation-aware transformer for portfolio policy learning. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 4647\u20134653."},{"key":"e_1_3_4_339_2","doi-asserted-by":"publisher","DOI":"10.1086\/209650"},{"key":"e_1_3_4_340_2","doi-asserted-by":"publisher","unstructured":"S. Y. Yang Y. Yu and S. Almahdi. 2018. An investor sentiment reward-based trading system using Gaussian inverse reinforcement learning algorithm. Expert Systems with Applications 114 C (2018) 388\u2013401. 10.1016\/j.eswa.2018.07.056","DOI":"10.1016\/j.eswa.2018.07.056"},{"key":"e_1_3_4_341_2","volume-title":"Proceedings of the 1st ACM International Conference on AI in Finance. 1\u20138.","author":"Yang H.","unstructured":"H. Yang, X. Y. Liu, S. Zhong, and A. Walid. 2020. Deep reinforcement learning for automated stock trading: An ensemble strategy. In Proceedings of the 1st ACM International Conference on AI in Finance. 1\u20138."},{"key":"e_1_3_4_342_2","volume-title":"InProceedings of the AAAI Conference on Artificial Intelligence 34","author":"Ye Y.","year":"2020","unstructured":"Y. Ye, H. Pei, B. Wang, P. Y. Chen, Y. Zhu, J. Xiao, and B. Li. 2020. Reinforcement-learning based portfolio management with augmented asset movement prediction states. InProceedings of the AAAI Conference on Artificial Intelligence 34, 01 (2020), 1112\u20131119."},{"key":"e_1_3_4_343_2","doi-asserted-by":"publisher","unstructured":"P. Yu J. S. Lee I. Kulyatin Z. Shi and S. Dasgupta. 2019. Model-based deep reinforcement learning for dynamic portfolio optimisation. arXiv e-print 1901.08740. 10.48550\/arXiv.1901.08740","DOI":"10.48550\/arXiv.1901.08740"},{"key":"e_1_3_4_344_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ribaf.2023.101879"},{"key":"e_1_3_4_345_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics9091384"},{"key":"e_1_3_4_346_2","volume-title":"Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 3067\u20133071","author":"Zarkias K. S.","unstructured":"K. S. Zarkias, N. Passalis, A. Tsantekidis, and A. Tefas. 2019. Deep reinforcement learning for financial trading using price trailing. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 3067\u20133071."},{"key":"e_1_3_4_347_2","volume-title":"Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation. 1757\u20131758","author":"Zhang J.","unstructured":"J. Zhang and D. Maringer. 2013. Indicator selection for daily equity trading with recurrent reinforcement learning. In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation. 1757\u20131758."},{"key":"e_1_3_4_348_2","volume-title":"Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1449\u20131453","author":"Zhang J.","unstructured":"J. Zhang and D. Maringer. 2014. Two parameter update schemes for recurrent reinforcement learning. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1449\u20131453."},{"key":"e_1_3_4_349_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10614-015-9490-y"},{"key":"e_1_3_4_350_2","doi-asserted-by":"publisher","DOI":"10.3905\/jfds.2020.1.030"},{"key":"e_1_3_4_351_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.2979700"},{"key":"e_1_3_4_352_2","volume-title":"Proceedings of the 2nd ACM International Conference on AI in Finance. 1\u20139.","author":"Zhao M.","unstructured":"M. Zhao and V. Linetsky. 2021. High frequency automated market making algorithms with adverse selection risk control via reinforcement learning. In Proceedings of the 2nd ACM International Conference on AI in Finance. 1\u20139."},{"key":"e_1_3_4_353_2","volume-title":"Proceedings of the ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems.","author":"Zhao K.","unstructured":"K. Zhao, Y. Ma, J. Liu, H. A. O. Jianye, Y. Zheng, and Z. Meng. 2023. Improving offline-to-online reinforcement learning with q-ensembles. In Proceedings of the ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems."},{"key":"e_1_3_4_354_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.3004555"},{"key":"e_1_3_4_355_2","volume-title":"Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 4461\u20134468","author":"Zhong Y.","unstructured":"Y. Zhong, Y. M. Bergstrom, and A. Ward. 2021. Data-driven market-making via model-free learning. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 4461\u20134468."},{"key":"e_1_3_4_356_2","volume-title":"Ensemble methods: Foundations and algorithms","author":"Zhou Z. H.","unstructured":"Z. H. Zhou. 2012. Ensemble methods: Foundations and algorithms. CRC Press."},{"key":"e_1_3_4_357_2","volume-title":"Proceedings of the International Conference on Neural Information Processing. Springer, Cham, 335\u2013346","author":"Zhu Y.","unstructured":"Y. Zhu, H. Yang, J. Jiang, and Q. Huang. 2018. An adaptive box-normalization stock index trading strategy based on reinforcement learning. In Proceedings of the International Conference on Neural Information Processing. Springer, Cham, 335\u2013346."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3733714","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T12:54:56Z","timestamp":1749646496000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3733714"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,11]]},"references-count":353,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3733714"],"URL":"https:\/\/doi.org\/10.1145\/3733714","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,11]]},"assertion":[{"value":"2023-07-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}