{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T19:35:21Z","timestamp":1774553721718,"version":"3.50.1"},"reference-count":31,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,7,28]],"date-time":"2019-07-28T00:00:00Z","timestamp":1564272000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Recently there has been an exponential increase in the use of artificial intelligence for trading in financial markets such as stock and forex. Reinforcement learning has become of particular interest to financial traders ever since the program AlphaGo defeated the strongest human contemporary Go board game player Lee Sedol in 2016. We systematically reviewed all recent stock\/forex prediction or trading articles that used reinforcement learning as their primary machine learning method. All reviewed articles had some unrealistic assumptions such as no transaction costs, no liquidity issues and no bid or ask spread issues. Transaction costs had significant impacts on the profitability of the reinforcement learning algorithms compared with the baseline algorithms tested. Despite showing statistically significant profitability when reinforcement learning was used in comparison with baseline models in many studies, some showed no meaningful level of profitability, in particular with large changes in the price pattern between the system training and testing data. Furthermore, few performance comparisons between reinforcement learning and other sophisticated machine\/deep learning models were provided. The impact of transaction costs, including the bid\/ask spread on profitability has also been assessed. In conclusion, reinforcement learning in stock\/forex trading is still in its early development and further research is needed to make it a reliable method in this domain.<\/jats:p>","DOI":"10.3390\/data4030110","type":"journal-article","created":{"date-parts":[[2019,7,29]],"date-time":"2019-07-29T03:06:58Z","timestamp":1564369618000},"page":"110","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":110,"title":["Reinforcement Learning in Financial Markets"],"prefix":"10.3390","volume":"4","author":[{"given":"Terry Lingze","family":"Meng","sequence":"first","affiliation":[{"name":"School of Computer Science, Building J12, University of Sydney, 1 Cleveland Street, Darlington, NSW 2006, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7792-2327","authenticated-orcid":false,"given":"Matloob","family":"Khushi","sequence":"additional","affiliation":[{"name":"School of Computer Science, Building J12, University of Sydney, 1 Cleveland Street, Darlington, NSW 2006, Australia"}]}],"member":"1968","published-online":{"date-parts":[[2019,7,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Khushi, M., Dean, I.M., Teber, E.T., Chircop, M., Arthur, J.W., and Flores-Rodriguez, N. (2017). Automated classification and characterization of the mitotic spindle following knockdown of a mitosis-related protein. BMC Bioinform., 18.","DOI":"10.1186\/s12859-017-1966-4"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1561\/0600000035","article-title":"Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning","volume":"7","author":"Criminisi","year":"2012","journal-title":"Found. Trends\u00ae Comput. Graph. Vis."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Khalid, S., Khalil, T., and Nasreen, S. (2014, January 7\u201310). A survey of feature selection and feature extraction techniques in machine learning. Proceedings of the 2014 Science and Information Conference, Warsaw, Poland.","DOI":"10.1109\/SAI.2014.6918213"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Khushi, M., Choudhury, N., Arthur, J.W., Clarke, C.L., and Graham, J.D. (2018). Predicting Functional Interactions Among DNA-Binding Proteins. 25th International Conference on Neural Information Processing, Springer.","DOI":"10.1007\/978-3-030-04221-9_7"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"8259","DOI":"10.1007\/s00500-017-2768-3","article-title":"Forecasting financial indicators by generalized behavioral learning method","volume":"22","year":"2018","journal-title":"Soft Comput."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1109\/72.935097","article-title":"Learning to Trade via Direct Reinforcement","volume":"12","year":"2001","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_7","unstructured":"Saffell, J.M. (1999). Reinforcement Learning for Trading. Advances in Neural Information Processing Systems 11, MIT Press."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1002\/(SICI)1099-131X(1998090)17:5\/6<441::AID-FOR707>3.0.CO;2-#","article-title":"Performance functions and reinforcement learning for trading systems and portfolios","volume":"17","author":"Moody","year":"1998","journal-title":"J. Forecast."},{"key":"ref_9","unstructured":"Kanwar, N. (2019). Deep Reinforcement Learning-Based Portfolio Management, The University of Texas at Arlington."},{"key":"ref_10","unstructured":"Cumming, J. (2015). An Investigation into the Use of Reinforcement Learning Techniques within the Algorithmic Trading Domain, Imperial College London."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.eswa.2018.02.032","article-title":"Trading financial indices with reinforcement learning agents","volume":"103","author":"Pendharkar","year":"2018","journal-title":"Expert Syst. Appl."},{"key":"ref_12","first-page":"1032","article-title":"Estimating the Maximum Expected Value through Gaussian Approximation","volume":"48","author":"Restelli","year":"2016","journal-title":"Int. Conf. Mach. Learn."},{"key":"ref_13","first-page":"1","article-title":"Q-Learning and SARSA: A comparison between two intelligent stochastic control approaches for financial trading","volume":"15","year":"2015","journal-title":"Univ. Ca\u2019 Foscari Venice Dept. Econ. Res. Pap."},{"key":"ref_14","first-page":"15","article-title":"Robust forex trading with deep q network (dqn)","volume":"39","author":"Sornmayura","year":"2019","journal-title":"Assumpt. Bus. Adm. Coll."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"864","DOI":"10.1109\/TSMCA.2007.904825","article-title":"A Multiagent Approach to Q-Learning for Daily Stock Trading","volume":"37","author":"Lee","year":"2007","journal-title":"IEEE Trans. Syst. ManCybern. -Part A Syst. Hum."},{"key":"ref_16","unstructured":"Elder, T. (2008). Creating Algorithmic Traders with Hierarchical Reinforcement Learning, University of Edinburgh."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Dietmar Maringer, T.R. (2010). Threshold Recurrent Reinforcement Learning Model for Automated Trading, Springer.","DOI":"10.1007\/978-3-642-12242-2_22"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Li, H., Dagli, C.H., and Enke, D. (2007, January 1\u20135). Short-term Stock Market Timing Prediction under Reinforcement Learning Schemes. Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, Honolulu, HI, USA.","DOI":"10.1109\/ADPRL.2007.368193"},{"key":"ref_19","unstructured":"Faratin, P. (2004). Three automated stock-trading agents: A comparative study. Agent-Mediated Electronic Commerce VI, Springer."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1016\/j.eswa.2019.03.055","article-title":"Continuous-time reinforcement learning approach for portfolio management with time penalization","volume":"129","author":"Carsteanu","year":"2019","journal-title":"Elsevier Expert Syst. Appl."},{"key":"ref_21","unstructured":"Lee, J.W. (2001, January 12\u201316). Stock price prediction using reinforcement learning. Proceedings of the 2001 IEEE International Symposium on Industrial Electronics, Pusan, Korea."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.eswa.2018.09.036","article-title":"Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning","volume":"117","author":"Jeong","year":"2018","journal-title":"Expert Syst. Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"783","DOI":"10.1016\/j.asoc.2018.09.017","article-title":"Reinforcement learning applied to Forex trading","volume":"73","author":"Neves","year":"2018","journal-title":"Appl. Soft Comput."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"4741","DOI":"10.1016\/j.eswa.2010.09.001","article-title":"Stock trading with cycles: A financial application of ANFIS and reinforcement learning","volume":"38","author":"Tan","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_25","unstructured":"Lu, D.W. (2017). Agent Inspired Trading Using Recurrent Reinforcement Learning and LSTM Neural Networks. Arxiv Quant. Financ. arXiv."},{"key":"ref_26","unstructured":"Huang, C.-Y. (2018). Financial Trading as a Game: A Deep Reinforcement Learning Approach. Arxiv Quant. Financ. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1080\/00207720412331303697","article-title":"System for foreign exchange trading using genetic algorithms and reinforcement learning","volume":"35","author":"Hryshko","year":"2004","journal-title":"Int. J. Syst. Sci."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2121","DOI":"10.1016\/j.ins.2005.10.009","article-title":"Adaptive stock trading with dynamic asset allocation using reinforcement learning","volume":"176","author":"O","year":"2006","journal-title":"Inf. Sci."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Gabrielsson, P., and Johansson, U. (2015, January 7\u201310). High-Frequency Equity Index Futures Trading Using Recurrent Reinforcement Learning with Candlesticks. Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa.","DOI":"10.1109\/SSCI.2015.111"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, J., and Maringer, D. (2014, January 20). Two Parameter Update Schemes for Recurrent Reinforcement Learning. Proceedings of the 2014 IEEE Congress on Evolutionary Computation, Beijing, China.","DOI":"10.1109\/CEC.2014.6900330"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, J., and Maringer, D. (2013, January 6\u201310). Indicator selection for daily equity trading with recurrent reinforcement learning. Proceedings of the 15th annual conference companion on Genetic and evolutionary computation, Amsterdam, The Netherlands.","DOI":"10.1145\/2464576.2480773"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/4\/3\/110\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:10:37Z","timestamp":1760188237000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/4\/3\/110"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,28]]},"references-count":31,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,9]]}},"alternative-id":["data4030110"],"URL":"https:\/\/doi.org\/10.3390\/data4030110","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7,28]]}}}