{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T16:01:21Z","timestamp":1773072081824,"version":"3.50.1"},"reference-count":89,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,11,16]],"date-time":"2021-11-16T00:00:00Z","timestamp":1637020800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/L016796\/1"],"award-info":[{"award-number":["EP\/L016796\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Deep reinforcement learning (DRL) has achieved significant results in many machine learning (ML) benchmarks. In this short survey, we provide an overview of DRL applied to trading on financial markets with the purpose of unravelling common structures used in the trading community using DRL, as well as discovering common issues and limitations of such approaches. We include also a short corpus summarization using Google Scholar. Moreover, we discuss how one can use hierarchy for dividing the problem space, as well as using model-based RL to learn a world model of the trading environment which can be used for prediction. In addition, multiple risk measures are defined and discussed, which not only provide a way of quantifying the performance of various algorithms, but they can also act as (dense) reward-shaping mechanisms for the agent. We discuss in detail the various state representations used for financial markets, which we consider critical for the success and efficiency of such DRL agents. The market in focus for this survey is the cryptocurrency market; the results of this survey are two-fold: firstly, to find the most promising directions for further research and secondly, to show how a lack of consistency in the community can significantly impede research and the development of DRL agents for trading.<\/jats:p>","DOI":"10.3390\/data6110119","type":"journal-article","created":{"date-parts":[[2021,11,16]],"date-time":"2021-11-16T11:32:03Z","timestamp":1637062323000},"page":"119","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["Deep Reinforcement Learning for Trading\u2014A Critical Survey"],"prefix":"10.3390","volume":"6","author":[{"given":"Adrian","family":"Millea","sequence":"first","affiliation":[{"name":"Imperial College London, London SW7 2AZ, UK"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_2","unstructured":"Sato, Y. (2019). Model-free reinforcement learning for financial portfolios: A brief survey. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hu, Z., Zhao, Y., and Khushi, M. (2021). A survey of forex and stock price prediction using deep learning. Appl. Syst. Innov., 4.","DOI":"10.3390\/asi4010009"},{"key":"ref_4","unstructured":"Fischer, T.G. (2018). Reinforcement Learning in Financial Markets-a Survey, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics. Technical Report."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Mosavi, A., Faghan, Y., Ghamisi, P., Duan, P., Ardabili, S.F., Salwana, E., and Band, S.S. (2020). Comprehensive review of deep reinforcement learning methods and applications in economics. Mathematics, 8.","DOI":"10.31224\/osf.io\/5qfex"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Meng, T.L., and Khushi, M. (2019). Reinforcement learning in financial markets. Data, 4.","DOI":"10.3390\/data4030110"},{"key":"ref_7","first-page":"21260","article-title":"A peer-to-peer electronic cash system","volume":"4","author":"Nakamoto","year":"2008","journal-title":"Decentralized Bus. Rev."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Islam, M.R., Nor, R.M., Al-Shaikhli, I.F., and Mohammad, K.S. (2018, January 23\u201325). Cryptocurrency vs. Fiat Currency: Architecture, Algorithm, Cashflow &amp Ledger Technology on Emerging Economy: The Influential Facts of Cryptocurrency and Fiat Currency. Proceedings of the 2018 International Conference on Information and Communication Technology for the Muslim World (ICT4M), Kuala Lumpur, Malaysia.","DOI":"10.1109\/ICT4M.2018.00022"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"101075","DOI":"10.1016\/j.frl.2018.12.023","article-title":"On the speculative nature of cryptocurrencies: A study on Garman and Klass volatility measure","volume":"32","author":"Tan","year":"2020","journal-title":"Financ. Res. Lett."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wang, J., Sun, T., Liu, B., Cao, Y., and Wang, D. (2018, January 17\u201320). Financial markets prediction with deep learning. Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA.","DOI":"10.1109\/ICMLA.2018.00022"},{"key":"ref_11","unstructured":"Song, Y.G., Zhou, Y.L., and Han, R.J. (2018). Neural networks for stock price prediction. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Selvin, S., Vinayakumar, R., Gopalakrishnan, E., Menon, V.K., and Soman, K. (2017, January 13\u201316). Stock price prediction using LSTM, RNN and CNN-sliding window model. Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (Icacci), Manipal, India.","DOI":"10.1109\/ICACCI.2017.8126078"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.jfds.2018.04.003","article-title":"Stock price prediction using support vector regression on daily and up to the minute prices","volume":"4","author":"Henrique","year":"2018","journal-title":"J. Financ. Data Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1016\/j.procs.2020.03.326","article-title":"Stock closing price prediction using machine learning techniques","volume":"167","author":"Vijh","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Rathan, K., Sai, S.V., and Manikanta, T.S. (2019, January 23\u201325). Crypto-currency price prediction using decision tree and regression techniques. Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India.","DOI":"10.1109\/ICOEI.2019.8862585"},{"key":"ref_16","unstructured":"Ke, N.R., Singh, A., Touati, A., Goyal, A., Bengio, Y., Parikh, D., and Batra, D. (May, January 30). Modeling the long term future in model-based reinforcement learning. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_17","unstructured":"Moerland, T.M., Broekens, J., and Jonker, C.M. (2020). Model-based reinforcement learning: A survey. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Pant, D.R., Neupane, P., Poudel, A., Pokhrel, A.K., and Lama, B.K. (2018, January 25\u201327). Recurrent neural network based bitcoin price prediction by twitter sentiment analysis. Proceedings of the 2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS), Kathmandu, Nepal.","DOI":"10.1109\/CCCS.2018.8586824"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"47","DOI":"10.18178\/ijke.2019.5.2.116","article-title":"Sentiment Analysis of News for Effective Cryptocurrency Price Prediction","volume":"5","author":"Vo","year":"2019","journal-title":"Int. J. Knowl. Eng."},{"key":"ref_20","unstructured":"Clements, W.R., Van Delft, B., Robaglia, B.M., Slaoui, R.B., and Toth, S. (2019). Estimating risk and uncertainty in deep reinforcement learning. arXiv."},{"key":"ref_21","first-page":"1","article-title":"Forecasting and trading cryptocurrencies with machine learning under changing market conditions","volume":"7","author":"Godinho","year":"2021","journal-title":"Financ. Innov."},{"key":"ref_22","unstructured":"Suri, K., and Saurav, S. (2021, October 05). Attentive Hierarchical Reinforcement Learning for Stock Order Executions. Available online: https:\/\/github.com\/karush17\/Hierarchical-Attention-Reinforcement-Learning."},{"key":"ref_23","unstructured":"Yu, P., Lee, J.S., Kulyatin, I., Shi, Z., and Dasgupta, S. (2019). Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"17229","DOI":"10.1007\/s00521-020-05359-8","article-title":"A deep Q-learning portfolio management framework for the cryptocurrency market","volume":"32","author":"Lucarelli","year":"2020","journal-title":"Neural Comput. Appl."},{"key":"ref_25","unstructured":"Wang, R., Wei, H., An, B., Feng, Z., and Yao, J. (2020). Commission Fee is not Enough: A Hierarchical Reinforced Framework for Portfolio Management. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Gao, Y., Gao, Z., Hu, Y., Song, S., Jiang, Z., and Su, J. (2021, January 4\u20136). A Framework of Hierarchical Deep Q-Network for Portfolio Management. Proceedings of the ICAART (2), Online Streaming.","DOI":"10.5220\/0010233201320140"},{"key":"ref_27","unstructured":"Jiang, Z., Xu, D., and Liang, J. (2017). A deep reinforcement learning framework for the financial portfolio management problem. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Shi, S., Li, J., Li, G., and Pan, P. (2019, January 3\u20137). A Multi-Scale Temporal Feature Aggregation Convolutional Neural Network for Portfolio Management. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China.","DOI":"10.1145\/3357384.3357961"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Itoh, Y., and Adachi, M. (September, January 29). Chaotic time series prediction by combining echo-state networks and radial basis function networks. Proceedings of the 2010 IEEE International Workshop on Machine Learning for Signal Processing, Kittila, Finland.","DOI":"10.1109\/MLSP.2010.5589260"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"132495","DOI":"10.1016\/j.physd.2020.132495","article-title":"Data-driven predictions of the Lorenz system","volume":"408","author":"Dubois","year":"2020","journal-title":"Phys. D"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mehtab, S., and Sen, J. (2020). Stock price prediction using convolutional neural networks on a multivariate timeseries. arXiv.","DOI":"10.36227\/techrxiv.15088734"},{"key":"ref_32","unstructured":"Briola, A., Turiel, J., Marcaccioli, R., and Aste, T. (2021). Deep Reinforcement Learning for Active High Frequency Trading. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Boukas, I., Ernst, D., Th\u00e9ate, T., Bolland, A., Huynen, A., Buchwald, M., Wynants, C., and Corn\u00e9lusse, B. (2020). A deep reinforcement learning framework for continuous intraday market bidding. arXiv.","DOI":"10.1007\/s10994-021-06020-8"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Conegundes, L., and Pereira, A.C.M. (2020, January 19\u201324). Beating the Stock Market with a Deep Reinforcement Learning Day Trading System. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK.","DOI":"10.1109\/IJCNN48605.2020.9206938"},{"key":"ref_35","unstructured":"Sadighian, J. (2020). Extending Deep Reinforcement Learning Frameworks in Cryptocurrency Market Making. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"534","DOI":"10.1016\/j.asoc.2015.07.008","article-title":"Application of evolutionary computation for rule discovery in stock algorithmic trading: A literature review","volume":"36","author":"Hu","year":"2015","journal-title":"Appl. Soft Comput."},{"key":"ref_37","unstructured":"Taghian, M., Asadi, A., and Safabakhsh, R. (2020). Learning Financial Asset-Specific Trading Rules via Deep Reinforcement Learning. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bisht, K., and Kumar, A. (2020, January 1\u20133). Deep Reinforcement Learning based Multi-Objective Systems for Financial Trading. Proceedings of the 2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE), Online.","DOI":"10.1109\/ICRAIE51050.2020.9358319"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"114632","DOI":"10.1016\/j.eswa.2021.114632","article-title":"An application of deep reinforcement learning to algorithmic trading","volume":"173","author":"Ernst","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Bu, S.J., and Cho, S.B. (2018, January 21\u201323). Learning optimal Q-function using deep Boltzmann machine for reliable trading of cryptocurrency. Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Madrid, Spain.","DOI":"10.1007\/978-3-030-03493-1_49"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Cover, T.M. (2011). Universal portfolios. The Kelly Capital Growth Investment Criterion: Theory and Practice, World Scientific.","DOI":"10.1142\/9789814293501_0015"},{"key":"ref_42","unstructured":"Li, B., and Hoi, S.C. (2012). On-line portfolio selection with moving average reversion. arXiv."},{"key":"ref_43","unstructured":"Moon, S.H., Kim, Y.H., and Moon, B.R. (2019). Empirical investigation of state-of-the-art mean reversion strategies for equity markets. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1086\/294846","article-title":"Mutual fund performance","volume":"39","author":"Sharpe","year":"1966","journal-title":"J. Bus."},{"key":"ref_45","unstructured":"Moody, J., and Wu, L. (1997, January 24\u201325). Optimization of trading systems and portfolios. Proceedings of the IEEE\/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr), New York, NY, USA."},{"key":"ref_46","unstructured":"Gran, P.K., Holm, A.J.K., and S\u00f8g\u00e5rd, S.G. (2019). A Deep Reinforcement Learning Approach to Stock Trading. [Master\u2019s Thesis, NTNU]."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Yang, H., Liu, X.Y., Zhong, S., and Walid, A. (2020). Deep reinforcement learning for automated stock trading: An ensemble strategy. SSRN.","DOI":"10.2139\/ssrn.3690996"},{"key":"ref_48","unstructured":"Magdon-Ismail, M., and Atiya, A.F. (2015). An analysis of the maximum drawdown risk measure. Citeseer."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_50","unstructured":"Li, Y. (2017). Deep reinforcement learning: An overview. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Mousavi, S.S., Schukat, M., and Howley, E. (2016, January 21\u201322). Deep reinforcement learning: An overview. Proceedings of the SAI Intelligent Systems Conference, London, UK.","DOI":"10.1007\/978-3-319-56991-8_32"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1109\/MSP.2017.2743240","article-title":"Deep reinforcement learning: A brief survey","volume":"34","author":"Arulkumaran","year":"2017","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Narasimhan, K., Kulkarni, T., and Barzilay, R. (2015). Language understanding for text-based games using deep reinforcement learning. arXiv.","DOI":"10.18653\/v1\/D15-1001"},{"key":"ref_54","unstructured":"Foerster, J.N., Assael, Y.M., de Freitas, N., and Whiteson, S. (2016). Learning to communicate to solve riddles with deep distributed recurrent q-networks. arXiv."},{"key":"ref_55","unstructured":"Heravi, J.R. (2019). Learning Representations in Reinforcement Learning, University of California."},{"key":"ref_56","unstructured":"Stooke, A., Lee, K., Abbeel, P., and Laskin, M. (2021, January 18\u201324). Decoupling representation learning from reinforcement learning. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_57","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Grefenstette, E., Blunsom, P., De Freitas, N., and Hermann, K.M. (2014). A deep architecture for semantic parsing. arXiv.","DOI":"10.3115\/v1\/W14-2405"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing, T., Yang, M., Tong, J., and Zhang, Q. (2019, January 4\u20138). Time-series anomaly detection service at Microsoft. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330680"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1016\/j.neucom.2020.03.011","article-title":"Probabilistic forecasting with temporal convolutional neural network","volume":"399","author":"Chen","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_61","unstructured":"Yashaswi, K. (2021). Deep Reinforcement Learning for Portfolio Optimization using Latent Feature State Space (LFSS) Module. arXiv."},{"key":"ref_62","unstructured":"(2021, June 21). Technical Indicators. Available online: https:\/\/www.tradingtechnologies.com\/xtrader-help\/x-study\/technical-indicator-definitions\/list-of-technical-indicators\/."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/j.ins.2020.05.066","article-title":"Adaptive stock trading strategies with deep reinforcement learning methods","volume":"538","author":"Wu","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_64","unstructured":"Chakraborty, S. (2019). Capturing financial markets to apply deep reinforcement learning. arXiv."},{"key":"ref_65","unstructured":"Jia, W., Chen, W., Xiong, L., and Hongyong, S. (2019, January 14\u201319). Quantitative trading on stock market based on deep reinforcement learning. Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Rundo, F. (2019). Deep LSTM with reinforcement learning layer for financial trend prediction in FX high frequency trading systems. Appl. Sci., 9.","DOI":"10.3390\/app9204460"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"130","DOI":"10.3390\/axioms9040130","article-title":"Deep reinforcement learning agent for S&P 500 stock selection","volume":"9","author":"Huotari","year":"2020","journal-title":"Axioms"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.neunet.2021.02.026","article-title":"Diversity-driven knowledge distillation for financial trading using Deep Reinforcement Learning","volume":"140","author":"Tsantekidis","year":"2021","journal-title":"Neural Netw."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Lucarelli, G., and Borrotti, M. (2019, January 24\u201326). A deep reinforcement learning approach for automated cryptocurrency trading. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Crete, Greece.","DOI":"10.1007\/978-3-030-19823-7_20"},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"8119","DOI":"10.1007\/s10489-021-02262-0","article-title":"Portfolio management system in equity market neutral using reinforcement learning","volume":"51","author":"Wu","year":"2021","journal-title":"Appl. Intell."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1016\/j.neucom.2020.04.004","article-title":"Portfolio trading system of digital currencies: A deep reinforcement learning with multidimensional attention gating mechanism","volume":"402","author":"Weng","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_72","unstructured":"Suri, K., Shi, X.Q., Plataniotis, K., and Lawryshyn, Y. (2021). TradeR: Practical Deep Hierarchical Reinforcement Learning for Trade Execution. arXiv."},{"key":"ref_73","unstructured":"Wei, H., Wang, Y., Mangu, L., and Decker, K. (2019). Model-based reinforcement learning for predictions and control for limit order books. arXiv."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Leem, J., and Kim, H.Y. (2020). Action-specialized expert ensemble trading system with extended discrete action space using deep reinforcement learning. PLoS ONE, 15.","DOI":"10.1371\/journal.pone.0236178"},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.eswa.2018.09.036","article-title":"Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning","volume":"117","author":"Jeong","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"112872","DOI":"10.1016\/j.eswa.2019.112872","article-title":"Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading","volume":"140","author":"Lei","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"114553","DOI":"10.1016\/j.eswa.2020.114553","article-title":"Deep reinforcement learning based trading agents: Risk curiosity driven learning for financial rules-based policy","volume":"170","author":"Hirchoua","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_78","unstructured":"Deisenroth, M., and Rasmussen, C.E. (July, January 28). PILCO: A model-based and data-efficient approach to policy search. Proceedings of the 28th International Conference on machine learning (ICML-11), Citeseer, Bellevue, WA, USA."},{"key":"ref_79","first-page":"3537","article-title":"Model-based relative entropy stochastic search","volume":"28","author":"Abdolmaleki","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_80","unstructured":"Levine, S., and Koltun, V. (2013, January 16\u201321). Guided policy search. Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1038\/nature14540","article-title":"Reinforcement learning improves behaviour from evaluative feedback","volume":"521","author":"Littman","year":"2015","journal-title":"Nature"},{"key":"ref_82","first-page":"3","article-title":"Autoencoders, minimum description length, and Helmholtz free energy","volume":"6","author":"Hinton","year":"1994","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_83","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_84","unstructured":"Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., and Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv."},{"key":"ref_85","unstructured":"Xu, Z., van Hasselt, H., and Silver, D. (2018). Meta-gradient reinforcement learning. arXiv."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"106622","DOI":"10.1016\/j.knosys.2020.106622","article-title":"AutoML: A Survey of the State-of-the-Art","volume":"212","author":"He","year":"2021","journal-title":"Knowl.-Based Syst."},{"key":"ref_87","unstructured":"Zhang, Z. (2020). Hierarchical Modelling for Financial Data. [Ph.D. Thesis, University of Oxford]."},{"key":"ref_88","unstructured":"Filos, A. (2019). Reinforcement Learning for Portfolio Management. [Master\u2019s Thesis, Imperial College London]."},{"key":"ref_89","unstructured":"De Quinones, P.C.F., Perez-Muelas, V.L., and Mari, J.M. Reinforcement Learning in Stock Market. [Master\u2019s Thesis, University of Valencia]."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/11\/119\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:31:18Z","timestamp":1760167878000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/11\/119"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,16]]},"references-count":89,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["data6110119"],"URL":"https:\/\/doi.org\/10.3390\/data6110119","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202111.0044.v1","asserted-by":"object"}]},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,16]]}}}