{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T12:45:25Z","timestamp":1780317925411,"version":"3.54.1"},"reference-count":51,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,3,17]],"date-time":"2024-03-17T00:00:00Z","timestamp":1710633600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,17]],"date-time":"2024-03-17T00:00:00Z","timestamp":1710633600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep reinforcement learning (DRL) can be used to extract deep features that can be incorporated into reinforcement learning systems to enable improved decision-making; DRL can therefore also be used for managing stock portfolios. Traditional methods cannot fully exploit the advantages of DRL because they are generally based on real-time stock quotes, which do not have sufficient features for making comprehensive decisions. In this study, in addition to stock quotes, we introduced stock financial indices as additional stock features. Moreover, we used Markowitz mean-variance theory for determining stock correlation. A three-agent deep reinforcement learning model called Collaborative Multi-agent reinforcement learning-based stock Portfolio management System (CMPS) was designed and trained based on fused data. In CMPS, each agent was implemented with a deep Q-network to obtain the features of time-series stock data, and a self-attention network was used to combine the output of each agent. We added a risk-free asset strategy to CMPS to prevent risks and referred to this model as CMPS-Risk Free (CMPS-RF). We conducted experiments under different market conditions using the stock data of China Shanghai Stock Exchange 50 and compared our model with the state-of-the-art models. The results showed that CMPS could obtain better profits than the compared benchmark models, and CMPS-RF was able to accurately recognize the market risk and achieved the best Sharpe and Calmar ratios. The study findings are expected to aid in the development of an efficient investment-trading strategy.<\/jats:p>","DOI":"10.1007\/s11063-024-11582-4","type":"journal-article","created":{"date-parts":[[2024,3,17]],"date-time":"2024-03-17T09:01:19Z","timestamp":1710666079000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Deep Reinforcement Learning Model for Stock Portfolio Management Based on Data Fusion"],"prefix":"10.1007","volume":"56","author":[{"given":"Haifeng","family":"Li","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mo","family":"Hai","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,3,17]]},"reference":[{"issue":"4","key":"11582_CR1","doi-asserted-by":"publisher","first-page":"5","DOI":"10.2469\/faj.v55.n4.2281","volume":"55","author":"HM Markowitz","year":"1999","unstructured":"Markowitz HM (1999) The early history of portfolio theory: 1600\u20131960. Financ Anal J 55(4):5\u201316","journal-title":"Financ Anal J"},{"issue":"1","key":"11582_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.jempfin.2005.12.001","volume":"14","author":"A Ang","year":"2007","unstructured":"Ang A, Chen J (2007) Capm over the long run: 1926\u20132001. J Empir Financ 14(1):1\u201340","journal-title":"J Empir Financ"},{"issue":"1","key":"11582_CR3","doi-asserted-by":"publisher","first-page":"75","DOI":"10.2469\/faj.v51.n1.1861","volume":"51","author":"EF Fama","year":"1995","unstructured":"Fama EF (1995) Random walks in stock market prices. Financ Anal J 51(1):75\u201380","journal-title":"Financ Anal J"},{"issue":"5","key":"11582_CR4","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1093\/icc\/11.5.895","volume":"11","author":"JD Farmer","year":"2002","unstructured":"Farmer JD (2002) Market force, ecology and evolution. Ind Corp Chang 11(5):895\u2013953","journal-title":"Ind Corp Chang"},{"key":"11582_CR5","doi-asserted-by":"crossref","unstructured":"Ladosz P, Weng L, Kim M, Oh H (2022) Exploration in deep reinforcement learning: a survey. Inf Fus","DOI":"10.1016\/j.inffus.2022.03.003"},{"issue":"7587","key":"11582_CR6","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den\u00a0Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot m et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484\u2013489","journal-title":"Nature"},{"key":"11582_CR7","unstructured":"Moody J, Saffell M (1998) Reinforcement learning for trading. In: Advances in neural information processing systems, vol 11"},{"issue":"15","key":"11582_CR8","doi-asserted-by":"publisher","first-page":"2121","DOI":"10.1016\/j.ins.2005.10.009","volume":"176","author":"O Jangmin","year":"2006","unstructured":"Jangmin O, Lee J, Lee JW, Zhang B-T (2006) Adaptive stock trading with dynamic asset allocation using reinforcement learning. Inf Sci 176(15):2121\u20132147","journal-title":"Inf Sci"},{"key":"11582_CR9","doi-asserted-by":"crossref","unstructured":"Bertoluzzo F, Corazza M (2007) Making financial trading by recurrent reinforcement learning. In: International conference on knowledge-based and intelligent information and engineering systems. Springer, Berlin, pp 619\u2013626","DOI":"10.1007\/978-3-540-74827-4_78"},{"key":"11582_CR10","doi-asserted-by":"crossref","unstructured":"Maringer D, Ramtohul T (2010) Threshold recurrent reinforcement learning model for automated trading. In: European conference on the applications of evolutionary computation. Springer, Berlin, pp 212\u2013221","DOI":"10.1007\/978-3-642-12242-2_22"},{"issue":"1","key":"11582_CR11","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1007\/s10287-011-0131-1","volume":"9","author":"D Maringer","year":"2012","unstructured":"Maringer D, Ramtohul T (2012) Regime-switching recurrent reinforcement learning for investment decision making. CMS 9(1):89\u2013107","journal-title":"CMS"},{"key":"11582_CR12","doi-asserted-by":"crossref","unstructured":"Bertoluzzo F, Corazza M (2012) Reinforcement learning for automatic financial trading: introduction and some applications. University Ca\u2019Foscari of Venice, Department of Economics Research Paper Series No 33","DOI":"10.2139\/ssrn.2192034"},{"key":"11582_CR13","unstructured":"Du X, Zhai J, Lv K (2016) Algorithm trading using q-learning and recurrent reinforcement learning. Positions 1(1)"},{"key":"11582_CR14","unstructured":"Sutton RS, McAllester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, vol 12"},{"key":"11582_CR15","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1016\/j.dss.2014.04.011","volume":"64","author":"D Eilers","year":"2014","unstructured":"Eilers D, Dunis CL, Mettenheim H-J, Breitner MH (2014) Intelligent trading of seasonal effects: a decision support algorithm based on reinforcement learning. Decis Support Syst 64:100\u2013108","journal-title":"Decis Support Syst"},{"issue":"6","key":"11582_CR16","doi-asserted-by":"publisher","first-page":"1153","DOI":"10.1016\/j.jedc.2010.01.015","volume":"34","author":"SD Bekiros","year":"2010","unstructured":"Bekiros SD (2010) Heterogeneous trading strategies with adaptive fuzzy actor-critic reinforcement learning: A behavioral approach. J Econ Dyn Control 34(6):1153\u20131170","journal-title":"J Econ Dyn Control"},{"issue":"3","key":"11582_CR17","doi-asserted-by":"publisher","first-page":"653","DOI":"10.1109\/TNNLS.2016.2522401","volume":"28","author":"Y Deng","year":"2016","unstructured":"Deng Y, Bao F, Kong Y, Ren Z, Dai Q (2016) Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans Neural Netw Learn Syst 28(3):653\u2013664","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"11582_CR18","unstructured":"Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059"},{"key":"11582_CR19","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971"},{"key":"11582_CR20","unstructured":"O\u2019Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458"},{"key":"11582_CR21","unstructured":"Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint arXiv:1409.2329"},{"issue":"7","key":"11582_CR22","doi-asserted-by":"publisher","first-page":"1235","DOI":"10.1162\/neco_a_01199","volume":"31","author":"Y Yu","year":"2019","unstructured":"Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235\u20131270","journal-title":"Neural Comput"},{"key":"11582_CR23","unstructured":"Xiong Z, Liu X-Y, Zhong S, Yang H, Walid A (2018) Practical deep reinforcement learning approach for stock trading. arXiv preprint arXiv:1811.07522"},{"key":"11582_CR24","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347"},{"key":"11582_CR25","doi-asserted-by":"crossref","unstructured":"Yang H, Liu X-Y, Zhong S, Walid A (2020) Deep reinforcement learning for automated stock trading: an ensemble strategy. In: Proceedings of the first ACM international conference on AI in finance, pp 1\u20138","DOI":"10.1145\/3383455.3422540"},{"key":"11582_CR26","unstructured":"Liang Z, Chen H, Zhu J, Jiang K, Li Y (2018) Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940"},{"key":"11582_CR27","doi-asserted-by":"crossref","unstructured":"Liu X-Y, Yang H, Chen Q, Zhang R, Yang L, Xiao B, Wang CD (2020) FinRL: a deep reinforcement learning library for automated stock trading in quantitative finance. arXiv preprint arXiv:2011.09607","DOI":"10.2139\/ssrn.3737859"},{"key":"11582_CR28","doi-asserted-by":"crossref","unstructured":"Wang J, Zhang Y, Tang K, Wu J, Xiong Z (2019) Alphastock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1900\u20131908","DOI":"10.1145\/3292500.3330647"},{"key":"11582_CR29","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30"},{"key":"11582_CR30","doi-asserted-by":"crossref","unstructured":"Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 701\u2013710","DOI":"10.1145\/2623330.2623732"},{"key":"11582_CR31","doi-asserted-by":"crossref","unstructured":"Ye Y, Pei H, Wang B, Chen P-Y, Zhu Y, Xiao J, Li B (2020) Reinforcement-learning based portfolio management with augmented asset movement prediction states. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 1112\u20131119","DOI":"10.1609\/aaai.v34i01.5462"},{"key":"11582_CR32","doi-asserted-by":"crossref","unstructured":"Daiya D, Lin C (2021) Stock movement prediction and portfolio management via multimodal learning with transformer. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3305\u20133309","DOI":"10.1109\/ICASSP39728.2021.9414893"},{"key":"11582_CR33","doi-asserted-by":"crossref","unstructured":"Wang Z, Huang B, Tu S, Zhang K, Xu L (2021) Deeptrader: a deep reinforcement learning approach for risk-return balanced portfolio management with market conditions embedding. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 643\u2013650","DOI":"10.1609\/aaai.v35i1.16144"},{"key":"11582_CR34","unstructured":"Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122"},{"key":"11582_CR35","doi-asserted-by":"crossref","unstructured":"Wu Z, Pan S, Long G, Jiang J, Zhang C (2019) Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121","DOI":"10.24963\/ijcai.2019\/264"},{"key":"11582_CR36","doi-asserted-by":"crossref","unstructured":"Lee J, Kim R, Yi S-W, Kang J (2020) Maps: Multi-agent reinforcement learning-based portfolio management system. arXiv preprint arXiv:2007.05402","DOI":"10.24963\/ijcai.2020\/623"},{"issue":"7540","key":"11582_CR37","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533","journal-title":"Nature"},{"issue":"2","key":"11582_CR38","doi-asserted-by":"publisher","first-page":"0263689","DOI":"10.1371\/journal.pone.0263689","volume":"17","author":"Z Huang","year":"2022","unstructured":"Huang Z, Tanaka F (2022) MSPM: a modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management. PLoS ONE 17(2):0263689","journal-title":"PLoS ONE"},{"issue":"12","key":"11582_CR39","doi-asserted-by":"publisher","first-page":"7877","DOI":"10.1007\/s00500-021-05801-6","volume":"25","author":"U Pham","year":"2021","unstructured":"Pham U, Luu Q, Tran H (2021) Multi-agent reinforcement learning approach for hedging portfolio problem. Soft Comput 25(12):7877\u20137885","journal-title":"Soft Comput"},{"issue":"1","key":"11582_CR40","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1007\/s10614-020-10038-w","volume":"57","author":"J Lussange","year":"2021","unstructured":"Lussange J, Lazarevich I, Bourgeois-Gironde S, Palminteri S, Gutkin B (2021) Modelling stock markets by multi-agent reinforcement learning. Comput Econ 57(1):113\u2013147","journal-title":"Comput Econ"},{"key":"11582_CR41","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.114517","volume":"169","author":"I Yaman","year":"2021","unstructured":"Yaman I, Dalk\u0131l\u0131\u00e7 TE (2021) A hybrid approach to cardinality constraint portfolio selection problem based on nonlinear neural network and genetic algorithm. Expert Syst Appl 169:114517","journal-title":"Expert Syst Appl"},{"key":"11582_CR42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s11432-020-2894-9","volume":"64","author":"AT Khan","year":"2021","unstructured":"Khan AT, Cao X, Li S, Hu B, Katsikis VN (2021) Quantum beetle antennae search: a novel technique for the constrained portfolio optimization problem. SCIENCE CHINA Inf Sci 64:1\u201314","journal-title":"SCIENCE CHINA Inf Sci"},{"key":"11582_CR43","doi-asserted-by":"crossref","unstructured":"Cao X, Peng C, Zheng Y, Li S, Ha TT, Shutyaev V, Katsikis V, Stanimirovic P (2023) Neural networks for portfolio analysis in high-frequency trading. IEEE Trans Neural Netw Learn Syst","DOI":"10.1109\/TNNLS.2023.3311169"},{"key":"11582_CR44","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.120934","volume":"233","author":"X Cao","year":"2023","unstructured":"Cao X, Francis A, Pu X, Zhang Z, Katsikis V, Stanimirovic P, Brajevic I, Li S (2023) A novel recurrent neural network based online portfolio analysis for high frequency trading. Expert Syst Appl 233:120934","journal-title":"Expert Syst Appl"},{"issue":"7","key":"11582_CR45","doi-asserted-by":"publisher","first-page":"609","DOI":"10.1057\/s41260-019-00145-1","volume":"21","author":"Z Ding","year":"2020","unstructured":"Ding Z, Martin RD, Yang C (2020) Portfolio turnover when IC is time-varying. J Asset Manag 21(7):609\u2013622","journal-title":"J Asset Manag"},{"key":"11582_CR46","unstructured":"Kevin S (2022) Security analysis and portfolio management. PHI Learning Pvt. Ltd."},{"key":"11582_CR47","doi-asserted-by":"crossref","unstructured":"Cao X, Li S (2023) A novel dynamic neural system for nonconvex portfolio optimization with cardinality restrictions. IEEE Trans Syst Man Cybernet Syst 53(11): 6943\u20136952","DOI":"10.1109\/TSMC.2023.3288224"},{"key":"11582_CR48","doi-asserted-by":"publisher","unstructured":"Cao X, Li S (2023) Neural networks for portfolio analysis with cardinality constraints. IEEE Trans Neural Netw Learn Syst. https:\/\/doi.org\/10.1109\/TNNLS.2023.3307192","DOI":"10.1109\/TNNLS.2023.3307192"},{"issue":"8","key":"11582_CR49","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1073\/pnas.38.8.716","volume":"38","author":"R Bellman","year":"1952","unstructured":"Bellman R (1952) On the theory of dynamic programming. Proc Natl Acad Sci 38(8):716\u2013719","journal-title":"Proc Natl Acad Sci"},{"issue":"3","key":"11582_CR50","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJ Watkins","year":"1992","unstructured":"Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3):279\u2013292","journal-title":"Mach Learn"},{"key":"11582_CR51","doi-asserted-by":"crossref","unstructured":"Ross SA (2005) Mutual fund separation in financial theory-the separating distributions, pp 309\u2013356","DOI":"10.1142\/9789812701022_0010"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11582-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11582-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11582-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T20:39:35Z","timestamp":1715891975000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11582-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,17]]},"references-count":51,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["11582"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11582-4","relation":{},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,17]]},"assertion":[{"value":"23 February 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 March 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"No potential conflict of interest was reported by the authors.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"108"}}