{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T16:57:17Z","timestamp":1762361837717,"version":"build-2065373602"},"reference-count":16,"publisher":"Institution of Engineering and Technology (IET)","issue":"1","license":[{"start":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T00:00:00Z","timestamp":1721260800000},"content-version":"vor","delay-in-days":199,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004921","name":"Shanghai Jiao Tong University","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004921","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Signal Processing"],"published-print":{"date-parts":[[2024,1]]},"abstract":"<jats:p>Reinforcement learning (RL) has been applied to financial portfolio management in recent years. Current studies mostly focus on profit accumulation without much consideration of risk. Some risk\u2010return balanced studies extract features from price and volume data only, which is highly correlated and missing representation of risk features. To tackle these problems, we propose a weight control unit (WCU) to effectively manage the position of portfolio management in different market statuses. A loss penalty term is also designed in the reward function to prevent sharp drawdown during trading. Moreover, stock spatial interrelation representing the correlation between two different stocks is captured by a graph convolution network based on fundamental data. Temporal interrelation is also captured by a temporal convolutional network based on new factors designed with price and volume data. Both spatial and temporal interrelation work for better feature extraction from historical data and also make the model more interpretable. Finally, a deep deterministic policy gradient actor\u2013critic RL is applied to explore optimal policy in portfolio management. We conduct our approach in a challenging non\u2010short\u2010selling market, and the experiment results show that our method outperforms the state\u2010of\u2010the\u2010art methods in both profit and risk criteria. Specifically, with 6.72% improvement on an annualized rate of return, 7.72% decrease in maximum drawdown, and a better annualized Sharpe ratio of 0.112. Also, the loss penalty and WCU provide new aspects for future work in risk control.<\/jats:p>","DOI":"10.1049\/2024\/5399392","type":"journal-article","created":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T00:34:49Z","timestamp":1721349289000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["A Deep Reinforcement Learning Approach for Portfolio Management in Non\u2010Short\u2010Selling Market"],"prefix":"10.1049","volume":"2024","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0058-4088","authenticated-orcid":false,"given":"Ruidan","family":"Su","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3449-016X","authenticated-orcid":false,"given":"Chun","family":"Chi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6270-0449","authenticated-orcid":false,"given":"Shikui","family":"Tu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2752-1573","authenticated-orcid":false,"given":"Lei","family":"Xu","sequence":"additional","affiliation":[]}],"member":"265","published-online":{"date-parts":[[2024,7,18]]},"reference":[{"key":"e_1_2_8_1_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-012-5281-z"},{"key":"e_1_2_8_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbankfin.2007.04.008"},{"key":"e_1_2_8_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-9236(03)00082-4"},{"key":"e_1_2_8_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-50502-8_10"},{"key":"e_1_2_8_5_2","unstructured":"FengF. ChenH. HeX. DingJ. SunM. andChuaT.-S. Enhancing stock movement prediction with adversarial training 2018 arXiv preprint arXiv: 1810.09936."},{"key":"e_1_2_8_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2016.2522401"},{"key":"e_1_2_8_7_2","doi-asserted-by":"crossref","unstructured":"WangZ. HuangB. TuS. ZhangK. andXuL. Deeptrader: a deep reinforcement learning approach for risk-return balanced portfolio management with market conditions embedding 35 Proceedings of the AAAI Conference on Artificial Intelligence May 2021 AAAI 643\u2013650.","DOI":"10.1609\/aaai.v35i1.16144"},{"key":"e_1_2_8_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.115127"},{"key":"e_1_2_8_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13748-020-00225-z"},{"key":"e_1_2_8_10_2","first-page":"236","article-title":"Cost-sensitive portfolio selection via deep reinforcement learning","volume":"34","author":"Zhang Y.","year":"2022","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_8_11_2","doi-asserted-by":"crossref","unstructured":"PigorschU.andSch\u00e4ferS. High-dimensional stock portfolio trading with deep reinforcement learning 2022 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr) 2022 IEEE 1\u20138.","DOI":"10.1109\/CIFEr52523.2022.9776121"},{"key":"e_1_2_8_12_2","doi-asserted-by":"crossref","unstructured":"YangH. LiuX.-Y. ZhongS. andWalidA. Deep reinforcement learning for automated stock trading: an ensemble strategy Proceedings of the First ACM International Conference on AI in Finance October 2020 New York NY United States Association for Computing Machinery 1\u20138.","DOI":"10.1145\/3383455.3422540"},{"key":"e_1_2_8_13_2","doi-asserted-by":"crossref","unstructured":"WangJ. ZhangY. TangK. WuJ. andXiongZ. Alphastock: a buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining August 2019 New York NY United States Association for Computing Machinery 1900\u20131908.","DOI":"10.1145\/3292500.3330647"},{"key":"e_1_2_8_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.10.092"},{"key":"e_1_2_8_15_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40535-018-0052-y"},{"key":"e_1_2_8_16_2","unstructured":"WindData Wind data 2022 Accessed May 2022https:\/\/www.wind.com.cn\/."}],"container-title":["IET Signal Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/2024\/5399392","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T16:53:06Z","timestamp":1762361586000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/2024\/5399392"}},"subtitle":[],"editor":[{"given":"Tianyuan","family":"Liu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,1]]},"references-count":16,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1]]}},"alternative-id":["10.1049\/2024\/5399392"],"URL":"https:\/\/doi.org\/10.1049\/2024\/5399392","archive":["Portico"],"relation":{},"ISSN":["1751-9675","1751-9683"],"issn-type":[{"type":"print","value":"1751-9675"},{"type":"electronic","value":"1751-9683"}],"subject":[],"published":{"date-parts":[[2024,1]]},"assertion":[{"value":"2023-12-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"5399392"}}