{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T23:28:50Z","timestamp":1767828530142,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T00:00:00Z","timestamp":1665964800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,17]]},"DOI":"10.1145\/3511808.3557412","type":"proceedings-article","created":{"date-parts":[[2022,10,16]],"date-time":"2022-10-16T01:22:22Z","timestamp":1665883342000},"page":"406-415","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Optimal Action Space Search"],"prefix":"10.1145","author":[{"given":"Zhongjie","family":"Duan","sequence":"first","affiliation":[{"name":"East China Normal University, Shanghai, China"}]},{"given":"Cen","family":"Chen","sequence":"additional","affiliation":[{"name":"East China Normal University, Shanghai, China"}]},{"given":"Dawei","family":"Cheng","sequence":"additional","affiliation":[{"name":"Tongji University, Shanghai, China"}]},{"given":"Yuqi","family":"Liang","sequence":"additional","affiliation":[{"name":"Seek Data Group, Emoney Inc., Shanghai, China"}]},{"given":"Weining","family":"Qian","sequence":"additional","affiliation":[{"name":"East China Normal University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,17]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-42297-8_40"},{"key":"e_1_3_2_2_2_1","volume-title":"Anvitha GK Bhat, and Mamatha V Jadhav","author":"Azhikodan Akhil Raj","year":"2019","unstructured":"Akhil Raj Azhikodan , Anvitha GK Bhat, and Mamatha V Jadhav . 2019 . Stock trading bot using deep reinforcement learning. In Innovations in Computer Science and Engineering. Springer , 41--49. Akhil Raj Azhikodan, Anvitha GK Bhat, and Mamatha V Jadhav. 2019. Stock trading bot using deep reinforcement learning. In Innovations in Computer Science and Engineering. Springer, 41--49."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jedc.2010.01.015"},{"key":"e_1_3_2_2_4_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","volume":"4","author":"Chaslot Guillaume","year":"2008","unstructured":"Guillaume Chaslot , Sander Bakkes , Istvan Szita , and Pieter Spronck . 2008 . Monte-carlo tree search: A new framework for game ai . In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , Vol. 4 . 216--217. Guillaume Chaslot, Sander Bakkes, Istvan Szita, and Pieter Spronck. 2008. Monte-carlo tree search: A new framework for game ai. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 4. 216--217."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSESS47205.2019.9040728"},{"key":"e_1_3_2_2_6_1","volume-title":"International conference on computer science, applied mathematics and applications. Springer, 311--322","author":"Dang Quang-Vinh","year":"2019","unstructured":"Quang-Vinh Dang . 2019 . Reinforcement learning in stock trading . In International conference on computer science, applied mathematics and applications. Springer, 311--322 . Quang-Vinh Dang. 2019. Reinforcement learning in stock trading. In International conference on computer science, applied mathematics and applications. Springer, 311--322."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/0047-2727(79)90043-4"},{"key":"e_1_3_2_2_8_1","volume-title":"Deep direct reinforcement learning for financial signal representation and trading","author":"Deng Yue","year":"2016","unstructured":"Yue Deng , Feng Bao , Youyong Kong , Zhiquan Ren , and Qionghai Dai . 2016. Deep direct reinforcement learning for financial signal representation and trading . IEEE transactions on neural networks and learning systems, Vol. 28 , 3 ( 2016 ), 653--664. Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. 2016. Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, Vol. 28, 3 (2016), 653--664."},{"key":"e_1_3_2_2_9_1","volume-title":"John JF Sherrerd, et al","author":"Dixit Avinash K","year":"1990","unstructured":"Avinash K Dixit , John JF Sherrerd, et al . 1990 . Optimization in economic theory. Oxford University Press on Demand. Avinash K Dixit, John JF Sherrerd, et al. 1990. Optimization in economic theory. Oxford University Press on Demand."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1002\/asmb.2399"},{"key":"e_1_3_2_2_11_1","volume-title":"Universal trading for order execution with oracle policy distillation. arXiv preprint arXiv:2103.10860","author":"Fang Yuchen","year":"2021","unstructured":"Yuchen Fang , Kan Ren , Weiqing Liu , Dong Zhou , Weinan Zhang , Jiang Bian , Yong Yu , and Tie-Yan Liu . 2021. Universal trading for order execution with oracle policy distillation. arXiv preprint arXiv:2103.10860 ( 2021 ). Yuchen Fang, Kan Ren, Weiqing Liu, Dong Zhou, Weinan Zhang, Jiang Bian, Yong Yu, and Tie-Yan Liu. 2021. Universal trading for order execution with oracle policy distillation. arXiv preprint arXiv:2103.10860 (2021)."},{"key":"e_1_3_2_2_13_1","volume-title":"Game theory","author":"Fudenberg Drew","unstructured":"Drew Fudenberg and Jean Tirole . 1991. Game theory . MIT press . Drew Fudenberg and Jean Tirole. 1991. Game theory. MIT press."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.05.050"},{"key":"e_1_3_2_2_15_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_2_16_1","volume-title":"Financial trading as a game: A deep reinforcement learning approach. arXiv preprint arXiv:1807.02787","author":"Huang Chien Yi","year":"2018","unstructured":"Chien Yi Huang . 2018. Financial trading as a game: A deep reinforcement learning approach. arXiv preprint arXiv:1807.02787 ( 2018 ). Chien Yi Huang. 2018. Financial trading as a game: A deep reinforcement learning approach. arXiv preprint arXiv:1807.02787 (2018)."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2018.09.036"},{"key":"e_1_3_2_2_18_1","volume-title":"2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.","author":"Jia WU","year":"2019","unstructured":"WU Jia , WANG Chen , Lidong Xiong , and SUN Hongyong . 2019 . Quantitative trading on stock market based on deep reinforcement learning . In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8. WU Jia, WANG Chen, Lidong Xiong, and SUN Hongyong. 2019. Quantitative trading on stock market based on deep reinforcement learning. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/IntelliSys.2017.8324237"},{"key":"e_1_3_2_2_20_1","volume-title":"Nature","volume":"596","author":"Jumper John","year":"2021","unstructured":"John Jumper , Richard Evans , Alexander Pritzel , Tim Green , Michael Figurnov , Olaf Ronneberger , Kathryn Tunyasuvunakool , Russ Bates , Augustin \u017didek , Anna Potapenko , 2021 . Highly accurate protein structure prediction with AlphaFold . Nature , Vol. 596 , 7873 (2021), 583--589. John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin \u017didek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature, Vol. 596, 7873 (2021), 583--589."},{"key":"e_1_3_2_2_21_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_2_22_1","volume-title":"Actor-critic algorithms. Advances in neural information processing systems","author":"Konda Vijay","year":"1999","unstructured":"Vijay Konda and John Tsitsiklis . 1999. Actor-critic algorithms. Advances in neural information processing systems , Vol. 12 ( 1999 ). Vijay Konda and John Tsitsiklis. 1999. Actor-critic algorithms. Advances in neural information processing systems, Vol. 12 (1999)."},{"key":"e_1_3_2_2_23_1","volume-title":"Visualizing the loss landscape of neural nets. Advances in neural information processing systems","author":"Li Hao","year":"2018","unstructured":"Hao Li , Zheng Xu , Gavin Taylor , Christoph Studer , and Tom Goldstein . 2018. Visualizing the loss landscape of neural nets. Advances in neural information processing systems , Vol. 31 ( 2018 ). Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the loss landscape of neural nets. Advances in neural information processing systems, Vol. 31 (2018)."},{"key":"e_1_3_2_2_24_1","volume-title":"Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940","author":"Liang Zhipeng","year":"2018","unstructured":"Zhipeng Liang , Hao Chen , Junhao Zhu , Kangkang Jiang , and Yanran Li. 2018. Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940 ( 2018 ). Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, and Yanran Li. 2018. Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940 (2018)."},{"key":"e_1_3_2_2_25_1","unstructured":"Timothy P Lillicrap Jonathan J Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In ICLR (Poster).  Timothy P Lillicrap Jonathan J Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In ICLR (Poster)."},{"key":"e_1_3_2_2_26_1","volume-title":"The effects of memory replay in reinforcement learning. In 2018 56th annual allerton conference on communication, control, and computing (Allerton)","author":"Liu Ruishan","unstructured":"Ruishan Liu and James Zou . 2018. The effects of memory replay in reinforcement learning. In 2018 56th annual allerton conference on communication, control, and computing (Allerton) . IEEE , 478--485. Ruishan Liu and James Zou. 2018. The effects of memory replay in reinforcement learning. In 2018 56th annual allerton conference on communication, control, and computing (Allerton). IEEE, 478--485."},{"key":"e_1_3_2_2_27_1","volume-title":"Christina Dan Wang, and Jian Guo","author":"Liu Xiao-Yang","year":"2021","unstructured":"Xiao-Yang Liu , Jingyang Rui , Jiechao Gao , Liuqing Yang , Hongyang Yang , Zhaoran Wang , Christina Dan Wang, and Jian Guo . 2021 . FinRL-Meta: A Universe of Near-Real Market Environments for Data-Driven Deep Reinforcement Learning in Quantitative Finance . arXiv preprint arXiv:2112.06753 (2021). Xiao-Yang Liu, Jingyang Rui, Jiechao Gao, Liuqing Yang, Hongyang Yang, Zhaoran Wang, Christina Dan Wang, and Jian Guo. 2021. FinRL-Meta: A Universe of Near-Real Market Environments for Data-Driven Deep Reinforcement Learning in Quantitative Finance. arXiv preprint arXiv:2112.06753 (2021)."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"crossref","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei A Rusu Joel Veness Marc G Bellemare Alex Graves Martin Riedmiller Andreas K Fidjeland Georg Ostrovski etal 2015. Human-level control through deep reinforcement learning. nature Vol. 518 7540 (2015) 529--533.  Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei A Rusu Joel Veness Marc G Bellemare Alex Graves Martin Riedmiller Andreas K Fidjeland Georg Ostrovski et al. 2015. Human-level control through deep reinforcement learning. nature Vol. 518 7540 (2015) 529--533.","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_2_29_1","first-page":"1","article-title":"Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey","volume":"21","author":"Narvekar Sanmit","year":"2020","unstructured":"Sanmit Narvekar , Bei Peng , Matteo Leonetti , Jivko Sinapov , Matthew E Taylor , and Peter Stone . 2020 . Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey . Journal of Machine Learning Research , Vol. 21 (2020), 1 -- 50 . Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E Taylor, and Peter Stone. 2020. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. Journal of Machine Learning Research, Vol. 21 (2020), 1--50.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.36.1.48"},{"key":"e_1_3_2_2_31_1","volume-title":"Forecasting the equity risk premium: the role of technical indicators. Management science","author":"Neely Christopher J","year":"2014","unstructured":"Christopher J Neely , David E Rapach , Jun Tu , and Guofu Zhou . 2014. Forecasting the equity risk premium: the role of technical indicators. Management science , Vol. 60 , 7 ( 2014 ), 1772--1791. Christopher J Neely, David E Rapach, Jun Tu, and Guofu Zhou. 2014. Forecasting the equity risk premium: the role of technical indicators. Management science, Vol. 60, 7 (2014), 1772--1791."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106384"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-405X(88)90021-9"},{"key":"e_1_3_2_2_34_1","volume-title":"Deep reinforcement learning in quantitative algorithmic trading: A review. arXiv preprint arXiv:2106.00123","author":"Pricope Tidor-Vlad","year":"2021","unstructured":"Tidor-Vlad Pricope . 2021. Deep reinforcement learning in quantitative algorithmic trading: A review. arXiv preprint arXiv:2106.00123 ( 2021 ). Tidor-Vlad Pricope. 2021. Deep reinforcement learning in quantitative algorithmic trading: A review. arXiv preprint arXiv:2106.00123 (2021)."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.18626"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.3390\/app9204460"},{"key":"e_1_3_2_2_38_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 ( 2017 ). John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.275.5306.1593"},{"key":"e_1_3_2_2_40_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al.","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J Maddison , Arthur Guez , Laurent Sifre , George Van Den Driessche , Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016 . Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529 , 7587 (2016), 484--489. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484--489."},{"key":"e_1_3_2_2_41_1","volume-title":"Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research","author":"Srivastava Nitish","year":"2014","unstructured":"Nitish Srivastava , Geoffrey Hinton , Alex Krizhevsky , Ilya Sutskever , and Ruslan Salakhutdinov . 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research , Vol. 15 , 1 ( 2014 ), 1929--1958. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, Vol. 15, 1 (2014), 1929--1958."},{"key":"e_1_3_2_2_42_1","volume-title":"Reinforcement learning: An introduction","author":"Sutton Richard S","unstructured":"Richard S Sutton and Andrew G Barto . 2018. Reinforcement learning: An introduction . MIT press . Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbankfin.2009.08.004"},{"key":"e_1_3_2_2_44_1","article-title":"Visualizing data using t-SNE","volume":"9","author":"der Maaten Laurens Van","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton . 2008 . Visualizing data using t-SNE . Journal of machine learning research , Vol. 9 , 11 (2008). Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).","journal-title":"Journal of machine learning research"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i5.16568"},{"key":"e_1_3_2_2_46_1","volume-title":"The optimal reward baseline for gradient-based reinforcement learning. arXiv preprint arXiv:1301.2315","author":"Weaver Lex","year":"2013","unstructured":"Lex Weaver and Nigel Tao . 2013. The optimal reward baseline for gradient-based reinforcement learning. arXiv preprint arXiv:1301.2315 ( 2013 ). Lex Weaver and Nigel Tao. 2013. The optimal reward baseline for gradient-based reinforcement learning. arXiv preprint arXiv:1301.2315 (2013)."},{"key":"e_1_3_2_2_47_1","volume-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning","author":"Williams Ronald J","year":"1992","unstructured":"Ronald J Williams . 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning , Vol. 8 , 3 ( 1992 ), 229--256. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3 (1992), 229--256."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2020.05.066"},{"key":"e_1_3_2_2_49_1","volume-title":"Practical deep reinforcement learning approach for stock trading. arXiv preprint arXiv:1811.07522","author":"Xiong Zhuoran","year":"2018","unstructured":"Zhuoran Xiong , Xiao-Yang Liu , Shan Zhong , Hongyang Yang , and Anwar Walid . 2018. Practical deep reinforcement learning approach for stock trading. arXiv preprint arXiv:1811.07522 ( 2018 ). Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, and Anwar Walid. 2018. Practical deep reinforcement learning approach for stock trading. arXiv preprint arXiv:1811.07522 (2018)."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2019.2907260"},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.3905\/jfds.2020.1.030"}],"event":{"name":"CIKM '22: The 31st ACM International Conference on Information and Knowledge Management","location":"Atlanta GA USA","acronym":"CIKM '22","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 31st ACM International Conference on Information &amp; Knowledge Management"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511808.3557412","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3511808.3557412","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:55Z","timestamp":1750182535000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511808.3557412"}},"subtitle":["An Effective Deep Reinforcement Learning Method for Algorithmic Trading"],"short-title":[],"issued":{"date-parts":[[2022,10,17]]},"references-count":49,"alternative-id":["10.1145\/3511808.3557412","10.1145\/3511808"],"URL":"https:\/\/doi.org\/10.1145\/3511808.3557412","relation":{},"subject":[],"published":{"date-parts":[[2022,10,17]]},"assertion":[{"value":"2022-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}