{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T15:52:45Z","timestamp":1781884365165,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":16,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,1,5]],"date-time":"2023-01-05T00:00:00Z","timestamp":1672876800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,1,5]]},"DOI":"10.1145\/3583788.3583802","type":"proceedings-article","created":{"date-parts":[[2023,6,4]],"date-time":"2023-06-04T22:12:05Z","timestamp":1685916725000},"page":"96-101","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Joint Action Representation and Prioritized Experience Replay for Reinforcement Learning in Large Discrete Action Spaces"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0109-3360","authenticated-orcid":false,"given":"Xueyu","family":"Wei","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Anhui University of Technology, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0082-462X","authenticated-orcid":false,"given":"Wei","family":"Xue","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Anhui University of Technology, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9799-4635","authenticated-orcid":false,"given":"Wei","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Anhui University of Technology, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1184-1270","authenticated-orcid":false,"given":"Yuanxia","family":"Shen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Anhui University of Technology, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1546-3951","authenticated-orcid":false,"given":"Gaohang","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Science, Hangzhou Dianzi University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,6,4]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[\n  1\n  ]  Barth-Maron G. Hoffman M. W. Budden D. Dabney W. Horgan D. Tb D. Muldal A. Heess N. and Lillicrap T. 2018. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617.  [1] Barth-Maron G. Hoffman M. W. Budden D. Dabney W. Horgan D. Tb D. Muldal A. Heess N. and Lillicrap T. 2018. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"[\n  2\n  ]  Clifton J. and Laber E. 2020. Q-learning: theory and applications. Annual Review of Statistics and Its Application. Vol. 7.pp.279\u2013301.  [2] Clifton J. and Laber E. 2020. Q-learning: theory and applications. Annual Review of Statistics and Its Application. Vol. 7.pp.279\u2013301.","DOI":"10.1146\/annurev-statistics-031219-041220"},{"key":"e_1_3_2_1_3_1","unstructured":"[\n  3\n  ]  Dulac-Arnold G. Evans R. Van Hasselt H. Sunehag P. Lillicrap T. Hunt J. Mann T. Weber T. Degris T. and Coppin B. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679.  [3] Dulac-Arnold G. Evans R. Van Hasselt H. Sunehag P. Lillicrap T. Hunt J. Mann T. Weber T. Degris T. and Coppin B. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"[\n  4\n  ]  Gao C. Lei W. He X. de Rijke M. and Chua T.S. 2021. Advances and challenges in conversational recommender systems: A survey. AI Open. Vol. 2.pp.100\u2013126.  [4] Gao C. Lei W. He X. de Rijke M. and Chua T.S. 2021. Advances and challenges in conversational recommender systems: A survey. AI Open. Vol. 2.pp.100\u2013126.","DOI":"10.1016\/j.aiopen.2021.06.002"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2012.2218595"},{"key":"e_1_3_2_1_6_1","first-page":"321","volume-title":"Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics.","author":"Hou Y.","unstructured":"[ 6 ] Hou , Y. , Liu , L. , Wei , Q. , Xu , X. and Chen , C ., 2017. A novel DDPG method with prioritized experience replay . In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics. pp.316\u2013 321 . [6] Hou, Y., Liu, L., Wei, Q., Xu, X. and Chen, C., 2017. A novel DDPG method with prioritized experience replay. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics. pp.316\u2013321."},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of The Network and Distributed System Security Symposium.","author":"Kwon Y.","unstructured":"[ 7 ] Kwon , Y. , Saltaformaggio , B. , Kim , I.L. , Lee , K.H. , Zhang , X. and Xu , D ., 2017. A2c: Self destructing exploit executions via input perturbation . In Proceedings of The Network and Distributed System Security Symposium. [7] Kwon, Y., Saltaformaggio, B., Kim, I.L., Lee, K.H., Zhang, X. and Xu, D., 2017. A2c: Self destructing exploit executions via input perturbation. In Proceedings of The Network and Distributed System Security Symposium."},{"key":"e_1_3_2_1_8_1","unstructured":"[\n  8\n  ]  Lillicrap T.P. Hunt J.J. Pritzel A. Heess N. Erez T. Tassa Y. Silver D. and Wierstra D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.  [8] Lillicrap T.P. Hunt J.J. Pritzel A. Heess N. Erez T. Tassa Y. Silver D. and Wierstra D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971."},{"key":"e_1_3_2_1_9_1","volume-title":"Finrl: A deep reinforcement learning library for automated stock trading in quantitative finance. arXiv preprint arXiv:2011.09607.","author":"Liu X.Y.","year":"2020","unstructured":"[ 9 ] Liu , X.Y. , Yang , H. , Chen , Q. , Zhang , R. , Yang , L. , Xiao , B. and Wang , C.D. , 2020 . Finrl: A deep reinforcement learning library for automated stock trading in quantitative finance. arXiv preprint arXiv:2011.09607. [9] Liu, X.Y., Yang, H., Chen, Q., Zhang, R., Yang, L., Xiao, B. and Wang, C.D., 2020. Finrl: A deep reinforcement learning library for automated stock trading in quantitative finance. arXiv preprint arXiv:2011.09607."},{"key":"e_1_3_2_1_10_1","unstructured":"[\n  10\n  ]  Melis G. Dyer C. and Blunsom P. 2017. On the state of the art of evaluation in neural language models. arXiv preprint arXiv:1707.05589.  [10] Melis G. Dyer C. and Blunsom P. 2017. On the state of the art of evaluation in neural language models. arXiv preprint arXiv:1707.05589."},{"key":"e_1_3_2_1_11_1","unstructured":"[\n  11\n  ]  Mnih V. Kavukcuoglu K. Silver D. Graves A. Antonoglou I. Wierstra D. and Riedmiller M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.  [11] Mnih V. Kavukcuoglu K. Silver D. Graves A. Antonoglou I. Wierstra D. and Riedmiller M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602."},{"key":"e_1_3_2_1_12_1","volume-title":"Markov decision processes: discrete stochastic dynamic programming","author":"Puterman M.L.","unstructured":"[ 12 ] Puterman , M.L. , 2014. Markov decision processes: discrete stochastic dynamic programming . John Wiley & Sons, Inc. , New York . [12] Puterman, M.L., 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, Inc., New York."},{"key":"e_1_3_2_1_13_1","unstructured":"[\n  13\n  ]  Schaul T. Quan J. Antonoglou I. and Silver D. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952.  [13] Schaul T. Quan J. Antonoglou I. and Silver D. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952."},{"key":"e_1_3_2_1_14_1","unstructured":"[\n  14\n  ]  Schulman J. Wolski F. Dhariwal P. Radford A. and Klimov O. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.  [14] Schulman J. Wolski F. Dhariwal P. Radford A. and Klimov O. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347."},{"key":"e_1_3_2_1_15_1","first-page":"4131","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a032","author":"Tavakoli A.","unstructured":"[ 15 ] Tavakoli , A. , Pardo , F. and Kormushev , P ., 2018. Action branching architectures for deep reinforcement learning . In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a032 . pp. 4131 - 4138 . [15] Tavakoli, A., Pardo, F. and Kormushev, P., 2018. Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a032. pp.4131-4138."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"[\n  16\n  ]  Wang F Wang X and Sun S. 2022. A Reinforcement Learning Level-based Particle Swarm Optimization Algorithm for Large-scale Optimization. Information Sciences Vol.\u00a0602. pp.298-312.  [16] Wang F Wang X and Sun S. 2022. A Reinforcement Learning Level-based Particle Swarm Optimization Algorithm for Large-scale Optimization. Information Sciences Vol.\u00a0602. pp.298-312.","DOI":"10.1016\/j.ins.2022.04.053"}],"event":{"name":"ICMLSC 2023: 2023 The 7th International Conference on Machine Learning and Soft Computing","location":"Chongqing China","acronym":"ICMLSC 2023"},"container-title":["2023 The 7th International Conference on Machine Learning and Soft Computing (ICMLSC)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3583788.3583802","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3583788.3583802","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:59Z","timestamp":1750178819000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3583788.3583802"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,5]]},"references-count":16,"alternative-id":["10.1145\/3583788.3583802","10.1145\/3583788"],"URL":"https:\/\/doi.org\/10.1145\/3583788.3583802","relation":{},"subject":[],"published":{"date-parts":[[2023,1,5]]},"assertion":[{"value":"2023-06-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}