{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,9]],"date-time":"2025-05-09T05:06:22Z","timestamp":1746767182524,"version":"3.37.3"},"reference-count":56,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,8,3]],"date-time":"2023-08-03T00:00:00Z","timestamp":1691020800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,3]],"date-time":"2023-08-03T00:00:00Z","timestamp":1691020800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004721","name":"The University of Tokyo","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004721","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["World Wide Web"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This study proposes a new efficient parameter tuning method for multi-agent simulation (MAS) using deep reinforcement learning. MAS is currently a useful tool for social sciences, but is hard to realize realistic simulations due to its computational burden for parameter tuning. This study proposes efficient parameter tuning to address this issue using deep reinforcement learning methods. To improve compatibility with the tuning task, our proposed method employs actor-critic-based deep reinforcement learning, such as deep deterministic policy gradient (DDPG) and soft actor-critic (SAC). In addition to the customized version of DDPG and SAC for our task, we also propose three additional components to stabilize the learning: an action converter (DDPG only), a redundant full neural network actor, and a seed fixer. For experimental verification, we employ a parameter tuning task in an artificial financial market simulation, comparing our proposed model, its ablations, and the Bayesian estimation-based baseline. The results demonstrate that our model outperforms the baseline in terms of tuning performance, indicating that the additional components of the proposed method are essential. Moreover, the critic of our model works effectively as a surrogate model, that is, as an approximate function of the simulation, which allows the actor to tune the parameters appropriately. We have also found that the SAC-based method exhibits the best and fastest convergence, which we assume is achieved by the high exploration capability of SAC.<\/jats:p>","DOI":"10.1007\/s11280-023-01197-5","type":"journal-article","created":{"date-parts":[[2023,8,3]],"date-time":"2023-08-03T03:21:41Z","timestamp":1691032901000},"page":"3535-3559","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Neural-network-based parameter tuning for multi-agent simulation using deep reinforcement learning"],"prefix":"10.1007","volume":"26","author":[{"given":"Masanori","family":"Hirano","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kiyoshi","family":"Izumi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,8,3]]},"reference":[{"key":"1197_CR1","doi-asserted-by":"publisher","unstructured":"Kurahashi, S.: Estimating Effectiveness of Preventing Measures for 2019 Novel Coronavirus Diseases (COVID-19). Proceeding of 2020 9th Int. Congress Adv. Appl. Inf. 487\u2013492 (2020). https:\/\/doi.org\/10.1109\/IIAI-AAI50415.2020.00103","DOI":"10.1109\/IIAI-AAI50415.2020.00103"},{"issue":"1\u20132","key":"1197_CR2","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1002\/isaf.1374","volume":"23","author":"T Mizuta","year":"2016","unstructured":"Mizuta, T., Kosugi, S., Kusumoto, T., Matsumoto, W., Izumi, K., Yagi, I., Yoshimura, S.: Effects of Price Regulations and Dark Pools on Financial Market Stability: An Investigation by Multiagent Simulations. Intell. Syst. Account. Finance Manag. 23(1\u20132), 97\u2013120 (2016). https:\/\/doi.org\/10.1002\/isaf.1374","journal-title":"Intell. Syst. Account. Finance Manag."},{"issue":"4","key":"1197_CR3","doi-asserted-by":"publisher","first-page":"75","DOI":"10.3390\/jrfm13040075","volume":"13","author":"M Hirano","year":"2020","unstructured":"Hirano, M., Izumi, K., Shimada, T., Matsushima, H., Sakaji, H.: Impact Analysis of Financial Regulation on Multi-Asset Markets Using Artificial Market Simulations. J. Risk Financial Manag. 13(4), 75 (2020). https:\/\/doi.org\/10.3390\/jrfm13040075","journal-title":"J. Risk Financial Manag."},{"key":"1197_CR4","doi-asserted-by":"crossref","unstructured":"Sajjad, M., Singh, K., Paik, E., Ahn, C.W.: A data-driven approach for agent-based modeling: Simulating the dynamics of family formation. J. Art. Soc. Soc. Simul. 19(1), 9 (2016). https:\/\/doi.org\/10.18564\/jasss.2988","DOI":"10.18564\/jasss.2988"},{"issue":"9","key":"1197_CR5","doi-asserted-by":"publisher","first-page":"1779","DOI":"10.1541\/ieejeiss.133.1779","volume":"133","author":"Y Nonaka","year":"2013","unstructured":"Nonaka, Y., Onishi, M., Yamashita, T., Okada, T., Shimada, A., Taniguchi, R.I.: Walking velocity model for accurate and massive pedestrian simulator. IEEJ Trans. Electron. Inf. Syst. 133(9), 1779\u20131786 (2013). https:\/\/doi.org\/10.1541\/ieejeiss.133.1779","journal-title":"IEEJ Trans. Electron. Inf. Syst."},{"key":"1197_CR6","doi-asserted-by":"publisher","unstructured":"Shigenaka, S., Onishi, M., Yamashita, T., Noda, I.: Estimation of LargeScale Pedestrian Movement Using Data Assimilation. IEICE Trans. Inf. Syst. D. J. 101(9), 1286\u20131294 (2018). https:\/\/doi.org\/10.14923\/transinfj.2017SAP0014","DOI":"10.14923\/transinfj.2017SAP0014"},{"key":"1197_CR7","unstructured":"Moss, S., Edmonds, B.: Towards Good Social Science. J. Art. Soc. Social Simul. 8(4), 13 (2005). http:\/\/jasss.soc.surrey.ac.uk\/8\/4\/13.html"},{"issue":"6","key":"1197_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1527\/TJSAI.AG-E","volume":"31","author":"H Matsushima","year":"2016","unstructured":"Matsushima, H., Uchitane, T., Tsuji, J., Yamashita, T., Ito, N., Noda, I.: Applying Design of Experiment based Significant Parameter Search and Reducing Number of Experiment to Analysis of Evacuation Simulation. Trans. Japanese Society Art. Intell. 31(6), 1\u20139 (2016). https:\/\/doi.org\/10.1527\/TJSAI.AG-E","journal-title":"Trans. Japanese Society Art. Intell."},{"key":"1197_CR9","doi-asserted-by":"publisher","unstructured":"Yamashita, Y., Shigenaka, S., Oba, D., Onishi, M.: Estimation of Large-scale Multi Agent Simulation Results Using Neural Networks [in Japanese]. In: 39th Japanese Special Interest Group on Society andArtificial Intelligence (SIG-SAI), p. 05 (2020). https:\/\/doi.org\/10.11517\/JSAISIGTWO.2020.SAI-039_05","DOI":"10.11517\/JSAISIGTWO.2020.SAI-039_05"},{"key":"1197_CR10","doi-asserted-by":"publisher","unstructured":"Ozaki, Y., Tanigaki, Y., Watanabe, S., Onishi, M.: Multiobjective treestructured parzen estimator for computationally expensive optimization problems. In: Proceedings of 2020 Genetic and Evolutionary Computation Conference, pp. 533\u2013541 (2020). https:\/\/doi.org\/10.1145\/3377930.3389817","DOI":"10.1145\/3377930.3389817"},{"key":"1197_CR11","doi-asserted-by":"publisher","unstructured":"Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., Mordatch, I.: Emergent Tool Use From Multi-Agent Autocurricula. In: Proceedings of the International Conference on Learning Representations (2020). https:\/\/doi.org\/10.48550\/arxiv.1909.07528","DOI":"10.48550\/arxiv.1909.07528"},{"issue":"7256","key":"1197_CR12","doi-asserted-by":"publisher","first-page":"685","DOI":"10.1038\/460685a","volume":"460","author":"JD Farmer","year":"2009","unstructured":"Farmer, J.D., Foley, D.: The economy needs agent-based modelling. Nature 460(7256), 685\u2013686 (2009). https:\/\/doi.org\/10.1038\/460685a","journal-title":"Nature"},{"issue":"6275","key":"1197_CR13","doi-asserted-by":"publisher","first-page":"818","DOI":"10.1126\/science.aad0299","volume":"351","author":"S Battiston","year":"2016","unstructured":"Battiston, S., Farmer, J.D., Flache, A., Garlaschelli, D., Haldane, A.G., Heesterbeek, H., Hommes, C., Jaeger, C., May, R., Scheffer, M.: Complexity theory and financial regulation: Economic policy needs interdisciplinary network analysis and behavioral modeling. Science 351(6275), 818\u2013819 (2016). https:\/\/doi.org\/10.1126\/science.aad0299","journal-title":"Science"},{"issue":"6719","key":"1197_CR14","doi-asserted-by":"publisher","first-page":"498","DOI":"10.1038\/17290","volume":"397","author":"T Lux","year":"1999","unstructured":"Lux, T., Marchesi, M.: Scaling and criticality in a stochastic multi-agent model of a financial market. Nature 397(6719), 498\u2013500 (1999). https:\/\/doi.org\/10.1038\/17290","journal-title":"Nature"},{"key":"1197_CR15","doi-asserted-by":"publisher","unstructured":"Cui, W., Brabazon, A.: An agent-based modeling approach to study price impact. In: Proceedings of 2012 IEEE Conference on Computational Intelligence for Financial Engineering and Economics, pp. 241\u2013248 (2012). https:\/\/doi.org\/10.1109\/CIFEr.2012.6327798","DOI":"10.1109\/CIFEr.2012.6327798"},{"key":"1197_CR16","doi-asserted-by":"publisher","unstructured":"Mizuta, T.: An agent-based model for designing a financial market that works well. arXiv (2019). https:\/\/doi.org\/10.48550\/arXiv.1906.06000","DOI":"10.48550\/arXiv.1906.06000"},{"issue":"2","key":"1197_CR17","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1007\/s40844-015-0024-z","volume":"12","author":"T Torii","year":"2015","unstructured":"Torii, T., Izumi, K., Yamada, K.: Shock transfer by arbitrage trading: analysis using multi-asset artificial market. Evol. Inst. Econ. Rev. 12(2), 395\u2013412 (2015). https:\/\/doi.org\/10.1007\/s40844-015-0024-z","journal-title":"Evol. Inst. Econ. Rev."},{"issue":"5","key":"1197_CR18","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1088\/1469-7688\/2\/5\/303","volume":"2","author":"C Chiarella","year":"2002","unstructured":"Chiarella, C., Iori, G.: A simulation analysis of the microstructure of double auction markets. Quantitative Finance 2(5), 346\u2013353 (2002). https:\/\/doi.org\/10.1088\/1469-7688\/2\/5\/303","journal-title":"Quantitative Finance"},{"key":"1197_CR19","doi-asserted-by":"publisher","unstructured":"Leal, S.J., Napoletano, M.: Market stability vs. market resilience: Regulatory policies experiments in an agent-based model with low- and high-frequency trading. J. Econ. Behav. Organ. 157, 15\u201341 (2019). https:\/\/doi.org\/10.1016\/j.jebo.2017.04.013","DOI":"10.1016\/j.jebo.2017.04.013"},{"key":"1197_CR20","doi-asserted-by":"publisher","unstructured":"Paddrik, M., Hayes, R., Todd, A., Yang, S., Beling, P., Scherer, W.: An agent based model of the E-Mini S &P 500 applied to flash crash analysis. In: Proceedings of 2012 IEEE Conference on Computational Intelligence for Financial Engineering and Economics, pp. 257\u2013264 (2012). https:\/\/doi.org\/10.1109\/CIFEr.2012.6327800","DOI":"10.1109\/CIFEr.2012.6327800"},{"issue":"3","key":"1197_CR21","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1007\/s10015-017-0368-z","volume":"22","author":"T Torii","year":"2017","unstructured":"Torii, T., Kamada, T., Izumi, K., Yamada, K.: Platform Design for Largescale Artificial Market Simulation and Preliminary Evaluation on the K Computer. Art. Life Robotics 22(3), 301\u2013307 (2017). https:\/\/doi.org\/10.1007\/s10015-017-0368-z","journal-title":"Art. Life Robotics"},{"key":"1197_CR22","unstructured":"Torii, T., Izumi, K., Kamada, T., Yonenoh, H., Fujishima, D., Matsuura, I., Hirano, M., Takahashi, T.: Plham: Platform for Large-scale and Highfrequency Artificial Market (2016). https:\/\/github.com\/plham\/plham"},{"key":"1197_CR23","unstructured":"Torii, T., Izumi, K., Kamada, T., Yonenoh, H., Fujishima, D., Matsuura, I., Hirano, M., Takahashi, T., Finnerty, P.: PlhamJ (2019). https:\/\/github.com\/plham\/plhamJ"},{"key":"1197_CR24","doi-asserted-by":"publisher","unstructured":"Sato, H., Koyama, Y., Kurumatani, K., Shiozawa, Y., Deguchi, H.: U-mart: a test bed for interdisciplinary research into agent-based artificial markets. In: Evolutionary Controversies in Economics, pp. 179\u2013190 (2001). https:\/\/doi.org\/10.1007\/978-4-431-67903-5_13","DOI":"10.1007\/978-4-431-67903-5_13"},{"key":"1197_CR25","doi-asserted-by":"publisher","unstructured":"Arthur, W.B., Holland, J.H., LeBaron, B., Palmer, R., Tayler, P.: Asset pricing under endogenous expectations in an artificial stock market. The Economy as an Evolving Complex System II, 15\u201344 (1997). https:\/\/doi.org\/10.1201\/9780429496639-2","DOI":"10.1201\/9780429496639-2"},{"key":"1197_CR26","doi-asserted-by":"publisher","unstructured":"Byrd, D., Hybinette, M., Hybinette Balch, T., Morgan, J.: ABIDES: Towards High-Fidelity Multi-Agent Market Simulation. In: Proceedings of the 2020 Conference on Principles of Advanced Discrete Simulation, pp. 11\u201322 (2020). https:\/\/doi.org\/10.1145\/3384441.3395986","DOI":"10.1145\/3384441.3395986"},{"key":"1197_CR27","doi-asserted-by":"publisher","unstructured":"Murase, Y., Uchitane, T., Ito, N.: A Tool for Parameter-space Explorations. Phys. Proced. 57(C), 73\u201376 (2014). https:\/\/doi.org\/10.1016\/J.PHPRO.2014.08.134","DOI":"10.1016\/J.PHPRO.2014.08.134"},{"key":"1197_CR28","doi-asserted-by":"publisher","unstructured":"Murase, Y., Matsushima, H., Noda, I., Kamada, T.: CARAVAN: A Framework for Comprehensive Simulations on Massive Parallel Machines. Massively Multi-Agent Systems II, 130\u2013143 (2019). https:\/\/doi.org\/10.1007\/978-3-030-20937-7_9","DOI":"10.1007\/978-3-030-20937-7_9"},{"issue":"2","key":"1197_CR29","doi-asserted-by":"publisher","first-page":"0263150","DOI":"10.1371\/JOURNAL.PONE.0263150","volume":"17","author":"C Angione","year":"2022","unstructured":"Angione, C., Silverman, E., Yaneske, E.: Using machine learning as a surrogate model for agent-based simulations. PLOS ONE 17(2), 0263150 (2022). https:\/\/doi.org\/10.1371\/JOURNAL.PONE.0263150","journal-title":"PLOS ONE"},{"issue":"3\u20134","key":"1197_CR30","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/bf00992698","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3\u20134), 279\u2013292 (1992). https:\/\/doi.org\/10.1007\/bf00992698","journal-title":"Mach. Learn."},{"issue":"1","key":"1197_CR31","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1007\/BF00115009","volume":"3","author":"RS Sutton","year":"1988","unstructured":"Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9\u201344 (1988). https:\/\/doi.org\/10.1007\/BF00115009","journal-title":"Mach. Learn."},{"issue":"3","key":"1197_CR32","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1145\/203330.203343","volume":"38","author":"G Tesauro","year":"1995","unstructured":"Tesauro, G.: Temporal Difference Learning and TD-Gammon. Commun. ACM 38(3), 58\u201368 (1995). https:\/\/doi.org\/10.1145\/203330.203343","journal-title":"Commun. ACM"},{"key":"1197_CR33","volume-title":"On-line Q-learning Using Connectionist Systems","author":"GA Rummery","year":"1994","unstructured":"Rummery, G.A., Niranjan, M.: On-line Q-learning Using Connectionist Systems. University of Cambridge, Department of Engineering Cambridge, England (1994)"},{"issue":"7540","key":"1197_CR34","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529\u2013533 (2015). https:\/\/doi.org\/10.1038\/nature14236","journal-title":"Nature"},{"key":"1197_CR35","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1613\/jair.3912","volume":"47","author":"MG Bellemare","year":"2013","unstructured":"Bellemare, M.G., Veness, J., Bowling, M.: The Arcade Learning Environment: An Evaluation Platform for General Agents. J. Art. Intell. Res. 47, 253\u2013279 (2013). https:\/\/doi.org\/10.1613\/jair.3912","journal-title":"J. Art. Intell. Res."},{"key":"1197_CR36","doi-asserted-by":"publisher","unstructured":"Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Adv. Neural Inf. Process. Syst. 2, 1097\u20131105 (2012). https:\/\/doi.org\/10.1145\/3065386","DOI":"10.1145\/3065386"},{"key":"1197_CR37","doi-asserted-by":"publisher","unstructured":"Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-Learning. In: Proceedings of 30th AAAI Conference on Artificial Intelligence, pp. 2094\u20132100 (2016). https:\/\/doi.org\/10.1609\/aaai.v30i1.10295","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"1197_CR38","unstructured":"Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Frcitas, N.: Dueling Network Architectures for Deep Reinforcement Learning. In: Proceedings of 33rd International Conference on Machine Learning, pp. 2939\u20132947 (2016)"},{"key":"1197_CR39","doi-asserted-by":"publisher","unstructured":"Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., Legg, S.: Noisy Netw. Explor. arXiv (2017). https:\/\/doi.org\/10.48550\/arXiv.1706.10295","DOI":"10.48550\/arXiv.1706.10295"},{"key":"1197_CR40","volume-title":"Reinforcement Learning: An Introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, USA (2018)"},{"key":"1197_CR41","unstructured":"OpenAI: OpenAI Baselines: ACKTR & A2C (2017). https:\/\/openai.com\/blog\/baselines-acktr-a2c\/ Accessed 2019-11-06"},{"key":"1197_CR42","doi-asserted-by":"publisher","unstructured":"Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. In: Proceedings of 32nd AAAI Conference on Artificial Intelligence, pp. 3215\u20133222 (2018). https:\/\/doi.org\/10.1609\/aaai.v32i1.11796","DOI":"10.1609\/aaai.v32i1.11796"},{"key":"1197_CR43","doi-asserted-by":"publisher","unstructured":"Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., van Hasselt, H., Silver, D.: Distributed Prioritized Experience Replay. arXiv (2018). https:\/\/doi.org\/10.48550\/arXiv.1803.00933","DOI":"10.48550\/arXiv.1803.00933"},{"key":"1197_CR44","unstructured":"Kapturowski, S., Ostrovski, G., Quan, J., Munos, R., Dabney, W.: Recurrent Experience Replay in Distributed Reinforcement Learning. In: Proceedings of International Conference on Learning Representations, pp. 1\u201315 (2019)"},{"issue":"8","key":"1197_CR45","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Comput. 9(8), 1735\u20131780 (1997). https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735","journal-title":"Neural Comput."},{"issue":"6419","key":"1197_CR46","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1126\/science.aar6404","volume":"362","author":"D Silver","year":"2018","unstructured":"Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., Hassabis, D.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Sci. 362(6419), 1140\u20131144 (2018). https:\/\/doi.org\/10.1126\/science.aar6404","journal-title":"Sci."},{"issue":"7676","key":"1197_CR47","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., Van Den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550(7676), 354\u2013359 (2017). https:\/\/doi.org\/10.1038\/nature24270","journal-title":"Nature"},{"key":"1197_CR48","doi-asserted-by":"publisher","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: Proceedings of 4th International Conference on Learning Representations (2015). https:\/\/doi.org\/10.48550\/arxiv.1509.02971","DOI":"10.48550\/arxiv.1509.02971"},{"key":"1197_CR49","doi-asserted-by":"publisher","unstructured":"Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., Levine, S.: Soft Actor-Critic Algorithms and Applications. arXiv (2018). https:\/\/doi.org\/10.48550\/arxiv.1812.05905","DOI":"10.48550\/arxiv.1812.05905"},{"key":"1197_CR50","doi-asserted-by":"publisher","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft Actor-Critic: OffPolicy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proc. 35th Int. Conf. Mach. Learn. 2976\u20132989 (2018). https:\/\/doi.org\/10.48550\/arxiv.1801.01290","DOI":"10.48550\/arxiv.1801.01290"},{"issue":"5","key":"1197_CR51","doi-asserted-by":"publisher","first-page":"823","DOI":"10.1103\/PhysRev.36.823","volume":"36","author":"GE Uhlenbeck","year":"1930","unstructured":"Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the brownian motion. Physi. Rev. 36(5), 823 (1930). https:\/\/doi.org\/10.1103\/PhysRev.36.823","journal-title":"Physi. Rev."},{"key":"1197_CR52","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1016\/j.neunet.2012.11.007","volume":"41","author":"P Wawrzy\u0144ski","year":"2013","unstructured":"Wawrzy\u0144ski, P., Tanwani, A.K.: Autonomous reinforcement learning with experience replay. Neural Netw. 41, 156\u2013167 (2013). https:\/\/doi.org\/10.1016\/j.neunet.2012.11.007","journal-title":"Neural Netw."},{"key":"1197_CR53","doi-asserted-by":"publisher","unstructured":"Frankle, J., Carbin, M.: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. Proceedings of 7th International Conference on Learning Representations (2018). https:\/\/doi.org\/10.48550\/arxiv.1803.03635","DOI":"10.48550\/arxiv.1803.03635"},{"key":"1197_CR54","doi-asserted-by":"publisher","DOI":"10.1515\/9781400884964","volume-title":"The End of Theory: Financial Crises, the Failure of Economics, and the Sweep of Human Interaction","author":"RM Bookstaber","year":"2017","unstructured":"Bookstaber, R.M.: The End of Theory: Financial Crises, the Failure of Economics, and the Sweep of Human Interaction. Princeton University Press, USA (2017)"},{"key":"1197_CR55","unstructured":"Corsi, F.: Measuring and modelling realized volatility: from tick-by-tick to long memory. PhD thesis, Universit\u00e1 della Svizzera italiana (2005)"},{"key":"1197_CR56","doi-asserted-by":"publisher","unstructured":"Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th International Conference on Knowledge Discovery & Data Mining, pp. 2623\u20132631 (2019). https:\/\/doi.org\/10.1145\/3292500.3330701","DOI":"10.1145\/3292500.3330701"}],"container-title":["World Wide Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11280-023-01197-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11280-023-01197-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11280-023-01197-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T04:28:24Z","timestamp":1696998504000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11280-023-01197-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,3]]},"references-count":56,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["1197"],"URL":"https:\/\/doi.org\/10.1007\/s11280-023-01197-5","relation":{},"ISSN":["1386-145X","1573-1413"],"issn-type":[{"type":"print","value":"1386-145X"},{"type":"electronic","value":"1573-1413"}],"subject":[],"published":{"date-parts":[[2023,8,3]]},"assertion":[{"value":"10 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 June 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 July 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 August 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}},{"value":"The authors declare no conflicts of interest.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}