{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T17:40:03Z","timestamp":1745430003569,"version":"3.40.4"},"reference-count":75,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,3,25]],"date-time":"2025-03-25T00:00:00Z","timestamp":1742860800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,25]],"date-time":"2025-03-25T00:00:00Z","timestamp":1742860800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62032016"],"award-info":[{"award-number":["62032016"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005090","name":"Beijing Nova Program","doi-asserted-by":"publisher","award":["20220484106"],"award-info":[{"award-number":["20220484106"]}],"id":[{"id":"10.13039\/501100005090","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Hydraulic-supports alignment is to keep the coal mining face in line and is heavily influenced by the various geological states. The experiences produced by the moving process are unbalanced, which leads to the agent not learning important knowledge from the rare samples. This paper is the first to introduce the reinforcement learning to the hydraulic-supports alignment, and establish the Markov optimal decision model by TD3 algorithm. Aiming at the imbalance issue of the experience, this paper proposes a segmented experience pool and three sampling replay mechanisms according to the characteristics of the moving process with various geological states. Experimental results show that the improved TD3, utilizing a segmented experience pool with three different replay mechanisms, could effectively identify the optimal moving policy and achieve significant convergence in cases involving both normal movement and insufficient movement of hydraulic-supports. In contrast, the TD3 performs inadequately and struggles to find the optimal policy.<\/jats:p>","DOI":"10.1007\/s11063-025-11744-y","type":"journal-article","created":{"date-parts":[[2025,3,28]],"date-time":"2025-03-28T00:15:21Z","timestamp":1743120921000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Hydraulic-Supports Alignment by TD3 with Segmented Experience Pool"],"prefix":"10.1007","volume":"57","author":[{"given":"Yi","family":"Yang","sequence":"first","affiliation":[]},{"given":"Yapeng","family":"Dai","sequence":"additional","affiliation":[]},{"given":"Tian","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Wei","family":"Qian","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,25]]},"reference":[{"key":"11744_CR1","doi-asserted-by":"publisher","first-page":"113387","DOI":"10.1016\/j.enpol.2022.113387","volume":"174","author":"S Tiedemann","year":"2023","unstructured":"Tiedemann S, M\u00fcller-Hansen F (2023) Auctions to phase out coal power: lessons learned from Germany. Energy Policy 174:113387","journal-title":"Energy Policy"},{"issue":"2","key":"11744_CR2","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s40789-015-0071-4","volume":"2","author":"J Wang","year":"2015","unstructured":"Wang J, Yu B, Kang H et al (2015) Key technologies and equipment for a fully mechanized top-coal caving operation with a large mining height at ultra-thick coal seams. Int J Coal Sci Technol 2(2):97\u2013161. https:\/\/doi.org\/10.1007\/s40789-015-0071-4","journal-title":"Int J Coal Sci Technol"},{"issue":"4","key":"11744_CR3","doi-asserted-by":"publisher","first-page":"954","DOI":"10.1007\/s10230-022-00891-6","volume":"41","author":"C Li","year":"2022","unstructured":"Li C, Zuo J, Huang X et al (2022) Water inrush modes through a thick aquifuge floor in a deep coal mine and appropriate control technology: a case study from hebei, china. Mine Water Environ 41(4):954\u2013969. https:\/\/doi.org\/10.1007\/s10230-022-00891-6","journal-title":"Mine Water Environ"},{"key":"11744_CR4","doi-asserted-by":"publisher","first-page":"113977","DOI":"10.1016\/j.measurement.2023.113977","volume":"225","author":"Z Hao","year":"2024","unstructured":"Hao Z, Xie J, Wang X et al (2024) A method for reconstructing the pose of hydraulic support group based on point cloud and digital twin. Measurement 225:113977","journal-title":"Measurement"},{"issue":"18","key":"11744_CR5","doi-asserted-by":"publisher","first-page":"5878","DOI":"10.3390\/s24185878","volume":"24","author":"B Miao","year":"2024","unstructured":"Miao B, Ge S, Li Y et al (2024) Method of motion planning for digital twin navigation and cutting of shearer. Sensors 24(18):5878","journal-title":"Sensors"},{"key":"11744_CR6","doi-asserted-by":"publisher","first-page":"101694","DOI":"10.1016\/j.aei.2022.101694","volume":"53","author":"J Xie","year":"2022","unstructured":"Xie J, Li S, Wang X (2022) A digital smart product service system and a case study of the mining industry: Mspss. Adv Eng Inform 53:101694","journal-title":"Adv Eng Inform"},{"key":"11744_CR7","doi-asserted-by":"publisher","first-page":"111722","DOI":"10.1016\/j.measurement.2022.111722","volume":"202","author":"X Jiao","year":"2022","unstructured":"Jiao X, Xie J, Wang X et al (2022) Intelligent decision method for the position and attitude self-adjustment of hydraulic support groups driven by a digital twin system. Measurement 202:111722","journal-title":"Measurement"},{"key":"11744_CR8","doi-asserted-by":"publisher","first-page":"107743","DOI":"10.1016\/j.measurement.2020.107743","volume":"158","author":"X Ge","year":"2020","unstructured":"Ge X, Xie J, Wang X et al (2020) A virtual adjustment method and experimental study of the support attitude of hydraulic support groups in propulsion state. Measurement 158:107743","journal-title":"Measurement"},{"issue":"12","key":"11744_CR9","doi-asserted-by":"publisher","first-page":"1347","DOI":"10.1016\/S0967-0661(00)00081-2","volume":"8","author":"A Alleyne","year":"2000","unstructured":"Alleyne A, Liu R (2000) A simplified approach to force control for electro-hydraulic systems. Control Eng Pract 8(12):1347\u20131356","journal-title":"Control Eng Pract"},{"issue":"4","key":"11744_CR10","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1016\/j.ijmst.2024.04.011","volume":"34","author":"J Guo","year":"2024","unstructured":"Guo J, Huang W, Feng G et al (2024) Stability analysis of longwall top-coal caving face in extra-thick coal seams based on an innovative numerical hydraulic support model. Int J Min Sci Technol 34(4):491\u2013505","journal-title":"Int J Min Sci Technol"},{"issue":"4","key":"11744_CR11","doi-asserted-by":"publisher","first-page":"981","DOI":"10.3390\/pr11040981","volume":"11","author":"R Li","year":"2023","unstructured":"Li R, Yuan W, Ding X et al (2023) Review of research and development of hydraulic synchronous control system. Processes 11(4):981","journal-title":"Processes"},{"issue":"5","key":"11744_CR12","doi-asserted-by":"publisher","first-page":"1054","DOI":"10.1109\/TNN.1998.712192","volume":"9","author":"RS Sutton","year":"1998","unstructured":"Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054\u20131054. https:\/\/doi.org\/10.1109\/TNN.1998.712192","journal-title":"IEEE Trans Neural Netw"},{"key":"11744_CR13","doi-asserted-by":"publisher","unstructured":"Meng Q, Anandan PD, Rielly CD et\u00a0al (2023) Multi-agent reinforcement learning and rl-based adaptive pid control of crystallization processes. In: Kokossis AC, Georgiadis MC, Pistikopoulos E (eds) 33rd European symposium on computer aided process engineering, computer aided chemical engineering, vol\u00a052. Elsevier, pp. 1667\u20131672. https:\/\/doi.org\/10.1016\/B978-0-443-15274-0.50265-1","DOI":"10.1016\/B978-0-443-15274-0.50265-1"},{"key":"11744_CR14","unstructured":"Zhou M, Luo J, Villella J et\u00a0al (2021) Smarts: an open-source scalable multi-agent rl training school for autonomous driving. In: Kober J, Ramos F, Tomlin C (eds) Proceedings of the 2020 conference on robot learning, proceedings of machine learning research, vol 155. PMLR, pp 264\u2013285"},{"key":"11744_CR15","doi-asserted-by":"publisher","unstructured":"Budak AF, Bhansali P, Liu B et\u00a0al (2021) Dnn-opt: an rl inspired optimization for analog circuit sizing using deep neural networks. In: 2021 58th ACM\/IEEE design automation conference (DAC), pp 1219\u20131224. https:\/\/doi.org\/10.1109\/DAC18074.2021.9586139","DOI":"10.1109\/DAC18074.2021.9586139"},{"key":"11744_CR16","doi-asserted-by":"publisher","first-page":"110404","DOI":"10.1016\/j.epsr.2024.110404","volume":"233","author":"C Liu","year":"2024","unstructured":"Liu C, Rao X, Zhao B et al (2024) Deep reinforcement learning-based optimal bidding strategy for real-time multi-participant electricity market with short-term load. Electric Power Syst Res 233:110404","journal-title":"Electric Power Syst Res"},{"key":"11744_CR17","doi-asserted-by":"publisher","first-page":"109917","DOI":"10.1016\/j.cie.2024.109917","volume":"189","author":"F Zhang","year":"2024","unstructured":"Zhang F, Li R, Gong W (2024) Deep reinforcement learning-based memetic algorithm for energy-aware flexible job shop scheduling with multi-agv. Comput Ind Eng 189:109917","journal-title":"Comput Ind Eng"},{"key":"11744_CR18","doi-asserted-by":"publisher","first-page":"107087","DOI":"10.1016\/j.ocecoaman.2024.107087","volume":"251","author":"X Chen","year":"2024","unstructured":"Chen X, Liu S, Zhao J et al (2024) Autonomous port management based agv path planning and optimization via an ensemble reinforcement learning framework. Ocean Coast Manag 251:107087","journal-title":"Ocean Coast Manag"},{"issue":"9","key":"11744_CR19","doi-asserted-by":"publisher","first-page":"753","DOI":"10.3390\/aerospace11090753","volume":"11","author":"S Liu","year":"2024","unstructured":"Liu S, Zhou S, Miao J et al (2024) Autonomous trajectory planning method for stratospheric airship regional station-keeping based on deep reinforcement learning. Aerospace 11(9):753","journal-title":"Aerospace"},{"key":"11744_CR20","unstructured":"Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning"},{"key":"11744_CR21","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1016\/j.neucom.2022.06.102","volume":"503","author":"H Jiang","year":"2022","unstructured":"Jiang H, Esfahani MA, Wu K et al (2022) itd3-cln: learn to navigate in dynamic scene through deep reinforcement learning. Neurocomputing 503:118\u2013128","journal-title":"Neurocomputing"},{"key":"11744_CR22","first-page":"1","volume":"8","author":"Z Lou","year":"2024","unstructured":"Lou Z, Wang Y, Shan S et al (2024) Balanced prioritized experience replay in off-policy reinforcement learning. Neural Comput Appl 8:1\u201317","journal-title":"Neural Comput Appl"},{"key":"11744_CR23","doi-asserted-by":"publisher","first-page":"124017","DOI":"10.1016\/j.eswa.2024.124017","volume":"251","author":"J Yu","year":"2024","unstructured":"Yu J, Li J, L\u00fc S et al (2024) Mixed experience sampling for off-policy reinforcement learning. Expert Syst Appl 251:124017","journal-title":"Expert Syst Appl"},{"issue":"6","key":"11744_CR24","doi-asserted-by":"publisher","first-page":"4255","DOI":"10.1007\/s10064-021-02198-2","volume":"80","author":"M Ghasemi","year":"2021","unstructured":"Ghasemi M, Corkum AG, Gorrell GA (2021) Ground surface rock buckling: analysis of collected cases and failure mechanisms. Bull Eng Geol Env 80(6):4255\u20134276. https:\/\/doi.org\/10.1007\/s10064-021-02198-2","journal-title":"Bull Eng Geol Env"},{"key":"11744_CR25","doi-asserted-by":"publisher","first-page":"168650","DOI":"10.1109\/ACCESS.2020.3023551","volume":"8","author":"S Jiang","year":"2020","unstructured":"Jiang S, Lv R, Wan L et al (2020) Dynamic characteristics of the chain drive system of scraper conveyor based on the speed difference. IEEE Access 8:168650\u2013168658. https:\/\/doi.org\/10.1109\/ACCESS.2020.3023551","journal-title":"IEEE Access"},{"issue":"12","key":"11744_CR26","doi-asserted-by":"publisher","first-page":"125103","DOI":"10.1088\/1361-6501\/aceb0e","volume":"34","author":"N Chen","year":"2023","unstructured":"Chen N, Fang X, Feng H et al (2023) Scraper conveyor shape sensing technology based on orthogonal optical fiber strain. Meas Sci Technol 34(12):125103. https:\/\/doi.org\/10.1088\/1361-6501\/aceb0e","journal-title":"Meas Sci Technol"},{"issue":"17","key":"11744_CR27","doi-asserted-by":"publisher","first-page":"6399","DOI":"10.3390\/s22176399","volume":"22","author":"Y Song","year":"2022","unstructured":"Song Y, Fang X, Wu G et al (2022) Research on straightness perception compensation model of fbg scraper conveyor based on rotation error angle. Sensors 22(17):6399","journal-title":"Sensors"},{"issue":"19","key":"11744_CR28","doi-asserted-by":"publisher","first-page":"3545","DOI":"10.3390\/math10193545","volume":"10","author":"J Lv","year":"2022","unstructured":"Lv J, Shi P, Wan Z et al (2022) Research on a real-time monitoring method for the three-dimensional straightness of a scraper conveyor based on binocular vision. Mathematics 10(19):3545","journal-title":"Mathematics"},{"key":"11744_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TIM.2023.3267344","volume":"72","author":"Q Zeng","year":"2023","unstructured":"Zeng Q, Xu W, Gao K (2023) Measurement method and experiment of hydraulic support group attitude and straightness based on binocular vision. IEEE Trans Instrum Meas 72:1\u201314. https:\/\/doi.org\/10.1109\/TIM.2023.3267344","journal-title":"IEEE Trans Instrum Meas"},{"key":"11744_CR30","doi-asserted-by":"publisher","first-page":"113264","DOI":"10.1016\/j.measurement.2023.113264","volume":"219","author":"Q Chen","year":"2023","unstructured":"Chen Q, Chen H, Chen H et al (2023) Measurement of displacement and top beam attitude angle of advanced hydraulic support based on visual detection. Measurement 219:113264","journal-title":"Measurement"},{"issue":"6","key":"11744_CR31","doi-asserted-by":"publisher","first-page":"8623","DOI":"10.1109\/TNNLS.2022.3230978","volume":"35","author":"W Qian","year":"2024","unstructured":"Qian W, Lu D, Guo S et al (2024) Distributed state estimation for mixed delays system over sensor networks with multichannel random attacks and markov switching topology. IEEE Trans Neural Netw Learn Syst 35(6):8623\u20138637. https:\/\/doi.org\/10.1109\/TNNLS.2022.3230978","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"05","key":"11744_CR32","first-page":"645","volume":"38","author":"Y Wang","year":"2021","unstructured":"Wang Y, Chang Z, Gao F et al (2021) Study on the alignment method of hydraulic support. Mech Electr Eng Mag 38(05):645\u2013649","journal-title":"Mech Electr Eng Mag"},{"issue":"11","key":"11744_CR33","doi-asserted-by":"publisher","first-page":"119","DOI":"10.13272\/j.issn.1671-251x.2022020030","volume":"48","author":"D Song","year":"2022","unstructured":"Song D, Lu C, Tao X et al (2022) Hydraulic support straightening method based on maximum correntropy Kalman filtering algorithm. J Mine Autom 48(11):119\u2013124. https:\/\/doi.org\/10.13272\/j.issn.1671-251x.2022020030","journal-title":"J Mine Autom"},{"issue":"01","key":"11744_CR34","doi-asserted-by":"publisher","first-page":"168","DOI":"10.13247\/j.cnki.jcumt.20210671","volume":"52","author":"X Wang","year":"2023","unstructured":"Wang X, Wang S, Wang S et al (2023) Prediction model of straightness error of scraper conveyor. J China Univ Min Technol 52(01):168\u2013177. https:\/\/doi.org\/10.13247\/j.cnki.jcumt.20210671","journal-title":"J China Univ Min Technol"},{"key":"11744_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.13225\/j.cnki.jccs.2023.1516","volume":"8","author":"M Sun","year":"2023","unstructured":"Sun M, Wang Y, Chang Y et al (2023) Distributed cooperative control of consistency of hydraulic support robot group moving frame. J China Coal Soc 8:1\u201316. https:\/\/doi.org\/10.13225\/j.cnki.jccs.2023.1516","journal-title":"J China Coal Soc"},{"issue":"10","key":"11744_CR36","first-page":"39","volume":"35","author":"B Hu","year":"2014","unstructured":"Hu B, Lian Z (2014) Study on hydraulic support straightening system based on support vector machine and genetic algorithm. Coal Miner Mach 35(10):39\u201341","journal-title":"Coal Miner Mach"},{"key":"11744_CR37","doi-asserted-by":"publisher","first-page":"142","DOI":"10.19614\/j.cnki.jsks.202008023","volume":"08","author":"X Wu","year":"2020","unstructured":"Wu X (2020) Research on fuzzy pid control straightening method of scraper conveyor in mine. Metal Mine 08:142\u2013146. https:\/\/doi.org\/10.19614\/j.cnki.jsks.202008023","journal-title":"Metal Mine"},{"issue":"02","key":"11744_CR38","doi-asserted-by":"publisher","first-page":"652","DOI":"10.13225\/j.cnki.jccs.XR20.1897","volume":"46","author":"X Wang","year":"2021","unstructured":"Wang X, Li S, Xie J et al (2021) Straightening method of scraper conveyor driven by robot kinematics and time series prediction. J China Coal Soc 46(02):652\u2013666. https:\/\/doi.org\/10.13225\/j.cnki.jccs.XR20.1897","journal-title":"J China Coal Soc"},{"issue":"05","key":"11744_CR39","doi-asserted-by":"publisher","first-page":"1043","DOI":"10.13545\/j.cnki.jmse.2023.0238","volume":"40","author":"X Fang","year":"2023","unstructured":"Fang X, Chen N, FENG H et al (2023) Key technologies of optical fiber accurate perception and straightening of straightness of the scraper conveyor. J Min Saf Eng 40(05):1043\u20131056. https:\/\/doi.org\/10.13545\/j.cnki.jmse.2023.0238","journal-title":"J Min Saf Eng"},{"key":"11744_CR40","unstructured":"Dehghan S, Yanikoglu B (2024) Evaluating ChatGPT\u2019s ability to detect hate speech in Turkish tweets. In: H\u00fcrriyeto\u011flu A, Tanev H, Thapa S et\u00a0al (eds) Proceedings of the 7th workshop on challenges and applications of automated extraction of socio-political events from text (CASE 2024). Association for Computational Linguistics, St. Julians, Malta, pp 54\u201359. https:\/\/aclanthology.org\/2024.case-1.6\/"},{"key":"11744_CR41","unstructured":"Dehghan S, Yan\u0131ko\u011flu B (2024) Multi-domain hate speech detection using dual contrastive learning and paralinguistic features. In: Calzolari N, Kan MY, Hoste V et\u00a0al (eds) Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024). ELRA and ICCL, Torino, Italia, pp 11745\u201311755. https:\/\/aclanthology.org\/2024.lrec-main.1025\/"},{"key":"11744_CR42","doi-asserted-by":"publisher","DOI":"10.3390\/app13031913","author":"S Dehghan","year":"2023","unstructured":"Dehghan S, Amasyali MF (2023) Selfccl: curriculum contrastive learning by transferring self-taught knowledge for fine-tuning bert. Appl Sci. https:\/\/doi.org\/10.3390\/app13031913","journal-title":"Appl Sci"},{"key":"11744_CR43","doi-asserted-by":"crossref","unstructured":"Dehghan S, Amasyali MF (2022) Supmpn: supervised multiple positives and negatives contrastive learning model for semantic textual similarity. Appl Sci 12(19). https:\/\/www.mdpi.com\/2076-3417\/12\/19\/9659","DOI":"10.3390\/app12199659"},{"key":"11744_CR44","doi-asserted-by":"publisher","unstructured":"Hester T, Quinlan M, Stone P (2012) Rtmba: a real-time model-based reinforcement learning architecture for robot control. In: 2012 IEEE international conference on robotics and automation, pp 85\u201390. https:\/\/doi.org\/10.1109\/ICRA.2012.6225072","DOI":"10.1109\/ICRA.2012.6225072"},{"key":"11744_CR45","doi-asserted-by":"publisher","unstructured":"Johannink T, Bahl S, Nair A et\u00a0al (2019) Residual reinforcement learning for robot control. In: 2019 International conference on robotics and automation (ICRA), pp 6023\u20136029. https:\/\/doi.org\/10.1109\/ICRA.2019.8794127","DOI":"10.1109\/ICRA.2019.8794127"},{"key":"11744_CR46","doi-asserted-by":"publisher","first-page":"153171","DOI":"10.1109\/ACCESS.2021.3126658","volume":"9","author":"E Salvato","year":"2021","unstructured":"Salvato E, Fenu G, Medvet E et al (2021) Crossing the reality gap: a survey on sim-to-real transferability of robot controllers in reinforcement learning. IEEE Access 9:153171\u2013153187. https:\/\/doi.org\/10.1109\/ACCESS.2021.3126658","journal-title":"IEEE Access"},{"issue":"2","key":"11744_CR47","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1007\/s10514-009-9132-0","volume":"27","author":"N Vlassis","year":"2009","unstructured":"Vlassis N, Toussaint M, Kontes G et al (2009) Learning model-free robot control by a Monte Carlo Em algorithm. Auton Robot 27(2):123\u2013130. https:\/\/doi.org\/10.1007\/s10514-009-9132-0","journal-title":"Auton Robot"},{"issue":"1","key":"11744_CR48","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1109\/TCIAIG.2009.2037972","volume":"2","author":"H Wang","year":"2010","unstructured":"Wang H, Gao Y, Chen X (2010) Rl-dot: a reinforcement learning npc team for playing domination games. IEEE Trans Comput Intell AI Games 2(1):17\u201326. https:\/\/doi.org\/10.1109\/TCIAIG.2009.2037972","journal-title":"IEEE Trans Comput Intell AI Games"},{"key":"11744_CR49","doi-asserted-by":"publisher","unstructured":"Wender S, Watson I (2012) Applying reinforcement learning to small scale combat in the real-time strategy game starcraft:broodwar. In: 2012 IEEE conference on computational intelligence and games (CIG), pp 402\u2013408. https:\/\/doi.org\/10.1109\/CIG.2012.6374183","DOI":"10.1109\/CIG.2012.6374183"},{"key":"11744_CR50","doi-asserted-by":"publisher","unstructured":"Bergdahl J, Gordillo C, Tollmar K et\u00a0al (2020) Augmenting automated game testing with deep reinforcement learning. In: 2020 IEEE conference on games (CoG), pp 600\u2013603. https:\/\/doi.org\/10.1109\/CoG47356.2020.9231552","DOI":"10.1109\/CoG47356.2020.9231552"},{"issue":"1","key":"11744_CR51","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1109\/JRPROC.1961.287775","volume":"49","author":"M Minsky","year":"1961","unstructured":"Minsky M (1961) Steps toward artificial intelligence. Proc IRE 49(1):8\u201330. https:\/\/doi.org\/10.1109\/JRPROC.1961.287775","journal-title":"Proc IRE"},{"issue":"3","key":"11744_CR52","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1023\/A:1022676722315","volume":"8","author":"C Watkins","year":"1992","unstructured":"Watkins C, Dayan P (1992) Technical note: Q-learning. Mach Learn 8(3):279\u2013292. https:\/\/doi.org\/10.1023\/A:1022676722315","journal-title":"Mach Learn"},{"issue":"2","key":"11744_CR53","doi-asserted-by":"publisher","first-page":"683","DOI":"10.1002\/asjc.3075","volume":"26","author":"Z Zhang","year":"2024","unstructured":"Zhang Z, Chen J, Zhao W (2024) Multi-agv route planning in automated warehouse system based on shortest-time q-learning algorithm. Asian J Control 26(2):683\u2013702","journal-title":"Asian J Control"},{"key":"11744_CR54","doi-asserted-by":"publisher","first-page":"127204","DOI":"10.1016\/j.neucom.2023.127204","volume":"572","author":"X Li","year":"2024","unstructured":"Li X, Tang B, Li H (2024) Adaer: an adaptive experience replay approach for continual lifelong learning. Neurocomputing 572:127204","journal-title":"Neurocomputing"},{"key":"11744_CR55","doi-asserted-by":"publisher","first-page":"108583","DOI":"10.1016\/j.compchemeng.2024.108583","volume":"182","author":"H Shi","year":"2024","unstructured":"Shi H, Gao W, Jiang X et al (2024) Two-dimensional model-free q-learning-based output feedback fault-tolerant control for batch processes. Comput Chem Eng 182:108583","journal-title":"Comput Chem Eng"},{"key":"11744_CR56","unstructured":"Mnih V, Kavukcuoglu K, Silver D et\u00a0al (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602"},{"issue":"4","key":"11744_CR57","doi-asserted-by":"publisher","first-page":"2852","DOI":"10.1109\/TII.2020.3000502","volume":"17","author":"SR Pokhrel","year":"2020","unstructured":"Pokhrel SR, Garg S (2020) Multipath communication with deep q-network for industry 4.0 automation and orchestration. IEEE Trans Ind Inf 17(4):2852\u20132859","journal-title":"IEEE Trans Ind Inf"},{"issue":"12","key":"11744_CR58","doi-asserted-by":"publisher","first-page":"9138","DOI":"10.1109\/JIOT.2021.3093346","volume":"9","author":"F Liang","year":"2021","unstructured":"Liang F, Yu W, Liu X et al (2021) Toward deep q-network-based resource allocation in industrial internet of things. IEEE Internet Things J 9(12):9138\u20139150","journal-title":"IEEE Internet Things J"},{"key":"11744_CR59","unstructured":"Silver D, Lever G, Heess N et\u00a0al (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on international conference on machine learning, vol 32. JMLR.org, ICML\u201914, pp I-387\u2013I-395"},{"key":"11744_CR60","unstructured":"Lillicrap T, Hunt J, Pritzel A et al (2015) Continuous control with deep reinforcement learning. CoRR"},{"key":"11744_CR61","unstructured":"Mnih V, Badia AP, Mirza M et\u00a0al (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48. JMLR.org, ICML\u201916, pp 1928\u20131937"},{"key":"11744_CR62","unstructured":"Schulman J, Wolski F, Dhariwal P et\u00a0al (2017) Proximal policy optimization algorithms. arXiv:1707.06347"},{"key":"11744_CR63","unstructured":"Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290"},{"issue":"3","key":"11744_CR64","doi-asserted-by":"publisher","first-page":"2007","DOI":"10.3934\/jimo.2024160","volume":"21","author":"Z Ren","year":"2025","unstructured":"Ren Z, Zhu Y, Feng Z et al (2025) Autonomous injection molding parameter tuning via enhanced td3-based reinforcement learning with behavior cloning. J Ind Manag Optim 21(3):2007\u20132031","journal-title":"J Ind Manag Optim"},{"issue":"18","key":"11744_CR65","doi-asserted-by":"publisher","first-page":"2350305","DOI":"10.1142\/S021812662350305X","volume":"32","author":"Y Gu","year":"2023","unstructured":"Gu Y, Zhu Z, Chu Y et al (2023) D3-td3: deep dense dueling architectures in td3 algorithm for robot path planning based on 3d point cloud. J Circuits Syst Comput 32(18):2350305","journal-title":"J Circuits Syst Comput"},{"issue":"8","key":"11744_CR66","doi-asserted-by":"publisher","first-page":"2525","DOI":"10.3390\/s24082525","volume":"24","author":"B Huang","year":"2024","unstructured":"Huang B, Xie J, Yan J (2024) Inspection robot navigation based on improved td3 algorithm. Sensors 24(8):2525","journal-title":"Sensors"},{"key":"11744_CR67","doi-asserted-by":"publisher","first-page":"130564","DOI":"10.1016\/j.energy.2024.130564","volume":"293","author":"Y Zhou","year":"2024","unstructured":"Zhou Y, Huang Y, Mao X et al (2024) Research on energy management strategy of fuel cell hybrid power via an improved td3 deep reinforcement learning. Energy 293:130564","journal-title":"Energy"},{"key":"11744_CR68","unstructured":"Schaul T, Quan J, Antonoglou I et\u00a0al (2015) Prioritized experience replay. CoRR arXiv:1511.05952"},{"key":"11744_CR69","doi-asserted-by":"publisher","unstructured":"Ramicic M, Bonarini A (2017) Attention-based experience replay in deep q-learning. In: Proceedings of the 9th international conference on machine learning and computing. association for computing machinery, New York, NY, USA, ICMLC \u201917, pp 476\u2013481. https:\/\/doi.org\/10.1145\/3055635.3056621","DOI":"10.1145\/3055635.3056621"},{"key":"11744_CR70","doi-asserted-by":"publisher","DOI":"10.3389\/fnbot.2018.00032","author":"T George Karimpanal","year":"2018","unstructured":"George Karimpanal T, Bouffanais R (2018) Experience replay using transition sequences. Front Neurorobot. https:\/\/doi.org\/10.3389\/fnbot.2018.00032","journal-title":"Front Neurorobot"},{"key":"11744_CR71","unstructured":"Fang M, Zhou C, Shi B et al (2019) DHER: hindsight experience replay for dynamic goals. In: International conference on learning representations"},{"key":"11744_CR72","unstructured":"Buzzega P, Boschini M, Porrello A et al (2020) Dark experience for general continual learning: a strong, simple baseline. In: Larochelle H, Ranzato M, Hadsell R, et\u00a0al (eds) Advances in neural information processing systems, vol\u00a033. Curran Associates, Inc., pp 15920\u201315930"},{"key":"11744_CR73","doi-asserted-by":"publisher","unstructured":"Cicek DC, Duran E, Saglam B et al (2021) Off-policy correction for deep deterministic policy gradient algorithms via batch prioritized experience replay. In: 2021 IEEE 33rd international conference on tools with artificial intelligence (ICTAI), pp 1255\u20131262. https:\/\/doi.org\/10.1109\/ICTAI52525.2021.00199","DOI":"10.1109\/ICTAI52525.2021.00199"},{"issue":"11","key":"11744_CR74","doi-asserted-by":"publisher","first-page":"1541","DOI":"10.1631\/FITEE.2300084","volume":"24","author":"S Wang","year":"2023","unstructured":"Wang S, Zhao B, Zhang Z et al (2023) Embedding expert demonstrations into clustering buffer for effective deep reinforcement learning. Front Inf Technol Electron Eng 24(11):1541\u20131556. https:\/\/doi.org\/10.1631\/FITEE.2300084","journal-title":"Front Inf Technol Electron Eng"},{"issue":"1","key":"11744_CR75","first-page":"35","volume":"43","author":"L Zhang","year":"2023","unstructured":"Zhang L, Feng Y, Wang R et al (2023) Efficient experience replay architecture for offline reinforcement learning. Robot Intell Autom 43(1):35\u201343","journal-title":"Robot Intell Autom"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-025-11744-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-025-11744-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-025-11744-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T16:59:59Z","timestamp":1745427599000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-025-11744-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,25]]},"references-count":75,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["11744"],"URL":"https:\/\/doi.org\/10.1007\/s11063-025-11744-y","relation":{},"ISSN":["1573-773X"],"issn-type":[{"type":"electronic","value":"1573-773X"}],"subject":[],"published":{"date-parts":[[2025,3,25]]},"assertion":[{"value":"24 February 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 March 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"35"}}