{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T17:35:28Z","timestamp":1776101728455,"version":"3.50.1"},"reference-count":57,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2023,3,17]],"date-time":"2023-03-17T00:00:00Z","timestamp":1679011200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,17]],"date-time":"2023-03-17T00:00:00Z","timestamp":1679011200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005004","name":"Ekonomiaren Garapen eta Lehiakortasun Saila, Eusko Jaurlaritza","doi-asserted-by":"publisher","award":["KK-2021\/00033 TREBEZIA"],"award-info":[{"award-number":["KK-2021\/00033 TREBEZIA"]}],"id":[{"id":"10.13039\/501100005004","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001872","name":"Centre for Industrial Technological Development","doi-asserted-by":"publisher","award":["CER-20211007"],"award-info":[{"award-number":["CER-20211007"]}],"id":[{"id":"10.13039\/501100001872","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int. J. Mach. Learn. &amp; Cyber."],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This work focuses on the operation of picking an object on a table with a mobile manipulator. We use deep reinforcement learning (DRL) to learn a positioning policy for the robot\u2019s base by considering the reachability constraints of the arm. This work extends our first proof-of-concept with the ultimate goal of validating the method on a real robot. Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is used to model the base controller, and is optimised using the feedback from the MoveIt! based arm planner. The idea is to encourage the base controller to position itself in areas where the arm reaches the object. Following a simulation-to-reality approach, first we create a realistic simulation of the robotic environment in Unity, and integrate it in Robot Operating System (ROS). The drivers for both the base and the arm are also implemented. The DRL-based agent is trained in simulation and, both the robot and target poses are randomised to make the learnt base controller robust to uncertainties. We propose a task-specific setup for TD3, which includes state\/action spaces, reward function and neural architectures. We compare the proposed method with the baseline work and show that the combination of TD3 and the proposed setup leads to a<jats:inline-formula><jats:alternatives><jats:tex-math>$$11\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>11<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>higher success rate than with the baseline, with an overall success rate of<jats:inline-formula><jats:alternatives><jats:tex-math>$$97\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>97<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>. Finally, the learnt agent is deployed and validated in the real robotic system where we obtain a promising success rate of<jats:inline-formula><jats:alternatives><jats:tex-math>$$75\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>75<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>.<\/jats:p>","DOI":"10.1007\/s13042-023-01815-8","type":"journal-article","created":{"date-parts":[[2023,3,17]],"date-time":"2023-03-17T03:02:48Z","timestamp":1679022168000},"page":"3003-3023","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Learning positioning policies for mobile manipulation operations with deep reinforcement learning"],"prefix":"10.1007","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2760-435X","authenticated-orcid":false,"given":"Ander","family":"Iriondo","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7653-6210","authenticated-orcid":false,"given":"Elena","family":"Lazkano","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9777-9564","authenticated-orcid":false,"given":"Ander","family":"Ansuategi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8550-5312","authenticated-orcid":false,"given":"Andoni","family":"Rivera","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9192-3879","authenticated-orcid":false,"given":"Iker","family":"Lluvia","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3763-5312","authenticated-orcid":false,"given":"Carlos","family":"Tub\u00edo","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,3,17]]},"reference":[{"issue":"2","key":"1815_CR1","doi-asserted-by":"publisher","first-page":"97","DOI":"10.3390\/machines10020097","volume":"10","author":"T Sandakalum","year":"2022","unstructured":"Sandakalum T, Ang MH Jr (2022) Motion planning for mobile manipulators-a systematic review. Machines 10(2):97. https:\/\/doi.org\/10.3390\/machines10020097","journal-title":"Machines"},{"key":"1815_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1613\/jair.3451","volume":"43","author":"F Stulp","year":"2012","unstructured":"Stulp F, Fedrizzi A, M\u00f6senlechner L et al (2012) Learning and reasoning with action-related places for robust mobile manipulation. J Artif Intell Res 43:1\u201342. https:\/\/doi.org\/10.1613\/jair.3451","journal-title":"J Artif Intell Res"},{"key":"1815_CR3","doi-asserted-by":"publisher","unstructured":"Kappler D, Pastor P, Kalakrishnan M, et\u00a0al (2015) Data-driven online decision making for autonomous manipulation. In: Robotics: science and systems, https:\/\/doi.org\/10.15607\/RSS.2015.XI.044","DOI":"10.15607\/RSS.2015.XI.044"},{"issue":"6","key":"1815_CR4","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1109\/MSP.2017.2743240","volume":"34","author":"K Arulkumaran","year":"2017","unstructured":"Arulkumaran K, Deisenroth MP, Brundage M et al (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26\u201338. https:\/\/doi.org\/10.1109\/MSP.2017.2743240","journal-title":"IEEE Signal Process Mag"},{"key":"1815_CR5","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2021.3105426","author":"X Yang","year":"2021","unstructured":"Yang X, Xu Y, Kuang L et al (2021) An information fusion approach to intelligent traffic signal control using the joint methods of multiagent reinforcement learning and artificial intelligence of things. IEEE Trans Intell Transp Syst. https:\/\/doi.org\/10.1109\/TITS.2021.3105426","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"2","key":"1815_CR6","doi-asserted-by":"publisher","first-page":"348","DOI":"10.3390\/app9020348","volume":"9","author":"A Iriondo","year":"2019","unstructured":"Iriondo A, Lazkano E, Susperregi L et al (2019) Pick and place operations in logistics using a mobile manipulator controlled with deep reinforcement learning. Appl Sci 9(2):348. https:\/\/doi.org\/10.3390\/app9020348","journal-title":"Appl Sci"},{"key":"1815_CR7","unstructured":"Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, PMLR, p 1587\u20131596, https:\/\/proceedings.mlr.press\/v80\/fujimoto18a.html"},{"key":"1815_CR8","unstructured":"Juliani A, Berges VP, Teng E, et\u00a0al (2018) Unity: a general platform for intelligent agents. arXiv preprint arXiv:1809.02627"},{"key":"1815_CR9","unstructured":"Quigley M, Conley K, Gerkey B, et\u00a0al (2009) Ros: an open-source robot operating system. In: ICRA workshop on open source software, Kobe, Japan, p\u00a05, http:\/\/robotics.stanford.edu\/~ang\/papers\/icraoss09-ROS.pdf"},{"key":"1815_CR10","unstructured":"Brockman G, Cheung V, Pettersson L, et\u00a0al (2016) Openai gym. arXiv preprint arXiv:1606.01540"},{"key":"1815_CR11","unstructured":"Siciliano B, Khatib O (2016) Springer handbook of robotics. Springer, https:\/\/link.springer.com\/content\/pdf\/10.1007%2F978-3-319-32552-1.pdf"},{"key":"1815_CR12","doi-asserted-by":"publisher","unstructured":"Marder-Eppstein E, Berger E, Foote T, et\u00a0al (2010) The office marathon: robust navigation in an indoor office environment. In: IEEE international conference on robotics and automation, IEEE, p 300\u2013307, https:\/\/doi.org\/10.1109\/ROBOT.2010.5509725","DOI":"10.1109\/ROBOT.2010.5509725"},{"key":"1815_CR13","doi-asserted-by":"publisher","unstructured":"Coleman D, Sucan I, Chitta S, et\u00a0al (2014) Reducing the barrier to entry of complex robotic software: a moveit! case study. arXiv preprint arXiv:1404.3785https:\/\/doi.org\/10.6092\/JOSER_2014_05_01_p3","DOI":"10.6092\/JOSER_2014_05_01_p3"},{"issue":"4","key":"1815_CR14","doi-asserted-by":"publisher","first-page":"172988141771858","DOI":"10.1177\/1729881417718588","volume":"14","author":"A D\u00f6mel","year":"2017","unstructured":"D\u00f6mel A, Kriegel S, Ka\u00dfecker M et al (2017) Toward fully autonomous mobile manipulation for industrial environments. Int J Adv Robot Syst 14(4):1729881417718588. https:\/\/doi.org\/10.1177\/1729881417718588","journal-title":"Int J Adv Robot Syst"},{"key":"1815_CR15","doi-asserted-by":"publisher","unstructured":"Xu J, Harada K, Wan W, et\u00a0al (2020) Planning an efficient and robust base sequence for a mobile manipulator performing multiple pick-and-place tasks. In: IEEE International Conference on Robotics and Automation (ICRA), IEEE, p. 11018\u201311024, https:\/\/doi.org\/10.1109\/ICRA40945.2020.9196999","DOI":"10.1109\/ICRA40945.2020.9196999"},{"key":"1815_CR16","doi-asserted-by":"crossref","unstructured":"Padois V, Fourquet JY, Chiron P (2006) From robotic arms to mobile manipulation: On coordinated motion schemes. In: Intelligent Production Machines and Systems. Elsevier, p 572\u2013577, https:\/\/hal.archives-ouvertes.fr\/hal-00624374\/file\/2006ACTI1475.pdf","DOI":"10.1016\/B978-008045157-2\/50100-0"},{"issue":"5","key":"1815_CR17","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1177\/0278364903022005004","volume":"22","author":"J Tan","year":"2003","unstructured":"Tan J, Xi N, Wang Y (2003) Integrated task planning and control for mobile manipulators. Int J Robot Res 22(5):337\u2013354. https:\/\/doi.org\/10.1177\/0278364903022005004","journal-title":"Int J Robot Res"},{"key":"1815_CR18","doi-asserted-by":"publisher","unstructured":"Berntorp K, Arz\u00e9n KE, Robertsson A (2012) Mobile manipulation with a kinematically redundant manipulator for a pick-and-place scenario. In: Control Applications (CCA), 2012 IEEE International Conference on, IEEE, p 1596\u20131602, https:\/\/doi.org\/10.1109\/CCA.2012.6402361","DOI":"10.1109\/CCA.2012.6402361"},{"key":"1815_CR19","doi-asserted-by":"publisher","unstructured":"Meeussen W, Wise M, Glaser S, et\u00a0al (2010) Autonomous door opening and plugging in with a personal robot. In: Robotics and Automation (ICRA), IEEE International Conference on, IEEE, p 729\u2013736, https:\/\/doi.org\/10.1109\/ROBOT.2010.5509556","DOI":"10.1109\/ROBOT.2010.5509556"},{"issue":"19","key":"1815_CR20","doi-asserted-by":"publisher","first-page":"6620","DOI":"10.3390\/s21196620","volume":"21","author":"A Ibarguren","year":"2021","unstructured":"Ibarguren A, Daelman P (2021) Path driven dual arm mobile co-manipulation architecture for large part manipulation in industrial environments. Sensors 21(19):6620. https:\/\/doi.org\/10.3390\/s21196620","journal-title":"Sensors"},{"issue":"5","key":"1815_CR21","doi-asserted-by":"publisher","first-page":"1121","DOI":"10.1109\/72.950141","volume":"12","author":"S Lin","year":"2001","unstructured":"Lin S, Goldenberg AA (2001) Neural-network control of mobile manipulators. IEEE Trans Neural Netw 12(5):1121\u20131133. https:\/\/doi.org\/10.1109\/72.950141","journal-title":"IEEE Trans Neural Netw"},{"key":"1815_CR22","doi-asserted-by":"publisher","unstructured":"Konidaris G, Kuindersma S, Grupen R, et\u00a0al (2011) Autonomous skill acquisition on a mobile manipulator. In: Twenty-Fifth AAAI Conference on Artificial Intelligence, https:\/\/doi.org\/10.1609\/aaai.v25i1.7982","DOI":"10.1609\/aaai.v25i1.7982"},{"issue":"4\u20135","key":"1815_CR23","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1177\/0278364920987859","volume":"40","author":"J Ibarz","year":"2021","unstructured":"Ibarz J, Tan J, Finn C et al (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res 40(4\u20135):698\u2013721. https:\/\/doi.org\/10.1177\/0278364920987859","journal-title":"Int J Robot Res"},{"key":"1815_CR24","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3027923","author":"MQ Mohammed","year":"2020","unstructured":"Mohammed MQ, Chung KL, Chyi CS (2020) Review of deep reinforcement learning-based object grasping: techniques, open challenges and recommendations. IEEE Access. https:\/\/doi.org\/10.1109\/ACCESS.2020.3027923","journal-title":"IEEE Access"},{"key":"1815_CR25","doi-asserted-by":"publisher","unstructured":"Hansen J, Hogan F, Rivkin D, et\u00a0al (2022) Visuotactile-rl: learning multimodal manipulation policies with deep reinforcement learning. In: 2022 International Conference on Robotics and Automation (ICRA), IEEE, p 8298\u20138304, https:\/\/doi.org\/10.1109\/ICRA46639.2022.9812019","DOI":"10.1109\/ICRA46639.2022.9812019"},{"issue":"5","key":"1815_CR26","doi-asserted-by":"publisher","first-page":"674","DOI":"10.26599\/TST.2021.9010012","volume":"26","author":"K Zhu","year":"2021","unstructured":"Zhu K, Zhang T (2021) Deep reinforcement learning based mobile robot navigation: a review. Tsinghua Sci Technol 26(5):674\u2013691. https:\/\/doi.org\/10.26599\/TST.2021.9010012","journal-title":"Tsinghua Sci Technol"},{"key":"1815_CR27","doi-asserted-by":"crossref","unstructured":"Haarnoja T, Ha S, Zhou A, et\u00a0al (2018) Learning to walk via deep reinforcement learning. arXiv preprint arXiv:1812.11103arXiv: org\/pdf\/1812.11103pdf","DOI":"10.15607\/RSS.2019.XV.011"},{"issue":"4","key":"1815_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3072959.3073602","volume":"36","author":"XB Peng","year":"2017","unstructured":"Peng XB, Berseth G, Yin K et al (2017) Deeploco: dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Trans Graphics (TOG) 36(4):1\u201313. https:\/\/doi.org\/10.1145\/3072959.3073602","journal-title":"ACM Trans Graphics (TOG)"},{"key":"1815_CR29","unstructured":"Kalashnikov D, Irpan A, Pastor P, et\u00a0al (2018) Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning, PMLR, p 651\u2013673, https:\/\/proceedings.mlr.press\/v87\/kalashnikov18a.html"},{"key":"1815_CR30","doi-asserted-by":"publisher","unstructured":"Jangir R, Aleny\u00e0 G, Torras C (2020) Dynamic cloth manipulation with deep reinforcement learning. In: IEEE International Conference on Robotics and Automation (ICRA), IEEE, p 4630\u20134636, https:\/\/doi.org\/10.1109\/ICRA40945.2020.9196659","DOI":"10.1109\/ICRA40945.2020.9196659"},{"key":"1815_CR31","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, et\u00a0al (2016) Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971"},{"issue":"2","key":"1815_CR32","doi-asserted-by":"publisher","first-page":"575","DOI":"10.3390\/app10020575","volume":"10","author":"M Kim","year":"2020","unstructured":"Kim M, Han DK, Park JH et al (2020) Motion planning of robot manipulators for a smoother path using a twin delayed deep deterministic policy gradient with hindsight experience replay. Appl Sci 10(2):575. https:\/\/doi.org\/10.3390\/app10020575","journal-title":"Appl Sci"},{"issue":"7","key":"1815_CR33","doi-asserted-by":"publisher","first-page":"627","DOI":"10.1177\/0278364906067174","volume":"25","author":"D Hsu","year":"2006","unstructured":"Hsu D, Latombe JC, Kurniawati H (2006) On the probabilistic foundations of probabilistic roadmap planning. Int J Robot Res 25(7):627\u2013643. https:\/\/doi.org\/10.1177\/0278364906067174","journal-title":"Int J Robot Res"},{"key":"1815_CR34","doi-asserted-by":"publisher","unstructured":"Tai L, Paolo G, Liu M (2017) Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, p 31\u201336, https:\/\/doi.org\/10.1109\/IROS.2017.8202134","DOI":"10.1109\/IROS.2017.8202134"},{"issue":"3","key":"1815_CR35","doi-asserted-by":"publisher","first-page":"2124","DOI":"10.1109\/TVT.2018.2890773","volume":"68","author":"C Wang","year":"2019","unstructured":"Wang C, Wang J, Shen Y et al (2019) Autonomous navigation of UAVs in large-scale complex environments: a deep reinforcement learning approach. IEEE Trans Veh Technol 68(3):2124\u20132136. https:\/\/doi.org\/10.1109\/TVT.2018.2890773","journal-title":"IEEE Trans Veh Technol"},{"key":"1815_CR36","doi-asserted-by":"publisher","unstructured":"Dankwa S, Zheng W (2019) Modeling a continuous locomotion behavior of an intelligent agent using deep reinforcement technique. In: IEEE 2nd International Conference on Computer and Communication Engineering Technology (CCET), p 172\u2013175, https:\/\/doi.org\/10.1109\/CCET48361.2019.8989177","DOI":"10.1109\/CCET48361.2019.8989177"},{"issue":"30","key":"1815_CR37","doi-asserted-by":"publisher","first-page":"2460","DOI":"10.17485\/IJST\/v14i30.1030","volume":"14","author":"P Khoi","year":"2021","unstructured":"Khoi P, Giang N, Tan H (2021) Control and simulation of a 6-DOF biped robot based on twin delayed deep deterministic policy gradient algorithm. Indian J Sci Technol 14(30):2460\u20132471. https:\/\/doi.org\/10.17485\/IJST\/v14i30.1030","journal-title":"Indian J Sci Technol"},{"key":"1815_CR38","unstructured":"Kindle J, Furrer F, Novkovic T, et\u00a0al (2020) Whole-body control of a mobile manipulator using end-to-end reinforcement learning. arXiv preprint arXiv:2003.02637"},{"issue":"3","key":"1815_CR39","doi-asserted-by":"publisher","first-page":"939","DOI":"10.3390\/s20030939","volume":"20","author":"C Wang","year":"2020","unstructured":"Wang C, Zhang Q, Tian Q et al (2020) Learning mobile manipulation through deep reinforcement learning. Sensors 20(3):939. https:\/\/doi.org\/10.3390\/s20030939","journal-title":"Sensors"},{"key":"1815_CR40","unstructured":"Schulman J, Wolski F, Dhariwal P, et\u00a0al (2017) Proximal policy optimization algorithms. p 1\u201312. 06347 arXiv preprint arXiv:1707.06347"},{"key":"1815_CR41","unstructured":"Bischof M (2018) ROS-SHARP. https:\/\/github.com\/siemens\/ros-sharp, Accessed 16 Jan 2023"},{"key":"1815_CR42","doi-asserted-by":"publisher","unstructured":"Qian W, Xia Z, Xiong J, et\u00a0al (2014) Manipulation task simulation using ROS and gazebo. In: IEEE International Conference on Robotics and Biomimetics (ROBIO 2014), IEEE, p 2594\u20132598, https:\/\/doi.org\/10.1109\/ROBIO.2014.7090732","DOI":"10.1109\/ROBIO.2014.7090732"},{"key":"1815_CR43","doi-asserted-by":"crossref","unstructured":"Chitta S, Marder-Eppstein E, Meeussen W, et al. (2017) rocontrol: a generic and simple control framework for ROS. The Journal of Open Source Software. DOIurlhttps:\/\/doi.org\/10.21105\/joss.00456","DOI":"10.21105\/joss.00456"},{"key":"1815_CR44","unstructured":"Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol\u00a01. MIT press Cambridge, https:\/\/web.stanford.edu\/class\/psych209\/Readings\/SuttonBartoIPRLBook2ndEd.pdf"},{"key":"1815_CR45","doi-asserted-by":"publisher","unstructured":"Foote T (2013) tf: The transform library. In: Technologies for Practical Robot Applications (TePRA), 2013 IEEE International Conference on, Open-Source Software workshop, p 1\u20136, https:\/\/doi.org\/10.1109\/TePRA.2013.6556373","DOI":"10.1109\/TePRA.2013.6556373"},{"key":"1815_CR46","unstructured":"Silver D, Lever G, Heess N, et\u00a0al (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, Vol. 32, ICML\u201914, p I-387-I-395, http:\/\/proceedings.mlr.press\/v32\/silver14.pdf"},{"issue":"7540","key":"1815_CR47","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533. https:\/\/doi.org\/10.1038\/nature14236","journal-title":"Nature"},{"issue":"1","key":"1815_CR48","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1007\/BF00115009","volume":"3","author":"RS Sutton","year":"1988","unstructured":"Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9\u201344. https:\/\/doi.org\/10.1007\/BF00115009","journal-title":"Mach Learn"},{"key":"1815_CR49","unstructured":"Hill A, Raffin A, Ernestus M, et\u00a0al (2018) Stable baselines. https:\/\/github.com\/hill-a\/stable-baselines"},{"issue":"5\u20136","key":"1815_CR50","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1016\/0925-2312(91)90023-5","volume":"2","author":"F Murtagh","year":"1991","unstructured":"Murtagh F (1991) Multilayer perceptrons for classification and regression. Neurocomputing 2(5\u20136):183\u2013197. https:\/\/doi.org\/10.1016\/0925-2312(91)90023-5","journal-title":"Neurocomputing"},{"key":"1815_CR51","unstructured":"Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, p 278\u2013287"},{"key":"1815_CR52","unstructured":"Chan SC, Fishman S, Canny J, et\u00a0al (2020) Measuring the reliability of reinforcement learning algorithms. In: International Conference on Learning Representations, Addis Ababa, Ethiopia, https:\/\/openreview.net\/pdf?id=SJlpYJBKvH"},{"issue":"3","key":"1815_CR53","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1002\/qre.1598","volume":"31","author":"M Riaz","year":"2015","unstructured":"Riaz M (2015) On enhanced interquartile range charting for process dispersion. Qual Reliab Eng Int 31(3):389\u2013398. https:\/\/doi.org\/10.1002\/qre.1598","journal-title":"Qual Reliab Eng Int"},{"issue":"2","key":"1815_CR54","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1111\/1468-0300.00091","volume":"31","author":"C Acerbi","year":"2002","unstructured":"Acerbi C, Tasche D (2002) Expected shortfall: a natural coherent alternative to value at risk. Econ Notes 31(2):379\u2013388. https:\/\/doi.org\/10.1111\/1468-0300.00091","journal-title":"Econ Notes"},{"issue":"01","key":"1815_CR55","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1142\/S0219024905002767","volume":"8","author":"A Chekhlov","year":"2005","unstructured":"Chekhlov A, Uryasev S, Zabarankin M (2005) Drawdown measure in portfolio optimization. Int J Theor Appl Financ 8(01):13\u201358. https:\/\/doi.org\/10.1142\/S0219024905002767","journal-title":"Int J Theor Appl Financ"},{"key":"1815_CR56","doi-asserted-by":"crossref","unstructured":"Fox D, Burgard W, Dellaert F, et\u00a0al (1999) Monte carlo localization: Efficient position estimation for mobile robots. AAAI\/IAAI (343-349):2\u20132. http:\/\/robots.stanford.edu\/papers\/fox.aaai99.pdf","DOI":"10.1109\/ROBOT.1999.772544"},{"issue":"1","key":"1815_CR57","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1109\/TRO.2006.889486","volume":"23","author":"G Grisetti","year":"2007","unstructured":"Grisetti G, Stachniss C, Burgard W (2007) Improved techniques for grid mapping with Rao-Blackwellized particle filters. IEEE Trans Robot 23(1):34\u201346. https:\/\/doi.org\/10.1109\/TRO.2006.889486","journal-title":"IEEE Trans Robot"}],"container-title":["International Journal of Machine Learning and Cybernetics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13042-023-01815-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13042-023-01815-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13042-023-01815-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T15:27:07Z","timestamp":1729092427000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13042-023-01815-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,17]]},"references-count":57,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["1815"],"URL":"https:\/\/doi.org\/10.1007\/s13042-023-01815-8","relation":{},"ISSN":["1868-8071","1868-808X"],"issn-type":[{"value":"1868-8071","type":"print"},{"value":"1868-808X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,17]]},"assertion":[{"value":"24 February 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 February 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 March 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}