{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T15:59:37Z","timestamp":1778687977303,"version":"3.51.4"},"reference-count":27,"publisher":"World Scientific Pub Co Pte Ltd","issue":"03n04","funder":[{"DOI":"10.13039\/501100000196","name":"Canada Foundation for Innovation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000196","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000024","name":"Canadian Institutes of Health Research","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000024","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Alberta Jobs, Economy and Innovation Ministry's Major Initiatives Fund to the Center for Autonomous Systems in Strengthening Future Communities"},{"DOI":"10.13039\/501100004543","name":"China Scholarship Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004543","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Med. Robot. Res."],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:p> Soft-tissue needle steering, where a deformable needle is inserted into the tissue to guide its tip to a desired position, is a common minimally invasive surgery (MIS) procedure. The diverse types of needles and complex tissue dynamics limit the use of existing approaches that utilize models of the needle and the tissue for automating the task. In this work, we employ a data-driven approach using deep reinforcement learning (DRL) to achieve autonomous needle steering by viewing it as a multi-goal reinforcement learning problem. Human interventions are incorporated during training to accelerate learning and reduce catastrophic failures. Generative adversarial imitation learning (GAIL) is combined with regular DRL by utilizing a hindsight relabeling scheme for human interventions to encourage the agent to imitate human behavior. To emulate the sim-to-real process, an agent is first trained in a simplistic simulation environment for needle steering and then transferred to a sophisticated one considered as the real world with fine-tuning (sim-to-sim). Experimental results show that with human interventions, the proposed method outperforms the other compared DRL approaches and can achieve good performance with only 2,000 training steps in the complex simulation environment, achieving an average return comparable to that of a 55,000-step agent trained from scratch. <\/jats:p>","DOI":"10.1142\/s2424905x24400105","type":"journal-article","created":{"date-parts":[[2024,6,10]],"date-time":"2024-06-10T07:28:26Z","timestamp":1718004506000},"source":"Crossref","is-referenced-by-count":1,"title":["Autonomous Soft-Tissue Needle Steering Using Reinforcement Learning Guided by Human Input"],"prefix":"10.1142","volume":"09","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2989-9991","authenticated-orcid":false,"given":"Yafei","family":"Ou","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Alberta, 9211-116 Street NW, Edmonton, AB, T6G 1H9, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7427-6961","authenticated-orcid":false,"given":"Mahdi","family":"Tavakoli","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Alberta, 9211-116 Street NW, Edmonton, AB, T6G 1H9, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2024,7,24]]},"reference":[{"key":"S2424905X24400105BIB001","doi-asserted-by":"publisher","DOI":"10.1016\/j.conengprac.2017.03.004"},{"key":"S2424905X24400105BIB002","doi-asserted-by":"publisher","DOI":"10.1177\/0278364906065388"},{"key":"S2424905X24400105BIB003","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2013.2271098"},{"key":"S2424905X24400105BIB004","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2018.11.018"},{"key":"S2424905X24400105BIB005","doi-asserted-by":"publisher","DOI":"10.1142\/S2424905X18420047"},{"key":"S2424905X24400105BIB006","doi-asserted-by":"publisher","DOI":"10.1109\/TRA.2003.817044"},{"key":"S2424905X24400105BIB007","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2011.6094969"},{"key":"S2424905X24400105BIB008","doi-asserted-by":"publisher","DOI":"10.1142\/S2424905X16400079"},{"key":"S2424905X24400105BIB009","doi-asserted-by":"publisher","DOI":"10.1007\/s10439-014-1203-5"},{"key":"S2424905X24400105BIB010","doi-asserted-by":"publisher","DOI":"10.1007\/s11517-016-1599-1"},{"key":"S2424905X24400105BIB011","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2016.7759365"},{"key":"S2424905X24400105BIB012","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561673"},{"key":"S2424905X24400105BIB013","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3056057"},{"key":"S2424905X24400105BIB014","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561177"},{"key":"S2424905X24400105BIB015","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2023.3254860"},{"key":"S2424905X24400105BIB016","doi-asserted-by":"publisher","DOI":"10.1007\/s11548-019-02098-7"},{"key":"S2424905X24400105BIB017","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9981164"},{"key":"S2424905X24400105BIB018","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01117"},{"key":"S2424905X24400105BIB019","first-page":"262","volume-title":"Conf. Robot Learning","author":"Rusu A. A.","year":"2017"},{"key":"S2424905X24400105BIB021","first-page":"410","volume-title":"Conf. Robot Learning","author":"Wang F.","year":"2018"},{"key":"S2424905X24400105BIB022","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2022.05.017"},{"key":"S2424905X24400105BIB023","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3177685"},{"key":"S2424905X24400105BIB025","doi-asserted-by":"publisher","DOI":"10.1109\/ISMR57123.2023.10130214"},{"key":"S2424905X24400105BIB026","doi-asserted-by":"publisher","DOI":"10.1142\/S2424905X23400044"},{"key":"S2424905X24400105BIB029","volume":"30","author":"Andrychowicz M.","year":"2017","journal-title":"Adv. Neural Inform. Process. Systems"},{"key":"S2424905X24400105BIB030","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2016.2527065"},{"key":"S2424905X24400105BIB031","volume":"32","author":"Ding Y.","year":"2019","journal-title":"Adv. Neural Inform. Process. Syst."}],"container-title":["Journal of Medical Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S2424905X24400105","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,14]],"date-time":"2024-11-14T01:07:12Z","timestamp":1731546432000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S2424905X24400105"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,24]]},"references-count":27,"journal-issue":{"issue":"03n04","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["10.1142\/S2424905X24400105"],"URL":"https:\/\/doi.org\/10.1142\/s2424905x24400105","relation":{},"ISSN":["2424-905X","2424-9068"],"issn-type":[{"value":"2424-905X","type":"print"},{"value":"2424-9068","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,24]]},"article-number":"2440010"}}