{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T23:14:45Z","timestamp":1780355685248,"version":"3.54.1"},"reference-count":62,"publisher":"SAGE Publications","issue":"2-3","license":[{"start":{"date-parts":[[2018,7,25]],"date-time":"2018-07-25T00:00:00Z","timestamp":1532476800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2019,3]]},"abstract":"<jats:p>We present sequential windowed inverse reinforcement learning (SWIRL), a policy search algorithm that is a hybrid of exploration and demonstration paradigms for robot learning. We apply unsupervised learning to a small number of initial expert demonstrations to structure future autonomous exploration. SWIRL approximates a long time horizon task as a sequence of local reward functions and subtask transition conditions. Over this approximation, SWIRL applies Q-learning to compute a policy that maximizes rewards. Experiments suggest that SWIRL requires significantly fewer rollouts than pure reinforcement learning and fewer expert demonstrations than behavioral cloning to learn a policy. We evaluate SWIRL in two simulated control tasks, parallel parking and a two-link pendulum. On the parallel parking task, SWIRL achieves the maximum reward on the task with 85% fewer rollouts than Q-learning, and one-eight of demonstrations needed by behavioral cloning. We also consider physical experiments on surgical tensioning and cutting deformable sheets using a da Vinci surgical robot. On the deformable tensioning task, SWIRL achieves a 36% relative improvement in reward compared with a baseline of behavioral cloning with segmentation.<\/jats:p>","DOI":"10.1177\/0278364918784350","type":"journal-article","created":{"date-parts":[[2018,7,25]],"date-time":"2018-07-25T09:20:32Z","timestamp":1532510432000},"page":"126-145","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":51,"title":["SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards"],"prefix":"10.1177","volume":"38","author":[{"given":"Sanjay","family":"Krishnan","sequence":"first","affiliation":[{"name":"AUTOLAB, University of California, Berkeley, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Animesh","family":"Garg","sequence":"additional","affiliation":[{"name":"AUTOLAB, University of California, Berkeley, USA"},{"name":"Stanford University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Richard","family":"Liaw","sequence":"additional","affiliation":[{"name":"AUTOLAB, University of California, Berkeley, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Brijen","family":"Thananjeyan","sequence":"additional","affiliation":[{"name":"AUTOLAB, University of California, Berkeley, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lauren","family":"Miller","sequence":"additional","affiliation":[{"name":"AUTOLAB, University of California, Berkeley, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Florian T","family":"Pokorny","sequence":"additional","affiliation":[{"name":"RPL\/CSC, KTH Royal Institute of Technology, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ken","family":"Goldberg","sequence":"additional","affiliation":[{"name":"AUTOLAB, University of California, Berkeley, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2018,7,25]]},"reference":[{"key":"bibr1-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"bibr2-0278364918784350","first-page":"5074","volume-title":"NIPS\u201916 proceedings of the 30th international conference on neural information processing systems","author":"Agrawal P","year":"2016"},{"key":"bibr3-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2008.10.024"},{"key":"bibr4-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1142\/S0219843608001431"},{"key":"bibr5-0278364918784350","volume-title":"NIPS bounded optimality and rational metareasoning workshop","author":"Bacon PL","year":"2015"},{"key":"bibr6-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022140919877"},{"key":"bibr7-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.1995.478953"},{"key":"bibr8-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2008.02.009"},{"key":"bibr9-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1016\/j.cognition.2008.08.011"},{"key":"bibr10-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/JRA.1986.1087032"},{"key":"bibr11-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/URAI.2014.7057522"},{"key":"bibr12-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2004.1389828"},{"key":"bibr13-0278364918784350","first-page":"271","volume-title":"NIPS\u201992 proceedings of the 5th international conference on neural information processing systems","author":"Dayan P","year":"1992"},{"key":"bibr14-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1613\/jair.639"},{"key":"bibr15-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989324"},{"key":"bibr16-0278364918784350","first-page":"49","volume-title":"ICML\u201916 proceedings of the 33rd international conference on on machine learning","author":"Finn C","year":"2016"},{"key":"bibr17-0278364918784350","first-page":"243","volume-title":"ICML \u201902 proceedings of the nineteenth international conference on machine learning","volume":"2","author":"Hengst B","year":"2002"},{"key":"bibr18-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(97)00044-4"},{"key":"bibr19-0278364918784350","first-page":"1523","volume-title":"NIPS\u201902 proceedings of the 15th international conference on neural information processing systems","author":"Ijspeert A","year":"2002"},{"key":"bibr20-0278364918784350","first-page":"1890","volume-title":"AAAI\u201914 proceedings of the twenty-eighth AAAI conference on artificial intelligence","author":"Judah K","year":"2014"},{"key":"bibr21-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50028-9"},{"key":"bibr22-0278364918784350","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2010.VI.034"},{"key":"bibr23-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2014.6907809"},{"key":"bibr24-0278364918784350","first-page":"769","volume-title":"NIPS\u201907 proceedings of the 20th international conference on neural information processing systems","author":"Kolter JZ","year":"2007"},{"key":"bibr25-0278364918784350","first-page":"895","volume-title":"IJCAI\u201907 proceedings of the 20th international joint conference on artificial intelligence","author":"Konidaris G","year":"2007"},{"key":"bibr26-0278364918784350","volume-title":"Workshop on algorithmic foundations of robotics (WAFR)","author":"Krishnan S","year":"2016"},{"key":"bibr27-0278364918784350","volume-title":"International symposium of robotics research","author":"Krishnan S","year":"2015"},{"key":"bibr28-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/MRA.2010.936961"},{"key":"bibr29-0278364918784350","volume-title":"29th international conference on machine learning, ICML 2012","author":"Kulis B","year":"2012"},{"key":"bibr30-0278364918784350","first-page":"3675","volume-title":"NIPS\u201916 proceedings of the 30th international conference on neural information processing systems","author":"Kulkarni TD","year":"2016"},{"key":"bibr31-0278364918784350","first-page":"143","volume-title":"1st conference on robot learning (CoRL)","author":"Laskey M","year":"2017"},{"key":"bibr32-0278364918784350","first-page":"173","volume-title":"2016 International Symposium on Experimental Robotics. ISER 2016. Springer Proceedings in Advanced Robotics","volume":"1","author":"Levine S","year":"2016"},{"key":"bibr33-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/HUMANOIDS.2015.7363584"},{"key":"bibr34-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2015.07.005"},{"key":"bibr35-0278364918784350","first-page":"536","volume-title":"NIPS\u201998 proceedings of the 11th international conference on neural information processing systems","author":"Mika S","year":"1998"},{"key":"bibr36-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"bibr37-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.2000.0897"},{"key":"bibr38-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1007\/BF00318086"},{"key":"bibr39-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2016.7487607"},{"key":"bibr40-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2015.7139344"},{"key":"bibr41-0278364918784350","first-page":"663","volume-title":"ICML \u201900 proceedings of the seventeenth international conference on machine learning","author":"Ng AY","year":"2000"},{"key":"bibr42-0278364918784350","first-page":"278","volume-title":"ICML \u201999 proceedings of the sixteenth international conference on machine learning","author":"Ng AY","year":"1999"},{"key":"bibr43-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386006"},{"key":"bibr44-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1561\/2300000053"},{"key":"bibr45-0278364918784350","first-page":"1043","volume-title":"NIPS\u201997 proceedings of the 10th international conference on neural information processing systems","author":"Parr R","year":"1997"},{"key":"bibr46-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2009.5152385"},{"key":"bibr47-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2016.7487517"},{"key":"bibr48-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_1"},{"key":"bibr49-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2015.7353414"},{"key":"bibr50-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1003779"},{"key":"bibr51-0278364918784350","first-page":"01703","author":"Stadie BC","year":"2017","journal-title":"arXiv"},{"key":"bibr52-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1007\/s002210050606"},{"key":"bibr53-0278364918784350","volume-title":"Reinforcement Learning: An Introduction","volume":"1","author":"Sutton RS","year":"1998"},{"key":"bibr54-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"bibr55-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2016.2517825"},{"key":"bibr56-0278364918784350","volume-title":"Proceedings of the 1993 connectionist models summer school","author":"Thrun S","year":"1993"},{"key":"bibr57-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2012.2185694"},{"key":"bibr58-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1037\/0096-1523.11.6.828"},{"key":"bibr59-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-7687.2006.00535.x"},{"key":"bibr60-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1162\/jocn_a_00078"},{"key":"bibr61-0278364918784350","doi-asserted-by":"publisher","DOI":"10.1145\/2166966.2166968"},{"key":"bibr62-0278364918784350","first-page":"1433","volume-title":"AAAI\u201908 proceedings of the 23rd national conference on artificial intelligence","volume":"3","author":"Ziebart BD","year":"2008"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364918784350","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0278364918784350","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364918784350","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:15:59Z","timestamp":1777457759000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0278364918784350"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,25]]},"references-count":62,"journal-issue":{"issue":"2-3","published-print":{"date-parts":[[2019,3]]}},"alternative-id":["10.1177\/0278364918784350"],"URL":"https:\/\/doi.org\/10.1177\/0278364918784350","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,7,25]]}}}