{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T05:28:57Z","timestamp":1776922137357,"version":"3.51.2"},"reference-count":79,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,1,18]],"date-time":"2025-01-18T00:00:00Z","timestamp":1737158400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,18]],"date-time":"2025-01-18T00:00:00Z","timestamp":1737158400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100005825","name":"National Institute of Food and Agriculture","doi-asserted-by":"publisher","award":["2022-67021-37868"],"award-info":[{"award-number":["2022-67021-37868"]}],"id":[{"id":"10.13039\/100005825","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100005825","name":"National Institute of Food and Agriculture","doi-asserted-by":"publisher","award":["2022-67021-37868"],"award-info":[{"award-number":["2022-67021-37868"]}],"id":[{"id":"10.13039\/100005825","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Robot"],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce <jats:bold>V<\/jats:bold>isual <jats:bold>I<\/jats:bold>mitation l<jats:bold>E<\/jats:bold>arning with <jats:bold>W<\/jats:bold>aypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator\u2019s intent, employing an agent-agnostic reward function for feedback on the robot\u2019s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30\u00a0min, with fewer than 20 real-world rollouts. Code and videos here: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/collab.me.vt.edu\/view\/\" ext-link-type=\"uri\">https:\/\/collab.me.vt.edu\/view\/<\/jats:ext-link>\n          <\/jats:p>","DOI":"10.1007\/s10514-024-10188-y","type":"journal-article","created":{"date-parts":[[2025,1,18]],"date-time":"2025-01-18T07:38:55Z","timestamp":1737185935000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["View: visual imitation learning with waypoints"],"prefix":"10.1007","volume":"49","author":[{"given":"Ananth","family":"Jonnavittula","sequence":"first","affiliation":[]},{"given":"Sagar","family":"Parekh","sequence":"additional","affiliation":[]},{"given":"Dylan","family":"P. Losey","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,18]]},"reference":[{"key":"10188_CR1","doi-asserted-by":"crossref","unstructured":"Alakuijala, M., Dulac-Arnold, G., Mairal, J., Ponce, J., & Schmid, C. (2023). Learning reward functions for robotic manipulation by observing humans. In IEEE international conference on robotics and automation (pp. 5006\u20135012).","DOI":"10.1109\/ICRA48891.2023.10161178"},{"key":"10188_CR2","unstructured":"Amiranashvili, A., Dorka, N., Burgard, W., Koltun, V., & Brox, T. (2020). Scaling imitation learning in minecraft. arXiv preprint arXiv:2007.02701"},{"key":"10188_CR3","doi-asserted-by":"crossref","unstructured":"Bahl, S., Gupta, A., & Pathak, D. (2022) Human-to-robot imitation in the wild. In Robotics: Science and systems.","DOI":"10.15607\/RSS.2022.XVIII.026"},{"key":"10188_CR4","unstructured":"Brown, D., Goo, W., Nagarajan, P., & Niekum, S. (2019). Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In International conference on machine learning (pp. 783\u2013792)."},{"key":"10188_CR5","unstructured":"Brown, D. S., Goo, W., & Niekum, S. (2020). Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Conference on robot learning (pp. 330\u2013359)."},{"key":"10188_CR6","doi-asserted-by":"crossref","unstructured":"Caba\u00a0Heilbron, F., Escorcia, V., Ghanem, B., & Carlos\u00a0Niebles, J. (2015) ActivityNet: A large-scale video benchmark for human activity understanding. In IEEE conference on computer vision and pattern recognition (pp. 961\u2013970).","DOI":"10.1109\/CVPR.2015.7298698"},{"issue":"3","key":"10188_CR7","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1177\/0278364917700714","volume":"36","author":"B Calli","year":"2017","unstructured":"Calli, B., Singh, A., Bruce, J., Walsman, A., Konolige, K., Srinivasa, S., Abbeel, P., & Dollar, A. M. (2017). Yale-CMU-Berkeley dataset for robotic manipulation research. The International Journal of Robotics Research, 36(3), 261\u2013268.","journal-title":"The International Journal of Robotics Research"},{"key":"10188_CR8","unstructured":"Cetin, E., & Celiktutan, O. (2021). Domain-robust visual imitation learning with mutual information constraints. In International conference on learning representations."},{"key":"10188_CR9","doi-asserted-by":"crossref","unstructured":"Chane-Sane, E., Schmid, C., & Laptev, I. (2023) Learning video-conditioned policies for unseen manipulation tasks. In International conference on robotics and automation (pp. 909\u2013916).","DOI":"10.1109\/ICRA48891.2023.10161336"},{"key":"10188_CR10","doi-asserted-by":"crossref","unstructured":"Chen, J., Yuan, B., & Tomizuka, M. (2019) Deep imitation learning for autonomous driving in generic urban scenarios with enhanced safety. In IEEE\/RSJ International conference on intelligent robots and systems (pp. 2884\u20132890).","DOI":"10.1109\/IROS40897.2019.8968225"},{"key":"10188_CR11","doi-asserted-by":"crossref","unstructured":"Ch\u00e9ron, G., Laptev, I., & Schmid, C. (2015) P-CNN: Pose-based CNN features for action recognition. In IEEE international conference on computer vision (pp. 3218\u20133226).","DOI":"10.1109\/ICCV.2015.368"},{"key":"10188_CR12","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1016\/j.mechmachtheory.2015.03.004","volume":"92","author":"JS Dai","year":"2015","unstructured":"Dai, J. S. (2015). Euler-Rodrigues formula variations, quaternion conjugation and intrinsic connections. Mechanism and Machine Theory, 92, 144\u2013152.","journal-title":"Mechanism and Machine Theory"},{"key":"10188_CR13","doi-asserted-by":"crossref","unstructured":"Das, P., Xu, C., Doell, R. F., & Corso, J. J. (2013). A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In IEEE conference on computer vision and pattern recognition (pp. 2634\u20132641).","DOI":"10.1109\/CVPR.2013.340"},{"key":"10188_CR14","unstructured":"Duan, J., Wang, Y. R., Shridhar, M., Fox, D., & Krishna, R. (2023) AR2-D2: Training a robot without a robot. arXiv preprint arXiv:2306.13818"},{"issue":"9","key":"10188_CR15","doi-asserted-by":"publisher","first-page":"2419","DOI":"10.1007\/s10994-021-05961-4","volume":"110","author":"G Dulac-Arnold","year":"2021","unstructured":"Dulac-Arnold, G., Levine, N., Mankowitz, D. J., Li, J., Paduraru, C., Gowal, S., & Hester, T. (2021). Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis. Machine Learning, 110(9), 2419\u20132468.","journal-title":"Machine Learning"},{"key":"10188_CR16","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1007\/s41315-019-00103-5","volume":"3","author":"B Fang","year":"2019","unstructured":"Fang, B., Jia, S., Guo, D., Xu, M., Wen, S., & Sun, F. (2019). Survey of imitation learning for robotic manipulation. International Journal of Intelligent Robotics and Applications, 3, 362\u2013369.","journal-title":"International Journal of Intelligent Robotics and Applications"},{"key":"10188_CR17","doi-asserted-by":"crossref","unstructured":"Fontaine, M. C., Togelius, J., Nikolaidis, S., & Hoover, A. K. (2020). Covariance matrix adaptation for the rapid illumination of behavior space. In Genetic and evolutionary computation conference (pp. 94\u2013102).","DOI":"10.1145\/3377930.3390232"},{"key":"10188_CR18","doi-asserted-by":"crossref","unstructured":"Gouda, A., Ghanem, A., & Reining, C. (2022). DoPose-6D dataset for object segmentation and 6D pose estimation. In IEEE international conference on machine learning and applications (pp. 477\u2013483).","DOI":"10.1109\/ICMLA55696.2022.00077"},{"key":"10188_CR19","unstructured":"Gouda, A., & Roidl, M. (2023) DoUnseen: Zero-shot object detection for robotic grasping. arXiv preprint arXiv:2304.02833"},{"key":"10188_CR20","doi-asserted-by":"crossref","unstructured":"Goyal, R., Ebrahimi\u00a0Kahou, S., Michalski, V., Materzynska, J., Westphal, S., Kim, H., Haenel, V., Fruend, I., Yianilos, P., & Mueller-Freitag, M. (2017) The\" something something\" video database for learning and evaluating visual common sense. In IEEE international conference on computer vision (pp. 5842\u20135850).","DOI":"10.1109\/ICCV.2017.622"},{"issue":"4","key":"10188_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3526107","volume":"11","author":"S Habibian","year":"2022","unstructured":"Habibian, S., Jonnavittula, A., & Losey, D. P. (2022). Here\u2019s what I\u2019ve learned: Asking questions that reveal reward learning. ACM Transactions on Human-Robot Interaction (THRI), 11(4), 1\u201328.","journal-title":"ACM Transactions on Human-Robot Interaction (THRI)"},{"key":"10188_CR22","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., & Girshick, R. (2017). Mask R-CNN. In IEEE international conference on computer vision (pp. 2961\u20132969).","DOI":"10.1109\/ICCV.2017.322"},{"issue":"2","key":"10188_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3054912","volume":"50","author":"A Hussein","year":"2017","unstructured":"Hussein, A., Gaber, M. M., Elyan, E., & Jayne, C. (2017). Imitation learning: A survey of learning methods. ACM Computing Surveys, 50(2), 1\u201335.","journal-title":"ACM Computing Surveys"},{"key":"10188_CR24","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017) Image-to-image translation with conditional adversarial networks. In IEEE conference on computer vision and pattern recognition (pp. 1125\u20131134).","DOI":"10.1109\/CVPR.2017.632"},{"key":"10188_CR25","doi-asserted-by":"crossref","unstructured":"Jain, V., Attarian, M., Joshi, N. J., Wahid, A., Driess, D., Vuong, Q., Sanketi, P. R., Sermanet, P., Welker, S., Chan, C., et\u00a0al. (2024). Vid2robot: End-to-end video-conditioned policy learning with cross-attention transformers. arXiv preprint arXiv:2403.12943","DOI":"10.15607\/RSS.2024.XX.052"},{"key":"10188_CR26","doi-asserted-by":"crossref","unstructured":"Jin, J., Petrich, L., Dehghan, M., & Jagersand, M. (2020) A geometric perspective on visual imitation learning. In IEEE\/RSJ international conference on intelligent robots and systems (pp. 5194\u20135200).","DOI":"10.1109\/IROS45743.2020.9341758"},{"key":"10188_CR27","doi-asserted-by":"crossref","unstructured":"Jonnavittula, A., & Losey, D. P. (2021). I know what you meant: Learning human objectives by (under) estimating their choice set. In IEEE international conference on robotics and automation (pp. 2747\u20132753).","DOI":"10.1109\/ICRA48506.2021.9562048"},{"key":"10188_CR28","doi-asserted-by":"crossref","unstructured":"Jonnavittula, A., & Losey, D. P. (2021). Learning to share autonomy across repeated interaction. In IEEE\/RSJ international conference on intelligent robots and systems (pp. 1851\u20131858).","DOI":"10.1109\/IROS51168.2021.9636748"},{"issue":"2","key":"10188_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3651994","volume":"13","author":"A Jonnavittula","year":"2024","unstructured":"Jonnavittula, A., Mehta, S. A., & Losey, D. P. (2024). SARI: Shared autonomy across repeated interaction. ACM Transactions on Human-Robot Interaction, 13(2), 1\u201336.","journal-title":"ACM Transactions on Human-Robot Interaction"},{"key":"10188_CR30","doi-asserted-by":"crossref","unstructured":"Kelly, M., Sidrane, C., Driggs-Campbell, K., & Kochenderfer, M. J. (2019) HG-DAgger: Interactive imitation learning with human experts. In IEEE international conference on robotics and automation (pp. 8077\u20138083).","DOI":"10.1109\/ICRA.2019.8793698"},{"key":"10188_CR31","unstructured":"Kim, M. J., Wu, J., & Finn, C. (2023). Giving robots a hand: Learning generalizable manipulation with eye-in-hand human video demonstrations. arXiv preprint arXiv:2307.05959"},{"issue":"11","key":"10188_CR32","doi-asserted-by":"publisher","first-page":"1238","DOI":"10.1177\/0278364913495721","volume":"32","author":"J Kober","year":"2013","unstructured":"Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238\u20131274.","journal-title":"The International Journal of Robotics Research"},{"key":"10188_CR33","unstructured":"Lee, R., Abou-Chakra, J., Zhang, F., & Corke, P. (2022). Learning fabric manipulation in the real world with human videos. arXiv preprint arXiv:2211.02832"},{"key":"10188_CR34","doi-asserted-by":"crossref","unstructured":"Lee, S., Oh, S. W., Won, D., & Kim, S. J. (2019). Copy-and-paste networks for deep video inpainting. In IEEE\/CVF international conference on computer vision (pp 4413\u20134421).","DOI":"10.1109\/ICCV.2019.00451"},{"key":"10188_CR35","unstructured":"Li, J., Lu, T., Cao, X., Cai, Y., & Wang, S. (2021). Meta-imitation learning by watching video demonstrations. In International conference on learning representations."},{"key":"10188_CR36","doi-asserted-by":"crossref","unstructured":"Liu, P., Orru, Y., Paxton, C., Shafiullah, N. M. M., & Pinto, L. (2024) OK-Robot: What really matters in integrating open-knowledge models for robotics. arXiv preprint arXiv:2401.12202","DOI":"10.15607\/RSS.2024.XX.091"},{"key":"10188_CR37","doi-asserted-by":"crossref","unstructured":"Liu, Y., Gupta, A., Abbeel, P., & Levine, S. (2018) Imitation from observation: Learning to imitate behaviors from raw video via context translation. In IEEE International conference on robotics and automation (pp. 1118\u20131125).","DOI":"10.1109\/ICRA.2018.8462901"},{"key":"10188_CR38","doi-asserted-by":"crossref","unstructured":"Lynch, C., & Sermanet, P. (2020). Language conditioned imitation learning over unstructured data. In Robotics: Science and systems.","DOI":"10.15607\/RSS.2021.XVII.047"},{"key":"10188_CR39","doi-asserted-by":"crossref","unstructured":"Mehta, S. A., Habibian, S., & Losey, D. P. (2024). Waypoint-based reinforcement learning for robot manipulation tasks. arXiv preprint arXiv:2403.13281","DOI":"10.1109\/IROS58592.2024.10802681"},{"issue":"3","key":"10188_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3623384","volume":"13","author":"SA Mehta","year":"2023","unstructured":"Mehta, S. A., & Losey, D. P. (2023). Unified learning from demonstrations, corrections, and preferences during physical human-robot interaction. ACM Transactions on Human-Robot Interaction, 13(3), 1\u201325.","journal-title":"ACM Transactions on Human-Robot Interaction"},{"key":"10188_CR41","doi-asserted-by":"crossref","unstructured":"Menda, K., Driggs-Campbell, K., & Kochenderfer, M. J. (2019). EnsembleDAgger: A Bayesian approach to safe imitation learning. In: IEEE\/RSJ international conference on intelligent robots and systems (pp. 5041\u20135048).","DOI":"10.1109\/IROS40897.2019.8968287"},{"issue":"12","key":"10188_CR42","doi-asserted-by":"publisher","first-page":"9434","DOI":"10.1109\/TPAMI.2021.3126682","volume":"44","author":"M Monfort","year":"2021","unstructured":"Monfort, M., Pan, B., Ramakrishnan, K., Andonian, A., McNamara, B. A., Lascelles, A., Fan, Q., Gutfreund, D., Feris, R. S., & Oliva, A. (2021). Multi-moments in time: Learning and interpreting models for multi-action video understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 9434\u20139445.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"5","key":"10188_CR43","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1007\/s11370-021-00398-z","volume":"14","author":"EF Morales","year":"2021","unstructured":"Morales, E. F., Murrieta-Cid, R., Becerra, I., & Esquivel-Basaldua, M. A. (2021). A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning. Intelligent Service Robotics, 14(5), 773\u2013805.","journal-title":"Intelligent Service Robotics"},{"key":"10188_CR44","doi-asserted-by":"publisher","first-page":"435","DOI":"10.1007\/s10707-013-0184-0","volume":"18","author":"J Muckell","year":"2014","unstructured":"Muckell, J., Olsen, P. W., Hwang, J. H., Lawson, C. T., & Ravi, S. (2014). Compression of trajectory data: A comprehensive evaluation and new approach. GeoInformatica, 18, 435\u2013460.","journal-title":"GeoInformatica"},{"key":"10188_CR45","unstructured":"Padalkar, A., Pooley, A., Jain, A., Bewley, A., Herzog, A., Irpan, A., Khazatsky, A., Rai, A., Singh, A., Brohan, A., et\u00a0al. (2023). Open X-embodiment: Robotic learning datasets and RT-X models. arXiv preprint arXiv:2310.08864"},{"issue":"2\u20133","key":"10188_CR46","doi-asserted-by":"publisher","first-page":"286","DOI":"10.1177\/0278364919880273","volume":"39","author":"Y Pan","year":"2020","unstructured":"Pan, Y., Cheng, C. A., Saigol, K., Lee, K., Yan, X., Theodorou, E. A., & Boots, B. (2020). Imitation learning for agile autonomous driving. The International Journal of Robotics Research, 39(2\u20133), 286\u2013302.","journal-title":"The International Journal of Robotics Research"},{"key":"10188_CR47","doi-asserted-by":"crossref","unstructured":"Pari, J., Shafiullah, N. M., Arunachalam, S. P., & Pinto, L. (2021). The surprising effectiveness of representation learning for visual imitation. In Robotics: Science and systems.","DOI":"10.15607\/RSS.2022.XVIII.010"},{"key":"10188_CR48","unstructured":"Patel, A., Wang, A., Radosavovic, I., & Malik, J. (2022). Learning to imitate object interactions from internet videos. arXiv preprint arXiv:2211.13225"},{"issue":"1","key":"10188_CR49","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1162\/neco.1991.3.1.88","volume":"3","author":"DA Pomerleau","year":"1991","unstructured":"Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1), 88\u201397.","journal-title":"Neural Computation"},{"key":"10188_CR50","first-page":"3016","volume":"34","author":"R Rafailov","year":"2021","unstructured":"Rafailov, R., Yu, T., Rajeswaran, A., & Finn, C. (2021). Visual adversarial imitation learning using variational models. Advances in Neural Information Processing Systems, 34, 3016\u20133028.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10188_CR51","doi-asserted-by":"crossref","unstructured":"Ratliff, N., Bagnell, J. A., & Srinivasa, S. S. (2007). Imitation learning for locomotion and manipulation. In IEEE-RAS international conference on humanoid robots (pp. 392\u2013397).","DOI":"10.1109\/ICHR.2007.4813899"},{"issue":"6","key":"10188_CR52","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2016","unstructured":"Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137\u20131149.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"6","key":"10188_CR53","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1145\/3130800.3130883","volume":"36","author":"J Romero","year":"2017","unstructured":"Romero, J., Tzionas, D., & Black, M. J. (2017). Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, 36(6), 245.","journal-title":"ACM Transactions on Graphics"},{"key":"10188_CR54","doi-asserted-by":"crossref","unstructured":"Rong, Y., Shiratori, T., & Joo, H. (2021). FrankMocap: A monocular 3d whole-body pose estimation system via regression and integration. In IEEE international conference on computer vision workshops (pp. 1749\u20131759).","DOI":"10.1109\/ICCVW54120.2021.00201"},{"key":"10188_CR55","unstructured":"Ross, S., Gordon, G., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In International conference on artificial intelligence and statistics (pp. 627\u2013635)."},{"key":"10188_CR56","first-page":"1040","volume":"9","author":"S Schaal","year":"1996","unstructured":"Schaal, S. (1996). Learning from demonstration. Advances in Neural Information Processing Systems, 9, 1040\u20131046.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10188_CR57","unstructured":"Sch\u00e4fer, L., Jones, L., Kanervisto, A., Cao, Y., Rashid, T., Georgescu, R., Bignell, D., Sen, S., Gavito, A. T., & Devlin, S. (2023). Visual encoders for data-efficient imitation learning in modern video games. arXiv preprint arXiv:2312.02312"},{"key":"10188_CR58","unstructured":"Scheller, C., Schraner, Y., & Vogel, M. (2020) Sample efficient reinforcement learning through learning from demonstrations in minecraft. In NeurIPS competition and demonstration track (pp. 67\u201376)."},{"key":"10188_CR59","doi-asserted-by":"crossref","unstructured":"Sermanet, P., Xu, K., & Levine, S. (2017). Unsupervised perceptual rewards for imitation learning. In Robotics: Science and systems.","DOI":"10.15607\/RSS.2017.XIII.050"},{"key":"10188_CR60","unstructured":"Shafiullah, N. M. M., Rai, A., Etukuru, H., Liu, Y., Misra, I., Chintala, S., & Pinto, L. (2023). On bringing robots home. arXiv preprint arXiv:2311.16098"},{"key":"10188_CR61","doi-asserted-by":"crossref","unstructured":"Shan, D., Geng, J., Shu, M., & Fouhey, D. F. (2020) Understanding human hands in contact at internet scale. In IEEE\/CVF conference on computer vision and pattern recognition (pp. 9869\u20139878).","DOI":"10.1109\/CVPR42600.2020.00989"},{"key":"10188_CR62","unstructured":"Sharma, P., Pathak, D., & Gupta, A. (2019). Third-person visual imitation learning via decoupled hierarchical controller. In Advances in neural information processing systems (vol.\u00a032)."},{"issue":"4","key":"10188_CR63","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1177\/02783649241227559","volume":"43","author":"K Shaw","year":"2024","unstructured":"Shaw, K., Bahl, S., Sivakumar, A., Kannan, A., & Pathak, D. (2024). Learning dexterity from human hand motion in internet videos. The International Journal of Robotics Research, 43(4), 513\u2013532.","journal-title":"The International Journal of Robotics Research"},{"key":"10188_CR64","doi-asserted-by":"crossref","unstructured":"Shi, L. X., Hu, Z., Zhao, T. Z., Sharma, A., Pertsch, K., Luo, J., Levine, S., & Finn, C. (2024) Yell at your robot: Improving on-the-fly from language corrections. arXiv preprint arXiv:2403.12910","DOI":"10.15607\/RSS.2024.XX.025"},{"key":"10188_CR65","unstructured":"Shi, L. X., Sharma, A., Zhao, T. Z., & Finn, C. (2023) Waypoint-based imitation learning for robotic manipulation. In Conference on Robot learning"},{"key":"10188_CR66","unstructured":"Sieb, M., Xian, Z., Huang, A., Kroemer, O., & Fragkiadaki, K. (2020) Graph-structured visual imitation. In Conference on Robot learning (pp. 979\u2013989)."},{"key":"10188_CR67","doi-asserted-by":"crossref","unstructured":"Smith, L., Dhawan, N., Zhang, M., Abbeel, P., & Levine, S. (2020) AVID: Learning multi-stage tasks via pixel-level translation of human videos. In Robotics: Science and systems.","DOI":"10.15607\/RSS.2020.XVI.024"},{"key":"10188_CR68","unstructured":"Snoek, J., Larochelle, H., Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (vol.\u00a025)."},{"issue":"3","key":"10188_CR69","doi-asserted-by":"publisher","first-page":"4978","DOI":"10.1109\/LRA.2020.3004787","volume":"5","author":"S Song","year":"2020","unstructured":"Song, S., Zeng, A., Lee, J., & Funkhouser, T. (2020). Grasping in the wild: Learning 6dof closed-loop grasping from low-cost demonstrations. IEEE Robotics and Automation Letters, 5(3), 4978\u20134985.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"10188_CR70","unstructured":"Taranovic, A., Kupcsik, A. G., Freymuth, N., & Neumann, G. (2022). Adversarial imitation learning with preferences. In International conference on learning representations."},{"key":"10188_CR71","doi-asserted-by":"crossref","unstructured":"Tremblay, J., To, T., & Birchfield, S. (2018). Falling things: A synthetic dataset for 3d object detection and pose estimation. In IEEE conference on computer vision and pattern recognition workshops (pp. 2038\u20132041).","DOI":"10.1109\/CVPRW.2018.00275"},{"issue":"4","key":"10188_CR72","doi-asserted-by":"publisher","first-page":"623","DOI":"10.1109\/TEVC.2017.2735550","volume":"22","author":"V Vassiliades","year":"2017","unstructured":"Vassiliades, V., Chatzilygeroudis, K., & Mouret, J. B. (2017). Using centroidal Voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Transactions on Evolutionary Computation, 22(4), 623\u2013630.","journal-title":"IEEE Transactions on Evolutionary Computation"},{"issue":"6","key":"10188_CR73","first-page":"1","volume":"39","author":"J Wang","year":"2020","unstructured":"Wang, J., Mueller, F., Bernard, F., Sorli, S., Sotnychenko, O., Qian, N., Otaduy, M. A., Casas, D., & Theobalt, C. (2020). Rgb2hands: Real-time tracking of 3D hand interactions from monocular RGB video. ACM Transactions on Graphics, 39(6), 1\u201316.","journal-title":"ACM Transactions on Graphics"},{"key":"10188_CR74","doi-asserted-by":"crossref","unstructured":"Wen, B., Lian, W., Bekris, K., & Schaal, S. (2022) You only demonstrate once: Category-level manipulation from single visual demonstration. In Robotics: Science and systems.","DOI":"10.15607\/RSS.2022.XVIII.044"},{"key":"10188_CR75","doi-asserted-by":"crossref","unstructured":"Wen, B., Tremblay, J., Blukis, V., Tyree, S., M\u00fcller, T., Evans, A., Fox, D., Kautz, J., & Birchfield, S. (2023). BundleSDF: Neural 6-DOF tracking and 3D reconstruction of unknown objects. In IEEE\/CVF conference on computer vision and pattern recognition (pp. 606\u2013617).","DOI":"10.1109\/CVPR52729.2023.00066"},{"key":"10188_CR76","unstructured":"Wen, C., Lin, J., Qian, J., Gao, Y., & Jayaraman, D. (2021). Keyframe-focused visual imitation learning. In International conference on machine learning (vol. 139, pp. 11123\u201311133)."},{"key":"10188_CR77","doi-asserted-by":"crossref","unstructured":"Xiong, H., Li, Q., Chen, Y. C., Bharadhwaj, H., Sinha, S., & Garg, A. (2021). Learning by watching: Physical imitation of manipulation skills from human videos. In IEEE\/RSJ international conference on intelligent robots and systems (pp. 7827\u20137834).","DOI":"10.1109\/IROS51168.2021.9636080"},{"key":"10188_CR78","unstructured":"Young, S., Gandhi, D., Tulsiani, S., Gupta, A., Abbeel, P., & Pinto, L. (2021). Visual imitation made easy. In Conference on Robot learning."},{"key":"10188_CR79","unstructured":"Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C. L., & Grundmann, M. (2020). Mediapipe hands: On-device real-time hand tracking. In CVPR workshop on computer vision for augmented and virtual reality."}],"container-title":["Autonomous Robots"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-024-10188-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10514-024-10188-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-024-10188-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,18]],"date-time":"2025-01-18T07:39:24Z","timestamp":1737185964000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10514-024-10188-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,18]]},"references-count":79,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["10188"],"URL":"https:\/\/doi.org\/10.1007\/s10514-024-10188-y","relation":{},"ISSN":["0929-5593","1573-7527"],"issn-type":[{"value":"0929-5593","type":"print"},{"value":"1573-7527","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,18]]},"assertion":[{"value":"26 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 December 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"All physical experiments that relied on interactions with humans were conducted under university guidelines and followed the protocol of Virginia Tech IRB -755.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Statement"}}],"article-number":"5"}}