{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,1,22]],"date-time":"2025-01-22T05:04:47Z","timestamp":1737522287181,"version":"3.33.0"},"reference-count":31,"publisher":"Cambridge University Press (CUP)","issue":"10","license":[{"start":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T00:00:00Z","timestamp":1731024000000},"content-version":"unspecified","delay-in-days":38,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotica"],"published-print":{"date-parts":[[2024,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Imitation from Observation (IfO) prompts the robot to imitate tasks from unlabeled videos via reinforcement learning (RL). The performance of the IfO algorithm depends on its ability to extract task-relevant representations since images are informative. Existing IfO algorithms extract image representations by using a simple encoding network or pre-trained network. Due to the lack of action labels, it is challenging to design a supervised task-relevant proxy task to train the simple encoding network. Representations extracted by a pre-trained network such as Resnet are often task-irrelevant. In this article, we propose a new approach for robot IfO via multimodal observations. Different modalities describe the same information from different sides, which can be used to design an unsupervised proxy task. Our approach contains two modules: the unsupervised cross-modal representation (UCMR) module and a self-behavioral cloning (self-BC)-based RL module. The UCMR module learns to extract task-relevant representations via a multimodal unsupervised proxy task. The Self-BC for further offline policy optimization collects successful experiences during the RL training. We evaluate our approach on the real robot pouring water task, quantitative pouring task, and pouring sand task. The robot achieves state-of-the-art performance.<\/jats:p>","DOI":"10.1017\/s0263574724000626","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T07:29:05Z","timestamp":1731050945000},"page":"3247-3262","source":"Crossref","is-referenced-by-count":0,"title":["Robot imitation from multimodal observation with unsupervised cross-modal representation"],"prefix":"10.1017","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0394-8713","authenticated-orcid":false,"given":"Xuanhui","family":"Xu","sequence":"first","affiliation":[]},{"given":"Mingyu","family":"You","sequence":"additional","affiliation":[]},{"given":"Hongjun","family":"Zhou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3193-6269","authenticated-orcid":false,"given":"Bin","family":"He","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"S0263574724000626_ref29","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3103980"},{"key":"S0263574724000626_ref23","doi-asserted-by":"publisher","DOI":"10.3390\/app10031073"},{"key":"S0263574724000626_ref5","doi-asserted-by":"crossref","unstructured":"[5] Karnan, H. , Warnell, G. , Xiao, X. S. and Stone, P. , \u201cVOILA: Visual-Observation-Only Imitation Learning for Autonomous Navigation,\u201d IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, USA (2022) pp. 2497\u20132503.","DOI":"10.1109\/ICRA46639.2022.9812316"},{"key":"S0263574724000626_ref31","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1017\/S0263574723001881","article-title":"Design, simulation, control of a hybrid pouring robot: Enhancing automation level in the foundry industry","volume":"42","author":"Wang","year":"2024","journal-title":"Robotica"},{"volume-title":"Generative adversarial imitation from observation","author":"Torabi","key":"S0263574724000626_ref4"},{"key":"S0263574724000626_ref13","doi-asserted-by":"crossref","unstructured":"[13] Liu, Y. , Gupta, A. , Abbeel, P. and Levine, S. , \u201cImitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation,\u201d 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, (1125) pp. 1118\u20131125.","DOI":"10.1109\/ICRA.2018.8462901"},{"key":"S0263574724000626_ref17","doi-asserted-by":"crossref","unstructured":"[17] Tremblay, J. F. , Manderson, T. , Noca, A. , Dudek, G. and Meger, D. , \u201cMultimodal dynamics modeling for off-road autonomous vehicles,\u201d 2021 IEEE International Conference on Robotics and Automation (ICRA), Xian, China (1802) pp. 1796\u20131802.","DOI":"10.1109\/ICRA48506.2021.9561910"},{"key":"S0263574724000626_ref25","unstructured":"[25] Schulman, J. , Levine, S. , Moritz, P. , Jordan, M. and Abbeel, P. , \u201cTrust Region Policy Optimization,\u201d International Conference on Machine Learning (ICML), Lille, France (1897) pp. 1889\u20131897."},{"key":"S0263574724000626_ref10","doi-asserted-by":"crossref","unstructured":"[10] Sermanet, P. , Lynch, C. , Chebotar, Y. , Hsu, J. , Jang, E. , Schaal, S. and Levine, S. , \u201cTime-Contrastive Networks: Self-Supervised Learning from Video,\u201d 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia (2018) pp.1134\u20131141.","DOI":"10.1109\/ICRA.2018.8462891"},{"key":"S0263574724000626_ref12","unstructured":"[12] Cobbe, K. , Klimov, O. , Hesse, C. , Kim, T. and Schulman, J. , \u201cQuantifying Generalization in Reinforcement Learning,\u201d International Conference on Machine Learning (ICML), Long Beach, CA, USA (1289) pp. 1282\u20131289."},{"key":"S0263574724000626_ref18","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574721000023"},{"key":"S0263574724000626_ref22","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01549"},{"key":"S0263574724000626_ref8","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3062004"},{"key":"S0263574724000626_ref14","doi-asserted-by":"publisher","DOI":"10.1145\/3422622"},{"key":"S0263574724000626_ref16","doi-asserted-by":"crossref","unstructured":"[16] Lee, M. , Tan, M. , Zhu, Y. and Bohg, J. , \u201cDetect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors,\u201d IEEE International Conference on Robotics and Automation (ICRA), Xian, China (2021) pp. 909\u2013916.","DOI":"10.1109\/ICRA48506.2021.9561847"},{"key":"S0263574724000626_ref3","doi-asserted-by":"crossref","unstructured":"[3] Hermann, L. , Argus, M. , Eitel, A. , Amiranashvili, A. , Burgard, W. and Brox, T. . \u201cAdaptive Curriculum Generation from Demonstrations for Sim-to-Real Visuomotor Control,\u201d IEEE International Conference on Robotics and Automation (ICRA), Paris, France (2020) pp. 6498\u20136505.","DOI":"10.1109\/ICRA40945.2020.9197108"},{"key":"S0263574724000626_ref20","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-control-042920-020211"},{"key":"S0263574724000626_ref15","first-page":"1","article-title":"Sample-efficient adversarial imitation learning from observation","volume":"25","author":"Torabi","year":"2024","journal-title":"J. Mach. Learn. Res."},{"key":"S0263574724000626_ref27","doi-asserted-by":"publisher","DOI":"10.1017\/S026357471800111X"},{"key":"S0263574724000626_ref6","unstructured":"[6] Shah, R. and Kumar, V. , \u201cRRL: Resnet as Representation for Reinforcement Learning,\u201d 2021 In International Conference on Machine Learning (ICML), (2021) pp. 9465\u20139476."},{"key":"S0263574724000626_ref30","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"S0263574724000626_ref7","doi-asserted-by":"crossref","unstructured":"[7] Cole, E. , Yang, X. , Wilber, K. , Aodha, O. M. and Belongie, S. , \u201cWhen Does Contrastive Visual Representation Learning Work?,\u201d IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA (2022) pp.14755\u201314764.","DOI":"10.1109\/CVPR52688.2022.01434"},{"key":"S0263574724000626_ref21","first-page":"473","volume-title":"Medical Image Computing and Computer Assisted Intervention\u2013MICCAI 2020: 23rd International Conference","author":"Saha","year":"2020"},{"key":"S0263574724000626_ref24","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.2021.3098451"},{"key":"S0263574724000626_ref2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364919880273"},{"key":"S0263574724000626_ref28","unstructured":"[28] Ruder, S. , \u201cAn overview of gradient descent optimization algorithms,\u201d arXiv preprint arXiv: 1609.04747, (2016)."},{"key":"S0263574724000626_ref1","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574722001230"},{"key":"S0263574724000626_ref9","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574723001613"},{"key":"S0263574724000626_ref11","doi-asserted-by":"crossref","unstructured":"[11] Torabi, F. , Warnell, G. and Stone, P. , \u201cBehavioral Cloning from Observation,\u201d 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden (2018), pp. 4950\u20134957.","DOI":"10.24963\/ijcai.2018\/687"},{"key":"S0263574724000626_ref19","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2022.3172469"},{"key":"S0263574724000626_ref26","first-page":"4213","article-title":"Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient","volume":"33","author":"Li","year":"2019","journal-title":"Proc. Sym. Edu. Adva. Artifi. Intel. (AAAI)"}],"container-title":["Robotica"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0263574724000626","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,21]],"date-time":"2025-01-21T05:33:32Z","timestamp":1737437612000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0263574724000626\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10]]},"references-count":31,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10]]}},"alternative-id":["S0263574724000626"],"URL":"https:\/\/doi.org\/10.1017\/s0263574724000626","relation":{},"ISSN":["0263-5747","1469-8668"],"issn-type":[{"type":"print","value":"0263-5747"},{"type":"electronic","value":"1469-8668"}],"subject":[],"published":{"date-parts":[[2024,10]]}}}