{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T10:09:06Z","timestamp":1764842946652,"version":"3.27.0"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643685489","type":"electronic"}],"license":[{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,10,16]]},"abstract":"<jats:p>Learning behavior in legged robots presents a significant challenge due to its inherent instability and complex constraints. Recent research has proposed the use of a large language model (LLM) to generate reward functions in reinforcement learning, thereby replacing the need for manually designed rewards by experts. However, this approach, which relies on textual descriptions to define learning objectives, fails to achieve controllable and precise behavior learning with clear directionality. In this paper, we introduce a new video2reward method, which directly generates reward functions from videos depicting the behaviors to be mimicked and learned. Specifically, we first process videos containing the target behaviors, converting the motion information of individuals in the videos into keypoint trajectories represented as coordinates through a video2text transforming module. These trajectories are then fed into an LLM to generate the reward function, which in turn is used to train the policy. To enhance the quality of the reward function, we develop a video-assisted iterative reward refinement scheme that visually assesses the learned behaviors and provides textual feedback to the LLM. This feedback guides the LLM to continually refine the reward function, ultimately facilitating more efficient behavior learning. Experimental results on tasks involving bipedal and quadrupedal robot motion control demonstrate that our method surpasses the performance of state-of-the-art LLM-based reward generation methods by over 37.6% in terms of human normalized score. More importantly, by switching video inputs, we find our method can rapidly learn diverse motion behaviors such as walking and running.<\/jats:p>","DOI":"10.3233\/faia241014","type":"book-chapter","created":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T13:55:25Z","timestamp":1729173325000},"source":"Crossref","is-referenced-by-count":1,"title":["Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning"],"prefix":"10.3233","author":[{"given":"Runhao","family":"Zeng","sequence":"first","affiliation":[{"name":"Artificial Intelligence Research Institute, Shenzhen MSU-BIT University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dingjie","family":"Zhou","sequence":"additional","affiliation":[{"name":"College of Mechatronics and Control Engineering, Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiwei","family":"Liang","sequence":"additional","affiliation":[{"name":"College of Mechatronics and Control Engineering, Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junlin","family":"Liu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering, Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hui","family":"Li","sequence":"additional","affiliation":[{"name":"College of Mechatronics and Control Engineering, Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Changxin","family":"Huang","sequence":"additional","affiliation":[{"name":"National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianqiang","family":"Li","sequence":"additional","affiliation":[{"name":"National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiping","family":"Hu","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Research Institute, Shenzhen MSU-BIT University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fuchun","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2024"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA241014","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T13:55:26Z","timestamp":1729173326000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA241014"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,16]]},"ISBN":["9781643685489"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia241014","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,16]]}}}