{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T23:36:56Z","timestamp":1761176216954,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>Current embodied robots heavily depend on pre-trained models, whose capabilities are inherently constrained by the data they were originally trained on. However, truly intelligent robots are expected to improve themselves autonomously when encountering novel environments where these pre-trained models fall short.This is the capability we define as self-evolving ability. In this paper, we investigate the self-evolving capacity of robotic vision models. Specifically, we simulate this process using the R3ED dataset and propose a training framework in which a policy learns to navigate through unfamiliar environments to collect informative data that can be used to refine the vision model. Our training pipeline is built upon the GRPO algorithm and incorporates historical states into the policy design to enhance contextual awareness. Furthermore, we introduce a novel reward mechanism based on supervision discrepancy to guide effective data collection. Experimental results validate the effectiveness of our proposed reinforcement training strategy. Our work highlights the potential of designing intelligent robots that can improve themselves without the intervene of human beings. Nevertheless, we acknowledge that robotic self-evolving remains a nascent and underexplored area, with significant room for further future research and the discovery of more optimal approaches.<\/jats:p>","DOI":"10.3233\/faia251129","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:52:19Z","timestamp":1761126739000},"source":"Crossref","is-referenced-by-count":0,"title":["Training Robotic Self-Evolving with GRPO"],"prefix":"10.3233","author":[{"given":"Qinpeng","family":"Yi","sequence":"first","affiliation":[{"name":"South China University of Technology"}]},{"given":"Ping","family":"Zhang","sequence":"additional","affiliation":[{"name":"South China University of Technology"}]},{"given":"Junwei","family":"Chen","sequence":"additional","affiliation":[{"name":"South China University of Technology"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA251129","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:52:19Z","timestamp":1761126739000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA251129"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia251129","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}