{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T20:03:10Z","timestamp":1777492990918,"version":"3.51.4"},"reference-count":208,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T00:00:00Z","timestamp":1757376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Robot. AI"],"abstract":"<jats:p>Diffusion generative models have demonstrated remarkable success in visual domains such as image and video generation. They have also recently emerged as a promising approach in robotics, especially in robot manipulations. Diffusion models leverage a probabilistic framework, and they stand out with their ability to model multi-modal distributions and their robustness to high-dimensional input and output spaces. This survey provides a comprehensive review of state-of-the-art diffusion models in robotic manipulation, including grasp learning, trajectory planning, and data augmentation. Diffusion models for scene and image augmentation lie at the intersection of robotics and computer vision for vision-based tasks to enhance generalizability and data scarcity. This paper also presents the two main frameworks of diffusion models and their integration with imitation learning and reinforcement learning. In addition, it discusses the common architectures and benchmarks and points out the challenges and advantages of current state-of-the-art diffusion-based methods.<\/jats:p>","DOI":"10.3389\/frobt.2025.1606247","type":"journal-article","created":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T04:23:20Z","timestamp":1757391800000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Diffusion models for robotic manipulation: a survey"],"prefix":"10.3389","volume":"12","author":[{"given":"Rosa","family":"Wolf","sequence":"first","affiliation":[]},{"given":"Yitian","family":"Shi","sequence":"additional","affiliation":[]},{"given":"Sheng","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Rania","family":"Rayyes","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,9,9]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"3116","DOI":"10.1109\/LRA.2024.3363530","article-title":"Diffusion policies for out-of-distribution generalization in offline reinforcement learning","volume":"9","author":"Ada","year":"2024","journal-title":"IEEE Robotics Automation Lett."},{"key":"B2","article-title":"IS conditional generative modeling all you need for decision-making?","author":"Ajay","year":"2023"},{"key":"B3","article-title":"Hindsight experience replay","volume":"30","author":"Andrychowicz","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B4","doi-asserted-by":"publisher","first-page":"164621","DOI":"10.1109\/ACCESS.2024.3492118","article-title":"GraspLDM: generative 6-DoF grasp synthesis using latent diffusion models","volume":"12","author":"Barad","year":"2024","journal-title":"IEEE Access"},{"key":"B5","first-page":"6904","article-title":"Towards generalizable zero-shot manipulation via translating human interaction plans","author":"Bharadhwaj","year":""},{"key":"B6","first-page":"306","article-title":"Track2Act: predicting point tracks from internet videos enables generalizable robot manipulation","author":"Bharadhwaj","year":""},{"key":"B7","volume-title":"\u03c0","author":"Black","year":""},{"key":"B8","article-title":"Zero-shot robotic manipulation with pretrained image-editing diffusion models","author":"Black","year":""},{"key":"B9","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1109\/tro.2013.2289018","article-title":"Data-driven grasp synthesis\u2014a survey","volume":"30","author":"Bohg","year":"2013","journal-title":"IEEE Trans. robotics"},{"key":"B10","first-page":"5144","article-title":"Riemannian flow matching policy for robot motion learning","author":"Braun","year":"2024"},{"key":"B11","first-page":"63818","article-title":"EDGI: equivariant diffusion for planning with embodied agents","volume":"36","author":"Brehmer","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B12","volume-title":"RT-2: vision-Language-action models transfer web knowledge to robotic control","author":"Brohan","year":""},{"key":"B13","volume-title":"RT-1: robotics transformer for real-world control at scale","author":"Brohan","year":""},{"key":"B14","volume-title":"Multi-modal diffusion for hand-object grasp generation","author":"Cao","year":"2024"},{"key":"B15","first-page":"1916","article-title":"Motion planning diffusion: learning and planning of robot motions with diffusion models","author":"Carvalho","year":"2023"},{"key":"B16","volume-title":"Grasp diffusion network: learning grasp generators from partial point clouds with diffusion models in SO (3) xR3","author":"Carvalho","year":"2024"},{"key":"B17","volume-title":"Text2Grasp: grasp synthesis by text prompts of object grasping parts","author":"Chang","year":"2024"},{"key":"B18","volume-title":"GR-2: a generative video-language-action model with web-scale knowledge for robot manipulation","author":"Cheang","year":"2024"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2402.16075","article-title":"Don\u2019t start from scratch: behavioral refinement via interpolant-based policy diffusion","author":"Chen","year":"","journal-title":"Robotics Sci. Syst."},{"key":"B20","first-page":"2012","article-title":"PlayFusion: Skill acquisition via diffusion from language-annotated play","volume":"229","author":"Chen","year":"","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B21","article-title":"Rovi-aug: robot and viewpoint augmentation for cross-embodiment robot learning","author":"Chen","year":""},{"key":"B22","volume-title":"GenAug: retargeting behaviors to unseen situations via generative augmentation","author":"Chen","year":""},{"key":"B23","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.04137","article-title":"Diffusion policy: visuomotor policy learning via action diffusion","author":"Chi","year":"2023","journal-title":"Robotics Sci. Syst. (RSS)"},{"key":"B24","first-page":"8780","article-title":"Diffusion models beat GANs on image synthesis","volume":"34","author":"Dhariwal","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B25","article-title":"Consistency models as a rich and efficient policy class for reinforcement learning","author":"Ding","year":"2023"},{"key":"B26","volume-title":"Diffusion augmented agents: a framework for efficient exploration and transfer learning","author":"Di Palo","year":"2024"},{"key":"B27","article-title":"An image is worth 16x16 words: transformers for image recognition at scale","author":"Dosovitskiy","year":"2021"},{"key":"B28","first-page":"9156","article-title":"Learning universal policies via text-guided video generation","volume":"36","author":"Du","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B29","first-page":"11444","article-title":"Graspnet-1billion: a large-scale benchmark for general object grasping","author":"Fang","year":"2020"},{"key":"B30","volume-title":"FFHFlow: a flow-based variational approach for multi-fingered grasp synthesis in real time","author":"Feng","year":"2024"},{"key":"B31","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1177\/02783649241281508","article-title":"Foundation models in robotics: applications, challenges, and the future","volume":"44","author":"Firoozi","year":"2025","journal-title":"Int. J. Robotics Res."},{"key":"B32","first-page":"158","article-title":"Implicit behavioral cloning","volume":"164","author":"Florence","year":"2022","journal-title":"Proc. Mach. Learn. Res."},{"key":"B33","article-title":"One step diffusion via shortcut models","author":"Frans","year":"2025"},{"key":"B34","doi-asserted-by":"publisher","first-page":"2694","DOI":"10.1109\/LRA.2025.3534065","article-title":"Diffusion for multi-embodiment grasping","volume":"10","author":"Freiberg","year":"2025","journal-title":"IEEE Robotics Automation Lett."},{"key":"B35","volume-title":"D4RL: datasets for deep data-driven reinforcement learning","author":"Fu","year":"2020"},{"key":"B36","first-page":"356","article-title":"Diffusion policies as multi-agent reinforcement learning strategies","volume-title":"Lecture notes in computer science","author":"Geng","year":"2023"},{"key":"B37","first-page":"3949","article-title":"Act3D: 3D feature field transformers for multi-task robotic manipulation","volume":"229","author":"Gervet","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B38","doi-asserted-by":"publisher","first-page":"2302","DOI":"10.1109\/tase.2023.3328964","article-title":"Metagraspnetv2: all-in-one dataset enabling fast and reliable robotic bin picking via object relationship reasoning and dexterous grasping","volume":"21","author":"Gilles","year":"2023","journal-title":"IEEE Trans. Automation Sci. Eng."},{"key":"B39","doi-asserted-by":"publisher","first-page":"3644","DOI":"10.1109\/LRA.2025.3544083","article-title":"MetaMVUC: active learning for sample-efficient sim-to-real domain adaptation in robotic grasping","volume":"10","author":"Gilles","year":"2025","journal-title":"IEEE Robotics Automation Lett."},{"key":"B40","first-page":"694","article-title":"Rvt: robotic view transformer for 3d object manipulation","author":"Goyal","year":"2023"},{"key":"B41","first-page":"10686","article-title":"Vector quantized diffusion model for text-to-image synthesis","author":"Gu","year":"2022"},{"key":"B42","first-page":"1025","article-title":"Relay policy learning: solving long-horizon tasks via imitation and reinforcement learning","author":"Gupta","year":"2020","journal-title":"Proc. Mach. Learn. Res."},{"key":"B43","first-page":"3766","article-title":"Scaling up and distilling Down: language-guided robot skill acquisition","volume":"229","author":"Ha","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B44","article-title":"Generative adversarial imitation learning","volume-title":"Advances in neural information processing systems","author":"Ho","year":"2016"},{"key":"B45","article-title":"Denoising diffusion probabilistic models","author":"Ho","year":"2020"},{"key":"B46","article-title":"Classifier-free diffusion guidance","author":"Ho","year":"2021"},{"key":"B47","volume-title":"HGDiffuser: efficient task-oriented grasp generation via human-guided grasp diffusion models","author":"Huang","year":""},{"key":"B48","first-page":"3882","article-title":"Edge grasp network: a graph-based se (3)-invariant approach to grasp detection","author":"Huang","year":"2023"},{"key":"B49","article-title":"An embodied generalist agent in 3D world","author":"Huang","year":""},{"key":"B50","first-page":"478","article-title":"Diffusion reward: learning rewards via conditional video diffusion","author":"Huang","year":""},{"key":"B51","first-page":"16489","article-title":"Subgoal diffuser: coarse-to-fine subgoal generation to guide model predictive control for robot manipulation","author":"Huang","year":""},{"key":"B52","first-page":"7590","article-title":"Multimodal diffusion segmentation model for object segmentation from manipulation instructions","author":"Iioka","year":"2023"},{"key":"B53","first-page":"7406","article-title":"Diffusionnocs: managing symmetry and uncertainty in sim2real multi-modal category-level pose estimation","author":"Ikeda","year":"2024"},{"key":"B54","doi-asserted-by":"publisher","first-page":"3019","DOI":"10.1109\/LRA.2020.2974707","article-title":"RLBench: the robot learning benchmark & learning environment","volume":"5","author":"James","year":"2020","journal-title":"IEEE Robotics Automation Lett."},{"key":"B55","first-page":"9902","article-title":"Planning with diffusion for flexible behavior synthesis","volume":"162","author":"Janner","year":"2022","journal-title":"Proc. 39th Int. Conf. Mach. Learn."},{"key":"B56","volume-title":"Synergies between affordance and geometry: 6-dof grasp detection via implicit representations","author":"Jiang","year":"2021"},{"key":"B57","article-title":"Gotta Go fast with score-based generative models","author":"Jolicoeur-Martineau","year":"2021"},{"key":"B58","first-page":"67195","article-title":"Efficient diffusion policies for offline reinforcement learning","volume-title":"Advances in neural information processing systems","author":"Kang","year":"2023"},{"key":"B59","first-page":"4796","article-title":"Dream2Real: zero-shot 3D object rearrangement with vision-language models","author":"Kapelyukh","year":"2024"},{"key":"B60","doi-asserted-by":"publisher","first-page":"3956","DOI":"10.1109\/LRA.2023.3272516","article-title":"DALL-E-Bot: introducing web-scale diffusion models to robotics","volume":"8","author":"Kapelyukh","year":"2023","journal-title":"IEEE Robotics Automation Lett."},{"key":"B61","first-page":"26565","article-title":"Elucidating the design space of diffusion-based generative models","volume":"35","author":"Karras","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B62","first-page":"2713","article-title":"RIC: rotate-inpaint-complete for generalizable scene reconstruction","author":"Kasahara","year":"2024"},{"key":"B63","first-page":"6672","article-title":"Gen2Sim: scaling up robot learning in simulation with generative models","author":"Katara","year":"2024"},{"key":"B64","article-title":"3D diffuser actor: policy diffusion with 3D scene representations","author":"Ke","year":"2024"},{"key":"B65","article-title":"OpenVLA: an open-source vision-language-action model","author":"Kim","year":""},{"key":"B66","doi-asserted-by":"publisher","first-page":"13160","DOI":"10.1609\/aaai.v38i12.29215","article-title":"Stitching sub-trajectories with conditional diffusion model for goal-conditioned offline RL","volume":"38","author":"Kim","year":"","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"B67","doi-asserted-by":"publisher","first-page":"13177","DOI":"10.1609\/aaai.v38i12.29217","article-title":"Robust policy learning via offline skill diffusion","volume":"38","author":"Kim","year":"","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"B68","first-page":"4015","article-title":"Segment anything","author":"Kirillov","year":"2023"},{"key":"B69","article-title":"Learning to act from actionless videos through dense correspondences","author":"Ko","year":"2024"},{"key":"B70","first-page":"1","article-title":"Generative adversarial networks","author":"Krichen","year":"2023"},{"key":"B71","volume-title":"Offline reinforcement learning: tutorial, review, and perspectives on open problems","author":"Levine","year":"2020"},{"key":"B72","volume-title":"Language-guided object-centric diffusion policy for generalizable and collision-aware robotic manipulation","author":"Li","year":"2025"},{"key":"B73","first-page":"273","article-title":"ClickDiff: click to induce semantic contact map for controllable grasp generation with diffusion models","author":"Li","year":""},{"key":"B74","volume-title":"CogACT: a foundational vision-language-action model for synergizing cognition and action in robotic manipulation","author":"Li","year":""},{"key":"B75","first-page":"20035","article-title":"Hierarchical diffusion for offline decision making","volume":"202","author":"Li","year":"2023","journal-title":"Proc. 40th Int. Conf. Mach. Learn."},{"key":"B76","first-page":"16841","article-title":"Crossway diffusion: improving diffusion-based visuomotor policy via self-supervised learning","author":"Li","year":""},{"key":"B77","first-page":"4328","article-title":"Diffusion-LM improves controllable text generation","volume":"35","author":"Li","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B78","volume-title":"ALDM-Grasping: diffusion-aided zero-shot sim-to-real transfer for robot grasping","author":"Li","year":""},{"key":"B79","first-page":"20725","article-title":"AdaptDiffuser: diffusion models as adaptive self-evolving planners","volume":"202","author":"Liang","year":"2023","journal-title":"Proc. 40th Int. Conf. Mach. Learn."},{"key":"B80","first-page":"16467","article-title":"SkillDiffuser: interpretable hierarchical planning via skill abstractions in diffusion-based task execution","author":"Liang","year":"2024"},{"key":"B81","article-title":"EquiGraspFlow: SE (3)-Equivariant 6-DoF grasp pose generative flows","author":"Lim","year":"2024"},{"key":"B82","article-title":"Flow matching for generative modeling","author":"Lipman","year":"2023"},{"key":"B83","first-page":"44776","article-title":"LIBERO: benchmarking knowledge transfer for lifelong robot learning","volume":"36","author":"Liu","year":"","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B84","volume-title":"RDT-1B: a diffusion foundation model for bimanual manipulation","author":"Liu","year":"2024"},{"key":"B85","first-page":"38","article-title":"Grounding dino: marrying dino with grounded pre-training for open-set object detection","author":"Liu","year":"2025"},{"key":"B86","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2023.XIX.031","article-title":"StructDiffusion: language-guided creation of physically-valid structures using unseen objects","author":"Liu","year":"","journal-title":"Robotics Sci. Syst"},{"key":"B87","first-page":"1300","article-title":"Composable part-based manipulation","volume":"229","author":"Liu","year":"","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B88","first-page":"5775","article-title":"DPM-Solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps","volume-title":"Advances in neural information processing systems","author":"Lu","year":"2022"},{"key":"B89","first-page":"414","article-title":"Ugg: unified generative grasping","author":"Lu","year":"2025"},{"key":"B90","article-title":"Are GANs created equal? A large-scale study","volume":"30","author":"Lucic","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B91","volume-title":"DexDiff: towards extrinsic dexterity manipulation of ungraspable objects in unrestricted environments","author":"Ma","year":""},{"key":"B92","first-page":"18081","article-title":"Hierarchical diffusion policy for kinematics-aware multi-task robotic manipulation","author":"Ma","year":""},{"key":"B93","article-title":"CACTI: a framework for scalable multi-task multi-scene visual imitation learning","author":"Mandi","year":"2022"},{"key":"B94","first-page":"1","article-title":"LapGym-An open source framework for reinforcement learning in robot-assisted laparoscopic surgery","volume":"24","author":"Maria Scheikl","year":"2023","journal-title":"J. Mach. Learn. Res."},{"key":"B95","doi-asserted-by":"publisher","DOI":"10.14569\/IJACSA.2023.0140399","article-title":"Rapidly exploring random trees for autonomous navigation in observable and uncertain environments","volume":"14","author":"Martinez","year":"2023","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"B96","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1109\/MCS.2011.940571","article-title":"Receding horizon control","volume":"31","author":"Mattingley","year":"2011","journal-title":"IEEE Control Syst. Mag."},{"key":"B97","doi-asserted-by":"publisher","first-page":"7327","DOI":"10.1109\/LRA.2022.3180108","article-title":"CALVIN: a benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks","volume":"7","author":"Mees","year":"2022","journal-title":"IEEE Robotics Automation Lett."},{"key":"B98","first-page":"8748","article-title":"Learning transferable visual models from natural language supervision","author":"Meila","year":"2021"},{"key":"B99","first-page":"2134","article-title":"Embodied lifelong learning for task and motion planning","volume":"229","author":"Mendez-Mendez","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B100","doi-asserted-by":"crossref","DOI":"10.14428\/esann\/2022.ES2022-100","volume-title":"Hyperspectral wavelength analysis with u-net for larynx cancer detection","author":"Meyer-Veit","year":""},{"key":"B101","doi-asserted-by":"publisher","first-page":"682","DOI":"10.1007\/978-3-031-15937-4_57","article-title":"Hyperspectral endoscopy using deep learning for laryngeal cancer segmentation","volume":"2022","author":"Meyer-Veit","year":"","journal-title":"Artif. Neural Netw. Mach. Learn. \u2013 ICANN"},{"key":"B102","first-page":"405","article-title":"NeRF: representing scenes as neural radiance fields for view synthesis","author":"Mildenhall","year":"2020"},{"key":"B103","first-page":"10867","article-title":"ReorientDiff: diffusion model based reorientation for object manipulation","author":"Mishra","year":"2024"},{"key":"B104","first-page":"2905","article-title":"Generative skill chaining: long-horizon skill planning with diffusion models","volume":"229","author":"Mishra","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B105","volume-title":"Mish: a self regularized non-monotonic activation function","author":"Misra","year":"2019"},{"key":"B106","first-page":"2901","article-title":"6-DOF GraspNet: variational grasp generation for object manipulation","author":"Mousavian","year":"2019"},{"key":"B107","doi-asserted-by":"publisher","first-page":"3994","DOI":"10.1109\/tro.2023.3280597","article-title":"Deep learning approaches to grasp synthesis: a review","volume":"39","author":"Newbury","year":"2023","journal-title":"IEEE Trans. Robotics"},{"key":"B108","volume-title":"FlowMP: learning motion fields for robot planning with conditional flow matching","author":"Nguyen","year":"2025"},{"key":"B109","first-page":"13719","article-title":"Lightweight language-driven grasp detection using conditional consistency model","author":"Nguyen","year":""},{"key":"B110","first-page":"3071","article-title":"Language-conditioned affordance-pose detection in 3D point clouds","author":"Nguyen","year":""},{"key":"B111","first-page":"8162","article-title":"Improved denoising diffusion probabilistic models","volume":"139","author":"Nichol","year":"2021","journal-title":"Proc. 38th Int. Conf. Mach. Learn."},{"key":"B112","volume-title":"Dinov2: learning robust visual features without supervision","author":"Oquab","year":"2023"},{"key":"B113","volume-title":"Vision-language-action model and diffusion policy switching enables dexterous control of an anthropomorphic hand","author":"Pan","year":""},{"key":"B114","first-page":"13341","article-title":"Exploiting priors from 3D diffusion models for RGB-based one-shot view planning","author":"Pan","year":""},{"key":"B115","volume-title":"Dm-osvp++: one-shot view planning using 3d diffusion models for active rgb-based object reconstruction","author":"Pan","year":"2025"},{"key":"B116","article-title":"Imitating human behaviour with diffusion models","author":"Pearce","year":"2022"},{"key":"B117","first-page":"4172","article-title":"Scalable diffusion models with transformers","author":"Peebles","year":"2023"},{"key":"B118","volume-title":"FiLM: visual reasoning with a general conditioning layer","author":"Perez","year":"2018"},{"key":"B119","volume-title":"FAST: efficient action tokenization for vision-language-action models","author":"Pertsch","year":"2025"},{"key":"B120","first-page":"1820","article-title":"On the sample complexity of imitation learning for smoothed model predictive control","author":"Pfrommer","year":"2024"},{"key":"B121","article-title":"Sampling constrained trajectories using composable diffusion models","author":"Power","year":"2023"},{"key":"B122","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2405.07503","article-title":"Consistency policy accelerated visuomotor policies via consistency distillation","author":"Prasad","year":"2024","journal-title":"Robotics Sci. Syst."},{"key":"B123","volume-title":"Ec-diffuser: multi-object manipulation via entity-centric behavior generation","author":"Qi","year":"2025"},{"key":"B124","article-title":"ThinkGrasp: a vision-language system for strategic part grasping in clutter","author":"Qian","year":"2024"},{"key":"B125","first-page":"8748","article-title":"Learning transferable visual models from natural language supervision","author":"Radford","year":"2021"},{"key":"B126","volume-title":"Learning complex dexterous manipulation with deep reinforcement learning and demonstrations","author":"Rajeswaran","year":"2017"},{"key":"B127","volume-title":"Hierarchical text-conditional image generation with CLIP latents","author":"Ramesh","year":"2022"},{"key":"B128","article-title":"Diffusion policy policy optimization","author":"Ren","year":"2024"},{"key":"B129","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2407.05996","article-title":"Multimodal diffusion transformer: learning versatile behavior from multimodal goals","author":"Reuss","year":"","journal-title":"Robotics Sci. Syst."},{"key":"B130","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.02532","article-title":"Goal-conditioned imitation learning using score-based diffusion policies","author":"Reuss","year":"2023","journal-title":"Robotics Sci. Syst."},{"key":"B131","volume-title":"Efficient diffusion transformer policies with mixture of expert denoisers for multitask learning","author":"Reuss","year":""},{"key":"B132","first-page":"10684","article-title":"High-resolution image synthesis with latent diffusion models","author":"Rombach","year":""},{"key":"B133","first-page":"10684","article-title":"High-resolution image synthesis with latent diffusion models","author":"Rombach","year":""},{"key":"B134","volume-title":"Diffusion predictive control with constraints","author":"R\u00f6mer","year":"2024"},{"key":"B135","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1007\/978-3-319-24574-4_28","article-title":"U-Net: Convolutional networks for biomedical image segmentation","volume":"2015","author":"Ronneberger","year":"2015","journal-title":"Med. Image Comput. Computer-Assisted Intervention \u2013 MICCAI"},{"key":"B136","first-page":"661","article-title":"Efficient reductions for imitation learning","volume":"9","author":"Ross","year":"2010","journal-title":"Proc. Thirteen. Int. Conf. Artif. Intell. Statistics"},{"key":"B137","first-page":"528","article-title":"Flow matching imitation learning for multi-support manipulation","author":"Rouxel","year":"2024"},{"key":"B138","first-page":"18007","article-title":"Diffusion-EDFs: Bi-equivariant denoising generative modeling on SE(3) for visual robotic manipulation","author":"Ryu","year":"2024"},{"key":"B139","article-title":"Equivariant descriptor fields: SE(3)-equivariant energy-based models for end-to-end visual robotic manipulation learning","author":"Ryu","year":"2023"},{"key":"B140","first-page":"10351","article-title":"EDMP: ensemble-of-costs-guided diffusion for motion planning","author":"Saha","year":"2024"},{"key":"B141","article-title":"Progressive distillation for fast sampling of diffusion models","author":"Salimans","year":"2022"},{"key":"B142","doi-asserted-by":"publisher","first-page":"5338","DOI":"10.1109\/LRA.2024.3382529","article-title":"Movement primitive diffusion: learning gentle robotic manipulation of deformable objects","volume":"9","author":"Scheikl","year":"2024","journal-title":"IEEE Robotics Automation Lett."},{"key":"B143","volume-title":"SE (3)-Equivariant robot learning and control: a tutorial survey","author":"Seo","year":"2025"},{"key":"B144","first-page":"8539","article-title":"From LLMs to actions: latent codes as bridges in hierarchical robot control","author":"Shentu","year":"2024"},{"key":"B145","first-page":"2195","article-title":"Waypoint-based imitation learning for robotic manipulation","volume":"229","author":"Shi","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B146","volume-title":"vMF-Contact: uncertainty-aware evidential learning for probabilistic contact-grasp in noisy clutter","author":"Shi","year":"2024"},{"key":"B147","volume-title":"VISO-Grasp: vision-language informed spatial object-centric 6-DoF active view planning and grasping in clutter and invisibility","author":"Shi","year":"2025"},{"key":"B148","doi-asserted-by":"crossref","DOI":"10.15607\/RSS.2024.XX.128","article-title":"Tilde: teleoperation for dexterous In-Hand manipulation learning with a DeltaHand","author":"Si","year":"2024","journal-title":"Robotics Sci. Syst"},{"key":"B149","article-title":"Shelving, stacking, hanging: relational pose diffusion for multi-modal rearrangement","author":"Simeonov","year":"2023"},{"key":"B150","first-page":"7344","article-title":"Constrained 6-DoF grasp generation on complex shapes for improved dual-arm manipulation","author":"Singh","year":"2024"},{"key":"B151","first-page":"2256","article-title":"Deep unsupervised learning using nonequilibrium thermodynamics","volume":"37","author":"Sohl-Dickstein","year":"2015","journal-title":"Proc. 32nd Int. Conf. Mach. Learn."},{"key":"B152","article-title":"Denoising diffusion implicit models","author":"Song","year":""},{"key":"B153","article-title":"Implicit grasp diffusion: bridging the gap between dense prediction and sampling-based grasping","author":"Song","year":"2024"},{"key":"B154","article-title":"Generative modeling by estimating gradients of the data distribution","volume":"32","author":"Song","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B155","article-title":"Score-based generative modeling through stochastic differential equations","author":"Song","year":""},{"key":"B156","first-page":"2878","article-title":"Fighting uncertainty with gradients: offline reinforcement learning via diffusion score matching","volume":"229","author":"Suh","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B157","article-title":"Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results","volume":"30","author":"Tarvainen","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B158","volume-title":"Octo: an open-source generalist robot policy","author":"Team","year":"2024"},{"key":"B159","first-page":"23","article-title":"Domain randomization for transferring deep neural networks from simulation to the real world","author":"Tobin","year":"2017"},{"key":"B160","first-page":"1082","article-title":"Training deep networks with synthetic data: bridging the reality gap by domain randomization","author":"Tremblay","year":"2018"},{"key":"B161","first-page":"11610","article-title":"Click to grasp: zero-shot precise manipulation via visual diffusion descriptors","author":"Tsagkas","year":"2024"},{"key":"B162","first-page":"5923","article-title":"SE(3)-DiffusionFields: learning smooth cost functions for joint grasp and motion optimization through diffusion","author":"Urain","year":"2023"},{"key":"B163","volume-title":"Reasoning with latent diffusion in offline reinforcement learning","author":"Venkatraman","year":"2023"},{"key":"B164","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2024.XX.051","article-title":"Render and diffuse: aligning image and action spaces for diffusion-based behaviour cloning","author":"Vosylius","year":"2024","journal-title":"Robotics Sci. Syst."},{"key":"B165","first-page":"17902","article-title":"Language-driven grasp detection","author":"Vuong","year":"2024"},{"key":"B166","doi-asserted-by":"crossref","DOI":"10.15607\/RSS.2024.XX.043","article-title":"DexCap: scalable and portable mocap data collection system for dexterous manipulation","author":"Wang","year":""},{"key":"B167","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2402.02511","article-title":"PoCo: policy composition from and for heterogeneous robot learning","author":"Wang","year":"","journal-title":"Robotics Sci. Syst."},{"key":"B168","first-page":"831","article-title":"Single-view scene point cloud human grasp generation","author":"Wang","year":""},{"key":"B169","volume-title":"Diffusion policies as an expressive policy class for offline reinforcement learning","author":"Wang","year":""},{"key":"B170","first-page":"3277","article-title":"Cold diffusion on the replay buffer: learning to plan from known good States","volume":"229","author":"Wang","year":"","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B171","article-title":"Learning fast samplers for diffusion models by differentiating trough sample quality","author":"Watson","year":"2022"},{"key":"B172","volume-title":"Interactive imitation learning for dexterous robotic manipulation: challenges and perspectives \u2013 a survey","author":"Welte","year":"2025"},{"key":"B173","volume-title":"Diffusion-VLA: scaling robot foundation models via unified diffusion and autoregression","author":"Wen","year":"2024"},{"key":"B174","doi-asserted-by":"publisher","first-page":"3988","DOI":"10.1109\/LRA.2025.3544909","article-title":"TinyVLA: toward fast, data-efficient vision-language-action models for robotic manipulation","volume":"10","author":"Wen","year":"2025","journal-title":"IEEE Robotics Automation Lett."},{"key":"B175","doi-asserted-by":"publisher","first-page":"11834","DOI":"10.1109\/LRA.2024.3498776","article-title":"DexDiffuser: generating dexterous grasps with diffusion models","volume":"9","author":"Weng","year":"2024","journal-title":"IEEE Robotics Automation Lett."},{"key":"B176","volume-title":"Unidexfpm: universal dexterous functional pre-grasp manipulation via diffusion policy","author":"Wu","year":""},{"key":"B177","first-page":"22132","article-title":"Learning score-based grasping primitive for human-assisting dexterous grasping","volume":"36","author":"Wu","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B178","article-title":"Learning score-based grasping primitive for human-assisting dexterous grasping","volume":"36","author":"Wu","year":"","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B179","article-title":"ChainedDiffuser: unifying trajectory diffusion and keypose prediction for robotic manipulation","author":"Xian","year":"2023"},{"key":"B180","first-page":"3536","article-title":"XSkill: Cross embodiment skill discovery","volume":"229","author":"Xu","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B181","volume-title":"\u201cset it up!\u201d: functional object arrangement with compositional generative models","author":"Xu","year":"2024"},{"key":"B182","article-title":"Learning interactive real-world simulators","author":"Yang","year":"2024"},{"key":"B183","first-page":"3242","article-title":"Compositional diffusion-based continuous constraint solvers","volume":"229","author":"Yang","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B184","first-page":"1911","article-title":"G-HOP: generative hand-object prior for interaction reconstruction and grasp synthesis","author":"Ye","year":"2024"},{"key":"B185","first-page":"25702","article-title":"Latent diffusion energy-based model for interpretable text modelling","volume":"162","author":"Yu","year":"2022","journal-title":"Proc. 39th Int. Conf. Mach. Learn."},{"key":"B186","first-page":"1094","article-title":"Meta-world: a benchmark and evaluation for multi-task and Meta reinforcement learning","volume":"100","author":"Yu","year":"2020","journal-title":"Proc. Conf. Robot. Learn."},{"key":"B187","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2211.04604","article-title":"Scaling robot learning with semantic data augmentation through diffusion models","author":"Yu","year":"2023","journal-title":"Robotics Sci. Syst."},{"key":"B188","doi-asserted-by":"publisher","first-page":"7173","DOI":"10.1109\/tcyb.2024.3395626","article-title":"A survey of imitation learning: Algorithms, recent developments, and challenges","volume":"54","author":"Zare","year":"2024","journal-title":"IEEE Trans. Cybern."},{"key":"B189","first-page":"284","article-title":"GNFactor: multi-task real robot learning with generalizable neural feature fields","volume":"229","author":"Ze","year":"2023","journal-title":"Proc. 7th Conf. Robot Learn."},{"key":"B190","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2403.0395","article-title":"3D diffusion policy: generalizable visuomotor policy learning via simple 3D representations","author":"Ze","year":"2024","journal-title":"Robotics Sci. Syst."},{"key":"B191","doi-asserted-by":"publisher","first-page":"8258","DOI":"10.1109\/LRA.2024.3438036","article-title":"LVDiffusor: distilling functional rearrangement priors from large models into diffusor","volume":"9","author":"Zeng","year":"2024","journal-title":"IEEE Robotics Automation Lett."},{"key":"B192","article-title":"Language control diffusion: efficiently scaling through space, time, and tasks","author":"Zhang","year":""},{"key":"B193","volume-title":"Affordance-based robot manipulation with flow matching","author":"Zhang","year":"2025"},{"key":"B194","article-title":"DexGraspNet 2.0: learning generative dexterous grasping in large-scale synthetic cluttered scenes","author":"Zhang","year":""},{"key":"B195","volume-title":"NaVid: video-based VLM plans the next step for vision-and-language navigation","author":"Zhang","year":""},{"key":"B196","volume-title":"ManiDext: hand-object manipulation synthesis via continuous correspondence embeddings and residual-guided diffusion","author":"Zhang","year":""},{"key":"B197","volume-title":"VLABench: a large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks","author":"Zhang","year":""},{"key":"B198","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2402.17768","article-title":"Diffusion meets DAgger: supercharging eye-in-hand imitation learning","author":"Zhang","year":"","journal-title":"Robotics Sci. Syst."},{"key":"B199","first-page":"80178","article-title":"PLANNER: generating diversified paragraph via latent language diffusion model","volume":"36","author":"Zhang","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B200","doi-asserted-by":"publisher","first-page":"284","DOI":"10.1007\/978-3-031-73390-1_17","article-title":"Nl2contact: natural language guided 3d hand-object contact modeling with diffusion model","volume":"2024","author":"Zhang","year":"2025","journal-title":"Comput. Vis. \u2013 ECCV"},{"key":"B201","volume-title":"GRAPE: generalizing robot policy via preference alignment","author":"Zhang","year":""},{"key":"B202","volume-title":"DexGrasp-Diffusion: diffusion-based unified functional grasp synthesis method for multi-dexterous robotic hands","author":"Zhang","year":""},{"key":"B203","volume-title":"AnyPlace: learning generalized object placement for robot manipulation","author":"Zhao","year":"2025"},{"key":"B204","first-page":"61229","article-title":"3D-VLA: a 3D vision-language-action generative world model","volume":"235","author":"Zhen","year":"2024","journal-title":"Proc. 41st Int. Conf. Mach. Learn."},{"key":"B205","volume-title":"GAGrasp: geometric algebra diffusion for dexterous grasping","author":"Zhong","year":"2025"},{"key":"B206","article-title":"Variational distillation of diffusion policies into mixture of experts","author":"Zhou","year":""},{"key":"B207","article-title":"RoboDreamer: learning compositional world models for robot imagination","author":"Zhou","year":""},{"key":"B208","first-page":"44000","article-title":"Adaptive online replanning with diffusion models","volume":"36","author":"Zhou","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."}],"container-title":["Frontiers in Robotics and AI"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2025.1606247\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T04:24:12Z","timestamp":1757391852000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2025.1606247\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,9]]},"references-count":208,"alternative-id":["10.3389\/frobt.2025.1606247"],"URL":"https:\/\/doi.org\/10.3389\/frobt.2025.1606247","relation":{},"ISSN":["2296-9144"],"issn-type":[{"value":"2296-9144","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,9]]},"article-number":"1606247"}}