{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T22:08:28Z","timestamp":1777586908223,"version":"3.51.4"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T00:00:00Z","timestamp":1766966400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T00:00:00Z","timestamp":1766966400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["TRR-169"],"award-info":[{"award-number":["TRR-169"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010896","name":"International Cooperation and Exchange Programme","doi-asserted-by":"publisher","award":["6206113600"],"award-info":[{"award-number":["6206113600"]}],"id":[{"id":"10.13039\/501100010896","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Existing task-oriented grasping approaches often rely on 2D pixel-wise affordance segmentation or predefined part annotations, limiting their applicability in unstructured 3D environments and constraining the grasp planning space. To overcome these limitations, we introduce a novel affordance-labeled grasp dataset constructed on simulation, capturing diverse functional interactions across object categories in a 6-DoF space. Building on this foundation, we propose a unified, language-guided grasping framework that takes partial point clouds and natural language instructions as input to generate semantically meaningful and geometrically feasible grasp poses. Specifically, a vision-language affordance grounding module produces dense 3D affordance maps aligned with task semantics, and a task-oriented grasp pipeline predicts coarse grasp candidates with implicit affordance cues. The coarse grasp proposals are subsequently refined based on visual affordance guidance, significantly enhancing both semantic alignment and grasp practicality. Extensive experiments in synthetic and real-world scenarios demonstrate that our method outperforms state-of-the-art approaches, effectively generalizing across diverse objects and tasks.<\/jats:p>","DOI":"10.1007\/s40747-025-02169-0","type":"journal-article","created":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T13:25:13Z","timestamp":1767014713000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Enhancing task-oriented robotic grasping via 3D affordance grounding from vision-language models"],"prefix":"10.1007","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0169-8896","authenticated-orcid":false,"given":"Wenkai","family":"Chen","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shang-Ching","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qingdu","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yung-Hui","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianwei","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,12,29]]},"reference":[{"issue":"2\u20133","key":"2169_CR1","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1177\/0278364919872545","volume":"39","author":"K Fang","year":"2020","unstructured":"Fang K, Zhu Y, Garg A, Kurenkov A, Mehta V, Fei-Fei L, Savarese S (2020) Learning task-oriented grasping for tool manipulation from simulated self-supervision. Int J Robot Res 39(2\u20133):202\u2013216","journal-title":"Int J Robot Res"},{"key":"2169_CR2","unstructured":"Murali A, Liu W, Marino K, Chernova S, Gupta A (2021) Same object, different grasps: Data and semantic knowledge for task-oriented grasping. In: Conference on robot learning. PMLR, pp 1540\u20131557"},{"key":"2169_CR3","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2022.102371","volume":"77","author":"W Hu","year":"2022","unstructured":"Hu W, Wang C, Liu F, Peng X, Sun P, Tan J (2022) A grasps-generation-and-selection convolutional neural network for a digital twin of intelligent robotic grasping. Robot Comput Integr Manuf 77:102371","journal-title":"Robot Comput Integr Manuf"},{"key":"2169_CR4","doi-asserted-by":"crossref","unstructured":"Liang H, Ma X, Li S, G\u00f6rner M, Tang S, Fang B, Sun F, Zhang J (2019) PointNetGPD: detecting grasp configurations from point sets. In: IEEE international conference on robotics and automation (ICRA), pp 3629\u20133635","DOI":"10.1109\/ICRA.2019.8794435"},{"issue":"3","key":"2169_CR5","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1007\/s10846-022-01586-4","volume":"104","author":"W Chen","year":"2022","unstructured":"Chen W, Liang H, Chen Z, Sun F, Zhang J (2022) Improving object grasp performance via transformer-based sparse shape completion. J Intell Robot Syst 104(3):45","journal-title":"J Intell Robot Syst"},{"key":"2169_CR6","doi-asserted-by":"crossref","unstructured":"Mousavian A, Eppner C, Fox D (2019) 6-DOF GraspNet: variational grasp generation for object manipulation. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 2901\u20132910","DOI":"10.1109\/ICCV.2019.00299"},{"issue":"5","key":"2169_CR7","doi-asserted-by":"publisher","first-page":"3929","DOI":"10.1109\/TRO.2023.3281153","volume":"39","author":"H-S Fang","year":"2023","unstructured":"Fang H-S, Wang C, Fang H, Gou M, Liu J, Yan H, Liu W, Xie Y, Lu C (2023) AnyGrasp: robust and efficient grasp perception in spatial and temporal domains. IEEE Trans Robot 39(5):3929\u20133945","journal-title":"IEEE Trans Robot"},{"issue":"2","key":"2169_CR8","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1109\/LRA.2019.2894439","volume":"4","author":"F-J Chu","year":"2019","unstructured":"Chu F-J, Xu R, Vela PA (2019) Learning affordance segmentation for real-world robotic manipulation via synthetic images. IEEE Robot Autom Lett 4(2):1140\u20131147","journal-title":"IEEE Robot Autom Lett"},{"key":"2169_CR9","doi-asserted-by":"crossref","unstructured":"Chen W, Liang H, Chen Z, Sun F, Zhang J (2022) Learning 6-DOF task-oriented grasp detection via implicit estimation and visual affordance. In: 2022 IEEE\/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 762\u2013769","DOI":"10.1109\/IROS47612.2022.9981900"},{"issue":"8","key":"2169_CR10","doi-asserted-by":"publisher","first-page":"10304","DOI":"10.1109\/TII.2024.3393007","volume":"20","author":"W Chen","year":"2024","unstructured":"Chen W, Liu S-C, Zhang J (2024) EHoA: a benchmark for task-oriented hand-object action recognition via event vision. IEEE Trans Ind Inf 20(8):10304\u201310313","journal-title":"IEEE Trans Ind Inf"},{"key":"2169_CR11","unstructured":"Jang Y, Xu D, Su H, Finn C (2024) GLOVER: generalizable open-vocabulary affordance reasoning for task-oriented grasping. arXiv http:\/\/arxiv.org\/abs\/2411.12286. Accessed 2025-03-12"},{"key":"2169_CR12","doi-asserted-by":"crossref","unstructured":"Tang Y, Zhang S, Hao X, Wang P, Wu J, Wang Z, Zhang S (2025) AffordGrasp: in-context affordance reasoning for open-vocabulary task-oriented grasping in clutter. arXiv http:\/\/arxiv.org\/abs\/2503.00778. Accessed 2025-03-12","DOI":"10.1109\/IROS60139.2025.11245995"},{"issue":"3","key":"2169_CR13","doi-asserted-by":"publisher","first-page":"2683","DOI":"10.1109\/TIE.2023.3270537","volume":"71","author":"Y Wu","year":"2023","unstructured":"Wu Y, Fu Y, Wang S (2023) Information-theoretic exploration for adaptive robotic grasping in clutter based on real-time pixel-level grasp detection. IEEE Trans Ind Electron 71(3):2683\u20132693","journal-title":"IEEE Trans Ind Electron"},{"issue":"2","key":"2169_CR14","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1109\/LRA.2019.2894439","volume":"4","author":"F-J Chu","year":"2019","unstructured":"Chu F-J, Xu R, Vela PA (2019) Learning affordance segmentation for real-world robotic manipulation via synthetic images. IEEE Robot Autom Lett 4(2):1140\u20131147. https:\/\/doi.org\/10.1109\/LRA.2019.2894439","journal-title":"IEEE Robot Autom Lett"},{"key":"2169_CR15","doi-asserted-by":"crossref","unstructured":"Deng S, Xu X, Wu C, Chen K, Jia K (2021) 3D AffordanceNet: a benchmark for visual object affordance understanding. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1778\u20131787","DOI":"10.1109\/CVPR46437.2021.00182"},{"key":"2169_CR16","unstructured":"Xu C, Chen Y, Wang H, Zhu S-C, Zhu Y, Huang S (2022) PartAfford: part-level affordance discovery from 3D objects. arXiv http:\/\/arxiv.org\/abs\/2202.13519. Accessed 2024-08-12"},{"key":"2169_CR17","doi-asserted-by":"crossref","unstructured":"Wang Y, Wu R, Mo K, Ke J, Fan Q, Guibas LJ, Dong H (2022) AdaAfford: learning to adapt manipulation affordance for 3D articulated objects via few-shot interactions. In: European conference on computer vision. Springer, pp 90\u2013107","DOI":"10.1007\/978-3-031-19818-2_6"},{"key":"2169_CR18","doi-asserted-by":"crossref","unstructured":"Chen D, Kong D, Li J, Yin B (2025) MaskPrompt: open-vocabulary affordance segmentation with object shape mask prompts. In: Proceedings of the AAAI conference on artificial intelligence, vol 39, pp 2034\u20132042","DOI":"10.1609\/aaai.v39i2.32200"},{"issue":"13\u201314","key":"2169_CR19","first-page":"1455","volume":"36","author":"A Pas","year":"2017","unstructured":"Pas A, Gualtieri M, Saenko K, Platt R (2017) Grasp pose detection in point clouds. Int J Robot Res 36(13\u201314):1455\u20131473","journal-title":"Int J Robot Res"},{"key":"2169_CR20","doi-asserted-by":"publisher","first-page":"15200","DOI":"10.1109\/TASE.2025.3566461","volume":"22","author":"Y Song","year":"2025","unstructured":"Song Y, Sun P, Jin P, Ren Y, Zheng Y, Li Z, Chu X, Zhang Y, Li T, Gu J (2025) Learning 6-dof fine-grained grasp detection based on part affordance grounding. IEEE Trans Autom Sci Eng 22:15200\u201315214. https:\/\/doi.org\/10.1109\/TASE.2025.3566461","journal-title":"IEEE Trans Autom Sci Eng"},{"key":"2169_CR21","unstructured":"Tziafas G, Kasaei H (2025) Towards open-world grasping with large vision-language models. In: Conference on robot learning. PMLR, pp 3304\u20133332"},{"issue":"12","key":"2169_CR22","doi-asserted-by":"publisher","first-page":"11834","DOI":"10.1109\/LRA.2024.3498776","volume":"9","author":"Z Weng","year":"2024","unstructured":"Weng Z, Lu H, Kragic D, Lundell J (2024) DexDiffuser: generating dexterous grasps with diffusion models. IEEE Robot Autom Lett 9(12):11834\u201311840. https:\/\/doi.org\/10.1109\/LRA.2024.3498776","journal-title":"IEEE Robot Autom Lett"},{"key":"2169_CR23","first-page":"46881","volume":"37","author":"Y-L Wei","year":"2024","unstructured":"Wei Y-L, Jiang J-J, Xing C, Tan X-T, Wu X-M, Li H, Cutkosky M, Zheng W-S (2024) Grasp as you say: language-guided dexterous grasp generation. Adv Neural Inf Process Syst 37:46881\u201346907","journal-title":"Adv Neural Inf Process Syst"},{"key":"2169_CR24","doi-asserted-by":"crossref","unstructured":"Nguyen T, Vu MN, Huang B, Van\u00a0Vo T, Truong V, Le N, Vo T, Le B, Nguyen A (2024) Language-conditioned affordance-pose detection in 3D point clouds. In: 2024 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3071\u20133078","DOI":"10.1109\/ICRA57147.2024.10610008"},{"key":"2169_CR25","doi-asserted-by":"crossref","unstructured":"Eppner C, Mousavian A, Fox D (2021) Acronym: a large-scale grasp dataset based on simulation. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 6222\u20136227","DOI":"10.1109\/ICRA48506.2021.9560844"},{"issue":"2","key":"2169_CR26","doi-asserted-by":"publisher","first-page":"2870","DOI":"10.1109\/LRA.2021.3062560","volume":"6","author":"R Xu","year":"2021","unstructured":"Xu R, Chu F-J, Tang C, Liu W, Vela PA (2021) An affordance keypoint detection network for robot manipulation. IEEE Robot Autom Lett 6(2):2870\u20132877","journal-title":"IEEE Robot Autom Lett"},{"key":"2169_CR27","unstructured":"Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H et al (2015) ShapeNet: an information-rich 3D model repository. arXiv:1512.03012. Accessed 2025-03-12"},{"key":"2169_CR28","unstructured":"Han J, Zhang R, Shao W, Gao P, Xu P, Xiao H, Zhang K, Liu C, Wen S, Guo Z, Lu X, Ren S, Wen Y, Chen X, Yue X, Li H, Qiao Y (2024) ImageBind-LLM: multi-modality instruction tuning. arXiv:2309.03905. Accessed 2024-08-12"},{"key":"2169_CR29","doi-asserted-by":"crossref","unstructured":"Yu X, Tang L, Rao Y, Huang T, Zhou J, Lu J (2021) Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In: IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 19291\u201319300","DOI":"10.1109\/CVPR52688.2022.01871"},{"key":"2169_CR30","doi-asserted-by":"crossref","unstructured":"He S, Ding H, Jiang X, Wen B (2024) SegPoint: segment any point cloud via large language model. In: European conference on computer vision. Springer, pp 349\u2013367","DOI":"10.1007\/978-3-031-72670-5_20"},{"key":"2169_CR31","unstructured":"AI@Meta: Llama 3 Model Card. https:\/\/github.com\/meta-llama\/llama3\/blob\/main\/model_card.md"},{"key":"2169_CR32","doi-asserted-by":"crossref","unstructured":"Qian S, Chen W, Bai M, Zhou X, Tu Z, Li LE (2024) AffordanceLLM: grounding affordance from vision language models. In: IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 7587\u20137597","DOI":"10.1109\/CVPRW63382.2024.00754"},{"key":"2169_CR33","unstructured":"Guo Z, Zhang R, Zhu X, Tang Y, Ma X, Han J, Chen K, Gao P, Li X, Li H, Heng P-A (2024) Point-bind and point-LLM: aligning point cloud with multi-modality for 3D understanding, generation, and instruction following. arXiv:2309.00615. Accessed 2024-08-12"},{"key":"2169_CR34","unstructured":"Zhang R, Han J, Liu C, Gao P, Zhou A, Hu X, Yan S, Lu P, Li H, Qiao Y (2023) LLaMA-Adapter: efficient fine-tuning of language models with zero-init attention. arXiv:2303.16199. Accessed 2024-08-12"},{"key":"2169_CR35","doi-asserted-by":"crossref","unstructured":"Deng S, Xu X, Wu C, Chen K, Jia K (2021) 3D AffordanceNet: a benchmark for visual object affordance understanding. In: IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 1778\u20131787","DOI":"10.1109\/CVPR46437.2021.00182"},{"key":"2169_CR36","unstructured":"Li K, Malik J (2018) Implicit maximum likelihood estimation. arXiv:1809.09087. Accessed 2024-08-12"},{"key":"2169_CR37","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27"},{"key":"2169_CR38","unstructured":"Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf Process Syst 30"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-02169-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-025-02169-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-02169-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T11:49:18Z","timestamp":1769773758000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-025-02169-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,29]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["2169"],"URL":"https:\/\/doi.org\/10.1007\/s40747-025-02169-0","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,29]]},"assertion":[{"value":"17 July 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 November 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 December 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no Conflict of interest to declare that are relevant to the content of this article. Financial or non-financial interests: The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This study does not involve any human participants or animals.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}}],"article-number":"42"}}