{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T14:12:15Z","timestamp":1777299135742,"version":"3.51.4"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T00:00:00Z","timestamp":1775088000000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100020950","name":"National Science and Technology Council","doi-asserted-by":"publisher","award":["NSTC114-2221-E-992-055"],"award-info":[{"award-number":["NSTC114-2221-E-992-055"]}],"id":[{"id":"10.13039\/501100020950","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The advancement of generative artificial intelligence, particularly with the advent of diffusion models and three-dimensional (3D) Gaussian Splatting (3DGS), has introduced novel avenues for manipulating and synthesizing 3D models. However, current 3D editing methods primarily focus on global style transfers or constrained geometric deformations. They face significant challenges in executing fine-grained, part-level manipulations guided by text prompts, especially for complex tasks that require simultaneous changes to both geometry and appearance. Many existing approaches operate at the rendering level, which hinders the creation of new geometric structures. To overcome these limitations, we propose Mask2-3D, a diffusion-based framework for prompt-driven, part-level 3D editing. The core of our framework is a learnable, multi-view mask generator that predicts a coherent editing region rather than just segmenting existing contours. This unique mechanism provides the flexibility to create new shape architectures and undergo significant geometric modifications. Furthermore, the system integrates a LoRA-finetuned diffusion model to facilitate high-fidelity content synthesis and style transfer within these designated regions, while a subsequent re-rendering process ensures multi-view consistency. By implementing this innovative workflow, Mask2-3D enables precise, flexible, and structurally sound local editing of 3D models via natural language commands, significantly enhancing the intuitiveness and creative freedom of the 3D content creation process.<\/jats:p>","DOI":"10.1093\/jcde\/qwag035","type":"journal-article","created":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T11:45:50Z","timestamp":1775043950000},"page":"273-289","source":"Crossref","is-referenced-by-count":0,"title":["A diffusion framework based on prompt-driven masks for part-level 3D editing"],"prefix":"10.1093","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-7579-4713","authenticated-orcid":false,"given":"Jian-Ru","family":"Zhu","sequence":"first","affiliation":[{"name":"National Kaohsiung University of Science and Technology Department of Electrical Engineering, , No. 415, Jiangong Rd., Sanmin Dist., Kaohsiung City 807618 ,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-1577-948X","authenticated-orcid":false,"given":"Hong-Yu","family":"Jian","sequence":"additional","affiliation":[{"name":"National Kaohsiung University of Science and Technology Department of Electrical Engineering, , No. 415, Jiangong Rd., Sanmin Dist., Kaohsiung City 807618 ,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8875-678X","authenticated-orcid":false,"given":"Tyng-Yeu","family":"Liang","sequence":"additional","affiliation":[{"name":"National Kaohsiung University of Science and Technology Department of Electrical Engineering, , No. 415, Jiangong Rd., Sanmin Dist., Kaohsiung City 807618 ,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2026,4,2]]},"reference":[{"key":"2026042709231835900_bib1","first-page":"40","article-title":"Learning representations and generative models for 3D point clouds","volume-title":"International Conference on Machine Learning","author":"Achlioptas","year":"2018"},{"key":"2026042709231835900_bib2","doi-asserted-by":"publisher","first-page":"18392","DOI":"10.1109\/CVPR52729.2023.01764","article-title":"InstructPix2Pix: Learning to follow image editing instructions","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Brooks","year":"2023"},{"key":"2026042709231835900_bib3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1512.03012","article-title":"ShapeNet: An information-rich 3D model repository","volume-title":"preprint arXiv: 1512.03012","author":"Chang","year":"2015"},{"key":"2026042709231835900_bib4","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1007\/978-3-030-20893-6_7","article-title":"Text2Shape: Generating shapes from natural language by learning joint embedding","volume-title":"Computer Vision\u2014ACCV 2018","author":"Chen","year":"2019"},{"key":"2026042709231835900_bib5","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1706.05587","article-title":"Rethinking atrous convolution for semantic image segmentation","volume-title":"arXiv preprint","author":"Chen","year":"2017"},{"key":"2026042709231835900_bib6","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1007\/978-3-031-72904-1_5","article-title":"DGE: Direct Gaussian 3D editing by consistent multi-view editing","volume":"15132","author":"Chen","year":"2024","journal-title":"Computer Vision\u2014ECCV 2024 (Lecture Notes in Computer Science"},{"key":"2026042709231835900_bib7","doi-asserted-by":"publisher","first-page":"18593","DOI":"10.1109\/CVPR52688.2022.01806","article-title":"UNIST: Unpaired neural implicit shape translation network","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Chen","year":"2022"},{"key":"2026042709231835900_bib8","doi-asserted-by":"publisher","first-page":"21476","DOI":"10.1109\/CVPR52733.2024.02029","article-title":"GaussianEditor: Swift and controllable 3D editing with Gaussian splatting","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CPVR)","author":"Chen","year":"2024"},{"key":"2026042709231835900_bib9","article-title":"3DGS-Drag: Dragging Gaussians for intuitive point-based 3D editing","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"Dong","year":"2025"},{"key":"2026042709231835900_bib10","article-title":"An image is worth 16\u00d716 words: Transformers for image recognition at scale","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Dosovitskiy","year":"2021"},{"key":"2026042709231835900_bib11","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1093\/jcde\/qwaf091","article-title":"Automatic reconstruction of 3D topology optimization to editable CAD model with rotation minimizing frames","volume":"12","author":"Feng","year":"2025","journal-title":"Journal of Computational Design and Engineering"},{"key":"2026042709231835900_bib12","first-page":"8882","article-title":"ShapeCrafter: A recursive text-conditioned 3D shape generation model","volume":"35","author":"Fu","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2026042709231835900_bib13","doi-asserted-by":"publisher","first-page":"19683","DOI":"10.1109\/ICCV51070.2023.01808","article-title":"Instruct-NeRF2NeRF: Editing 3D scenes with instructions","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Haque","year":"2023"},{"key":"2026042709231835900_bib14","doi-asserted-by":"publisher","first-page":"2980","DOI":"10.1109\/ICCV.2017.322","article-title":"Mask R-CNN","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV)","author":"He","year":"2017"},{"key":"2026042709231835900_bib15","article-title":"LoRA: Low-rank adaptation of large language models","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Hu","year":"2022"},{"key":"2026042709231835900_bib16","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2405.15491","article-title":"GSDeformer: Direct, real-time and extensible cage-based deformation for 3D Gaussian splatting","volume-title":"arXiv preprint","author":"Huang","year":"2024"},{"key":"2026042709231835900_bib17","doi-asserted-by":"publisher","first-page":"857","DOI":"10.1109\/CVPR52688.2022.00094","article-title":"Zero-shot text-guided object generation with Dream Fields","volume-title":"2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Jain","year":"2022"},{"key":"2026042709231835900_bib18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1093\/jcde\/qwaf018","article-title":"Sketch-based modeling with perception-aware extraction and intention-aware snapping of contours","volume":"12","author":"Jin","year":"2025","journal-title":"Journal of Computational Design and Engineering"},{"key":"2026042709231835900_bib19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3592433","article-title":"3D Gaussian splatting for real-time radiance field rendering","volume":"42","author":"Kerbl","year":"2023","journal-title":"ACM Transactions on Graphics"},{"key":"2026042709231835900_bib20","doi-asserted-by":"publisher","first-page":"3992","DOI":"10.1109\/ICCV51070.2023.00371","article-title":"Segment anything","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Kirillov","year":"2023"},{"key":"2026042709231835900_bib21","doi-asserted-by":"publisher","first-page":"1931","DOI":"10.1109\/CVPR52729.2023.00192","article-title":"Multi-concept customization of text-to-image diffusion","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Kumari","year":"2023"},{"key":"2026042709231835900_bib22","doi-asserted-by":"publisher","first-page":"11362","DOI":"10.1609\/aaai.v34i07.6798","article-title":"Learning part generation and assembly for structure-aware shape synthesis","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Li","year":"2020"},{"key":"2026042709231835900_bib23","article-title":"VoxHammer: Training-free precise and coherent 3D editing in native 3D space","volume-title":"Proceedings of the International Conference on 3D Vision (3DV)","author":"Li","year":"2026"},{"key":"2026042709231835900_bib24","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1093\/jcde\/qwaf102","article-title":"CADCL: Reconstruct parametric CAD models from B-rep via contrastive learning","volume":"12","author":"Liang","year":"2025","journal-title":"Journal of Computational Design and Engineering"},{"key":"2026042709231835900_bib25","doi-asserted-by":"publisher","first-page":"300","DOI":"10.1109\/CVPR52729.2023.00037","article-title":"Magic3D: High-resolution text-to-3D content creation","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Lin","year":"2023"},{"key":"2026042709231835900_bib26","doi-asserted-by":"publisher","first-page":"9264","DOI":"10.1109\/ICCV51070.2023.00853","article-title":"Zero-1-to-3: Zero-shot one image to 3D object","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Liu","year":"2023"},{"key":"2026042709231835900_bib27","doi-asserted-by":"publisher","first-page":"7076","DOI":"10.1109\/CVPR52688.2022.00695","article-title":"Image segmentation using text and image prompts","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"L\u00fcddecke","year":"2022"},{"key":"2026042709231835900_bib28","doi-asserted-by":"publisher","first-page":"12663","DOI":"10.1109\/CVPR52729.2023.01218","article-title":"Latent-NeRF for shape-guided generation of 3D shapes and textures","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CPVR)","author":"Metzer","year":"2023"},{"key":"2026042709231835900_bib29","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1145\/3503250","article-title":"NeRF: Representing scenes as neural radiance fields for view synthesis","volume":"65","author":"Mildenhall","year":"2020","journal-title":"Communications of the ACM"},{"key":"2026042709231835900_bib30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cag.2021.02.006","article-title":"LPMNet: Latent part modification and generation for 3D point clouds","volume":"96","author":"\u00d6ng\u00fcn","year":"2021","journal-title":"Computers & Graphics"},{"key":"2026042709231835900_bib31","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1093\/jcde\/qwae099","article-title":"Three-dimensional hullform reconstruction from two-dimensional drawings based on image processing techniques","volume":"11","author":"Park","year":"2024","journal-title":"Journal of Computational Design and Engineering"},{"key":"2026042709231835900_bib32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3588432.3591513","article-title":"Zero-shot image-to-image translation","volume-title":"Proceedings of the ACM SIGGRAPH 2023 Conference","author":"Parmar","year":"2023"},{"key":"2026042709231835900_bib33","doi-asserted-by":"publisher","first-page":"3942","DOI":"10.1609\/aaai.v32i1.11671","article-title":"FiLM: Visual reasoning with a general conditioning layer","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Perez","year":"2018"},{"key":"2026042709231835900_bib34","first-page":"1862","article-title":"SDXL: Improving latent diffusion models for high-resolution image synthesis","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Podell","year":"2024"},{"key":"2026042709231835900_bib35","first-page":"8748","article-title":"Learning transferable visual models from natural language supervision","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Radford","year":"2021"},{"key":"2026042709231835900_bib36","article-title":"SAM 2: Segment anything in images and videos","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Ravi","year":"2025"},{"key":"2026042709231835900_bib37","doi-asserted-by":"publisher","first-page":"10674","DOI":"10.1109\/CVPR52688.2022.01042","article-title":"High-resolution image synthesis with latent diffusion models","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Rombach","year":"2022"},{"key":"2026042709231835900_bib38","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1007\/978-3-319-24574-4_28","article-title":"U-Net: Convolutional networks for biomedical image segmentation","volume-title":"Medical Image Computing and Computer-Assisted Intervention\u2014MICCAI 2015 (Lecture Notes in Computer Science","author":"Ronneberger","year":"2015"},{"key":"2026042709231835900_bib39","doi-asserted-by":"publisher","first-page":"22500","DOI":"10.1109\/CVPR52729.2023.02155","article-title":"DreamBooth: Fine-tuning text-to-image diffusion models for subject-driven generation","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CPVR)","author":"Ruiz","year":"2023"},{"key":"2026042709231835900_bib40","doi-asserted-by":"publisher","first-page":"(pp. 18582","DOI":"10.1109\/CVPR52688.2022.01805","article-title":"CLIP-Forge: Towards zero-shot text-to-shape generation","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Sanghi","year":"2022"},{"key":"2026042709231835900_bib41","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2011.13388","article-title":"3DSNet: Unsupervised shape-to-shape 3D style transfer","volume-title":"arXiv preprint","author":"Segu","year":"2021"},{"key":"2026042709231835900_bib42","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2310.15110","article-title":"Zero123++: A single image to consistent multi-view diffusion base model","volume-title":"arXiv preprint","author":"Shi","year":"2023"},{"key":"2026042709231835900_bib43","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3588432.3591506","article-title":"Key-locked rank one editing for text-to-image personalization","volume-title":"Proceedings of the ACM SIGGRAPH 2023 Conference","author":"Tewel","year":"2023"},{"key":"2026042709231835900_bib44","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2504.12800","article-title":"CAGE-GS: High-fidelity cage-based 3D Gaussian splatting deformation","volume-title":"arXiv preprint","author":"Tong","year":"2025"},{"key":"2026042709231835900_bib45","doi-asserted-by":"publisher","first-page":"21469","DOI":"10.1109\/CVPR52734.2025.02000","article-title":"Structured 3D latents for scalable and versatile 3D generation","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Xiang","year":"2025"},{"key":"2026042709231835900_bib46","doi-asserted-by":"publisher","first-page":"20908","DOI":"10.1109\/CVPR52729.2023.02003","article-title":"Dream3D: Zero-shot text-to-3D synthesis using 3D shape prior and text-to-image diffusion models","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Xu","year":"2023"},{"key":"2026042709231835900_bib47","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1093\/jcde\/qwaf116","article-title":"DVS-3D: Diffusion-based novel view synthesis and 3D object reconstruction from a single image","volume":"12","author":"Xu","year":"2025","journal-title":"Journal of Computational Design and Engineering"},{"key":"2026042709231835900_bib48","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2306.03908","article-title":"SAM3D: Segment anything in 3D scenes","volume-title":"arXiv preprint","author":"Yang","year":"2023"},{"key":"2026042709231835900_bib49","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3355089.3356494","article-title":"LOGAN: Unpaired shape transform in latent overcomplete space","volume":"38","author":"Yin","year":"2019","journal-title":"ACM Transactions on Graphics"},{"key":"2026042709231835900_bib50","doi-asserted-by":"publisher","first-page":"e70104","DOI":"10.1049\/ipr2.70104","article-title":"PartConverter: A part-oriented transformation framework for point clouds","volume":"19","author":"Zeng","year":"2025","journal-title":"IET Image Processing"},{"key":"2026042709231835900_bib51","doi-asserted-by":"publisher","first-page":"3813","DOI":"10.1109\/ICCV51070.2023.00355","article-title":"Adding conditional control to text-to-image diffusion models","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Zhang","year":"2023"},{"key":"2026042709231835900_bib52","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2306.05668","article-title":"RePaint-NeRF: NeRF editing via semantic masks and diffusion models","volume-title":"arXiv preprint","author":"Zhou","year":"2023"},{"key":"2026042709231835900_bib53","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3658205","article-title":"TIP-Editor: An accurate 3D editor following both text-prompts and image-prompts","volume":"43","author":"Zhuang","year":"2024","journal-title":"ACM Transactions on Graphics"}],"container-title":["Journal of Computational Design and Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jcde\/advance-article-pdf\/doi\/10.1093\/jcde\/qwag035\/67725526\/qwag035.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jcde\/article-pdf\/13\/4\/273\/67725526\/qwag035.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jcde\/article-pdf\/13\/4\/273\/67725526\/qwag035.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T13:23:30Z","timestamp":1777296210000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jcde\/article\/13\/4\/273\/8572514"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,1]]},"references-count":53,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2026,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jcde\/qwag035","relation":{},"ISSN":["2288-5048"],"issn-type":[{"value":"2288-5048","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,4]]},"published":{"date-parts":[[2026,4,1]]}}}