{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T09:13:33Z","timestamp":1774602813021,"version":"3.50.1"},"reference-count":81,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T00:00:00Z","timestamp":1772064000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T00:00:00Z","timestamp":1772064000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2026,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    The limited availability of 3D medical image datasets, due to privacy concerns and high collection or annotation costs, poses significant challenges in the field of medical imaging. There are few solutions for realistic 3D medical image synthesis due to difficulties in backbone design and fewer 3D training samples compared to 2D counterparts. In this paper, we propose\n                    <jats:bold>GEM-3D<\/jats:bold>\n                    , a novel generative approach to the synthesis of 3D medical images and the enhancement of existing datasets using conditional diffusion models. Our method begins with a 2D slice, noted as the informed slice to serve the patient prior, and propagates the generation process using a 3D segmentation mask. By decomposing the 3D medical images into editable masks and patient prior information, GEM-3D offers a flexible yet effective solution for generating versatile 3D images from existing datasets. Moreover, as the informed slice contains patient-wise information, GEM-3D can also facilitate counterfactual image synthesis and dataset-level de-enhancement with desired control. Experiments on brain MRI and abdomen CT images demonstrate that GEM-3D is capable of synthesizing high-quality 3D medical images with volumetric consistency, offering a straightforward solution for dataset enhancement during inference. The code is available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/HKU-MedAI\/GEM-3D\" ext-link-type=\"uri\">https:\/\/github.com\/HKU-MedAI\/GEM-3D<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1007\/s11263-026-02789-0","type":"journal-article","created":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T19:09:05Z","timestamp":1772132945000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Generative Enhancement for 3D Medical Images"],"prefix":"10.1007","volume":"134","author":[{"given":"Lingting","family":"Zhu","sequence":"first","affiliation":[]},{"given":"Noel","family":"Codella","sequence":"additional","affiliation":[]},{"given":"Dongdong","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Zhenchao","family":"Jin","sequence":"additional","affiliation":[]},{"given":"Lu","family":"Yuan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9315-6527","authenticated-orcid":false,"given":"Lequan","family":"Yu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,2,26]]},"reference":[{"issue":"1","key":"2789_CR1","doi-asserted-by":"publisher","first-page":"4128","DOI":"10.1038\/s41467-022-30695-9","volume":"13","author":"M Antonelli","year":"2022","unstructured":"Antonelli, M., Reinke, A., Bakas, S., Farahani, K., Kopp-Schneider, A., Landman, B. A., Litjens, G., Menze, B., Ronneberger, O., Summers, R. M., et al. (2022). The medical segmentation decathlon. Nature Communications, 13(1), 4128.","journal-title":"Nature Communications"},{"key":"2789_CR2","unstructured":"Azizi, S., Kornblith, S., Saharia, C., Norouzi, M., & Fleet, D. J. (2023). Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:2304.08466."},{"key":"2789_CR3","unstructured":"Bao, F., Nie, S., Xue, K., Li, C., Pu, S., Wang, Y., Yue, G., Cao, Y., Su, H., & Zhu, J. (2023). One transformer fits all distributions in multi-modal diffusion at scale. arXiv preprint arXiv:2303.06555."},{"issue":"3","key":"2789_CR4","doi-asserted-by":"publisher","first-page":"908","DOI":"10.1002\/jmri.27908","volume":"55","author":"VM Bashyam","year":"2022","unstructured":"Bashyam, V. M., Doshi, J., Erus, G., Srinivasan, D., Abdulkadir, A., Singh, A., Habes, M., Fan, Y., Masters, C. L., Maruff, P., et al. (2022). Deep generative medical image harmonization for improving cross-site generalization in deep learning predictors. Journal of Magnetic Resonance Imaging, 55(3), 908\u2013916.","journal-title":"Journal of Magnetic Resonance Imaging"},{"key":"2789_CR5","doi-asserted-by":"crossref","unstructured":"Blattmann, A., Rombach, R., Ling, H., Dockhorn, T., Kim, S. W., Fidler, S., & Kreis, K. (2023). Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 22563\u201322575).","DOI":"10.1109\/CVPR52729.2023.02161"},{"key":"2789_CR6","doi-asserted-by":"crossref","unstructured":"Brooks, T., Holynski, A., & Efros, A. A. (2023). Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 18392\u201318402).","DOI":"10.1109\/CVPR52729.2023.01764"},{"key":"2789_CR7","unstructured":"Chambon, P., Bluethgen, C., Delbrouck, J.-B., Sluijs, R., Po\u0142acin, M., Chaves, J. M. Z., Abraham, T. M., Purohit, S., Langlotz, C. P., & Chaudhari, A. (2022). Roentgen: vision-language foundation model for chest x-ray generation. arXiv preprint arXiv:2211.12737."},{"key":"2789_CR8","doi-asserted-by":"crossref","unstructured":"Chen, Q., Chen, X., Song, H., Xiong, Z., Yuille, A., Wei, C., & Zhou, Z. (2024). Towards generalizable tumor synthesis. arXiv preprint arXiv:2402.19470.","DOI":"10.1109\/CVPR52733.2024.01060"},{"key":"2789_CR9","unstructured":"Chen, Q., Zhou, X., Liu, C., Chen, H., Li, W., Jiang, Z., Huang, Z., Zhao, Y., Yu, D., He, J., et al. (2025). Scaling tumor segmentation: Best lessons from real and synthetic data. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, (pp. 24001\u201324013)."},{"key":"2789_CR10","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1251\u20131258).","DOI":"10.1109\/CVPR.2017.195"},{"key":"2789_CR11","first-page":"8780","volume":"34","author":"P Dhariwal","year":"2021","unstructured":"Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34, 8780\u20138794.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"2789_CR12","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1109\/34.824822","volume":"22","author":"JS Duncan","year":"2000","unstructured":"Duncan, J. S., & Ayache, N. (2000). Medical image analysis: Progress over two decades and the challenges ahead. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 85\u2013106.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2789_CR13","doi-asserted-by":"crossref","unstructured":"Esser, P., Rombach, R., & Ommer, B. (2021). Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 12873\u201312883).","DOI":"10.1109\/CVPR46437.2021.01268"},{"key":"2789_CR14","doi-asserted-by":"crossref","unstructured":"Feng, W., Zhu, L., & Yu, L. (2023). Cheap lunch for medical image segmentation by fine-tuning sam on few exemplars. arXiv preprint arXiv:2308.14133.","DOI":"10.1007\/978-3-031-76160-7_2"},{"key":"2789_CR15","doi-asserted-by":"crossref","unstructured":"Girdhar, R., Singh, M., Brown, A., Duval, Q., Azadi, S., Rambhatla, S. S., Shah, A., Yin, X., Parikh, D., & Misra, I. (2023). Emu video: Factorizing text-to-video generation by explicit image conditioning. arXiv preprint arXiv:2311.10709.","DOI":"10.1007\/978-3-031-73033-7_12"},{"key":"2789_CR16","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27."},{"key":"2789_CR17","unstructured":"Gu, Y., Yang, J., Usuyama, N., Li, C., Zhang, S., Lungren, M. P., Gao, J., & Poon, H. (2023). Biomedjourney: Counterfactual biomedical image generation by instruction-learning from multimodal patient journeys. arXiv preprint arXiv:2310.10765."},{"key":"2789_CR18","doi-asserted-by":"crossref","unstructured":"Guo, P., Zhao, C., Yang, D., Xu, Z., Nath, V., Tang, Y., Simon, B., Belue, M., Harmon, S., Turkbey, B., et al. (2025). Maisi: Medical ai for synthetic imaging. In 2025 IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), (pp. 4430\u20134441). IEEE.","DOI":"10.1109\/WACV61041.2025.00435"},{"key":"2789_CR19","doi-asserted-by":"crossref","unstructured":"Han, C., Hayashi, H., Rundo, L., Araki, R., Shimoda, W., Muramatsu, S., Furukawa, Y., Mauri, G., & Nakayama, H. (2018). Gan-based synthetic brain mr image generation. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), (pp. 734\u2013738). IEEE.","DOI":"10.1109\/ISBI.2018.8363678"},{"key":"2789_CR20","doi-asserted-by":"crossref","unstructured":"Han, C., Kitamura, Y., Kudo, A., Ichinose, A., Rundo, L., Furukawa, Y., Umemoto, K., Li, Y., & Nakayama, H. (2019). Synthesizing diverse lung nodules wherever massively: 3d multi-conditional gan-based ct image augmentation for object detection. In 2019 International Conference on 3D Vision (3DV), (pp. 729\u2013737). IEEE.","DOI":"10.1109\/3DV.2019.00085"},{"key":"2789_CR21","doi-asserted-by":"crossref","unstructured":"Han, K., Xiong, Y., You, C., Khosravi, P., Sun, S., Yan, X., Duncan, J. S., & Xie, X. (2023). Medgen3d: A deep generative framework for paired 3d image and mask generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, (pp. 759\u2013769). Springer.","DOI":"10.1007\/978-3-031-43907-0_72"},{"key":"2789_CR22","doi-asserted-by":"crossref","unstructured":"Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H. R., & Xu, D. (2022). Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, (pp. 574\u2013584).","DOI":"10.1109\/WACV51458.2022.00181"},{"key":"2789_CR23","unstructured":"He, R., Sun, S., Yu, X., Xue, C., Zhang, W., Torr, P., Bai, S., & Qi, X. (2022). Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574."},{"key":"2789_CR24","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 770\u2013778).","DOI":"10.1109\/CVPR.2016.90"},{"key":"2789_CR25","doi-asserted-by":"crossref","unstructured":"Hedlin, E., Sharma, G., Mahajan, S., Isack, H., Kar, A., Tagliasacchi, A., & Yi, K. M. (2024). Unsupervised semantic correspondence using stable diffusion. Advances in Neural Information Processing Systems, 36.","DOI":"10.52202\/075280-0363"},{"key":"2789_CR26","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, 30."},{"key":"2789_CR27","first-page":"6840","volume":"33","author":"J Ho","year":"2020","unstructured":"Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2789_CR28","doi-asserted-by":"crossref","unstructured":"Hu, Q., Chen, Y., Xiao, J., Sun, S., Chen, J., Yuille, A. L., & Zhou, Z. (2023). Label-free liver tumor segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 7422\u20137432).","DOI":"10.1109\/CVPR52729.2023.00717"},{"key":"2789_CR29","unstructured":"Huang, L., Chen, D., Liu, Y., Shen, Y., Zhao, D., & Zhou, J. (2023). Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778."},{"issue":"2","key":"2789_CR30","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1038\/s41592-020-01008-z","volume":"18","author":"F Isensee","year":"2021","unstructured":"Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnu-net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203\u2013211.","journal-title":"Nature Methods"},{"key":"2789_CR31","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1125\u20131134).","DOI":"10.1109\/CVPR.2017.632"},{"key":"2789_CR32","doi-asserted-by":"crossref","unstructured":"Jiang, L., Mao, Y., Wang, X., Chen, X., & Li, C. (2023). Cola-diff: Conditional latent diffusion model for multi-modal mri synthesis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, (pp. 398\u2013408). Springer.","DOI":"10.1007\/978-3-031-43999-5_38"},{"key":"2789_CR33","doi-asserted-by":"crossref","unstructured":"Khader, F., Mueller-Franzes, G., Arasteh, S. T., Han, T., Haarburger, C., Schulze-Hagen, M., Schad, P., Engelhardt, S., Baessler, B., Foersch, S., et al. (2022). Medical diffusion-denoising diffusion probabilistic models for 3d medical image generation. arXiv preprint arXiv:2211.03364.","DOI":"10.1038\/s41598-023-34341-2"},{"key":"2789_CR34","doi-asserted-by":"crossref","unstructured":"Kim, B., & Ye, J. C. (2022). Diffusion deformable model for 4d temporal medical image generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, (pp. 539\u2013548). Springer.","DOI":"10.1007\/978-3-031-16431-6_51"},{"key":"2789_CR35","unstructured":"Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114."},{"key":"2789_CR36","doi-asserted-by":"crossref","unstructured":"Kwon, G., Han, C., & Kim, D.-s. (2019). Generation of 3d brain mri using auto-encoding generative adversarial networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, (pp. 118\u2013126). Springer.","DOI":"10.1007\/978-3-030-32248-9_14"},{"key":"2789_CR37","doi-asserted-by":"crossref","unstructured":"Li, D., Ling, H., Kim, S. W., Kreis, K., Fidler, S., & Torralba, A. (2022). Bigdatasetgan: Synthesizing imagenet with pixel-wise annotations. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 21330\u201321340).","DOI":"10.1109\/CVPR52688.2022.02064"},{"key":"2789_CR38","doi-asserted-by":"crossref","unstructured":"Lin, C.-H., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler, S., Liu, M.-Y., & Lin, T.-Y. (2023). Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 300\u2013309).","DOI":"10.1109\/CVPR52729.2023.00037"},{"key":"2789_CR39","unstructured":"Liu, X., Gong, C., & Liu, Q. (2022). Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003."},{"key":"2789_CR40","doi-asserted-by":"crossref","unstructured":"Liu, M., Maiti, P., Thomopoulos, S., Zhu, A., Chai, Y., Kim, H., & Jahanshad, N. (2021). Style transfer using generative adversarial networks for multi-site mri harmonization. In Medical Image Computing and Computer Assisted Intervention-MICCAI 2021: 24th International Conference, Strasbourg, France, September 27-October 1, 2021, Proceedings, Part III 24, (pp. 313\u2013322) Springer.","DOI":"10.1007\/978-3-030-87199-4_30"},{"key":"2789_CR41","unstructured":"Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101."},{"key":"2789_CR42","doi-asserted-by":"crossref","unstructured":"Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., & Van Gool, L. (2022). Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 11461\u201311471).","DOI":"10.1109\/CVPR52688.2022.01117"},{"issue":"10","key":"2789_CR43","doi-asserted-by":"publisher","first-page":"6695","DOI":"10.1109\/TPAMI.2021.3100536","volume":"44","author":"J Ma","year":"2021","unstructured":"Ma, J., Zhang, Y., Gu, S., Zhu, C., Ge, C., Zhang, Y., An, X., Wang, C., Wang, Q., Liu, X., et al. (2021). Abdomenct-1k: Is abdominal organ segmentation a solved problem? IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 6695\u20136714.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2789_CR44","doi-asserted-by":"crossref","unstructured":"Mei, X., Liu, Z., Robson, P. M., Marinelli, B., Huang, M., Doshi, A., Jacobi, A., Cao, C., Link, K. E., Yang, T., et al. (2022). Radimagenet: an open radiologic deep learning research dataset for effective transfer learning. Radiology: Artificial Intelligence, 4(5), Article 210315.","DOI":"10.1148\/ryai.210315"},{"issue":"10","key":"2789_CR45","doi-asserted-by":"publisher","first-page":"1993","DOI":"10.1109\/TMI.2014.2377694","volume":"34","author":"BH Menze","year":"2014","unstructured":"Menze, B. H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al. (2014). The multimodal brain tumor image segmentation benchmark (brats). IEEE Transactions on Medical Imaging, 34(10), 1993\u20132024.","journal-title":"IEEE Transactions on Medical Imaging"},{"key":"2789_CR46","unstructured":"Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784."},{"key":"2789_CR47","doi-asserted-by":"crossref","unstructured":"Mou, C., Wang, X., Xie, L., Wu, Y., Zhang, J., Qi, Z., Shan, Y., & Qie, X. (2023). T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453.","DOI":"10.1609\/aaai.v38i5.28226"},{"key":"2789_CR48","doi-asserted-by":"crossref","unstructured":"Peng, W., Adeli, E., Bosschieter, T., Park, S. H., Zhao, Q., & Pohl, K. M. (2023). Generating realistic brain mris via a conditional diffusion probabilistic model. In International Conference on Medical Image Computing and Computer-Assisted Intervention, (pp. 14\u201324). Springer.","DOI":"10.1007\/978-3-031-43993-3_2"},{"key":"2789_CR49","doi-asserted-by":"crossref","unstructured":"Pinaya, W. H., Tudosiu, P.-D., Dafflon, J., Da Costa, P. F., Fernandez, V., Nachev, P., Ourselin, S., & Cardoso, M. J. (2022). Brain imaging generation with latent diffusion models. In MICCAI Workshop on Deep Generative Models, (pp. 117\u2013126). Springer.","DOI":"10.1007\/978-3-031-18576-2_12"},{"key":"2789_CR50","unstructured":"Poole, B., Jain, A., Barron, J. T., & Mildenhall, B. (2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988."},{"key":"2789_CR51","unstructured":"Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, (pp. 8748\u20138763). PMLR."},{"key":"2789_CR52","unstructured":"Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1(2), 3."},{"key":"2789_CR53","unstructured":"Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. In International Conference on Machine Learning, (pp. 8821\u20138831). PMLR."},{"key":"2789_CR54","unstructured":"Rogozhnikov, A. (2021). Einops: Clear and reliable tensor manipulations with einstein-like notation. In International Conference on Learning Representations."},{"key":"2789_CR55","doi-asserted-by":"crossref","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 10684\u201310695).","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"2789_CR56","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5\u20139, 2015, Proceedings, Part III 18, (pp. 234\u2013241). Springer.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"2789_CR57","doi-asserted-by":"crossref","unstructured":"Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., & Aberman, K. (2023). Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 22500\u201322510).","DOI":"10.1109\/CVPR52729.2023.02155"},{"key":"2789_CR58","first-page":"36479","volume":"35","author":"C Saharia","year":"2022","unstructured":"Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E. L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35, 36479\u201336494.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2789_CR59","doi-asserted-by":"crossref","unstructured":"Sankaranarayanan, S., Balaji, Y., Jain, A., Lim, S. N., & Chellappa, R. (2018). Learning from synthetic data: Addressing domain shift for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3752\u20133761).","DOI":"10.1109\/CVPR.2018.00395"},{"key":"2789_CR60","unstructured":"Singer, U., Polyak, A., Hayes, T., Yin, X., An, J., Zhang, S., Hu, Q., Yang, H., Ashual, O., Gafni, O., et al. (2022). Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792."},{"issue":"18","key":"2789_CR61","doi-asserted-by":"publisher","first-page":"5097","DOI":"10.3390\/s20185097","volume":"20","author":"SP Singh","year":"2020","unstructured":"Singh, S. P., Wang, L., Gupta, S., Goli, H., Padmanabhan, P., & Guly\u00e1s, B. (2020). 3d deep learning on medical images: A review. Sensors, 20(18), 5097.","journal-title":"Sensors"},{"key":"2789_CR62","unstructured":"Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, (pp. 2256\u20132265). PMLR."},{"key":"2789_CR63","unstructured":"Song, J., Meng, C., & Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502."},{"key":"2789_CR64","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, \u0141, & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30."},{"key":"2789_CR65","unstructured":"Wang, W., Bao, J., Zhou, W., Chen, D., Chen, D., Yuan, L., & Li, H. (2022). Semantic image synthesis via diffusion models. arXiv preprint arXiv:2207.00050."},{"key":"2789_CR66","doi-asserted-by":"crossref","unstructured":"Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003). Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, (vol. 2, pp. 1398\u20131402). Ieee.","DOI":"10.1109\/ACSSC.2003.1292216"},{"key":"2789_CR67","unstructured":"Wu, W., Zhao, Y., Chen, H., Gu, Y., Zhao, R., He, Y., Zhou, H., Shou, M. Z., & Shen, C. (2024). Datasetdm: Synthesizing data with perception annotations using diffusion models. Advances in Neural Information Processing Systems, 36."},{"key":"2789_CR68","doi-asserted-by":"crossref","unstructured":"Wu, W., Zhao, Y., Shou, M. Z., Zhou, H., & Shen, C. (2023). Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. arXiv preprint arXiv:2303.11681.","DOI":"10.1109\/ICCV51070.2023.00117"},{"key":"2789_CR69","doi-asserted-by":"crossref","unstructured":"Xu, J., Liu, S., Vahdat, A., Byeon, W., Wang, X., & De Mello, S. (2023). Open-vocabulary panoptic segmentation with text-to-image diffusion models. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 2955\u20132966).","DOI":"10.1109\/CVPR52729.2023.00289"},{"key":"2789_CR70","doi-asserted-by":"crossref","unstructured":"Yang, L., Xu, X., Kang, B., Shi, Y., & Zhao, H. (2024). Freemask: synthetic images with dense annotations make stronger segmentation models. Advances in Neural Information Processing Systems, 36.","DOI":"10.52202\/075280-0819"},{"key":"2789_CR71","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2019.101552","volume":"58","author":"X Yi","year":"2019","unstructured":"Yi, X., Walia, E., & Babyn, P. (2019). Generative adversarial network in medical imaging: A review. Medical image analysis, 58, Article 101552.","journal-title":"Medical image analysis"},{"issue":"3","key":"2789_CR72","doi-asserted-by":"publisher","first-page":"1116","DOI":"10.1016\/j.neuroimage.2006.01.015","volume":"31","author":"PA Yushkevich","year":"2006","unstructured":"Yushkevich, P. A., Piven, J., Cody Hazlett, H., Gimpel Smith, R., Ho, S., Gee, J. C., & Gerig, G. (2006). User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage, 31(3), 1116\u20131128.","journal-title":"Neuroimage"},{"key":"2789_CR73","doi-asserted-by":"crossref","unstructured":"Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 586\u2013595).","DOI":"10.1109\/CVPR.2018.00068"},{"key":"2789_CR74","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Ling, H., Gao, J., Yin, K., Lafleche, J.-F., Barriuso, A., Torralba, A., & Fidler, S. (2021). Datasetgan: Efficient labeled data factory with minimal human effort. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, (pp. 10145\u201310155).","DOI":"10.1109\/CVPR46437.2021.01001"},{"key":"2789_CR75","doi-asserted-by":"crossref","unstructured":"Zhang, L., Rao, A., & Agrawala, M. (2023). Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, (pp. 3836\u20133847).","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"2789_CR76","doi-asserted-by":"crossref","unstructured":"Zhang, X., Xie, W., Huang, C., Zhang, Y., Chen, X., Tian, Q., & Wang, Y. (2023). Self-supervised tumor segmentation with sim2real adaptation. IEEE Journal of Biomedical and Health Informatics.","DOI":"10.1109\/JBHI.2023.3240844"},{"issue":"1","key":"2789_CR77","doi-asserted-by":"publisher","first-page":"6486","DOI":"10.1038\/s41467-025-61754-6","volume":"16","author":"L Zhang","year":"2025","unstructured":"Zhang, L., Jindal, B., Alaa, A., Weinreb, R., Wilson, D., Segal, E., Zou, J., & Xie, P. (2025). Generative ai enables medical image segmentation in ultra low-data regimes. Nature Communications, 16(1), 6486.","journal-title":"Nature Communications"},{"key":"2789_CR78","unstructured":"Zhao, S., Chen, D., Chen, Y.-C., Bao, J., Hao, S., Yuan, L., & Wong, K.-Y.K. (2024). Uni-controlnet: All-in-one control to text-to-image diffusion models. Advances in Neural Information Processing Systems, 36."},{"key":"2789_CR79","doi-asserted-by":"crossref","unstructured":"Zhao, C., Guo, P., Yang, D., Tang, Y., He, Y., Simon, B., Belue, M., Harmon, S., Turkbey, B., & Xu, D. (2025). Maisi-v2: Accelerated 3d high-resolution medical image synthesis with rectified flow and region-specific contrastive loss. arXiv preprint arXiv:2508.05772.","DOI":"10.1609\/aaai.v40i15.38309"},{"key":"2789_CR80","unstructured":"Zhao, H., Sheng, D., Bao, J., Chen, D., Chen, D., Wen, F., Yuan, L., Liu, C., Zhou, W., Chu, Q., et al. (2023). X-paste: Revisiting scalable copy-paste for instance segmentation using clip and stablediffusion. In International Conference on Machine Learning (ICML 2023)."},{"key":"2789_CR81","doi-asserted-by":"crossref","unstructured":"Zhu, L., Xue, Z., Jin, Z., Liu, X., He, J., Liu, Z., & Yu, L. (2023). Make-a-volume: Leveraging latent diffusion models for cross-modality 3d brain mri synthesis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, (pp. 592\u2013601). Springer.","DOI":"10.1007\/978-3-031-43999-5_56"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-026-02789-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-026-02789-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-026-02789-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T08:40:16Z","timestamp":1774600816000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-026-02789-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,26]]},"references-count":81,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,3]]}},"alternative-id":["2789"],"URL":"https:\/\/doi.org\/10.1007\/s11263-026-02789-0","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,26]]},"assertion":[{"value":"4 July 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 February 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"140"}}