{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T08:13:36Z","timestamp":1748333616721,"version":"3.33.0"},"reference-count":48,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2025,1,23]],"date-time":"2025-01-23T00:00:00Z","timestamp":1737590400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,1,23]],"date-time":"2025-01-23T00:00:00Z","timestamp":1737590400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"NIH","award":["R01CA237269"],"award-info":[{"award-number":["R01CA237269"]}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>\n                  <jats:italic>Introduction.<\/jats:italic> Clinical datasets for training deep learning (DL) models often exhibit high levels of heterogeneity due to differences such as patient characteristics, new medical techniques, and physician preferences. In recent years, hydrogel spacers have been used in some prostate cancer patients receiving radiotherapy to separate the prostate and the rectum to better spare the rectum while achieving adequate dose coverage on the prostate. However, this substantially affects the computed tomography image appearance, which downstream reduced the contouring accuracy of auto-segmentation algorithms. This leads to highly heterogeneous dataset. <jats:italic>Methods.<\/jats:italic> To address this issue, we propose to identify underlying clusters within the dataset and use the cluster labels for segmentation. We collected a clinical dataset of 909 patients, including those with two types of hydrogel spacers and those without. First, we trained a DL model to locate the prostate and limit our field of view to the local area surrounding the prostate and rectum. We then used Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction and employed k-means clustering to assign each patient to a cluster. To leverage this clustered data, we propose a text-guided segmentation model, contrastive language and image pre-training (CLIP)-UNet, which encodes the cluster information using a text encoder and combines the encoded text information with image features for segmentation. <jats:italic>Results.<\/jats:italic> The UMAP results indicated up to three clusters within the dataset. CLIP-UNet with cluster information achieved a Dice score of 86.2% compared to 84.4% from the baseline UNet. Additionally, CLIP-UNet outperforms other state-of-the-art models with or without cluster information. <jats:italic>Conclusion.<\/jats:italic> Automatic clustering assisted by DL can reveal hidden data clusters in clinical datasets, and CLIP-UNet effectively utilizes clustered labels and achieves higher performance.<\/jats:p>","DOI":"10.1088\/2632-2153\/ada8f3","type":"journal-article","created":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T22:58:35Z","timestamp":1736549915000},"page":"015015","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Deep unsupervised clustering for prostate auto-segmentation with and without hydrogel spacer"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6712-5823","authenticated-orcid":true,"given":"Hengrui","family":"Zhao","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6894-8645","authenticated-orcid":true,"given":"Biling","family":"Wang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9043-1490","authenticated-orcid":false,"given":"Michael","family":"Dohopolski","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6697-7434","authenticated-orcid":true,"given":"Ti","family":"Bai","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3083-6752","authenticated-orcid":true,"given":"Steve","family":"Jiang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9590-0655","authenticated-orcid":true,"given":"Dan","family":"Nguyen","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2025,1,23]]},"reference":[{"key":"mlstada8f3bib1","doi-asserted-by":"publisher","first-page":"713","DOI":"10.21037\/atm.2020.02.44","article-title":"A review of the application of deep learning in medical image classification and segmentation","volume":"8","author":"Cai","year":"2020","journal-title":"Ann. Trans. Med."},{"key":"mlstada8f3bib2","doi-asserted-by":"publisher","first-page":"1581","DOI":"10.7150\/ijbs.58855","article-title":"Artificial intelligence in the diagnosis of COVID-19: challenges and perspectives","volume":"17","author":"Huang","year":"2021","journal-title":"Int. J. Bio. Sci."},{"key":"mlstada8f3bib3","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1007\/s12194-019-00552-4","article-title":"AI-based computer-aided diagnosis (AI-CAD): the latest review to read first","volume":"13","author":"Fujita","year":"2020","journal-title":"Radiol. Phys. Technol."},{"key":"mlstada8f3bib4","doi-asserted-by":"publisher","DOI":"10.1177\/1533033819873922","article-title":"Artificial intelligence in radiotherapy treatment planning: present and future","volume":"18","author":"Wang","year":"2019","journal-title":"Technol. Cancer Res. Treat."},{"key":"mlstada8f3bib5","doi-asserted-by":"publisher","first-page":"4859","DOI":"10.21037\/qims-21-208","article-title":"Artificial intelligence applications in intensity modulated radiation treatment planning: an overview","volume":"11","author":"Sheng","year":"2021","journal-title":"Quant. Imaging Med. Surg."},{"key":"mlstada8f3bib6","doi-asserted-by":"publisher","DOI":"10.1016\/j.ctrv.2022.102410","article-title":"Artificial intelligence for prediction of treatment outcomes in breast cancer: systematic review of design, reporting standards, and bias","volume":"108","author":"Corti","year":"2022","journal-title":"Cancer Treat. Rev."},{"key":"mlstada8f3bib7","doi-asserted-by":"publisher","first-page":"2075","DOI":"10.1093\/ije\/dyw118","article-title":"Risk and treatment effect heterogeneity: re-analysis of individual participant data from 32 large clinical trials","volume":"45","author":"Kent","year":"2016","journal-title":"Int. J. Epidemiol."},{"key":"mlstada8f3bib8","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1038\/s41568-021-00418-1","article-title":"Pancreatic cancer evolution and heterogeneity: integrating omics and clinical data","volume":"22","author":"Connor","year":"2022","journal-title":"Nat. Rev. Cancer"},{"key":"mlstada8f3bib9","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2020.103491","article-title":"Heterogeneity in clinical research data quality monitoring: a national survey","volume":"108","author":"Houston","year":"2020","journal-title":"J. Biomed. Inform."},{"key":"mlstada8f3bib10","doi-asserted-by":"publisher","first-page":"5126","DOI":"10.24963\/ijcai.2017\/735","article-title":"Learning from data heterogeneity: algorithms and applications","author":"He","year":"2017"},{"key":"mlstada8f3bib11","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1016\/j.ijrobp.2020.08.034","article-title":"NRG oncology updated international consensus atlas on pelvic lymph node volumes for intact and postoperative prostate cancer","volume":"109","author":"Hall","year":"2021","journal-title":"Int. J. Radiat. Oncol. Biol. Phys."},{"key":"mlstada8f3bib12","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/B978-0-12-815739-8.00014-6","article-title":"Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders","author":"Thomas","year":"2020","journal-title":"Mach. Learn."},{"key":"mlstada8f3bib13","doi-asserted-by":"publisher","first-page":"1309","DOI":"10.1016\/S0360-3016(99)00541-6","article-title":"A comparison of clinical target volumes determined by CT and MRI for the radiotherapy planning of base of skull meningiomas","volume":"46","author":"Khoo","year":"2000","journal-title":"Int. J. Radiat. Oncol. Biol. Phys."},{"key":"mlstada8f3bib14","doi-asserted-by":"publisher","DOI":"10.1001\/jamanetworkopen.2020.8221","article-title":"Association of the placement of a perirectal hydrogel spacer with the clinical outcomes of men receiving radiotherapy for prostate cancer: a systematic review and meta-analysis","volume":"3","author":"Miller","year":"2020","journal-title":"JAMA Netw. Open"},{"key":"mlstada8f3bib15","doi-asserted-by":"publisher","first-page":"e74","DOI":"10.1016\/j.urology.2021.05.013","article-title":"SpaceOAR hydrogel spacer for reducing radiation toxicity during radiotherapy for prostate cancer","volume":"156","author":"Armstrong","year":"2021","journal-title":"Systematic Rev. Urology"},{"key":"mlstada8f3bib16","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1016\/j.prro.2021.09.009","article-title":"Characterization of an iodinated rectal spacer for prostate photon and proton radiation therapy","volume":"12","author":"Kamran","year":"2022","journal-title":"Pract. Radiat. Oncol."},{"article-title":"An experimental study of data heterogeneity in federated learning methods for medical imaging","year":"2021","author":"Qu","key":"mlstada8f3bib17"},{"key":"mlstada8f3bib18","doi-asserted-by":"publisher","first-page":"592","DOI":"10.1016\/j.ijrobp.2007.02.005","article-title":"Automatic segmentation of pelvic structures from magnetic resonance images for prostate cancer radiotherapy","volume":"68","author":"Pasquier","year":"2007","journal-title":"Int. J. Radiat. Oncol. Biol. Phys."},{"key":"mlstada8f3bib19","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2021.102101","article-title":"A deep learning-based framework for segmenting invisible clinical target volumes with estimated uncertainties for post-operative prostate cancer radiotherapy","volume":"72","author":"Balagopal","year":"2021","journal-title":"Med. Image Anal."},{"article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","year":"2018","author":"McInnes","key":"mlstada8f3bib20"},{"key":"mlstada8f3bib21","doi-asserted-by":"publisher","first-page":"100","DOI":"10.2307\/2346830","article-title":"Algorithm AS 136: a k-means clustering algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"J. R. Stat. Soc. C: Appl. Stat."},{"key":"mlstada8f3bib22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1348\/000711005X48266","article-title":"K-means clustering: a half century synthesis","volume":"59","author":"Steinley","year":"2006","journal-title":"Br. J. Math. Stat. Psychol."},{"article-title":"Attention is all you need","year":"2017","author":"Vaswani","key":"mlstada8f3bib23"},{"article-title":"Learning transferable visual models from natural language supervision","year":"2021","author":"Radford","key":"mlstada8f3bib24"},{"key":"mlstada8f3bib25","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28","article-title":"U-Net: convolutional networks for biomedical image segmentation","author":"Ronneberger","year":"2015"},{"article-title":"MIS-FM 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset","year":"2023","author":"Wang","key":"mlstada8f3bib26"},{"article-title":"nnU-Net: self-adapting framework for U-Net-based medical image segmentation","year":"2018","author":"Isensee","key":"mlstada8f3bib27"},{"key":"mlstada8f3bib28","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2021.102195","article-title":"PSA-net: deep learning\u2013based physician style\u2013aware segmentation network for postoperative prostate cancer clinical target volumes","volume":"121","author":"Balagopal","year":"2021","journal-title":"Artif. Intell. Med."},{"key":"mlstada8f3bib29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.17605\/OSF.IO\/UTNMW","article-title":"A comparative study of convolutional neural networks and cybernetic approaches on CIFAR-10 dataset","volume":"1","author":"Vinay","year":"2023","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"mlstada8f3bib30","doi-asserted-by":"publisher","first-page":"923","DOI":"10.1145\/3219819.3219907","article-title":"Deep learning for practical image recognition: case study on Kaggle competitions","author":"Yang","year":"2018"},{"article-title":"Pervasive label errors in test sets destabilize machine learning benchmarks","year":"2021","author":"Northcutt","key":"mlstada8f3bib31"},{"key":"mlstada8f3bib32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2342-13-7","article-title":"Quantification of heterogeneity observed in medical images","volume":"13","author":"Brooks","year":"2013","journal-title":"BMC Med. Imaging"},{"key":"mlstada8f3bib33","doi-asserted-by":"publisher","first-page":"1932","DOI":"10.1109\/TMI.2022.3233574","article-title":"Label-efficient self-supervised federated learning for tackling data heterogeneity in medical imaging","volume":"42","author":"Yan","year":"2023","journal-title":"IEEE Trans. Med. Imaging"},{"key":"mlstada8f3bib34","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2022.102424","article-title":"Handling data heterogeneity with generative replay in collaborative learning for medical imaging","volume":"78","author":"Qu","year":"2022","journal-title":"Med. Image Anal."},{"key":"mlstada8f3bib35","doi-asserted-by":"publisher","first-page":"4635","DOI":"10.1109\/JBHI.2022.3185956","article-title":"Splitavg: a heterogeneity-aware federated deep learning method for medical imaging","volume":"26","author":"Zhang","year":"2022","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"mlstada8f3bib36","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1007\/978-3-031-51026-7_15","article-title":"Federated learning for data and model heterogeneity in medical imaging","author":"Madni","year":"2023"},{"key":"mlstada8f3bib37","doi-asserted-by":"publisher","first-page":"612","DOI":"10.1016\/j.patcog.2016.09.035","article-title":"Heterogeneous data analysis: online learning for medical-image-based diagnosis","volume":"63","author":"Motai","year":"2017","journal-title":"Pattern Recognit."},{"article-title":"An image is worth 16 \u00d7 16 words: transformers for image recognition at scale","year":"2020","author":"Dosovitskiy","key":"mlstada8f3bib38"},{"key":"mlstada8f3bib39","doi-asserted-by":"publisher","first-page":"10012","DOI":"10.48550\/arXiv.2103.14030","article-title":"Swin transformer: hierarchical vision transformer using shifted windows","author":"Liu","year":"2021"},{"key":"mlstada8f3bib40","doi-asserted-by":"publisher","first-page":"12113","DOI":"10.1109\/TPAMI.2023.3275156","article-title":"Multimodal learning with transformers: a survey","volume":"45","author":"Xu","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"article-title":"Language-driven semantic segmentation","year":"2022","author":"Li","key":"mlstada8f3bib41"},{"key":"mlstada8f3bib42","doi-asserted-by":"publisher","first-page":"18155","DOI":"10.48550\/arXiv.2112.02244","article-title":"Lavt: language-aware vision transformer for referring image segmentation","author":"Yang","year":"2022"},{"key":"mlstada8f3bib43","doi-asserted-by":"crossref","DOI":"10.1109\/ISBI56570.2024.10635823","article-title":"Language guided domain generalized medical image segmentation","author":"Kunhimon","year":"2024"},{"key":"mlstada8f3bib44","doi-asserted-by":"publisher","first-page":"13997","DOI":"10.1109\/ICRA48506.2021.9561797","article-title":"Referring image segmentation via language-driven attention","author":"Chen","year":"2021"},{"key":"mlstada8f3bib45","doi-asserted-by":"publisher","first-page":"19187","DOI":"10.48550\/arXiv.2211.11158","article-title":"Language in a bottle: language model guided concept bottlenecks for interpretable image classification","author":"Yang","year":"2023"},{"key":"mlstada8f3bib46","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1007\/978-3-031-43904-9_27","article-title":"Text-guided foundation model adaptation for pathological image classification","author":"Zhang","year":"2023"},{"key":"mlstada8f3bib47","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41551-023-01045-x","article-title":"A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics","volume":"7","author":"Zhou","year":"2023","journal-title":"Nat. Biomed. Eng."},{"key":"mlstada8f3bib48","doi-asserted-by":"publisher","first-page":"21152","DOI":"10.48550\/arXiv.2301.00785","article-title":"Clip-driven universal model for organ segmentation and tumor detection","author":"Liu","year":"2023"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,23]],"date-time":"2025-01-23T06:30:32Z","timestamp":1737613832000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada8f3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,23]]},"references-count":48,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,1,23]]},"published-print":{"date-parts":[[2025,3,31]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ada8f3","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2025,1,23]]},"assertion":[{"value":"Deep unsupervised clustering for prostate auto-segmentation with and without hydrogel spacer","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-11-01","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-01-10","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-01-23","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}