{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T17:42:23Z","timestamp":1770918143762,"version":"3.50.1"},"reference-count":44,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T00:00:00Z","timestamp":1719273600000},"content-version":"vor","delay-in-days":24,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T00:00:00Z","timestamp":1719273600000},"content-version":"tdm","delay-in-days":24,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"NIH- NCI - National Cancer institute","award":["R01CA237269"],"award-info":[{"award-number":["R01CA237269"]}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2024,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Our study aims to explore the long-term performance patterns for deep learning (DL) models deployed in clinic and to investigate their efficacy in relation to evolving clinical practices. We conducted a retrospective study simulating the clinical implementation of our DL model involving 1328 prostate cancer patients treated between January 2006 and August 2022. We trained and validated a U-Net-based auto-segmentation model on data obtained from 2006 to 2011 and tested on data from 2012 to 2022, simulating the model\u2019s clinical deployment starting in 2012. We visualized the trends of the model performance using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test and multiple linear regression to investigate Dice similarity coefficient (DSC) variations across distinct periods and the impact of clinical factors, respectively. Initially, from 2012 to 2014, the model showed high performance in segmenting the prostate, rectum, and bladder. Post-2015, a notable decline in EMA DSC was observed for the prostate and rectum, while bladder contours remained stable. Key factors impacting the prostate contour quality included physician contouring styles, using various hydrogel spacers, CT scan slice thickness, MRI-guided contouring, and intravenous (IV) contrast (<jats:italic>p<\/jats:italic> &lt; 0.0001, <jats:italic>p<\/jats:italic> &lt; 0.0001, <jats:italic>p<\/jats:italic> = 0.0085, <jats:italic>p<\/jats:italic> = 0.0012, <jats:italic>p<\/jats:italic> &lt; 0.0001, respectively). Rectum contour quality was notably influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The quality of the bladder contour was primarily affected by IV contrast. The deployed DL model exhibited a substantial decline in performance over time, aligning with the evolving clinical settings.<\/jats:p>","DOI":"10.1088\/2632-2153\/ad580f","type":"journal-article","created":{"date-parts":[[2024,6,13]],"date-time":"2024-06-13T22:51:00Z","timestamp":1718319060000},"page":"025077","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Performance deterioration of deep learning models after clinical deployment: a case study with auto-segmentation for definitive prostate cancer radiotherapy"],"prefix":"10.1088","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6894-8645","authenticated-orcid":true,"given":"Biling","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Dohopolski","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6697-7434","authenticated-orcid":true,"given":"Ti","family":"Bai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junjie","family":"Wu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raquibul","family":"Hannan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Neil","family":"Desai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aurelie","family":"Garant","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9590-0655","authenticated-orcid":true,"given":"Dan","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mu-Han","family":"Lin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert","family":"Timmerman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8561-6511","authenticated-orcid":true,"given":"Xinlei","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3083-6752","authenticated-orcid":true,"given":"Steve B","family":"Jiang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"266","published-online":{"date-parts":[[2024,6,25]]},"reference":[{"key":"mlstad580fbib1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10796-021-10142-8","article-title":"Responsible artificial intelligence as a secret ingredient for digital health: bibliometric analysis, insights, and research directions","volume":"25","author":"Fosso Wamba","year":"2021","journal-title":"Inf. Syst. Front."},{"key":"mlstad580fbib2","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1038\/s41591-021-01614-0","article-title":"AI in health and medicine","volume":"28","author":"Rajpurkar","year":"2022","journal-title":"Nat. Med."},{"key":"mlstad580fbib3","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1038\/s41591-021-01229-5","article-title":"DECIDE-AI: new reporting guidelines to bridge the development-to-implementation gap in clinical artificial intelligence","volume":"27","author":"Group D-AS","year":"2021","journal-title":"Nat. Med."},{"key":"mlstad580fbib4","doi-asserted-by":"publisher","first-page":"1328","DOI":"10.1038\/s41591-021-01461-z","article-title":"AI in medicine must be explainable","volume":"27","author":"Kundu","year":"2021","journal-title":"Nat. Med."},{"key":"mlstad580fbib5","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1001\/jamainternmed.2018.7117","article-title":"Deep learning in medicine-promise, progress, and challenges","volume":"179","author":"Wang","year":"2019","journal-title":"JAMA Intern. Med."},{"key":"mlstad580fbib6","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejrad.2019.108742","article-title":"A survey on the future of radiology among radiologists, medical students and surgeons: students and surgeons tend to be more skeptical about artificial intelligence and radiologists may fear that other disciplines take over","volume":"121","author":"Jasper van Hoek","year":"2019","journal-title":"Eur. J. Radiol."},{"key":"mlstad580fbib7","doi-asserted-by":"publisher","first-page":"94","DOI":"10.7861\/futurehosp.6-2-94","article-title":"The potential for artificial intelligence in healthcare","volume":"6","author":"Davenport","year":"2019","journal-title":"Future Healthcare J."},{"key":"mlstad580fbib8","doi-asserted-by":"publisher","DOI":"10.1016\/j.ibmed.2022.100050","article-title":"AI in healthcare startups and special challenges","volume":"6","author":"Young","year":"2022","journal-title":"Intell. Med."},{"key":"mlstad580fbib9","author":"Agency","year":"2013"},{"key":"mlstad580fbib10","article-title":"Generalization in deep learning","author":"Kawaguchi","year":"2017"},{"key":"mlstad580fbib11","doi-asserted-by":"publisher","first-page":"e489","DOI":"10.1016\/S2589-7500(20)30186-2","article-title":"The myth of generalisability in clinical research and machine learning in health care","volume":"2","author":"Futoma","year":"2020","journal-title":"Lancet Digit. Health"},{"key":"mlstad580fbib12","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1056\/NEJMc2104626","article-title":"The clinician and dataset shift in artificial intelligence","volume":"385","author":"Finlayson","year":"2021","journal-title":"New Engl. J. Med."},{"key":"mlstad580fbib13","doi-asserted-by":"publisher","first-page":"1065","DOI":"10.1001\/jamainternmed.2021.2626","article-title":"External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients","volume":"181","author":"Wong","year":"2021","journal-title":"JAMA Intern. Med."},{"key":"mlstad580fbib14","doi-asserted-by":"publisher","first-page":"1651","DOI":"10.1093\/jamia\/ocz130","article-title":"Predictive analytics in health care: how can we know it works?","volume":"26","author":"Van Calster","year":"2019","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"mlstad580fbib15","doi-asserted-by":"publisher","first-page":"796","DOI":"10.1016\/j.jacr.2020.01.006","article-title":"Inconsistent performance of deep learning models on mammogram classification","volume":"17","author":"Wang","year":"2020","journal-title":"J. Am. Coll. Radiol."},{"key":"mlstad580fbib16","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/abb214","article-title":"Generalizability issues with deep learning models in medicine and their potential solutions: illustrated with cone-beam computed tomography (CBCT) to computed tomography (CT) image conversion","volume":"2","author":"Liang","year":"2020","journal-title":"Mach. Learn. Sci. Technol."},{"key":"mlstad580fbib17","doi-asserted-by":"publisher","first-page":"105","DOI":"10.3389\/fcvm.2020.00105","article-title":"Improving the generalizability of convolutional neural network-based segmentation on CMR images","volume":"7","author":"Chen","year":"2020","journal-title":"Front. Cardiovasc. Med."},{"key":"mlstad580fbib18","article-title":"Neurips 2020 competition: predicting generalization in deep learning","author":"Jiang","year":"2020"},{"key":"mlstad580fbib19","doi-asserted-by":"publisher","first-page":"87","DOI":"10.3389\/frai.2021.694875","article-title":"Deep learning\u2013based COVID-19 pneumonia classification using chest CT images: model generalizability","volume":"4","author":"Nguyen","year":"2021","journal-title":"Front. Artif. Intell."},{"key":"mlstad580fbib20","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1093\/biostatistics\/kxz041","article-title":"From development to deployment: dataset shift, causality, and shift-stable models in health AI","volume":"21","author":"Subbaswamy","year":"2019","journal-title":"Biostatistics"},{"key":"mlstad580fbib21","doi-asserted-by":"publisher","first-page":"877","DOI":"10.1093\/jamia\/ocaa032","article-title":"Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network","volume":"27","author":"Kashyap","year":"2020","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"mlstad580fbib22","doi-asserted-by":"publisher","first-page":"1052","DOI":"10.1093\/jamia\/ocx030","article-title":"Calibration drift in regression and machine learning models for acute kidney injury","volume":"24","author":"Davis","year":"2017","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"mlstad580fbib23","article-title":"Calibration drift among regression and machine learning models for hospital mortality.pdf","author":"Davis"},{"key":"mlstad580fbib24","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1136\/bmjqs-2018-008370","article-title":"Artificial intelligence, bias and clinical safety","volume":"28","author":"Challen","year":"2019","journal-title":"BMJ Qual. Saf."},{"key":"mlstad580fbib25","article-title":"Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks","author":"Nestor","year":"2019","edition":"(ed)"},{"key":"mlstad580fbib26","doi-asserted-by":"publisher","first-page":"e279","DOI":"10.1016\/S2589-7500(20)30102-3","article-title":"Clinical applications of continual learning machine learning","volume":"2","author":"Lee","year":"2020","journal-title":"Lancet Digit. Health"},{"key":"mlstad580fbib27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-022-00611-y","article-title":"Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare","volume":"5","author":"Feng","year":"2022","journal-title":"npj Digit. Med."},{"key":"mlstad580fbib28","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-24574-4_28","article-title":"U-net: convolutional networks for biomedical image segmentation","author":"Ronneberger","year":"2015"},{"key":"mlstad580fbib29","doi-asserted-by":"publisher","first-page":"2402","DOI":"10.1001\/jama.2016.17216","article-title":"Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs","volume":"316","author":"Gulshan","year":"2016","journal-title":"JAMA"},{"key":"mlstad580fbib30","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1038\/nature21056","article-title":"Dermatologist-level classification of skin cancer with deep neural networks","volume":"542","author":"Esteva","year":"2017","journal-title":"Nature"},{"key":"mlstad580fbib31","doi-asserted-by":"publisher","first-page":"1122","DOI":"10.1016\/j.cell.2018.02.010","article-title":"Identifying medical diagnoses and treatable diseases by image-based deep learning","volume":"172","author":"Kermany","year":"2018","journal-title":"Cell"},{"key":"mlstad580fbib32","article-title":"CheXpedition: investigating generalization challenges for translation of chest x-ray algorithms to the clinical setting","author":"Rajpurkar","year":"2020"},{"key":"mlstad580fbib33","article-title":"A proof-of-concept study of artificial intelligence assisted contour revision","author":"Bai","year":"2021"},{"key":"mlstad580fbib34","article-title":"Segmentation by test-time optimization (TTO) for CBCT-based adaptive radiation therapy","author":"Liang","year":"2022"},{"key":"mlstad580fbib35","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0179845","article-title":"Nomogram to predict rectal toxicity following prostate cancer radiotherapy","volume":"12","author":"Delobel","year":"2017","journal-title":"PLoS One"},{"key":"mlstad580fbib36","doi-asserted-by":"publisher","first-page":"47","DOI":"10.4236\/ijmpcero.2021.102005","article-title":"Dosimetric effects due to inter-observer variability of organ contouring when utilizing a knowledge-based planning system for prostate cancer","volume":"10","author":"Liu","year":"2021","journal-title":"Int. J. Med. Phys. Clin. Eng. Radiat. Oncol."},{"key":"mlstad580fbib37","doi-asserted-by":"publisher","first-page":"1415","DOI":"10.1016\/j.juro.2006.06.002","article-title":"Long-term outcome of high dose intensity modulated radiation therapy for patients with clinically localized prostate cancer","volume":"176","author":"Zelefsky","year":"2006","journal-title":"J. Urol."},{"key":"mlstad580fbib38","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1016\/j.prro.2018.08.002","article-title":"Hypofractionated radiation therapy for localized prostate cancer: executive summary of an ASTRO, ASCO, and AUA evidence-based guideline","volume":"8","author":"Morgan","year":"2018","journal-title":"Pract. Radiat. Oncol."},{"key":"mlstad580fbib39","doi-asserted-by":"publisher","first-page":"1099","DOI":"10.1016\/j.ijrobp.2006.10.050","article-title":"Stereotactic hypofractionated accurate radiotherapy of the prostate (SHARP), 33.5 Gy in five fractions for localized disease: first clinical trial results","volume":"67","author":"Madsen","year":"2007","journal-title":"Int. J. Radiat. Oncol. Biol. Phys."},{"key":"mlstad580fbib40","article-title":"Spacers and prostate radiation therapy: what urologists should know","volume":"3","author":"Shore","year":"2018","journal-title":"Everyday Urol.\u2014Oncol. Insights"},{"key":"mlstad580fbib41","author":"FDA","year":"2015"},{"key":"mlstad580fbib42","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1016\/j.ijrobp.2009.02.019","article-title":"Radiographic and anatomic basis for prostate contouring errors and methods to improve prostate contouring accuracy","volume":"76","author":"McLaughlin","year":"2010","journal-title":"Int. J. Radiat. Oncol. Biol. Phys."},{"key":"mlstad580fbib43","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6560\/abb71c","article-title":"Predicting lymph node metastasis in patients with oropharyngeal cancer by using a convolutional neural network with associated epistemic and aleatoric uncertainty","volume":"65","author":"Dohopolski","year":"2020","journal-title":"Phys. Med. Biol."},{"key":"mlstad580fbib44","article-title":"Dropout as a bayesian approximation: representing model uncertainty in deep learning","author":"Gal","year":"2016","edition":"(eds)"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T10:30:09Z","timestamp":1719311409000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad580f"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,1]]},"references-count":44,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,6,25]]},"published-print":{"date-parts":[[2024,6,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ad580f","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,1]]},"assertion":[{"value":"Performance deterioration of deep learning models after clinical deployment: a case study with auto-segmentation for definitive prostate cancer radiotherapy","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2024 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-03-12","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-06-13","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-06-25","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}