{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T01:00:58Z","timestamp":1776128458517,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2023,7,14]],"date-time":"2023-07-14T00:00:00Z","timestamp":1689292800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001348","name":"A*STAR","doi-asserted-by":"publisher","award":["H18\/01\/a0\/019"],"award-info":[{"award-number":["H18\/01\/a0\/019"]}],"id":[{"id":"10.13039\/501100001348","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Singapore Health Services under the Singhealth Duke-NUS Oncology ACP Programme","award":["08\/FY2020\/EX(SL)\/74-A150"],"award-info":[{"award-number":["08\/FY2020\/EX(SL)\/74-A150"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,25]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>To assess large language models on their ability to accurately infer cancer disease response from free-text radiology reports.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>We assembled 10\u00a0602 computed tomography reports from cancer patients seen at a single institution. All reports were classified into: no evidence of disease, partial response, stable disease, or progressive disease. We applied transformer models, a bidirectional long short-term memory model, a convolutional neural network model, and conventional machine learning methods to this task. Data augmentation using sentence permutation with consistency loss as well as prompt-based fine-tuning were used on the best-performing models. Models were validated on a hold-out test set and an external validation set based on Response Evaluation Criteria in Solid Tumors (RECIST) classifications.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The best-performing model was the GatorTron transformer which achieved an accuracy of 0.8916 on the test set and 0.8919 on the RECIST validation set. Data augmentation further improved the accuracy to 0.8976. Prompt-based fine-tuning did not further improve accuracy but was able to reduce the number of training reports to 500 while still achieving good performance.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>These models could be used by researchers to derive progression-free survival in large datasets. It may also serve as a decision support tool by providing clinicians an automated second opinion of disease response.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusions<\/jats:title>\n                  <jats:p>Large clinical language models demonstrate potential to infer cancer disease response from radiology reports at scale. Data augmentation techniques are useful to further improve performance. Prompt-based fine-tuning can significantly reduce the size of the training dataset.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocad133","type":"journal-article","created":{"date-parts":[[2023,7,14]],"date-time":"2023-07-14T23:36:56Z","timestamp":1689377816000},"page":"1657-1664","source":"Crossref","is-referenced-by-count":41,"title":["Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting"],"prefix":"10.1093","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1023-5730","authenticated-orcid":false,"given":"Ryan Shea Ying Cong","family":"Tan","sequence":"first","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"},{"name":"Duke-NUS Medical School , Singapore"}]},{"given":"Qian","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National University of Singapore , Singapore"}]},{"given":"Guat Hwa","family":"Low","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"}]},{"given":"Ruixi","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National University of Singapore , Singapore"}]},{"given":"Tzer Chew","family":"Goh","sequence":"additional","affiliation":[{"name":"Institute of Systems Science, National University of Singapore , Singapore"}]},{"given":"Christopher Chu En","family":"Chang","sequence":"additional","affiliation":[{"name":"Institute of Systems Science, National University of Singapore , Singapore"}]},{"given":"Fung Fung","family":"Lee","sequence":"additional","affiliation":[{"name":"Institute of Systems Science, National University of Singapore , Singapore"}]},{"given":"Wei Yin","family":"Chan","sequence":"additional","affiliation":[{"name":"Institute of Systems Science, National University of Singapore , Singapore"}]},{"given":"Wei Chong","family":"Tan","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"},{"name":"Duke-NUS Medical School , Singapore"}]},{"given":"Han Jieh","family":"Tey","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"}]},{"given":"Fun Loon","family":"Leong","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"}]},{"given":"Hong Qi","family":"Tan","sequence":"additional","affiliation":[{"name":"Division of Radiation Oncology, National Cancer Centre Singapore , Singapore"}]},{"given":"Wen Long","family":"Nei","sequence":"additional","affiliation":[{"name":"Division of Radiation Oncology, National Cancer Centre Singapore , Singapore"}]},{"given":"Wen Yee","family":"Chay","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"},{"name":"Duke-NUS Medical School , Singapore"}]},{"given":"David Wai Meng","family":"Tai","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"},{"name":"Duke-NUS Medical School , Singapore"}]},{"given":"Gillianne Geet Yi","family":"Lai","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"},{"name":"Duke-NUS Medical School , Singapore"}]},{"given":"Lionel Tim-Ee","family":"Cheng","sequence":"additional","affiliation":[{"name":"Duke-NUS Medical School , Singapore"},{"name":"Department of Diagnostic Radiology, Singapore General Hospital, Singapore"}]},{"given":"Fuh Yong","family":"Wong","sequence":"additional","affiliation":[{"name":"Division of Radiation Oncology, National Cancer Centre Singapore , Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1648-1473","authenticated-orcid":false,"given":"Matthew Chin Heng","family":"Chua","sequence":"additional","affiliation":[{"name":"Yong Loo Lin School of Medicine, National University of Singapore , Singapore"}]},{"given":"Melvin Lee Kiang","family":"Chua","sequence":"additional","affiliation":[{"name":"Duke-NUS Medical School , Singapore"},{"name":"Division of Radiation Oncology, National Cancer Centre Singapore , Singapore"},{"name":"Data and Computational Science Core, National Cancer Centre Singapore , Singapore"}]},{"given":"Daniel Shao Weng","family":"Tan","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"},{"name":"Division of Clinical Trials and Epidemiological Sciences, National Cancer Centre Singapore , Singapore"}]},{"given":"Choon Hua","family":"Thng","sequence":"additional","affiliation":[{"name":"Duke-NUS Medical School , Singapore"},{"name":"Division of Oncologic Imaging, National Cancer Centre Singapore, Singapore"}]},{"given":"Iain Bee Huat","family":"Tan","sequence":"additional","affiliation":[{"name":"Division of Medical Oncology, National Cancer Centre Singapore , Singapore"},{"name":"Duke-NUS Medical School , Singapore"},{"name":"Data and Computational Science Core, National Cancer Centre Singapore , Singapore"}]},{"given":"Hwee Tou","family":"Ng","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National University of Singapore , Singapore"}]}],"member":"286","published-online":{"date-parts":[[2023,7,14]]},"reference":[{"issue":"27","key":"2023092719591030500_ocad133-B1","doi-asserted-by":"publisher","first-page":"4268","DOI":"10.1200\/JCO.2010.28.5478","article-title":"Rapid-learning system for cancer care","volume":"28","author":"Abernethy","year":"2010","journal-title":"J Clin Oncol"},{"key":"2023092719591030500_ocad133-B2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1200\/CCI.17.00060","article-title":"CancerLinQ: origins, implementation, and future directions","volume":"2","author":"Rubinstein","year":"2018","journal-title":"JCO Clin Cancer Inform"},{"issue":"16","key":"2023092719591030500_ocad133-B3","doi-asserted-by":"publisher","first-page":"1845","DOI":"10.1200\/JCO.2017.72.6414","article-title":"Untapped potential of observational research to inform clinical decision making: American Society of Clinical Oncology Research Statement","volume":"35","author":"Visvanathan","year":"2017","journal-title":"J Clin Oncol"},{"issue":"21","key":"2023092719591030500_ocad133-B4","doi-asserted-by":"publisher","first-page":"5463","DOI":"10.1158\/0008-5472.CAN-19-0579","article-title":"Use of natural language processing to extract clinical cancer phenotypes from electronic medical records","volume":"79","author":"Savova","year":"2019","journal-title":"Cancer Res"},{"key":"2023092719591030500_ocad133-B5","author":"Sun","year":"2021"},{"issue":"12","key":"2023092719591030500_ocad133-B6","doi-asserted-by":"publisher","first-page":"1553","DOI":"10.1016\/S1470-2045(20)30615-X","article-title":"Deep-learning natural language processing for oncological applications","volume":"21","author":"Sorin","year":"2020","journal-title":"Lancet Oncol"},{"issue":"6","key":"2023092719591030500_ocad133-B7","doi-asserted-by":"publisher","first-page":"797","DOI":"10.1001\/jamaoncol.2016.0213","article-title":"Natural language processing in oncology: a review","volume":"2","author":"Yim","year":"2016","journal-title":"JAMA Oncol"},{"issue":"12","key":"2023092719591030500_ocad133-B8","doi-asserted-by":"crossref","first-page":"1992","DOI":"10.1001\/jamainternmed.2015.5868","article-title":"Cancer drugs approved on the basis of a surrogate end point and subsequent overall survival: an analysis of 5 years of US Food and Drug Administration approvals","volume":"175","author":"Kim","year":"2015","journal-title":"JAMA Intern Med"},{"issue":"1","key":"2023092719591030500_ocad133-B9","doi-asserted-by":"crossref","first-page":"e32","DOI":"10.1016\/S1470-2045(14)70375-4","article-title":"Outcomes and endpoints in trials of cancer treatment: the past, present, and future","volume":"16","author":"Wilson","year":"2015","journal-title":"Lancet Oncol"},{"issue":"23","key":"2023092719591030500_ocad133-B10","doi-asserted-by":"crossref","first-page":"2293","DOI":"10.1056\/NEJMsb1609216","article-title":"Real-world evidence\u2014what is it and what can it tell us?","volume":"375","author":"Sherman","year":"2016","journal-title":"N Engl J Med"},{"issue":"3","key":"2023092719591030500_ocad133-B11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1200\/CCI.19.00013","article-title":"Characterizing the feasibility and performance of real-world tumor progression end points and their association with overall survival in a large advanced non\u2013small-cell lung cancer data set","volume":"3","author":"Griffith","year":"2019","journal-title":"JCO Clin Cancer Inform"},{"issue":"4","key":"2023092719591030500_ocad133-B12","doi-asserted-by":"publisher","first-page":"1843","DOI":"10.1007\/s12325-021-01659-0","article-title":"Characterization of a real-world response variable and comparison with RECIST-based response rates from clinical trials in advanced NSCLC","volume":"38","author":"Ma","year":"2021","journal-title":"Adv Ther"},{"issue":"8","key":"2023092719591030500_ocad133-B13","doi-asserted-by":"publisher","first-page":"2122","DOI":"10.1007\/s12325-019-00970-1","article-title":"Generating real-world tumor burden endpoints from electronic health record data: comparison of RECIST, radiology-anchored, and clinician-anchored approaches for abstracting real-world progression in non-small cell lung cancer","volume":"36","author":"Griffith","year":"2019","journal-title":"Adv Ther"},{"issue":"10","key":"2023092719591030500_ocad133-B14","doi-asserted-by":"publisher","first-page":"1421","DOI":"10.1001\/jamaoncol.2019.1800","article-title":"Assessment of deep natural language processing in ascertaining oncologic outcomes from radiology reports","volume":"5","author":"Kehl","year":"2019","journal-title":"JAMA Oncol"},{"key":"2023092719591030500_ocad133-B15","doi-asserted-by":"publisher","first-page":"680","DOI":"10.1200\/CCI.20.00020","article-title":"Natural language processing to ascertain cancer outcomes from medical oncologist notes","volume":"4","author":"Kehl","year":"2020","journal-title":"JCO Clin Cancer Inform"},{"issue":"1","key":"2023092719591030500_ocad133-B16","doi-asserted-by":"publisher","first-page":"7304","DOI":"10.1038\/s41467-021-27358-6","article-title":"Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset","volume":"12","author":"Kehl","year":"2021","journal-title":"Nat Commun"},{"issue":"4","key":"2023092719591030500_ocad133-B17","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2023092719591030500_ocad133-B18","author":"Liu","year":"2019"},{"key":"2023092719591030500_ocad133-B19","author":"Alsentzer","year":"2019"},{"key":"2023092719591030500_ocad133-B20","author":"Yang","year":"2020"},{"key":"2023092719591030500_ocad133-B21","author":"He","year":"2021"},{"key":"2023092719591030500_ocad133-B22","author":"Gu","year":"2021"},{"key":"2023092719591030500_ocad133-B23","author":"Shin","year":"2020"},{"issue":"4","key":"2023092719591030500_ocad133-B24","doi-asserted-by":"publisher","first-page":"e210258","DOI":"10.1148\/ryai.210258","article-title":"RadBERT: adapting transformer-based language models to radiology","volume":"4","author":"Yan","year":"2022","journal-title":"Radiol Artif Intell"},{"issue":"1","key":"2023092719591030500_ocad133-B25","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1038\/s41746-022-00742-2","article-title":"A large language model for electronic health records","volume":"5","author":"Yang","year":"2022","journal-title":"NPJ Digit Med"},{"issue":"2","key":"2023092719591030500_ocad133-B26","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1007\/s10278-009-9215-7","article-title":"Discerning tumor status from unstructured MRI reports\u2014completeness of information in existing reports and utility of automated natural language processing","volume":"23","author":"Cheng","year":"2010","journal-title":"J Digit Imaging"},{"key":"2023092719591030500_ocad133-B27","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa"},{"key":"2023092719591030500_ocad133-B28","author":"Chollet","year":"2015"},{"key":"2023092719591030500_ocad133-B29","author":"Wolf","year":"2020"},{"key":"2023092719591030500_ocad133-B30","author":"Paszke","year":"2019"},{"key":"2023092719591030500_ocad133-B31","author":"Shmueli","year":"2019"},{"key":"2023092719591030500_ocad133-B32","article-title":"Unsupervised data augmentation for consistency training","volume":"33","author":"Xie","year":"2023","journal-title":"Adv Neural Inform Process Syst"},{"key":"2023092719591030500_ocad133-B33","doi-asserted-by":"publisher","first-page":"4980","DOI":"10.18653\/v1\/2021.emnlp-main.407","author":"Tam","year":"2021"},{"key":"2023092719591030500_ocad133-B34","author":"Bommasani","year":"2022"},{"key":"2023092719591030500_ocad133-B35","doi-asserted-by":"publisher","author":"Mueller","year":"2022","DOI":"10.48550\/arXiv.2204.07128"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/10\/1657\/51770129\/ocad133.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/10\/1657\/51770129\/ocad133.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,27]],"date-time":"2023-09-27T20:21:43Z","timestamp":1695846103000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/30\/10\/1657\/7224523"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,14]]},"references-count":35,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,7,14]]},"published-print":{"date-parts":[[2023,9,25]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocad133","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,10,1]]},"published":{"date-parts":[[2023,7,14]]}}}