{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T14:35:45Z","timestamp":1781188545996,"version":"3.54.1"},"reference-count":46,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2023,7,1]],"date-time":"2023-07-01T00:00:00Z","timestamp":1688169600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Health Informatics J"],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:p>Radiology reporting is narrative, and its content depends on the clinician\u2019s ability to interpret the images accurately. A tertiary hospital, such as anonymous institute, focuses on writing reports narratively as part of training for medical personnel. Nevertheless, free-text reports make it inconvenient to extract information for clinical audits and data mining. Therefore, we aim to convert unstructured breast radiology reports into structured formats using natural language processing (NLP) algorithm. This study used 327 de-identified breast radiology reports from the anonymous institute. The radiologist identified the significant data elements to be extracted. Our NLP algorithm achieved 97% and 94.9% accuracy in training and testing data, respectively. Henceforth, the structured information was used to build the predictive model for predicting the value of the BIRADS category. The model based on random forest generated the highest accuracy of 92%. Our study not only fulfilled the demands of clinicians by enhancing communication between medical personnel, but it also demonstrated the usefulness of mineable structured data in yielding significant insights.<\/jats:p>","DOI":"10.1177\/14604582231203763","type":"journal-article","created":{"date-parts":[[2023,9,23]],"date-time":"2023-09-23T12:07:09Z","timestamp":1695470829000},"update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":4,"title":["Natural language processing in narrative breast radiology reporting in University Malaya Medical Centre"],"prefix":"10.1177","volume":"29","author":[{"given":"Wee Ming","family":"Tan","sequence":"first","affiliation":[{"name":"Data Science and Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wei Lin","family":"Ng","sequence":"additional","affiliation":[{"name":"Department of Biomedical Imaging, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mogana Darshini","family":"Ganggayah","sequence":"additional","affiliation":[{"name":"Data Science and Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Victor Chee Wai","family":"Hoe","sequence":"additional","affiliation":[{"name":"Department of Social and Preventive Medicine, Universiti Malaya, Kuala Lumpur, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kartini","family":"Rahmat","sequence":"additional","affiliation":[{"name":"Department of Biomedical Imaging, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hana Salwani","family":"Zaini","sequence":"additional","affiliation":[{"name":"Department of Information Technology, University of Malaya Medical Centre, Kuala Lumpur, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nur Aishah","family":"Mohd Taib","sequence":"additional","affiliation":[{"name":"Department of Surgery, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1922-2044","authenticated-orcid":false,"given":"Sarinder Kaur","family":"Dhillon","sequence":"additional","affiliation":[{"name":"Data Science and Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2023,9,23]]},"reference":[{"key":"bibr1-14604582231203763","unstructured":"National Cancer Institute. Definition of electronic medical record - NCIDictionary of Cancer Terms. https:\/\/www.cancer.gov\/publications\/dictionaries\/cancer-terms\/def\/electronic-medical-record (nd, accessed 15-Nov-2021)"},{"key":"bibr2-14604582231203763","volume-title":"EMR vs EHR \u2013 what is the difference? - health ITBuzz","author":"Peter G","year":"2011"},{"key":"bibr3-14604582231203763","doi-asserted-by":"publisher","DOI":"10.5772\/intechopen.92613"},{"key":"bibr4-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1016\/j.crad.2011.05.013"},{"key":"bibr5-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1016\/j.acra.2017.08.005"},{"key":"bibr6-14604582231203763","unstructured":"Kalra S, Li L, Tizhoosh HR. Automatic classification of pathology reports using TF-IDF features.\n                      arXiv\n                      2019; arXiv:1903.07406."},{"key":"bibr7-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1001\/jama.2008.910"},{"key":"bibr8-14604582231203763","first-page":"2013","volume":"4","author":"Kumar L","year":"2013","journal-title":"J.Glob. Res. Comput. Sci"},{"key":"bibr9-14604582231203763","doi-asserted-by":"publisher","DOI":"10.3938\/NPSM.67.555"},{"key":"bibr10-14604582231203763","first-page":"1","volume":"11","author":"Jain K","year":"2021","journal-title":"Prim Health Care Open Access"},{"key":"bibr11-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-018-0723-6"},{"key":"bibr12-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-15-266"},{"key":"bibr13-14604582231203763","first-page":"221","volume":"2013","author":"Rink B","year":"2013","journal-title":"AMIA Jt. Summits Transl.Sci. proceedings"},{"key":"bibr14-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-019-0780-5"},{"key":"bibr15-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1111\/1754-9485.12861"},{"key":"bibr16-14604582231203763","doi-asserted-by":"publisher","DOI":"10.3174\/ajnr.A6961"},{"key":"bibr17-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1186\/s41747-019-0118-1"},{"key":"bibr18-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1002\/lrh2.10237"},{"key":"bibr19-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2021.103712"},{"key":"bibr20-14604582231203763","unstructured":"American College of Radiology. Breast imaging reporting & data system. www.acr.org\/Clinical-Resources\/Reporting-and-Data-Systems\/Bi-Rads (2013, accessed 18 May 2021)."},{"key":"bibr21-14604582231203763","volume-title":"R: a language and environment for statisticalcomputing","author":"R Core Development Team","year":"2013"},{"key":"bibr22-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2407-13-328"},{"key":"bibr23-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1186\/s40001-015-0140-6"},{"key":"bibr24-14604582231203763","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v040.i03"},{"key":"bibr25-14604582231203763","unstructured":"Python Software Foundation. Python language reference. version 2.7.18. www.python.org\/downloads\/release\/python-2718\/ (2020, accessed 26 Oct 2021)."},{"key":"bibr26-14604582231203763","unstructured":"Lundh F. 24.1. Tkinter \u2014 Python interface to Tcl\/Tk \u2014 Python 2.7 2 documentation, https:\/\/python.readthedocs.io\/en\/v2.7.2\/library\/tkinter.html (2013, accessed 27 Oct 2021)."},{"key":"bibr27-14604582231203763","unstructured":"Kuhn M, Wing J, Weston S, et al. Caret: classification and regression training, http:\/\/cran.nexr.com\/web\/packages\/caret\/index.html (2017, accessed 27 October 2021)."},{"key":"bibr28-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1016\/j.impact.2019.100179"},{"key":"bibr29-14604582231203763","doi-asserted-by":"publisher","DOI":"10.17265\/2159-5313\/2016.09.003"},{"key":"bibr30-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1016\/j.catena.2016.11.032"},{"key":"bibr31-14604582231203763","first-page":"18","volume":"2","author":"Liaw A","year":"2002","journal-title":"R News"},{"key":"bibr32-14604582231203763","unstructured":"Breiman L, Cutler A. Random forests - classification description. https:\/\/www.stat.berkeley.edu\/\u02dcbreiman\/RandomForests\/cc_home.html (2004, accessed 27 Mar 2021)."},{"key":"bibr33-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-21706-2_6"},{"key":"bibr34-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1016\/j.asej.2020.11.011"},{"key":"bibr35-14604582231203763","doi-asserted-by":"publisher","DOI":"10.3390\/molecules21080983"},{"key":"bibr36-14604582231203763","unstructured":"Chen T, He T, Benesty M, et al. xgboost: ExtremeGradient Boosting. R package version 1.3.2.1. http:\/\/cran.nexr.com\/web\/packages\/xgboost\/index.html (2018, accessed: 21 May 2021)."},{"key":"bibr37-14604582231203763","unstructured":"LeDell E, Gill N, Aiello S, et al. h2o: R interface for the 'H2O' Scalable machine learning platform. https:\/\/docs.h2o.ai\/h2o\/latest-stable\/h2o-r\/docs\/index.html (nd, accessed: 21 May 2021). https:\/\/Rproject.org\/package=h2o"},{"key":"bibr38-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1016\/0893-6080(89)90020-8"},{"key":"bibr39-14604582231203763","unstructured":"Jarek T. caTools: tools: moving window statistics, GIF, Base64, ROC AUC. Etc. https:\/\/cran.r-project.org\/web\/packages\/caTools\/index.html (2021, accessed 01 Dec 2021)"},{"key":"bibr40-14604582231203763","unstructured":"Grandini M, Bagli E, Visani G. Metrics for multi-class classification: AnOverview.\n                      arXiv\n                      2020; arXiv:2008.05756."},{"key":"bibr41-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2017.11.011"},{"key":"bibr42-14604582231203763","first-page":"670","volume-title":"3 April 2020","author":"Almeida JR"},{"key":"bibr43-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1155\/2020\/8839524"},{"key":"bibr44-14604582231203763","doi-asserted-by":"publisher","DOI":"10.11613\/BM.2012.031"},{"key":"bibr45-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1145\/3458652"},{"key":"bibr46-14604582231203763","doi-asserted-by":"publisher","DOI":"10.1007\/s00247-021-05177-7"}],"container-title":["Health Informatics Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/14604582231203763","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/14604582231203763","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/14604582231203763","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T22:28:16Z","timestamp":1777501696000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/14604582231203763"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["10.1177\/14604582231203763"],"URL":"https:\/\/doi.org\/10.1177\/14604582231203763","relation":{},"ISSN":["1460-4582","1741-2811"],"issn-type":[{"value":"1460-4582","type":"print"},{"value":"1741-2811","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7]]},"article-number":"14604582231203763"}}