{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T21:56:03Z","timestamp":1774907763998,"version":"3.50.1"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T00:00:00Z","timestamp":1759104000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T00:00:00Z","timestamp":1759104000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"British Society of Otology","award":["Small Grant"],"award-info":[{"award-number":["Small Grant"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Most healthcare data is in an unstructured format that requires processing to make it usable for research. Generally, this is done manually, which is both time-consuming and poorly scalable. Natural language processing (NLP) using machine learning offers a method to automate data extraction. In this paper we describe the development of a set of NLP models to extract and contextualise otology symptoms from free text documents.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>A dataset of 1,148 otology clinic letters written between 2009 \u2013 2011, from a London NHS hospital, were manually annotated and used to train a hybrid dictionary and machine learning NLP model to identify six key otological symptoms: hearing loss, impairment of balance, otalgia, otorrhoea, tinnitus and vertigo. Subsequently, a set of Bidirectional-Long-Short-Term-Memory (Bi-LSTM) models were trained to extract contextual information for each symptom, for example, defining the laterality of the ear affected.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>There were 1,197 symptom annotations and 2,861 contextual annotations with 24% of patients presenting with hearing loss. The symptom extraction model achieved a macro F1 score of 0.73. The Bi-LSTM models achieved a mean macro F1 score of 0.69 for the contextualisation tasks.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>NLP models for symptom extraction and contextualisation were successfully created and shown to perform well on real life data. Refinement is needed to produce models that can run without manual review. Downstream applications for these models include deep semantic searching in electronic health records, cohort identification for clinical trials and facilitating research into hearing loss phenotypes. Further testing of the external validity of the developed models is required.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/s12911-025-03180-8","type":"journal-article","created":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T19:21:42Z","timestamp":1759173702000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Automating the extraction of otology symptoms from clinic letters: a methodological study using natural language processing"],"prefix":"10.1186","volume":"25","author":[{"given":"Nikhil","family":"Joshi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kawsar","family":"Noor","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xi","family":"Bai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marina","family":"Forbes","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Talisa","family":"Ross","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liam","family":"Barrett","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Richard J. B.","family":"Dobson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anne G. M.","family":"Schilder","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nishchay","family":"Mehta","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Watjana","family":"Lilaonitkul","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,9,29]]},"reference":[{"key":"3180_CR1","doi-asserted-by":"publisher","unstructured":"Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review. J Healthc Eng. 2018;2018. https:\/\/doi.org\/10.1155\/2018\/4302425.","DOI":"10.1155\/2018\/4302425"},{"key":"3180_CR2","doi-asserted-by":"publisher","first-page":"103354","DOI":"10.1016\/j.jbi.2019.103354","volume":"102","author":"JM Steinkamp","year":"2020","unstructured":"Steinkamp JM, Bala W, Sharma A, Kantrowitz JJ. Task definition, annotated dataset, and supervised natural Language processing models for symptom extraction from unstructured clinical notes. J Biomed Inf. 2020;102:103354. https:\/\/doi.org\/10.1016\/j.jbi.2019.103354.","journal-title":"J Biomed Inf"},{"key":"3180_CR3","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1016\/j.cnc.2019.02.006","volume":"31","author":"S Mills","year":"2019","unstructured":"Mills S. Electronic health records and use of clinical decision support. Crit Care Nurs Clin North Am. 2019;31:125\u201331. https:\/\/doi.org\/10.1016\/j.cnc.2019.02.006.","journal-title":"Crit Care Nurs Clin North Am"},{"key":"3180_CR4","doi-asserted-by":"publisher","first-page":"364","DOI":"10.1093\/jamia\/ocy173","volume":"26","author":"TA Koleck","year":"2019","unstructured":"Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural Language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc. 2019;26:364\u201379. https:\/\/doi.org\/10.1093\/jamia\/ocy173.","journal-title":"J Am Med Inform Assoc"},{"key":"3180_CR5","doi-asserted-by":"publisher","unstructured":"Friedman C, Hripcsak G, DuMouchel W, Johnson S, Clayton P. Natural Language processing in an operational clinical information system. Nat Lang Eng. 1995;1:83\u2013108. https:\/\/doi.org\/10.1017\/S1351324900000061.","DOI":"10.1017\/S1351324900000061"},{"key":"3180_CR6","unstructured":"Friedman C, Knirsch C, Shagina L, Hripcsak G. Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries. AMIA Annu Symp Proc. 1999;256\u201360."},{"key":"3180_CR7","unstructured":"Jackson RG, Ball M, Patel R, Hayes RD, Dobson RJB, Stewart R. TextHunter \u2013 a user friendly tool for extracting generic concepts from free text in clinical research. AMIA Annu Symp Proc. 2014;2014:729."},{"key":"3180_CR8","doi-asserted-by":"publisher","DOI":"10.1136\/bmjopen-2016-012012","author":"RG Jackson","year":"2017","unstructured":"Jackson RG, Patel R, Jayatilleke N, Kolliakou A, Ball M, Gorrell G, et al. Natural Language processing to extract symptoms of severe mental illness from clinical text: the clinical record interactive search comprehensive data extraction (CRIS-CODE) project. BMJ Open. 2017. https:\/\/doi.org\/10.1136\/bmjopen-2016-012012.","journal-title":"BMJ Open"},{"key":"3180_CR9","doi-asserted-by":"publisher","first-page":"10","DOI":"10.13063\/2327-9214.1228","volume":"4","author":"G Divita","year":"2016","unstructured":"Divita G, Carter ME, Tran L-T, Redd D, Zeng QT, Duvall S, et al. v3NLP framework: tools to build applications for extracting concepts from clinical text. eGEMs. 2016;4:10. https:\/\/doi.org\/10.13063\/2327-9214.1228.","journal-title":"eGEMs"},{"key":"3180_CR10","doi-asserted-by":"publisher","first-page":"356","DOI":"10.3233\/978-1-61499-830-3-356","volume":"245","author":"G Divita","year":"2017","unstructured":"Divita G, Luo G, Tran L-TT, Workman TE, Gundlapalli AV, Samore MH. General symptom extraction from VA electronic medical notes. Stud Health Technol Inf. 2017;245:356\u201360. https:\/\/doi.org\/10.3233\/978-1-61499-830-3-356.","journal-title":"Stud Health Technol Inf"},{"key":"3180_CR11","doi-asserted-by":"publisher","unstructured":"Kraljevic Z, Searle T, Shek A, Roguski L, Noor K, Bean D, et al. Multi-domain clinical natural language processing with medcat: the medical concept annotation toolkit. Artif Intell Med. 2021;117:102083. https:\/\/doi.org\/10.1016\/j.artmed.2021.102083.","DOI":"10.1016\/j.artmed.2021.102083"},{"key":"3180_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-024-02589-x","volume":"24","author":"N Mehta","year":"2024","unstructured":"Mehta N, Ribeyre BB, Dimitrov L, English LJ, Ewart C, Heinrich A, et al. Creating a health informatics data resource for hearing health research. BMC Med Inf Decis Mak. 2024;24:1\u20139. https:\/\/doi.org\/10.1186\/s12911-024-02589-x.","journal-title":"BMC Med Inf Decis Mak"},{"key":"3180_CR13","unstructured":"NHS England. Action plan on hearing loss. 2015. https:\/\/www.england.nhs.uk\/wp-content\/uploads\/2015\/03\/act-plan-hearing-loss-upd.pdf."},{"key":"3180_CR14","doi-asserted-by":"publisher","unstructured":"Noor K, Roguski L, Handy A, Klapaukh R, Folarin A, Romao L, et al. Deployment of a free-text analytics platform at a UK National health service research hospital. CogStack at University College London Hospitals; 2021. https:\/\/doi.org\/10.2196\/38122.","DOI":"10.2196\/38122"},{"key":"3180_CR15","doi-asserted-by":"publisher","unstructured":"Jovanovi\u0107 J, Bagheri E. Semantic annotation in biomedicine: the current landscape. J Biomed Semant. 2017;8. https:\/\/doi.org\/10.1186\/s13326-017-0153-x.","DOI":"10.1186\/s13326-017-0153-x"},{"key":"3180_CR16","doi-asserted-by":"publisher","first-page":"5929","DOI":"10.1007\/s10462-020-09838-1","volume":"53","author":"G Van Houdt","year":"2020","unstructured":"Van Houdt G, Mosquera C, N\u00e1poles G. A review on the long short-term memory model. Artif Intell Rev. 2020;53:5929\u201355. https:\/\/doi.org\/10.1007\/s10462-020-09838-1.","journal-title":"Artif Intell Rev"},{"key":"3180_CR17","first-page":"37","volume":"3","author":"K Demeester","year":"2007","unstructured":"Demeester K, van Wieringen A, Hendrickx J-J, Topsakal V, Fransen E, Van Laer L, et al. Prevalence of tinnitus and audiometric shape. B-ENT. 2007;3:37\u201349.","journal-title":"B-ENT"},{"key":"3180_CR18","doi-asserted-by":"publisher","unstructured":"Iqbal E, Mallah R, Rhodes D, Wu H, Romero A, Chang N, et al. ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records. PLoS One. 2017;12. https:\/\/doi.org\/10.1371\/journal.pone.0187121.","DOI":"10.1371\/journal.pone.0187121"},{"key":"3180_CR19","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1016\/j.ijmedinf.2011.11.005","volume":"81","author":"ME Matheny","year":"2012","unstructured":"Matheny ME, FitzHenry F, Speroff T, Green JK, Griffith ML, Vasilevskis EE, et al. Detection of infectious symptoms from VA emergency department and primary care clinical Documentation. Int J Med Inf. 2012;81:143\u201356. https:\/\/doi.org\/10.1016\/j.ijmedinf.2011.11.005.","journal-title":"Int J Med Inf"},{"key":"3180_CR20","doi-asserted-by":"publisher","first-page":"629","DOI":"10.3233\/978-1-61499-564-7-629","volume":"216","author":"L Zhou","year":"2015","unstructured":"Zhou L, Baughman AW, Lei VJ, Lai KH, Navathe AS, Chang F, et al. Identifying patients with depression using free-text clinical documents. Stud Health Technol Inf. 2015;216:629\u201333. https:\/\/doi.org\/10.3233\/978-1-61499-564-7-629.","journal-title":"Stud Health Technol Inf"},{"key":"3180_CR21","doi-asserted-by":"publisher","unstructured":"Yenduri G, Selvi CG, Srivastava G, Kumar Reddy Maddikunta P, Raj DG, Jhaveri RH, et al. Generative Pre-trained Transformer: a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions arXiv.2305.10435. 2023. https:\/\/doi.org\/10.48550\/arXiv.2305.10435.","DOI":"10.48550\/arXiv.2305.10435"},{"key":"3180_CR22","doi-asserted-by":"publisher","unstructured":"Topal MO, Bas A, Van Heerden I. Exploring transformers in natural language generation: GPT, BERT, and XLNet. arXiv:2102.08036. 2021. https:\/\/doi.org\/10.48550\/arXiv.2102.08036.","DOI":"10.48550\/arXiv.2102.08036"},{"key":"3180_CR23","doi-asserted-by":"publisher","first-page":"115334","DOI":"10.1016\/j.psychres.2023.115334","volume":"326","author":"A McGowan","year":"2023","unstructured":"McGowan A, Gui Y, Dobbs M, Shuster S, Cotter M, Selloni A, et al. ChatGPT and bard exhibit spontaneous citation fabrication during psychiatry literature search. Psychiatry Res. 2023;326:115334. https:\/\/doi.org\/10.1016\/j.psychres.2023.115334.","journal-title":"Psychiatry Res"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-025-03180-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-025-03180-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-025-03180-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T19:21:46Z","timestamp":1759173706000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-025-03180-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,29]]},"references-count":23,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["3180"],"URL":"https:\/\/doi.org\/10.1186\/s12911-025-03180-8","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,29]]},"assertion":[{"value":"4 July 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 August 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 September 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"As per national regulations by the Health Research Authority (HRA), this study did not need NHS Research Ethics Committee approval. The HRA Decision Tool was used to aid this decision. Individual patient consent to participate was deemed unnecessary according to the same national regulations. The study was approved locally by the Royal National ENT and Eastman Dental Hospital Audit Lead and registered on the divisional audit programme 2021-22. Confidential data was kept within the hospital network. The study was conducted in compliance with the Declaration of Helsinki.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Richard J.B. Dobson is a director of CogStack Ltd and Onsentia Ltd.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"353"}}