{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T17:10:50Z","timestamp":1772557850348,"version":"3.50.1"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"S1","license":[{"start":{"date-parts":[[2019,11,1]],"date-time":"2019-11-01T00:00:00Z","timestamp":1572566400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2019,11,12]],"date-time":"2019-11-12T00:00:00Z","timestamp":1573516800000},"content-version":"vor","delay-in-days":11,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Biomed Semant"],"published-print":{"date-parts":[[2019,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n<jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>Free text in electronic health records (EHR) may contain additional phenotypic information beyond structured (coded) information. For major health events \u2013 heart attack and death \u2013 there is a lack of studies evaluating the extent to which free text in the primary care record might add information. Our objectives were to describe the contribution of free text in primary care to the recording of information about myocardial infarction (MI), including subtype, left ventricular function, laboratory results and symptoms; and recording of cause of death. We used the CALIBER EHR research platform which contains primary care data from the Clinical Practice Research Datalink (CPRD) linked to hospital admission data, the MINAP registry of acute coronary syndromes and the death registry. In CALIBER we randomly selected 2000 patients with MI and 1800 deaths. We implemented a rule-based natural language engine, the Freetext Matching Algorithm, on site at CPRD to analyse free text in the primary care record without raw data being released to researchers. We analysed text recorded within 90\u2009days before or 90\u2009days after the MI, and on or after the date of death.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>We extracted 10,927 diagnoses, 3658 test results, 3313 statements of negation, and 850 suspected diagnoses from the myocardial infarction patients. Inclusion of free text increased the recorded proportion of patients with chest pain in the week prior to MI from 19 to 27%, and differentiated between MI subtypes in a quarter more patients than structured data alone. Cause of death was incompletely recorded in primary care; in 36% the cause was in coded data and in 21% it was in free text. Only 47% of patients had exactly the same cause of death in primary care and the death registry, but this did not differ between coded and free text causes of death.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>Among patients who suffer MI or die, unstructured free text in primary care records contains much information that is potentially useful for research such as symptoms, investigation results and specific diagnoses. Access to large scale unstructured data in electronic health records (millions of patients) might yield important insights.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s13326-019-0214-4","type":"journal-article","created":{"date-parts":[[2019,11,12]],"date-time":"2019-11-12T01:02:36Z","timestamp":1573520556000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death"],"prefix":"10.1186","volume":"10","author":[{"given":"Anoop D.","family":"Shah","sequence":"first","affiliation":[]},{"given":"Emily","family":"Bailey","sequence":"additional","affiliation":[]},{"given":"Tim","family":"Williams","sequence":"additional","affiliation":[]},{"given":"Spiros","family":"Denaxas","sequence":"additional","affiliation":[]},{"given":"Richard","family":"Dobson","sequence":"additional","affiliation":[]},{"given":"Harry","family":"Hemingway","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,11,12]]},"reference":[{"key":"214_CR1","doi-asserted-by":"publisher","first-page":"4","DOI":"10.4104\/pcrj.2010.00078","volume":"20","author":"D Kalra","year":"2011","unstructured":"Kalra D, Fernando B. Approaches to enhancing the validity of coded data in electronic medical records. Prim Care Respir J. 2011;20:4\u20135.","journal-title":"Prim Care Respir J"},{"key":"214_CR2","doi-asserted-by":"publisher","first-page":"h1885","DOI":"10.1136\/bmj.h1885","volume":"350","author":"KP Liao","year":"2015","unstructured":"Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350:h1885. https:\/\/doi.org\/10.1136\/bmj.h1885.","journal-title":"BMJ."},{"key":"214_CR3","unstructured":"Natural language processing tools. eMERGE Network. https:\/\/emerge.mc.vanderbilt.edu\/natural-language-processing-nlp-tools\/. Accessed 23 May 2018."},{"issue":"3","key":"214_CR4","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1177\/2048872613487495","volume":"2","author":"E Herrett","year":"2013","unstructured":"Herrett E, George J, Denaxas S, Bhaskaran K, Timmis A, Hemingway H, Smeeth L. Type and timing of heralding in ST-elevation and non-ST-elevation myocardial infarction: an analysis of prospectively collected electronic healthcare records linked to the national registry of acute coronary syndromes. Eur Heart J Acute Cardiovasc Care. 2013;2(3):235\u201345. https:\/\/doi.org\/10.1177\/2048872613487495.","journal-title":"Eur Heart J Acute Cardiovasc Care"},{"issue":"4","key":"214_CR5","doi-asserted-by":"publisher","first-page":"666","DOI":"10.1016\/j.ahj.2006.12.022","volume":"153","author":"SS Pakhomov","year":"2007","unstructured":"Pakhomov SS, Hemingway H, Weston SA, Jacobsen SJ, Rodeheffer R, Roger VL. Epidemiology of angina pectoris: role of natural language processing of the medical record. Am Heart J. 2007;153(4):666\u201373.","journal-title":"Am Heart J"},{"issue":"1","key":"214_CR6","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1111\/j.1365-2125.2009.03537.x","volume":"69","author":"E Herrett","year":"2010","unstructured":"Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the general practice research database: a systematic review. Br J Clin Pharmacol. 2010;69(1):4\u201314. https:\/\/doi.org\/10.1111\/j.1365-2125.2009.03537.x.","journal-title":"Br J Clin Pharmacol"},{"issue":"1","key":"214_CR7","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1186\/1472-6947-12-88","volume":"12","author":"AD Shah","year":"2012","unstructured":"Shah AD, Martinez C, Hemingway H. The Freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records. BMC Med Inform Decis Mak. 2012;12(1):88. https:\/\/doi.org\/10.1186\/1472-6947-12-88.","journal-title":"BMC Med Inform Decis Mak"},{"key":"214_CR8","doi-asserted-by":"publisher","unstructured":"Koeling R, Tate AR, Carroll JA. Automatically estimating the incidence of symptoms recorded in GP free text notes. In: proceedings of the first international workshop on managing interoperability and complexity in health systems, Glasgow, Scotland, UK, 2011 (pp. 43\u201350). New York: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/2064747.2064757.","DOI":"10.1145\/2064747.2064757"},{"issue":"1","key":"214_CR9","doi-asserted-by":"publisher","DOI":"10.1136\/bmjopen-2010-000025","volume":"1","author":"AR Tate","year":"2011","unstructured":"Tate AR, Martin AG, Ali A, Cassell JA. Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer. BMJ Open. 2011;1(1):e000025. https:\/\/doi.org\/10.1136\/bmjopen-2010-000025.","journal-title":"BMJ Open"},{"issue":"1","key":"214_CR10","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0030412","volume":"7","author":"Z Wang","year":"2012","unstructured":"Wang Z, Shah AD, Tate AR, Denaxas S, Shawe-Taylor J, Hemingway H. Extracting diagnoses and investigation results from unstructured text in electronic health records by semi-supervised machine learning. PLoS One. 2012;7(1):e30412. https:\/\/doi.org\/10.1371\/journal.pone.0030412.","journal-title":"PLoS One"},{"issue":"6","key":"214_CR11","doi-asserted-by":"publisher","first-page":"1625","DOI":"10.1093\/ije\/dys188","volume":"41","author":"SC Denaxas","year":"2012","unstructured":"Denaxas SC, George J, Herrett E, Shah AD, Kalra D, Hingorani AD, et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). Int J Epidemiol. 2012;41(6):1625\u201338. https:\/\/doi.org\/10.1093\/ije\/dys188.","journal-title":"Int J Epidemiol"},{"key":"214_CR12","doi-asserted-by":"publisher","first-page":"f2350","DOI":"10.1136\/bmj.f2350","volume":"346","author":"E Herrett","year":"2013","unstructured":"Herrett E, Shah AD, Boggon R, Denaxas S, Smeeth L, van Staa T, et al. Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records. BMJ. 2013;346:f2350.","journal-title":"BMJ"},{"issue":"1","key":"214_CR13","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001;45(1):5\u201332.","journal-title":"Mach Learn"},{"key":"214_CR14","unstructured":"World Health Organization: International statistical classification of diseases and related health problems. 10th revision, fifth edition, 2016. ISBN 978 92 4 154916 5."},{"key":"214_CR15","unstructured":"R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria, 2015. https:\/\/www.R-project.org. Accessed 23 May 2018."},{"key":"214_CR16","volume-title":"CALIBER health records research toolkit","author":"AD Shah","year":"2015","unstructured":"Shah, AD. CALIBER health records research toolkit. 2015. https:\/\/r-forge.r-project.org\/projects\/caliberanalysis\/. ."},{"key":"214_CR17","doi-asserted-by":"publisher","first-page":"523","DOI":"10.1007\/s10579-015-9330-7","volume":"50","author":"A Savkov","year":"2016","unstructured":"Savkov A, Carroll J, Koeling R, Cassell J. Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus. Lang Resour Eval. 2016;50:523\u201348.","journal-title":"Lang Resour Eval"},{"key":"214_CR18","unstructured":"Zheng J, Yarzebski J, Ramesh BP, Goldberg RJ, Yu H. Automatically detecting acute myocardial infarction Events from EHR text: a preliminary study. AMIA Annu Symp Proc 2014;2014:1286\u20131293. eCollection 2014."},{"key":"214_CR19","unstructured":"Provost F. Machine learning from imbalanced data sets 101 (extended abstract). AAAI technical report WS-00-05. In: Papers from the AAAI Workshop, 2000. ISBN 978-1-57735-120-7. https:\/\/vvvvw.aaai.org\/Papers\/Workshops\/2000\/WS-00-05\/WS00-05-001.pdf"},{"key":"214_CR20","unstructured":"Liu A, Ziebart B. Robust classification under sample selection Bias. In: Advances in Neural Information Processing Systems 27, 2014. https:\/\/papers.nips.cc\/paper\/5458-robust-classification-under-sample-selection-bias.pdf"},{"key":"214_CR21","first-page":"507","volume":"17","author":"G Savova","year":"2010","unstructured":"Savova G, Masanz J, Ogren P, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo Clinic clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. JAMIA. 2010;17:507\u201313.","journal-title":"JAMIA"},{"key":"214_CR22","unstructured":"Aronson A: MetaMap. US National Library of Medicine 2011. http:\/\/metamap.nlm.nih.gov\/. Accessed 23 May 2018."},{"key":"214_CR23","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1186\/1472-6947-6-30","volume":"6","author":"QT Zeng","year":"2006","unstructured":"Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006;6:30.","journal-title":"BMC Med Inform Decis Mak."},{"key":"214_CR24","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/978-3-319-18818-8_11","volume-title":"The Semantic Web. Latest Advances and New Domains","author":"Genevieve Gorrell","year":"2015","unstructured":"Genevieve G, Petrak J, Bontcheva K. Using @Twitter Conventions to Improve #LOD-based Named Entity Disambiguation. In: The Semantic Web. Latest Advances and New Domains. Proceedings of the 12th European Semantic Web Conference 2015, Portoroz, Slovenia (pp. 171\u2013186). Springer international Publishing, 2015. https:\/\/doi.org\/10.1007\/978-3-319-18818-8_11."},{"issue":"1","key":"214_CR25","first-page":"183","volume":"10","author":"S Velupillai","year":"2015","unstructured":"Velupillai S, Mowery D, South BR, Kvist M, Dalianis H. Recent advances in clinical natural language processing in support of semantic analysis. Yearb Med Inform. 2015;10(1):183\u201393.","journal-title":"Yearb Med Inform"},{"key":"214_CR26","unstructured":"Microsoft. Design Guidance: Terminology. NHS Common User Interface Programme 2007. https:\/\/webarchive.nationalarchives.gov.uk\/20160921150545\/http:\/\/systems.digital.nhs.uk\/data\/cui\/uig. Accessed 1 July 2019."},{"issue":"1","key":"214_CR27","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1186\/s12911-017-0418-4","volume":"17","author":"HS Chase","year":"2017","unstructured":"Chase HS, Mitrani LR, Lu GG, Fulgieri DJ. Early recognition of multiple sclerosis using natural language processing of the electronic health record. BMC Med Inform Decis Mak. 2017;17(1):24. https:\/\/doi.org\/10.1186\/s12911-017-0418-4.","journal-title":"BMC Med Inform Decis Mak."},{"issue":"7","key":"214_CR28","doi-asserted-by":"publisher","DOI":"10.1136\/bmjopen-2017-017146","volume":"7","author":"A Dowell","year":"2017","unstructured":"Dowell A, Darlow B, Macrae J, Stubbe M, Turner N, McBain L. Childhood respiratory illness presentation and service utilisation in primary care: a six-year cohort study in Wellington, New Zealand, using natural language processing (NLP) software. BMJ Open. 2017;7(7):e017146. https:\/\/doi.org\/10.1136\/bmjopen-2017-017146.","journal-title":"BMJ Open"},{"issue":"8","key":"214_CR29","doi-asserted-by":"publisher","DOI":"10.1136\/bmjopen-2015-008160","volume":"5","author":"J MacRae","year":"2015","unstructured":"MacRae J, Darlow B, McBain L, Jones O, Stubbe M, Turner N, Dowell A. Accessing primary care big data: the development of a software algorithm to explore the rich content of consultation records. BMJ Open. 2015;5(8):e008160. https:\/\/doi.org\/10.1136\/bmjopen-2015-008160.","journal-title":"BMJ Open"},{"issue":"7","key":"214_CR30","doi-asserted-by":"publisher","first-page":"459","DOI":"10.1016\/j.cardfail.2014.03.008","volume":"20","author":"R Vijayakrishnan","year":"2014","unstructured":"Vijayakrishnan R, Steinhubl SR, Ng K, Sun J, Byrd RJ, Daar Z, Williams BA, De Filippi C, Badollahi S E, Stewart WF. Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. J Card Fail. 2014;20(7):459\u201364. https:\/\/doi.org\/10.1016\/j.cardfail.2014.03.008.","journal-title":"J Card Fail"},{"key":"214_CR31","doi-asserted-by":"publisher","unstructured":"Roland M, Guthrie B. Quality and outcomes framework: what have we learnt? BMJ. 2016;354. https:\/\/doi.org\/10.1136\/bmj.i4060.","DOI":"10.1136\/bmj.i4060"},{"key":"214_CR32","unstructured":"Cogstack. https:\/\/ctiuk.org\/projects\/cogstack\/. Accessed 23 May 2018."},{"key":"214_CR33","doi-asserted-by":"publisher","DOI":"10.1136\/bmjopen-2015-008721","volume":"6","author":"G Perera","year":"2016","unstructured":"Perera G, Broadbent M, Callard F, Chang CK, Downs J, Dutta R, et al. Cohort profile of the South London and Maudsley NHS Foundation Trust biomedical research Centre (SLaM BRC) case register: current status and recent enhancement of an electronic mental health record-derived data resource. BMJ Open. 2016;6:e008721. https:\/\/doi.org\/10.1136\/bmjopen-2015-008721.","journal-title":"BMJ Open"}],"container-title":["Journal of Biomedical Semantics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13326-019-0214-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13326-019-0214-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13326-019-0214-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,11]],"date-time":"2020-11-11T00:37:45Z","timestamp":1605055065000},"score":1,"resource":{"primary":{"URL":"https:\/\/jbiomedsem.biomedcentral.com\/articles\/10.1186\/s13326-019-0214-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11]]},"references-count":33,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2019,11]]}},"alternative-id":["214"],"URL":"https:\/\/doi.org\/10.1186\/s13326-019-0214-4","relation":{},"ISSN":["2041-1480"],"issn-type":[{"value":"2041-1480","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11]]},"assertion":[{"value":"12 November 2019","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The CALIBER programme has been approved by a NHS Research Ethics Committee (09\/H0810\/16). This study was approved by the CPRD Independent Scientific Advisory Committee (protocol 12_117). Individual patient consent is not required for observational CPRD studies, but patients have the opportunity to opt out of contributing to the database.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"20"}}