{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T14:26:49Z","timestamp":1776868009282,"version":"3.51.2"},"reference-count":44,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T00:00:00Z","timestamp":1686700800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","award":["104690"],"award-info":[{"award-number":["104690"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Thrombolysis treatment for acute ischaemic stroke can lead to better outcomes if administered early enough. However, contraindications exist which put the patient at greater risk of a bleed (e.g. recent major surgery, anticoagulant medication). Therefore, clinicians must check a patient's past medical history before proceeding with treatment. In this work we present a machine learning approach for accurate automatic detection of this information in unstructured text documents such as discharge letters or referral letters, to support the clinician in making a decision about whether to administer thrombolysis.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>We consulted local and national guidelines for thrombolysis eligibility, identifying 86 entities which are relevant to the thrombolysis decision. A total of 8,067 documents from 2,912 patients were manually annotated with these entities by medical students and clinicians. Using this data, we trained and validated several transformer-based named entity recognition (NER) models, focusing on transformer models which have been pre-trained on a biomedical corpus as these have shown most promise in the biomedical NER literature.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Our best model was a PubMedBERT-based approach, which obtained a lenient micro\/macro F1 score of 0.829\/0.723. Ensembling 5 variants of this model gave a significant boost to precision, obtaining micro\/macro F1 of 0.846\/0.734 which approaches the human annotator performance of 0.847\/0.839. We further propose numeric definitions for the concepts of name regularity (similarity of all spans which refer to an entity) and context regularity (similarity of all context surrounding mentions of an entity), using these to analyse the types of errors made by the system and finding that the name regularity of an entity is a stronger predictor of model performance than raw training set frequency.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Overall, this work shows the potential of machine learning to provide clinical decision support (CDS) for the time-critical decision of thrombolysis administration in ischaemic stroke by quickly surfacing relevant information, leading to prompt treatment and hence to better patient outcomes.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fdgth.2023.1186516","type":"journal-article","created":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T11:19:56Z","timestamp":1686741596000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Acute stroke CDS: automatic retrieval of thrombolysis contraindications from unstructured clinical letters"],"prefix":"10.3389","volume":"5","author":[{"given":"Murray","family":"Cutforth","sequence":"first","affiliation":[]},{"given":"Hannah","family":"Watson","sequence":"additional","affiliation":[]},{"given":"Cameron","family":"Brown","sequence":"additional","affiliation":[]},{"given":"Chaoyang","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Stuart","family":"Thomson","sequence":"additional","affiliation":[]},{"given":"Dickon","family":"Fell","sequence":"additional","affiliation":[]},{"given":"Vismantas","family":"Dilys","sequence":"additional","affiliation":[]},{"given":"Morag","family":"Scrimgeour","sequence":"additional","affiliation":[]},{"given":"Patrick","family":"Schrempf","sequence":"additional","affiliation":[]},{"given":"James","family":"Lesh","sequence":"additional","affiliation":[]},{"given":"Keith","family":"Muir","sequence":"additional","affiliation":[]},{"given":"Alexander","family":"Weir","sequence":"additional","affiliation":[]},{"given":"Alison Q","family":"O\u2019Neil","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,6,14]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"1247","DOI":"10.1016\/S0140-6736(18)31874-9","article-title":"Current practice, future directions in the diagnosis, acute treatment of ischaemic stroke","volume":"392","author":"Zerna","year":"2018","journal-title":"Lancet"},{"key":"B2","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1161\/STROKEAHA.118.021840","article-title":"Rapid alteplase administration improves functional outcomes in patients with stroke due to large vessel occlusions","volume":"50","author":"Goyal","year":"2019","journal-title":"Stroke"},{"key":"B3","year":""},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.3389\/fcell.2020.00673","article-title":"Named entity recognition and relation detection for biomedical information extraction","volume":"8","author":"Perera","year":"2020","journal-title":"Front Cell Dev Biol"},{"key":"B5","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1016\/j.ijmedinf.2018.02.005","article-title":"Applying natural language processing techniques to develop a task-specific emr interface for timely stroke thrombolysis: a feasibility study","volume":"112","author":"Sung","year":"2018","journal-title":"Int J Med Inform"},{"key":"B6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-015-0229-4","article-title":"A clinical decision support tool to screen health records for contraindications to stroke thrombolysis\u2013a pilot study","volume":"15","author":"Sun","year":"2015","journal-title":"BMC Med Inform Decis Mak"},{"key":"B7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-020-1111-6","article-title":"Using predictive process monitoring to assist thrombolytic therapy decision-making for ischemic stroke patients","volume":"20","author":"Xu","year":"2020","journal-title":"BMC Med Inform Decis Mak"},{"key":"B8","doi-asserted-by":"publisher","first-page":"116667","DOI":"10.1016\/j.jns.2020.116667","article-title":"Predicting major neurologic improvement, long-term outcome after thrombolysis using artificial neural networks","volume":"410","author":"Chung","year":"2020","journal-title":"J Neurol Sci"},{"key":"B9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-014-0127-1","article-title":"Development of a computerised decision aid for thrombolysis in acute stroke care","volume":"15","author":"Flynn","year":"2015","journal-title":"BMC Med Inform Decis Mak"},{"key":"B10","doi-asserted-by":"publisher","first-page":"934929","DOI":"10.3389\/fneur.2022.934929","article-title":"The feasibility and accuracy of machine learning in improving safety and efficiency of thrombolysis for patients with stroke: Literature review and proposed improvements","volume":"13","author":"Shao","year":"2022","journal-title":"Front Neurol"},{"key":"B11","doi-asserted-by":"publisher","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The unified medical language system (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"B12","author":"Cariello","year":""},{"key":"B13","author":"Kim","year":""},{"key":"B14","doi-asserted-by":"publisher","first-page":"baw068","DOI":"10.1093\/database\/baw068","article-title":"Biocreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database"},{"key":"B15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J Biomed Inform"},{"key":"B16","doi-asserted-by":"publisher","first-page":"552","DOI":"10.1136\/amiajnl-2011-000203","article-title":"2010 i2b2\/VA challenge on concepts, assertions, and relations in clinical text","volume":"18","author":"Uzuner","year":"2011","journal-title":"J Am Med Inform Assoc"},{"key":"B17","author":"Mohan","year":""},{"key":"B18","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1016\/j.compbiolchem.2008.03.008","article-title":"Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature","volume":"32","author":"Yang","year":"2008","journal-title":"Comput Biol Chem"},{"key":"B19","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1016\/j.compbiolchem.2009.07.004","article-title":"Two-phase biomedical named entity recognition using CRFs","volume":"33","author":"Li","year":"2009","journal-title":"Comput Biol Chem"},{"key":"B20","doi-asserted-by":"publisher","first-page":"2493","DOI":"10.48550\/arXiv.1103.0398","article-title":"Natural language processing (almost) from scratch","volume":"12","author":"Collobert","year":"2011","journal-title":"J Mach Learn Res"},{"key":"B21","author":"Devlin","year":""},{"key":"B22","author":"Beltagy","year":""},{"key":"B23","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"Biobert: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"B24","author":"Gururangan","year":""},{"key":"B25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3458754","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2021","journal-title":"ACM Trans Comput Healthc (HEALTH)"},{"key":"B26","author":"Yuan","year":""},{"key":"B27","author":"He","year":""},{"key":"B28","author":"Michalopoulos","year":""},{"key":"B29","author":"Jeong","year":""},{"key":"B30","author":"Shin","year":""},{"key":"B31","author":"Phan","year":""},{"key":"B32","doi-asserted-by":"publisher","DOI":"10.23889\/ijpds.v7i3.2056","article-title":"Introducing a new trusted research environment \u2013 the safe haven artificial platform (SHAIP)","volume":"7","author":"Wilde","year":"2022","journal-title":"Int J Popul Data Sci"},{"key":"B33","year":""},{"key":"B34","year":""},{"key":"B35","author":"Honnibal","year":""},{"key":"B36","author":"Ramshaw","year":""},{"key":"B37","author":"Kingma","year":""},{"key":"B38","author":"Li","year":""},{"key":"B39","author":"Bergstra","year":""},{"key":"B40","volume-title":"Pattern recognition and machine learning","author":"Bishop","year":"2006"},{"key":"B41","author":"Lin","year":""},{"key":"B42","author":"Pennington","year":""},{"key":"B43","author":"Dai","year":""},{"key":"B44","author":"Humeau","year":""}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2023.1186516\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T11:20:03Z","timestamp":1686741603000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2023.1186516\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,14]]},"references-count":44,"alternative-id":["10.3389\/fdgth.2023.1186516"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2023.1186516","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,14]]},"article-number":"1186516"}}