{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T16:32:29Z","timestamp":1775233949424,"version":"3.50.1"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2021,6,22]],"date-time":"2021-06-22T00:00:00Z","timestamp":1624320000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8,13]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>The study sought to develop and evaluate neural natural language processing (NLP) packages for the syntactic analysis and named entity recognition of biomedical and clinical English text.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We implement and train biomedical and clinical English NLP pipelines by extending the widely used Stanza library originally designed for general NLP tasks. Our models are trained with a mix of public datasets such as the CRAFT treebank as well as with a private corpus of radiology reports annotated with 5 radiology-domain entities. The resulting pipelines are fully based on neural networks, and are able to perform tokenization, part-of-speech tagging, lemmatization, dependency parsing, and named entity recognition for both biomedical and clinical text. We compare our systems against popular open-source NLP libraries such as CoreNLP and scispaCy, state-of-the-art models such as the BioBERT models, and winning systems from the BioNLP CRAFT shared task.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>For syntactic analysis, our systems achieve much better performance compared with the released scispaCy models and CoreNLP models retrained on the same treebanks, and are on par with the winning system from the CRAFT shared task. For NER, our systems substantially outperform scispaCy, and are better or on par with the state-of-the-art performance from BioBERT, while being much more computationally efficient.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>We introduce biomedical and clinical NLP packages built for the Stanza library. These packages offer performance that is similar to the state of the art, and are also optimized for ease of use. To facilitate research, we make all our models publicly available. We also provide an online demonstration (http:\/\/stanza.run\/bio).<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocab090","type":"journal-article","created":{"date-parts":[[2021,5,4]],"date-time":"2021-05-04T03:11:42Z","timestamp":1620097902000},"page":"1892-1899","source":"Crossref","is-referenced-by-count":96,"title":["Biomedical and clinical English model packages for the Stanza Python NLP library"],"prefix":"10.1093","volume":"28","author":[{"given":"Yuhao","family":"Zhang","sequence":"first","affiliation":[{"name":"Biomedical Informatics Training Program, Stanford University, Stanford, California, USA"}]},{"given":"Yuhui","family":"Zhang","sequence":"additional","affiliation":[{"name":"Computer Science Department, Stanford University, Stanford, California, USA"}]},{"given":"Peng","family":"Qi","sequence":"additional","affiliation":[{"name":"Computer Science Department, Stanford University, Stanford, California, USA"}]},{"given":"Christopher D","family":"Manning","sequence":"additional","affiliation":[{"name":"Computer Science and Linguistics Departments, Stanford University, Stanford, California, USA"}]},{"given":"Curtis P","family":"Langlotz","sequence":"additional","affiliation":[{"name":"Department of Radiology, Stanford University, Stanford, California, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,6,22]]},"reference":[{"issue":"5","key":"2021081407013016500_ocab090-B1","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1016\/j.molcel.2006.02.012","article-title":"Biomedical language processing: what\u2019s beyond PubMed?","volume":"21","author":"Hunter","year":"2006","journal-title":"Mol Cell"},{"issue":"16","key":"2021081407013016500_ocab090-B2","doi-asserted-by":"crossref","first-page":"1628","DOI":"10.1056\/NEJMsa0900592","article-title":"Use of electronic health records in U.S. hospitals","volume":"360","author":"Jha","year":"2009","journal-title":"N Engl J Med"},{"issue":"19","key":"2021081407013016500_ocab090-B3","doi-asserted-by":"crossref","first-page":"2840","DOI":"10.1093\/bioinformatics\/btu383","article-title":"Literome: PubMed-scale genomic knowledge base in the cloud","volume":"30","author":"Poon","year":"2014","journal-title":"Bioinformatics"},{"issue":"4","key":"2021081407013016500_ocab090-B4","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"issue":"2","key":"2021081407013016500_ocab090-B5","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1016\/j.jbi.2011.01.004","article-title":"AskHERMES: An online question answering system for complex clinical questions","volume":"44","author":"Cao","year":"2011","journal-title":"J Biomed Inform"},{"key":"2021081407013016500_ocab090-B6","author":"Jin","year":"2019"},{"key":"2021081407013016500_ocab090-B7","author":"Du","year":"2019"},{"key":"2021081407013016500_ocab090-B8","author":"McClosky","year":"2008"},{"key":"2021081407013016500_ocab090-B9","author":"Baumgartner","year":"2019"},{"key":"2021081407013016500_ocab090-B10","author":"Manning","year":"2014"},{"key":"2021081407013016500_ocab090-B11","author":"Neumann","year":"2019"},{"key":"2021081407013016500_ocab090-B12","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The Unified Medical Language System (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"issue":"5","key":"2021081407013016500_ocab090-B13","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1136\/jamia.2009.001560","article-title":"Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications","volume":"17","author":"Savova","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2021081407013016500_ocab090-B14","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1186\/1472-6947-6-30","article-title":"Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system","volume":"6","author":"Zeng","year":"2006","journal-title":"BMC Med Inform Decis Mak"},{"issue":"3","key":"2021081407013016500_ocab090-B15","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1136\/jamia.2009.002733","article-title":"An overview of MetaMap: historical perspective and recent advances","volume":"17","author":"Aronson","year":"2010","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2021081407013016500_ocab090-B16","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1093\/jamia\/ocx132","article-title":"CLAMP\u2014a toolkit for efficiently building customized clinical natural language processing pipelines","volume":"25","author":"Soysal","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2021081407013016500_ocab090-B17","doi-asserted-by":"crossref","first-page":"29","DOI":"10.5195\/jmla.2020.819","article-title":"Why do biomedical researchers learn to program? An exploratory investigation","volume":"108","author":"Deardorff","year":"2020","journal-title":"J Med Libr Assoc"},{"key":"2021081407013016500_ocab090-B18","author":"Qi","year":"2020"},{"key":"2021081407013016500_ocab090-B19","first-page":"4034","author":"Nivre","year":"2020"},{"key":"2021081407013016500_ocab090-B20","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1186\/1471-2105-13-207","article-title":"A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools","volume":"13","author":"Verspoor","year":"2012","journal-title":"BMC Bioinform"},{"key":"2021081407013016500_ocab090-B21","doi-asserted-by":"crossref","first-page":"160035","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci Data"},{"key":"2021081407013016500_ocab090-B22","author":"Dozat","year":"2017"},{"key":"2021081407013016500_ocab090-B23","first-page":"160","author":"Qi","year":"2018"},{"issue":"Suppl 1","key":"2021081407013016500_ocab090-B24","doi-asserted-by":"crossref","first-page":"i180","DOI":"10.1093\/bioinformatics\/btg1023","article-title":"GENIA corpus\u2014a semantically annotated corpus for bio-textmining","volume":"19","author":"Kim","year":"2003","journal-title":"Bioinformatics"},{"key":"2021081407013016500_ocab090-B25","first-page":"2371","author":"Schuster","year":"2016"},{"key":"2021081407013016500_ocab090-B26","first-page":"2897","author":"Silveira","year":"2014"},{"key":"2021081407013016500_ocab090-B27","first-page":"1638","author":"Akbik","year":"2018"},{"issue":"6","key":"2021081407013016500_ocab090-B28","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1093\/bioinformatics\/btt580","article-title":"Anatomical entity mention recognition at literature scale","volume":"30","author":"Pyysalo","year":"2014","journal-title":"Bioinformatics"},{"key":"2021081407013016500_ocab090-B29","doi-asserted-by":"crossref","first-page":"baw068","DOI":"10.1093\/database\/baw068","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database (Oxford)"},{"issue":"Suppl 1","key":"2021081407013016500_ocab090-B30","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1758-2946-7-S1-S2","article-title":"The CHEMDNER corpus of chemicals and drugs and its annotation principles","volume":"7","author":"Krallinger","year":"2015","journal-title":"J Cheminform"},{"issue":"S10","key":"2021081407013016500_ocab090-B31","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-16-S10-S2","article-title":"Overview of the cancer genetics and pathway curation tasks of BioNLP shared task 2013","volume":"16","author":"Pyysalo","year":"2015","journal-title":"BMC Bioinform"},{"key":"2021081407013016500_ocab090-B32","first-page":"73","author":"Kim","year":"2004"},{"key":"2021081407013016500_ocab090-B33","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1186\/1471-2105-11-85","article-title":"LINNAEUS: a species name identification system for biomedical literature","volume":"11","author":"Gerner","year":"2010","journal-title":"BMC Bioinform"},{"key":"2021081407013016500_ocab090-B34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J Biomed Inform"},{"issue":"6","key":"2021081407013016500_ocab090-B35","doi-asserted-by":"crossref","first-page":"e65390","DOI":"10.1371\/journal.pone.0065390","article-title":"The SPECIES and ORGANISMS resources for fast and accurate identification of taxonomic names in text","volume":"8","author":"Pafilis","year":"2013","journal-title":"PLoS One"},{"issue":"10","key":"2021081407013016500_ocab090-B36","doi-asserted-by":"crossref","first-page":"1745","DOI":"10.1093\/bioinformatics\/bty869","article-title":"Cross-type biomedical named entity recognition with deep multi-task learning","volume":"35","author":"Wang","year":"2019","journal-title":"Bioinformatics"},{"issue":"5","key":"2021081407013016500_ocab090-B37","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1136\/amiajnl-2011-000203","article-title":"2010 i2b2\/VA challenge on concepts, assertions, and relations in clinical text","volume":"18","author":"Uzuner","year":"2011","journal-title":"J Am Med Inform Assoc"},{"key":"2021081407013016500_ocab090-B38","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/j.artmed.2015.09.007","article-title":"Information extraction from multi-institutional radiology reports","volume":"66","author":"Hassanpour","year":"2016","journal-title":"Artif Intell Med"},{"issue":"1","key":"2021081407013016500_ocab090-B39","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1186\/s12859-019-2604-0","article-title":"From POS tagging to dependency parsing for biomedical event extraction","volume":"20","author":"Nguyen","year":"2019","journal-title":"BMC Bioinform"},{"key":"2021081407013016500_ocab090-B40","volume-title":"Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit","author":"Bird","year":"2009"},{"key":"2021081407013016500_ocab090-B41","first-page":"2442","author":"Andor","year":"2016"},{"key":"2021081407013016500_ocab090-B42","first-page":"206","author":"Ngo","year":"2019"},{"key":"2021081407013016500_ocab090-B43","first-page":"3615","author":"Beltagy","year":"2019"},{"key":"2021081407013016500_ocab090-B44","first-page":"72","author":"Alsentzer","year":"2019"},{"key":"2021081407013016500_ocab090-B45","author":"Moen","year":"2013"},{"issue":"1","key":"2021081407013016500_ocab090-B46","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1038\/s41597-019-0055-0","article-title":"BioWordVec, improving biomedical word embeddings with subword information and MeSH","volume":"6","author":"Zhang","year":"2019","journal-title":"Sci Data"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/28\/9\/1892\/39731803\/ocab090.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/28\/9\/1892\/39731803\/ocab090.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T03:47:19Z","timestamp":1724989639000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/28\/9\/1892\/6307885"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,22]]},"references-count":46,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2021,6,22]]},"published-print":{"date-parts":[[2021,8,13]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocab090","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,9,1]]},"published":{"date-parts":[[2021,6,22]]}}}