{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T16:51:44Z","timestamp":1780332704098,"version":"3.54.1"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T00:00:00Z","timestamp":1700784000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Shriners Children\u2019s","award":["70904"],"award-info":[{"award-number":["70904"]}]},{"name":"NIH NHGRI","award":["1U24HG011449-01A1"],"award-info":[{"award-number":["1U24HG011449-01A1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https:\/\/github.com\/monarch-initiative\/fenominal under the GNU General Public License v3.0.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad716","type":"journal-article","created":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T03:19:45Z","timestamp":1700882385000},"source":"Crossref","is-referenced-by-count":3,"title":["Term-BLAST-like alignment tool for concept recognition in noisy clinical texts"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2267-8333","authenticated-orcid":false,"given":"Tudor","family":"Groza","sequence":"first","affiliation":[{"name":"Rare Care Centre, Perth Children\u2019s Hospital , Nedlands, WA 6009, Australia"},{"name":"Genetics and Rare Diseases Program, Telethon Kids Institute , Nedlands, WA 6009, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0213-5668","authenticated-orcid":false,"given":"Honghan","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute of Health Informatics, University College London , London WC1E 6BT, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marcel E","family":"Dinger","sequence":"additional","affiliation":[{"name":"Pryzm Health , Sydney, NSW 2089, Australia"},{"name":"School of Life and Environmental Sciences, Faculty of Science, University of Sydney , NSW 2006, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Daniel","family":"Danis","sequence":"additional","affiliation":[{"name":"The Jackson Laboratory for Genomic Medicine , Farmington, CT 06032, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Coleman","family":"Hilton","sequence":"additional","affiliation":[{"name":"Shriners Children\u2019s Corporate Headquarters , Tampa, FL 33607, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anita","family":"Bagley","sequence":"additional","affiliation":[{"name":"Shriners Children's Northern California , Sacramento, CA 95817, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jon R","family":"Davids","sequence":"additional","affiliation":[{"name":"Shriners Children's Northern California , Sacramento, CA 95817, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5141-0259","authenticated-orcid":false,"given":"Ling","family":"Luo","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD 20894, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9998-916X","authenticated-orcid":false,"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD 20894, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0736-9199","authenticated-orcid":false,"given":"Peter N","family":"Robinson","sequence":"additional","affiliation":[{"name":"The Jackson Laboratory for Genomic Medicine , Farmington, CT 06032, United States"},{"name":"Institute for Systems Genomics, University of Connecticut , Farmington, CT 06032, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2023,11,24]]},"reference":[{"key":"2023120923273980800_btad716-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2023120923273980800_btad716-B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023120923273980800_btad716-B3","doi-asserted-by":"crossref","first-page":"e12596","DOI":"10.2196\/12596","article-title":"Identifying clinical terms in medical text using ontology-guided machine learning","volume":"7","author":"Arbabi","year":"2019","journal-title":"JMIR Med Inform"},{"key":"2023120923273980800_btad716-B4","first-page":"659","article-title":"Seven years since the launch of the matchmaker exchange: the evolution of genomic matchmaking","volume":"43","author":"Boycott","year":"2022","journal-title":"Hum Mutat"},{"key":"2023120923273980800_btad716-B5","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1038\/s41525-018-0053-8","article-title":"Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases","volume":"3","author":"Clark","year":"2018","journal-title":"NPJ Genom Med"},{"key":"2023120923273980800_btad716-B6","doi-asserted-by":"crossref","first-page":"1585","DOI":"10.1038\/s41436-018-0381-1","article-title":"ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis","volume":"21","author":"Deisseroth","year":"2019","journal-title":"Genet Med"},{"key":"2023120923273980800_btad716-B7","author":"Gorinski","year":"2019"},{"key":"2023120923273980800_btad716-B8","doi-asserted-by":"crossref","first-page":"bav005","DOI":"10.1093\/database\/bav005","article-title":"Automatic concept recognition using the human phenotype ontology reference and test suite corpora","volume":"2015","author":"Groza","year":"2015","journal-title":"Database"},{"key":"2023120923273980800_btad716-B9","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1038\/s41587-022-01357-4","article-title":"The GA4GH phenopacket schema defines a computable representation of clinical data","volume":"40","author":"Jacobsen","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2023120923273980800_btad716-B10","first-page":"56","article-title":"The open biomedical annotator","volume":"2009","author":"Jonquet","year":"2009","journal-title":"AMIA Joint Summit Transl Bioinformatics"},{"key":"2023120923273980800_btad716-B11","doi-asserted-by":"crossref","first-page":"D1077","DOI":"10.1093\/nar\/gkr913","article-title":"Gene expression atlas update\u2014a value-added database of microarray and sequencing-based functional genomics experiments","volume":"40","author":"Kapushesky","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023120923273980800_btad716-B12","first-page":"234","article-title":"Context-sensitive spelling correction of clinical text via conditional independence","volume":"174","author":"Kim","year":"2022","journal-title":"Proc Mach Learn Res"},{"key":"2023120923273980800_btad716-B13","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/S0378-1119(00)00431-5","article-title":"Using blast for identifying gene and protein names in journal articles","volume":"259","author":"Krauthammer","year":"2000","journal-title":"Gene"},{"key":"2023120923273980800_btad716-B14","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1016\/j.ajhg.2009.09.003","article-title":"Clinical diagnostics in human genetics with semantic similarity searches in ontologies","volume":"85","author":"K\u00f6hler","year":"2009","journal-title":"Am J Hum Genet"},{"key":"2023120923273980800_btad716-B15","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2019","journal-title":"Bioinformatics"},{"key":"2023120923273980800_btad716-B16","doi-asserted-by":"crossref","first-page":"1435","DOI":"10.1126\/science.2983426","article-title":"Rapid and sensitive protein similarity searches","volume":"227","author":"Lipman","year":"1985","journal-title":"Science"},{"key":"2023120923273980800_btad716-B17","doi-asserted-by":"crossref","first-page":"W566","DOI":"10.1093\/nar\/gkz386","article-title":"Doc2Hpo: a web application for efficient and accurate HPO concept curation","volume":"47","author":"Liu","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023120923273980800_btad716-B18","doi-asserted-by":"crossref","first-page":"8565739","DOI":"10.1155\/2017\/8565739","article-title":"Identifying human phenotype terms by combining machine learning and validation rules","volume":"2017","author":"Lobo","year":"2017","journal-title":"Biomed Res Int"},{"key":"2023120923273980800_btad716-B19","doi-asserted-by":"crossref","first-page":"1884","DOI":"10.1093\/bioinformatics\/btab019","article-title":"PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology","volume":"37","author":"Luo","year":"2021","journal-title":"Bioinformatics"},{"key":"2023120923273980800_btad716-B20","first-page":"3111","author":"Mikolov","year":"2013"},{"key":"2023120923273980800_btad716-B21","doi-asserted-by":"crossref","first-page":"bav089","DOI":"10.1093\/database\/bav089","article-title":"SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data","volume":"2015","author":"Pang","year":"2015","journal-title":"Database"},{"key":"2023120923273980800_btad716-B22","doi-asserted-by":"crossref","first-page":"610","DOI":"10.1016\/j.ajhg.2008.09.017","article-title":"The human phenotype ontology: a tool for annotating and analyzing human hereditary disease","volume":"83","author":"Robinson","year":"2008","journal-title":"Am J Hum Genet"},{"key":"2023120923273980800_btad716-B23","doi-asserted-by":"crossref","first-page":"D704","DOI":"10.1093\/nar\/gkz997","article-title":"The monarch initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species","volume":"48","author":"Shefchek","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023120923273980800_btad716-B24","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/j.ajhg.2016.07.005","article-title":"A whole-genome analysis framework for effective identification of pathogenic regulatory variants in mendelian disease","volume":"99","author":"Smedley","year":"2016","journal-title":"Am J Hum Genet"},{"key":"2023120923273980800_btad716-B25","doi-asserted-by":"crossref","first-page":"1868","DOI":"10.1056\/NEJMoa2035790","article-title":"100,000 Genomes pilot on rare-disease diagnosis in health care \u2013 preliminary report","volume":"385","author":"Smedley","year":"2021","journal-title":"N Engl J Med"},{"key":"2023120923273980800_btad716-B26","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.ajhg.2018.05.010","article-title":"Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes","volume":"103","author":"Son","year":"2018","journal-title":"Am J Hum Genet"},{"key":"2023120923273980800_btad716-B27","doi-asserted-by":"crossref","first-page":"bau045","DOI":"10.1093\/database\/bau045","article-title":"Automated semantic annotation of rare disease cases: a case study","volume":"2014","author":"Taboada","year":"2014","journal-title":"Database"},{"key":"2023120923273980800_btad716-B28","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1016\/j.ymgme.2015.11.003","article-title":"Undiagnosed diseases network international (UDNI): white paper for global actions to meet patient needs","volume":"116","author":"Taruscio","year":"2015","journal-title":"Mol Genet Metab"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad716\/53792331\/btad716.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/12\/btad716\/54194405\/btad716.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/12\/btad716\/54194405\/btad716.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,10]],"date-time":"2023-12-10T00:06:49Z","timestamp":1702166809000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad716\/7450067"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2023,11,24]]},"references-count":28,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2023,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad716","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,12,1]]},"published":{"date-parts":[[2023,11,24]]},"article-number":"btad716"}}