{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T15:14:27Z","timestamp":1776784467144,"version":"3.51.2"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations.<\/jats:p>\n               <jats:p>Results: We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder\u2014a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMU's performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases.<\/jats:p>\n               <jats:p>Discussion: Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMU's retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles.<\/jats:p>\n               <jats:p>Availability: Freely available at: http:\/\/bioinf.umbc.edu\/EMU\/ftp.<\/jats:p>\n               <jats:p>Contact: \u00a0mkann@umbc.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq667","type":"journal-article","created":{"date-parts":[[2010,12,8]],"date-time":"2010-12-08T01:46:45Z","timestamp":1291772805000},"page":"408-415","source":"Crossref","is-referenced-by-count":89,"title":["Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature"],"prefix":"10.1093","volume":"27","author":[{"given":"Emily","family":"Doughty","sequence":"first","affiliation":[{"name":"1 University of Maryland, Baltimore County, Baltimore, MD 21250, 2Division of Imaging and Applied Mathematics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993 and 3National Library of Medicine, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Attila","family":"Kertesz-Farkas","sequence":"additional","affiliation":[{"name":"1 University of Maryland, Baltimore County, Baltimore, MD 21250, 2Division of Imaging and Applied Mathematics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993 and 3National Library of Medicine, Bethesda, MD 20894, USA"},{"name":"1 University of Maryland, Baltimore County, Baltimore, MD 21250, 2Division of Imaging and Applied Mathematics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993 and 3National Library of Medicine, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Olivier","family":"Bodenreider","sequence":"additional","affiliation":[{"name":"1 University of Maryland, Baltimore County, Baltimore, MD 21250, 2Division of Imaging and Applied Mathematics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993 and 3National Library of Medicine, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gary","family":"Thompson","sequence":"additional","affiliation":[{"name":"1 University of Maryland, Baltimore County, Baltimore, MD 21250, 2Division of Imaging and Applied Mathematics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993 and 3National Library of Medicine, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Asa","family":"Adadey","sequence":"additional","affiliation":[{"name":"1 University of Maryland, Baltimore County, Baltimore, MD 21250, 2Division of Imaging and Applied Mathematics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993 and 3National Library of Medicine, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Peterson","sequence":"additional","affiliation":[{"name":"1 University of Maryland, Baltimore County, Baltimore, MD 21250, 2Division of Imaging and Applied Mathematics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993 and 3National Library of Medicine, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maricel G.","family":"Kann","sequence":"additional","affiliation":[{"name":"1 University of Maryland, Baltimore County, Baltimore, MD 21250, 2Division of Imaging and Applied Mathematics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993 and 3National Library of Medicine, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2010,12,7]]},"reference":[{"key":"2023012511540154600_B1","doi-asserted-by":"crossref","first-page":"D793","DOI":"10.1093\/nar\/gkn665","article-title":"McKusick's online Mendelian inheritance in man (OMIM)","volume":"37","author":"Amberger","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012511540154600_B2","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program","author":"Aronson","year":"2001","journal-title":"Proc. AMIA Symp."},{"key":"2023012511540154600_B3","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1007\/s10796-006-6103-2","article-title":"Mutation mining\u2013a prospector's tale","volume":"8","author":"Baker","year":"2006","journal-title":"Information Systems Frontiers"},{"key":"2023012511540154600_B4","doi-asserted-by":"crossref","first-page":"D26","DOI":"10.1093\/nar\/gkn723","article-title":"GenBank","volume":"37","author":"Benson","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012511540154600_B5","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The unified medical language system (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012511540154600_B6","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1016\/j.jbi.2003.11.002","article-title":"Exploring semantic groups through visual approaches","volume":"36","author":"Bodenreider","year":"2003","journal-title":"J. Biomed. Inform."},{"key":"2023012511540154600_B7","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/nar\/gkg095","article-title":"The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003","volume":"31","author":"Boeckmann","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023012511540154600_B8","doi-asserted-by":"crossref","first-page":"2567","DOI":"10.1093\/bioinformatics\/btl421","article-title":"OSIRIS: a tool for retrieving literature about sequence variants","volume":"22","author":"Bonis","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012511540154600_B9","first-page":"330","article-title":"The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community","volume":"129","author":"caBIG","year":"2007","journal-title":"Stud. Health Technol. Inform."},{"key":"2023012511540154600_B10","doi-asserted-by":"crossref","first-page":"1862","DOI":"10.1093\/bioinformatics\/btm235","article-title":"MutationFinder: a high-performance system for extracting point mutation mentions from text","volume":"23","author":"Caporaso","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012511540154600_B11","doi-asserted-by":"crossref","first-page":"680","DOI":"10.1101\/gr.217702","article-title":"Time for a unified system of mutation description and reporting: a review of locus-specific mutation databases","volume":"12","author":"Claustres","year":"2002","journal-title":"Genome Res."},{"key":"2023012511540154600_B12","doi-asserted-by":"crossref","first-page":"1261","DOI":"10.1142\/S021972000700317X","article-title":"Application of automatic mutation-gene pair extraction to diseases","volume":"5","author":"Erdogmus","year":"2007","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023012511540154600_B13","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1002\/(SICI)1097-0045(199603)28:3<162::AID-PROS3>3.0.CO;2-H","article-title":"Low incidence of androgen receptor gene mutations in human prostatic tumors using single strand conformation polymorphism analysis","volume":"28","author":"Evans","year":"1996","journal-title":"Prostate"},{"issue":"Suppl 2","key":"2023012511540154600_B14","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1186\/1471-2105-10-S2-S6","article-title":"Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text","volume":"10","author":"Garten","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012511540154600_B15","doi-asserted-by":"crossref","first-page":"10489","DOI":"10.1074\/jbc.M109604200","article-title":"Insulin-like growth factor (IGF)-binding protein-3 mutants that do not bind IGF-I or IGF-II stimulate apoptosis in human prostate cancer cells","volume":"277","author":"Hong","year":"2002","journal-title":"J. Biol. Chem."},{"key":"2023012511540154600_B16","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1093\/bioinformatics\/btg449","article-title":"Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors","volume":"20","author":"Horn","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012511540154600_B17","doi-asserted-by":"crossref","first-page":"506","DOI":"10.1097\/01.ju.0000129242.88182.e1","article-title":"Kruppel-like factor 6 germ-line mutations are infrequent in Finnish hereditary prostate cancer","volume":"172","author":"Koivisto","year":"2004","journal-title":"J. Urol."},{"key":"2023012511540154600_B18","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1016\/j.jbi.2004.08.004","article-title":"Term identification in the biomedical literature","volume":"37","author":"Krauthammer","year":"2004","journal-title":"J. Biomed. Inform."},{"key":"2023012511540154600_B19","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1002\/humu.21317","article-title":"Novel tools for extraction and validation of disease-related mutations applied to fabry disease","volume":"31","author":"Kuipers","year":"2010","journal-title":"Hum. Mutat."},{"key":"2023012511540154600_B20","first-page":"652","article-title":"BANNER: an executable survey of advances in biomedical named entity recognition","volume":"13","author":"Leaman","year":"2008","journal-title":"Pac. Symp. Biocomput."},{"key":"2023012511540154600_B21","doi-asserted-by":"crossref","first-page":"e16","DOI":"10.1371\/journal.pcbi.0030016","article-title":"Automatic extraction of protein point mutations using a graph bigram association","volume":"3","author":"Lee","year":"2007","journal-title":"PLoS Comput. Biol."},{"key":"2023012511540154600_B22","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1002\/cfg.255","article-title":"An upper-level ontology for the biomedical domain","volume":"4","author":"McCray","year":"2003","journal-title":"Comp. Funct. Genomics"},{"key":"2023012511540154600_B23","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1186\/1471-2105-11-157","article-title":"Moara: a Java library for extracting and normalizing gene and protein mentions","volume":"11","author":"Neves","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012511540154600_B24","first-page":"121","article-title":"Named entity recognition","volume-title":"Text Mining for Biology and Biomedicine.","author":"Park","year":"2006"},{"key":"2023012511540154600_B25","doi-asserted-by":"crossref","first-page":"D61","DOI":"10.1093\/nar\/gkl842","article-title":"NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"35","author":"Pruitt","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012511540154600_B26","doi-asserted-by":"crossref","first-page":"7289","DOI":"10.1021\/bi00419a017","article-title":"Structure-function studies of murine epidermal growth factor: expression and site-directed mutagenesis of epidermal growth factor gene","volume":"27","author":"Ray","year":"1988","journal-title":"Biochemistry"},{"key":"2023012511540154600_B27","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1093\/nar\/gkh162","article-title":"Automatic extraction of mutations from Medline and cross-validation with OMIM","volume":"32","author":"Rebholz-Schuhmann","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012511540154600_B28","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012511540154600_B29","doi-asserted-by":"crossref","first-page":"1124","DOI":"10.1093\/bioinformatics\/18.8.1124","article-title":"Tagging gene and protein names in biomedical text","volume":"18","author":"Tanabe","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012511540154600_B30","doi-asserted-by":"crossref","first-page":"820","DOI":"10.1016\/S0006-291X(02)02004-1","article-title":"Polymorphisms of the CYP1B1 gene have higher risk for prostate cancer","volume":"296","author":"Tanaka","year":"2002","journal-title":"Biochem. Biophys. Res. Commun."},{"key":"2023012511540154600_B31","doi-asserted-by":"crossref","first-page":"815","DOI":"10.1093\/bioinformatics\/btp071","article-title":"High-performance gene name normalization with GeNo","volume":"25","author":"Wermter","year":"2009","journal-title":"Bioinformatics"},{"issue":"Suppl 8","key":"2023012511540154600_B32","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-10-S8-S2","article-title":"EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts","volume":"10","author":"Yeniterzi","year":"2009","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/3\/408\/48863119\/bioinformatics_27_3_408.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/3\/408\/48863119\/bioinformatics_27_3_408.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T12:02:32Z","timestamp":1674648152000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/3\/408\/321135"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,12,7]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2011,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq667","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,2,1]]},"published":{"date-parts":[[2010,12,7]]}}}