{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,9]],"date-time":"2025-11-09T07:42:48Z","timestamp":1762674168380},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2017,2,13]],"date-time":"2017-02-13T00:00:00Z","timestamp":1486944000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"German Federal Ministry for Education and Research"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The extraction of sequence variants from the literature remains an important task. Existing methods primarily target standard (ST) mutation mentions (e.g. \u2018E6V\u2019), leaving relevant mentions natural language (NL) largely untapped (e.g. \u2018glutamic acid was substituted by valine at residue 6\u2019).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduced three new corpora suggesting named-entity recognition (NER) to be more challenging than anticipated: 28\u201377% of all articles contained mentions only available in NL. Our new method nala captured NL and ST by combining conditional random fields with word embedding features learned unsupervised from the entire PubMed. In our hands, nala substantially outperformed the state-of-the-art. For instance, we compared all unique mentions in new discoveries correctly detected by any of three methods (SETH, tmVar, or nala). Neither SETH nor tmVar discovered anything missed by nala, while nala uniquely tagged 33% mentions. For NL mentions the corresponding value shot up to 100% nala-only.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>Source code, API and corpora freely available at: http:\/\/tagtog.net\/-corpora\/IDP4+.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx083","type":"journal-article","created":{"date-parts":[[2017,2,15]],"date-time":"2017-02-15T08:58:08Z","timestamp":1487149088000},"page":"1852-1858","source":"Crossref","is-referenced-by-count":14,"title":["<i>nala<\/i>: text mining natural language mutation mentions"],"prefix":"10.1093","volume":"33","author":[{"given":"Juan Miguel","family":"Cejuela","sequence":"first","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"},{"name":"TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aleksandar","family":"Bojchevski","sequence":"additional","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"},{"name":"TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carsten","family":"Uhlig","sequence":"additional","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rustem","family":"Bekmukhametov","sequence":"additional","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"},{"name":"Microsoft, WA, Bellevue, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanjeev","family":"Kumar Karn","sequence":"additional","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"},{"name":"Ludwig Maximilian University, 80538 Munich & Siemens AG, Corporate Technology, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shpend","family":"Mahmuti","sequence":"additional","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ashish","family":"Baghudana","sequence":"additional","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"},{"name":"BITS-Pilani K. K. Birla Goa Campus, Goa, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ankit","family":"Dubey","sequence":"additional","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"},{"name":"Concur (Germany) GmbH, Frankfurt am Main, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Venkata P","family":"Satagopam","sequence":"additional","affiliation":[{"name":"Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Burkhard","family":"Rost","sequence":"additional","affiliation":[{"name":"TUM, Department of Informatics, Bioinformatics & Computational Biology \u2013 i12, Garching, Munich, Germany"},{"name":"Institute of Advanced Study (TUM-IAS) & Institute for Food and Plant Sciences WZW \u2013 Weihenstephan & New York Consortium on Membrane Protein Structure (NYCOMPS) & Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2017,2,13]]},"reference":[{"key":"2023020205315609600_btx083-B1","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/978-1-4939-3167-5_2","article-title":"UniProtKB\/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view","volume":"1374","author":"Boutet","year":"2016","journal-title":"Methods Mol. Biol"},{"key":"2023020205315609600_btx083-B2","doi-asserted-by":"crossref","first-page":"1862","DOI":"10.1093\/bioinformatics\/btm235","article-title":"MutationFinder: a high-performance system for extracting point mutation mentions from text","volume":"23","author":"Caporaso","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020205315609600_btx083-B3","author":"Caporaso","year":"2007"},{"key":"2023020205315609600_btx083-B4","doi-asserted-by":"crossref","first-page":"bau033","DOI":"10.1093\/database\/bau033","article-title":"tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles","volume":"2014","author":"Cejuela","year":"2014","journal-title":"Database (Oxford)"},{"key":"2023020205315609600_btx083-B5","doi-asserted-by":"crossref","first-page":"e1003951","DOI":"10.1371\/journal.pcbi.1003951","article-title":"The HIV mutation browser: a resource for human immunodeficiency virus mutagenesis and polymorphism data","volume":"10","author":"Davey","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023020205315609600_btx083-B6","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1002\/humu.22981","article-title":"HGVS recommendations for the description of sequence variants: 2016 update","volume":"37","author":"den Dunnen","year":"2016","journal-title":"Hum. Mutat"},{"key":"2023020205315609600_btx083-B7","author":"Guo","year":"2014"},{"key":"2023020205315609600_btx083-B8","doi-asserted-by":"crossref","first-page":"bau003.","DOI":"10.1093\/database\/bau003","article-title":"Literature mining of genetic variants for curation: quantifying the importance of supplementary material","volume":"2014","author":"Jimeno","year":"2014","journal-title":"Database (Oxford)"},{"key":"2023020205315609600_btx083-B9","doi-asserted-by":"crossref","first-page":"18.","DOI":"10.12688\/f1000research.3-18.v2","article-title":"Mutation extraction tools can be combined for robust recognition of genetic variants in the literature","volume":"3","author":"Jimeno","year":"2014","journal-title":"F1000Res"},{"key":"2023020205315609600_btx083-B10","doi-asserted-by":"crossref","first-page":"S8.","DOI":"10.1186\/gb-2008-9-s2-s8","article-title":"Linking genes to literature: text mining, information extraction, and retrieval applications for biology","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol"},{"key":"2023020205315609600_btx083-B11","author":"Lafferty","year":"2001"},{"key":"2023020205315609600_btx083-B12","doi-asserted-by":"crossref","first-page":"e0152725","DOI":"10.1371\/journal.pone.0152725","article-title":"DiMeX: a text mining system for mutation-disease association extraction","volume":"11","author":"Mahmood","year":"2016","journal-title":"PLoS One"},{"key":"2023020205315609600_btx083-B13","author":"Mikolov","year":"2013"},{"key":"2023020205315609600_btx083-B14","doi-asserted-by":"crossref","first-page":"S4.","DOI":"10.1186\/1471-2105-10-S8-S4","article-title":"Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb","volume":"10","author":"Nagel","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020205315609600_btx083-B15","author":"Passos","year":"2014"},{"key":"2023020205315609600_btx083-B16","doi-asserted-by":"crossref","first-page":"S2.","DOI":"10.1186\/2041-1480-3-S3-S2","article-title":"Literature mining of protein-residue associations with graph rules learned through distant supervision","volume":"3","author":"Ravikumar","year":"2012","journal-title":"J. Biomed. Seman"},{"key":"2023020205315609600_btx083-B17","doi-asserted-by":"crossref","DOI":"10.1186\/s12859-015-0609-x","article-title":"Text mining facilitates database curation \u2013 extraction of mutation-disease associations from Bio-medical literature","volume":"16","author":"Ravikumar","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023020205315609600_btx083-B18","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1016\/S0076-6879(96)66033-9","article-title":"PHD: predicting one-dimensional protein structure by profile-based neural networks","volume":"266","author":"Rost","year":"1996","journal-title":"Methods Enzymol"},{"key":"2023020205315609600_btx083-B19","doi-asserted-by":"crossref","first-page":"2637","DOI":"10.1007\/s00018-003-3114-8","article-title":"Automatic prediction of protein function","volume":"60","author":"Rost","year":"2003","journal-title":"Cell Mol. Life Sci"},{"key":"2023020205315609600_btx083-B20","doi-asserted-by":"crossref","first-page":"6504","DOI":"10.1073\/pnas.0701572104","article-title":"Prevalence of positive selection among nearly neutral amino acid replacements in Drosophila","volume":"104","author":"Sawyer","year":"2007","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023020205315609600_btx083-B21","first-page":"93","article-title":"Named entity recognition using word embedding as a feature","volume":"10","author":"Seok","year":"2016","journal-title":"Int. J. Softw. Eng. Appl"},{"key":"2023020205315609600_btx083-B22","author":"Settles","year":"2004"},{"key":"2023020205315609600_btx083-B23","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023020205315609600_btx083-B24","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1002\/humu.10212","article-title":"Human Gene Mutation Database (HGMD\u00ae): 2003 update","volume":"21","author":"Stenson","year":"2003","journal-title":"Hum. Mutat"},{"key":"2023020205315609600_btx083-B25","doi-asserted-by":"crossref","first-page":"240403","DOI":"10.1155\/2014\/240403","article-title":"Evaluating word representation features in biomedical named entity recognition tasks","volume":"2014","author":"Tang","year":"2014","journal-title":"Biomed. Res. Int"},{"key":"2023020205315609600_btx083-B26","doi-asserted-by":"crossref","first-page":"2883","DOI":"10.1093\/bioinformatics\/btw234","article-title":"SETH detects and normalizes genetic variants in text","volume":"32","author":"Thomas","year":"2016","journal-title":"Bioinformatics"},{"key":"2023020205315609600_btx083-B27","doi-asserted-by":"crossref","first-page":"D204","DOI":"10.1093\/nar\/gku989","article-title":"UniProt: a hub for protein information","volume":"43","author":"UniProt","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020205315609600_btx083-B28","doi-asserted-by":"crossref","first-page":"bat019","DOI":"10.1093\/database\/bat019","article-title":"Annotating the biomedical literature for the human variome","volume":"2013","author":"Verspoor","year":"2013","journal-title":"Database"},{"key":"2023020205315609600_btx083-B29","doi-asserted-by":"crossref","first-page":"e71711.","DOI":"10.1371\/journal.pone.0071711","article-title":"Mutationmapper: a tool to aid the mapping of protein mutation data","volume":"8","author":"Vohra","year":"2013","journal-title":"PLoS One"},{"key":"2023020205315609600_btx083-B30","doi-asserted-by":"crossref","first-page":"1433","DOI":"10.1093\/bioinformatics\/btt156","article-title":"tmVar: a text mining approach for extracting sequence variants in biomedical literature","volume":"29","author":"Wei","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020205315609600_btx083-B31","doi-asserted-by":"crossref","first-page":"918710.","DOI":"10.1155\/2015\/918710","article-title":"GNormPlus: an integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"Biomed Res. Int"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/12\/1852\/49039878\/bioinformatics_33_12_1852.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/12\/1852\/49039878\/bioinformatics_33_12_1852.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T05:35:48Z","timestamp":1675316148000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/12\/1852\/2991428"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2017,2,13]]},"references-count":31,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2017,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx083","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,6,15]]},"published":{"date-parts":[[2017,2,13]]}}}