{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T14:52:34Z","timestamp":1774536754126,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The accurate identification of chemicals in text is important for many applications, including computer-assisted reconstruction of metabolic networks or retrieval of information about substances in drug development. But due to the diversity of naming conventions and traditions for such molecules, this task is highly complex and should be supported by computational tools.<\/jats:p>\n               <jats:p>Results: We present ChemSpot, a named entity recognition (NER) tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and International Union of Pure and Applied Chemistry entities. Since the different classes of relevant entities have rather different naming characteristics, ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary. It achieves an F1 measure of 68.1% on the SCAI corpus, outperforming the only other freely available chemical NER tool, OSCAR4, by 10.8 percentage points.<\/jats:p>\n               <jats:p>Availability: ChemSpot is freely available at: http:\/\/www.informatik.hu-berlin.de\/wbi\/resources<\/jats:p>\n               <jats:p>Contact: \u00a0leser@informatik.hu-berlin.de<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts183","type":"journal-article","created":{"date-parts":[[2012,4,13]],"date-time":"2012-04-13T02:00:35Z","timestamp":1334282435000},"page":"1633-1640","source":"Crossref","is-referenced-by-count":212,"title":["ChemSpot: a hybrid system for chemical named entity recognition"],"prefix":"10.1093","volume":"28","author":[{"given":"Tim","family":"Rockt\u00e4schel","sequence":"first","affiliation":[{"name":"Department of Computer Science, Humboldt-Universit\u00e4t zu Berlin, Rudower Chaussee 25, 12489 Berlin, Germany"}]},{"given":"Michael","family":"Weidlich","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Humboldt-Universit\u00e4t zu Berlin, Rudower Chaussee 25, 12489 Berlin, Germany"}]},{"given":"Ulf","family":"Leser","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Humboldt-Universit\u00e4t zu Berlin, Rudower Chaussee 25, 12489 Berlin, Germany"}]}],"member":"286","published-online":{"date-parts":[[2012,4,12]]},"reference":[{"key":"2023012512341841500_B1","first-page":"556","article-title":"Assisted curation: does text mining really help","volume-title":"Proc. of the Pacific Symposium on Biocomputing","author":"Alex","year":"2008"},{"key":"2023012512341841500_B2","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1186\/1471-2105-10-28","article-title":"Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy","volume":"10","author":"Alexopoulou","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012512341841500_B3","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1016\/j.tibtech.2006.10.002","article-title":"Text mining and its potential applications in systems biology","volume":"24","author":"Ananiadou","year":"2006","journal-title":"Trends Biotechnol."},{"key":"2023012512341841500_B4","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program","volume-title":"Proc. of the AMIA Symposium","author":"Aronson","year":"2001"},{"key":"2023012512341841500_B5","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/S1359-6446(05)03682-2","article-title":"Mining chemical structural information from the drug literature","volume":"11","author":"Banville","year":"2006","journal-title":"Drug Discov. Today"},{"key":"2023012512341841500_B6","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1111\/j.1365-2796.2011.02494.x","article-title":"Using the reconstructed genome-scale human metabolic network to study physiology and pathology","volume":"271","author":"Bordbar","year":"2011","journal-title":"J. Intern. Med"},{"key":"2023012512341841500_B7","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1021\/ci990062c","article-title":"Name=struct: a practical approach to the sorry state of real-life chemical nomenclature","volume":"39","author":"Brecher","year":"1999","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"2023012512341841500_B8","first-page":"65","article-title":"Automatically adapting an NLP core engine to the biology domain","volume-title":"Proc. of the Joint BioLINK-Bio-Ontologies Meeting","author":"Buyko","year":"2006"},{"key":"2023012512341841500_B9","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1093\/bib\/6.1.57","article-title":"A survey of current work in biomedical text mining","volume":"6","author":"Cohen","year":"2005","journal-title":"Brief. Bioinformatics."},{"key":"2023012512341841500_B10","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-9-S11-S4","article-title":"Cascaded classifiers for confidence-based chemical named entity recognition","volume":"9","author":"Corbett","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012512341841500_B11","first-page":"107","article-title":"High-throughput identification of chemistry in life science texts","volume-title":"Proc. of 2nd International Symposium on Computational Life Science","author":"Corbett","year":"2006"},{"key":"2023012512341841500_B12","doi-asserted-by":"crossref","first-page":"1777","DOI":"10.1073\/pnas.0610772104","article-title":"Global reconstruction of the human metabolic network based on genomic and bibliomic data","volume":"104","author":"Duarte","year":"2007","journal-title":"Proc. of the National Academy of Sciences"},{"key":"2023012512341841500_B13","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.drudis.2006.02.011","article-title":"Status of text-mining techniques applied to biomedical text","volume":"11","author":"Erhardt","year":"2006","journal-title":"Drug Discov. Today"},{"key":"2023012512341841500_B14","first-page":"149","article-title":"Prominer: recognition of human gene and protein names using regularly updated dictionaries","volume-title":"Proc. of the Second BioCreAtIvE Challenge Workshop","author":"Fluck","year":"2007"},{"key":"2023012512341841500_B15","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1186\/1471-2105-11-85","article-title":"LINNAEUS: a species name identification system for biomedical literature","volume":"11","author":"Gerner","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012512341841500_B16","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/msb.2010.62","article-title":"Hepatonet1: a comprehensive metabolic reconstruction of the human hepatocyte for the analysis of liver physiology","volume":"6","author":"Gille","year":"2010","journal-title":"Mol. Syst. Biol."},{"key":"2023012512341841500_B17","doi-asserted-by":"crossref","first-page":"2769","DOI":"10.1093\/bioinformatics\/btr455","article-title":"The GNAT library for local and remote gene mention normalization","volume":"27","author":"Hakenberg","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012512341841500_B18","doi-asserted-by":"crossref","first-page":"2983","DOI":"10.1093\/bioinformatics\/btp535","article-title":"A dictionary to identify small molecules and drugs in free text","volume":"25","author":"Hettne","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512341841500_B19","first-page":"3","article-title":"Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining","volume":"2","author":"Hettne","year":"2010","journal-title":"J. Chem. Inf."},{"key":"2023012512341841500_B20","first-page":"41","article-title":"Oscar4: a flexible architecture for chemical text-mining","volume":"3","author":"Jessop","year":"2011","journal-title":"J. Chem. Inf"},{"key":"2023012512341841500_B21","volume-title":"Classical probabilistic models and conditional random fields.","author":"Klinger","year":"2007"},{"key":"2023012512341841500_B22","doi-asserted-by":"crossref","first-page":"i268","DOI":"10.1093\/bioinformatics\/btn181","article-title":"Detection of IUPAC and IUPAC-like chemical names","volume":"24","author":"Klinger","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512341841500_B23","first-page":"51","article-title":"Chemical names: terminological resources and corpora annotation","volume-title":"Proc. of the Workshop on Building and Evaluating Resources for Biomedical Text Mining","author":"Kol\u00e1\u0159ik","year":"2008"},{"key":"2023012512341841500_B24","doi-asserted-by":"crossref","first-page":"e20181","DOI":"10.1371\/journal.pone.0020181","article-title":"Using workflows to explore and optimise named entity recognition for chemistry","volume":"6","author":"Kolluru","year":"2011","journal-title":"PLoS ONE"},{"key":"2023012512341841500_B25","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/gb-2008-9-s2-s1","article-title":"Evaluation of text-mining systems for biology: overview of the second biocreative community challenge","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol."},{"key":"2023012512341841500_B26","article-title":"Conditional random fields: probabilistic models for segmenting and labeling sequence data","volume-title":"Proc. of ICML-2001","author":"Lafferty","year":"2001"},{"key":"2023012512341841500_B27","first-page":"652","article-title":"BANNER: an executable survey of advances in biomedical named entity recognition","volume-title":"Proc. of the Pacific Symposium on Biocomputing","author":"Leaman","year":"2008"},{"key":"2023012512341841500_B28","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1038\/msb4100177","article-title":"The edinburgh human metabolic network reconstruction and its functional analysis","volume":"3","author":"Ma","year":"2007","journal-title":"Mol. Syst. Biol."},{"key":"2023012512341841500_B29","author":"McCallum","year":"2002","journal-title":"MALLET: A Machine Learning for Language Toolkit."},{"key":"2023012512341841500_B30","first-page":"403","article-title":"Efficiently inducing features of conditional random fields","volume-title":"Proc. of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03)","author":"McCallum","year":"2003"},{"key":"2023012512341841500_B31","first-page":"591","article-title":"Maximum entropy Markov models for information extraction and segmentation","volume-title":"Proc. of ICML-2000","author":"McCallum","year":"2000"},{"key":"2023012512341841500_B32","first-page":"131","article-title":"Peregrine: lightweight gene name normalization by dictionary lookup","volume-title":"Proc. of the Second BioCreative Challenge","author":"Schuemie","year":"2007"},{"key":"2023012512341841500_B33","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1016\/j.drudis.2008.06.001","article-title":"Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems","volume":"13","author":"Segura-Bedmar","year":"2008","journal-title":"Drug Discov. Today"},{"issue":"Suppl. 5","key":"2023012512341841500_B34","doi-asserted-by":"crossref","first-page":"P9","DOI":"10.1186\/1471-2105-11-S5-P9","article-title":"Extracting drug-drug interactions from biomedical texts","volume":"11","author":"Segura-Bedmar","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012512341841500_B35","doi-asserted-by":"crossref","first-page":"3191","DOI":"10.1093\/bioinformatics\/bti475","article-title":"ABNER: an open source tool for automatically tagging genes, proteins, and other entity names in text","volume":"21","author":"Settles","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512341841500_B36","first-page":"137","article-title":"GeneView gene-centric ranking of biomedical text","volume-title":"Proc. of the BioCreative III Workshop","author":"Thomas","year":"2010"},{"key":"2023012512341841500_B37","doi-asserted-by":"crossref","first-page":"e1000837","DOI":"10.1371\/journal.pcbi.1000837","article-title":"A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature","volume":"6","author":"Tikk","year":"2010","journal-title":"PLoS Comput. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/12\/1633\/48879885\/bioinformatics_28_12_1633.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/12\/1633\/48879885\/bioinformatics_28_12_1633.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T16:23:11Z","timestamp":1674663791000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/12\/1633\/266861"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4,12]]},"references-count":37,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2012,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts183","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,6,15]]},"published":{"date-parts":[[2012,4,12]]}}}