{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T06:09:55Z","timestamp":1775282995571,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":3015,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools.<\/jats:p>\n               <jats:p>Results: We present an IUPAC name recognizer with an F1 measure of 85.6% on a MEDLINE corpus. The evaluation of different CRF orders and offset conjunction orders demonstrates the importance of these parameters. An evaluation of hand-selected patent sections containing large enumerations and terms with mixed nomenclature shows a good performance on these cases (F1 measure 81.5%). Remaining recognition problems are to detect correct borders of the typically long terms, especially when occurring in parentheses or enumerations. We demonstrate the scalability of our implementation by providing results from a full MEDLINE run.<\/jats:p>\n               <jats:p>Availability: We plan to publish the corpora, annotation guideline as well as the conditional random field model as a UIMA component.<\/jats:p>\n               <jats:p>Contact: \u00a0roman.klinger@scai.fraunhofer.de<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn181","type":"journal-article","created":{"date-parts":[[2008,6,27]],"date-time":"2008-06-27T07:43:13Z","timestamp":1214552593000},"page":"i268-i276","source":"Crossref","is-referenced-by-count":99,"title":["Detection of IUPAC and IUPAC-like chemical names"],"prefix":"10.1093","volume":"24","author":[{"given":"Roman","family":"Klinger","sequence":"first","affiliation":[{"name":"1 Fraunhofer Institute Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53574 Sankt Augustin and 2Bonn-Aachen International Center for Information Technology (B-IT), Department of Applied Life Science Informatics, Dahlmannstrasse 2, D-53113 Bonn, Germany"}]},{"given":"Corinna","family":"Kol\u00e1\u0159ik","sequence":"additional","affiliation":[{"name":"1 Fraunhofer Institute Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53574 Sankt Augustin and 2Bonn-Aachen International Center for Information Technology (B-IT), Department of Applied Life Science Informatics, Dahlmannstrasse 2, D-53113 Bonn, Germany"},{"name":"1 Fraunhofer Institute Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53574 Sankt Augustin and 2Bonn-Aachen International Center for Information Technology (B-IT), Department of Applied Life Science Informatics, Dahlmannstrasse 2, D-53113 Bonn, Germany"}]},{"given":"Juliane","family":"Fluck","sequence":"additional","affiliation":[{"name":"1 Fraunhofer Institute Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53574 Sankt Augustin and 2Bonn-Aachen International Center for Information Technology (B-IT), Department of Applied Life Science Informatics, Dahlmannstrasse 2, D-53113 Bonn, Germany"}]},{"given":"Martin","family":"Hofmann-Apitius","sequence":"additional","affiliation":[{"name":"1 Fraunhofer Institute Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53574 Sankt Augustin and 2Bonn-Aachen International Center for Information Technology (B-IT), Department of Applied Life Science Informatics, Dahlmannstrasse 2, D-53113 Bonn, Germany"},{"name":"1 Fraunhofer Institute Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53574 Sankt Augustin and 2Bonn-Aachen International Center for Information Technology (B-IT), Department of Applied Life Science Informatics, Dahlmannstrasse 2, D-53113 Bonn, Germany"}]},{"given":"Christoph M.","family":"Friedrich","sequence":"additional","affiliation":[{"name":"1 Fraunhofer Institute Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53574 Sankt Augustin and 2Bonn-Aachen International Center for Information Technology (B-IT), Department of Applied Life Science Informatics, Dahlmannstrasse 2, D-53113 Bonn, Germany"}]}],"member":"286","published-online":{"date-parts":[[2008,7,1]]},"reference":[{"key":"2023020210405814700_B1","unstructured":"ACDLabs\n          ACDName. Software\n          2007\n          Available at http:\/\/www.acdlabs.com\/products\/name_lab\/name\/(last accessed date December 18, 2007)"},{"key":"2023020210405814700_B2","first-page":"4609","article-title":"Reconstruction of chemical molecules from images","author":"Algorri","year":"2007"},{"key":"2023020210405814700_B3","first-page":"1095","article-title":"Identifying and classifying terms in the life sciences: the case of chemical terminology","volume-title":"Proceedings of the Fifth Language Resources and Evaluation Conference","author":"Anstein","year":"2006"},{"key":"2023020210405814700_B4","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop","year":"2006"},{"key":"2023020210405814700_B5","unstructured":"CambridgeSoft\n          Name=struct. Software\n          2007\n          Available at http:\/\/www.cambridgesoft.com\/databases\/details\/?db=16(last accessed date December 18, 2007)"},{"key":"2023020210405814700_B6","unstructured":"ChemAxon\n          Marvin. Software\n          2007\n          Available at http:\/\/www.chemaxon.com\/marvin\/(last accessed on January 11, 2008)"},{"key":"2023020210405814700_B7","first-page":"107","article-title":"High-throughput identification of chemistry in life science texts","volume-title":"2nd International Symposium on Computational Life Science (CompLife 2006, LNBI 4216)","author":"Corbett","year":"2006"},{"key":"2023020210405814700_B8","doi-asserted-by":"crossref","first-page":"57","DOI":"10.3115\/1572392.1572403","article-title":"Annotation of chemical named entities","volume-title":"BioNLP 2007: Biological, Translational, and Clinical Language Processing","author":"Corbett","year":"2007"},{"key":"2023020210405814700_B9","unstructured":"Corbett\n              P\n            \n          \n          Oscar3. Software\n          2007\n          Available at http:\/\/oscar3-chem.sourceforge.net, (last accessed date December 13, 2007)"},{"key":"2023020210405814700_B10","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-4541-9","volume-title":"An Introduction to the Bootstrap","author":"Efron","year":"1993"},{"key":"2023020210405814700_B11","unstructured":"Eigner-Pitto\n              V\n            \n            \u00a0et al.\n          Mining, storage, retrieval: the challenge of integrating chemoinformatics with chemical structure recognition in text and images\n          Talk on 5th Fraunhofer Symposium on Text Mining in the Life Sciences\n          2007\n          Available at http:\/\/www.scai.fhg.de\/tms07.html(last accessed date December 18, 2007)"},{"key":"2023020210405814700_B12","doi-asserted-by":"crossref","first-page":"915","DOI":"10.3390\/11110915","article-title":"Improving the quality of published chemical names with nomenclature software","volume":"11","author":"Eller","year":"2006","journal-title":"Molecules"},{"key":"2023020210405814700_B13","first-page":"85","article-title":"Biomedical and chemical named entity recognition with conditional random fields: the advantage of dictionary features","volume-title":"Proceedings of the Second International Symposium on Semantic Mining in Biomedicine (SMBM 2006)","author":"Friedrich","year":"2006"},{"key":"2023020210405814700_B14","doi-asserted-by":"crossref","first-page":"2424","DOI":"10.1021\/jm960724e","article-title":"Synthesis of racemic 6,7,8,9-tetrahydro-1h-1-benzazepine-2,5-diones as antagonists of n-methyl-d-aspartate (nmda) and \u03b1-amino-3-hydroxy-5-methylisoxazole-4- propionic acid (ampa) receptors","volume":"40","author":"Guzikowski","year":"1997","journal-title":"J. Med. Chem"},{"issue":"(S14)","key":"2023020210405814700_B15","article-title":"ProMiner: rule-based protein and gene entity recognition","volume":"6","author":"Hanisch","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020210405814700_B16","volume-title":"Proceedings of the Second BioCreative","author":"Hirschman","year":"2007"},{"key":"2023020210405814700_B17","unstructured":"Humana\n              I\n            \n          \n          Top 50 drugs brand-name prescribed\n          2005\n          Available at http:\/\/apps.humana.com\/prescription_benefits_and_services\/includes\/Top50BrandDrugs.pdf(last accessed date December 14, 2007"},{"key":"2023020210405814700_B18","doi-asserted-by":"crossref","first-page":"544","DOI":"10.1021\/ci980324v","article-title":"The extraction of information from the text of chemical patents. 1. identification of specific chemical names","volume":"38","author":"Kemp","year":"1998","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023020210405814700_B19","article-title":"Classical Probabilistic Models and Conditional Random Fields","volume-title":"Technical Report TR07-2-013","author":"Klinger","year":"2007"},{"key":"2023020210405814700_B20","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1142\/S0219720007003156","article-title":"Identifying gene specific variations in biomedical text","volume":"5","author":"Klinger","year":"2007","journal-title":"J. Bioinform. Computat. Biol"},{"key":"2023020210405814700_B21","first-page":"89","article-title":"Named entity recognition with combinations of conditional random fields","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop","author":"Klinger","year":"2007"},{"key":"2023020210405814700_B22","doi-asserted-by":"crossref","first-page":"i264","DOI":"10.1093\/bioinformatics\/btm196","article-title":"Identification of new drug classification terms in textual resources","volume":"23","author":"Kol\u00e1\u0159ik","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020210405814700_B23","article-title":"Chemical names: terminological resources and corpora annotation","volume-title":"Workshop on Building and evaluating resources for biomedical text mining (6th edition of the Language Resources and Evaluation Conference)","author":"Kol\u00e1\u0159ik","year":"2008"},{"key":"2023020210405814700_B24","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1109\/18.910572","article-title":"Factor graphs and the sum-product algorithm","volume":"47","author":"Kschischang","year":"2001","journal-title":"IEEE T. Inform. Theory"},{"key":"2023020210405814700_B25","first-page":"282","article-title":"Conditional random fields: probabilistic models for segmenting and labeling sequence data","volume-title":"Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001)","author":"Lafferty","year":"2001"},{"key":"2023020210405814700_B26","unstructured":"McCallum\n              AK\n            \n          \n          MALLET: a machine learning for language toolkit\n          2002\n          Available at http:\/\/mallet.cs.umass.edu(last accessed May 5, 2008)"},{"key":"2023020210405814700_B27","doi-asserted-by":"crossref","first-page":"3249","DOI":"10.1093\/bioinformatics\/bth350","article-title":"An entity tagger for recognizing acquired genomic variations in cancer literature","volume":"20","author":"McDonald","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020210405814700_B28","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1186\/1471-2105-6-S1-S6","article-title":". Identifying gene and protein mentions in text using conditional random fields","volume":"6","author":"McDonald","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020210405814700_B29","volume-title":"Compendium of Chemical Terminology \u2013 the Gold Book","author":"McNaught","year":"1997"},{"key":"2023020210405814700_B30","first-page":"135","article-title":"Chemical markup language: a simple introduction to structured documents","volume":"2","author":"Murray-Rust","year":"1997","journal-title":"World Wide Web J"},{"key":"2023020210405814700_B31","first-page":"427","article-title":"A biological named entity recognizer","volume-title":"Proceedings of the Pacific Symposium on Biocomputing","author":"Narayanaswamy","year":"2003"},{"key":"2023020210405814700_B32","unstructured":"NCBI\n          Pubchem data. Online\n          2007\n          Available at ftp:\/\/ftp.ncbi.nlm.nih.gov\/pubchem\/Compound\/CURRENT-Full\/XML\/(last accessed date September 5, 2007)"},{"key":"2023020210405814700_B33","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1090\/S0025-5718-1980-0572855-7","article-title":"Updating Quasi-Newton matrices with limited storage","volume":"35","author":"Nocedal","year":"1980","journal-title":"Math. Comput"},{"key":"2023020210405814700_B34","unstructured":"OpenEye\n          Lexichem. Software\n          2007\n          Available at http:\/\/www.eyesopen.com\/products\/toolkits\/lexichem.html(last accessed date December 18, 2007)"},{"key":"2023020210405814700_B35","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden Markov models and selected applications in speech recognition","volume":"77","author":"Rabiner","year":"1989","journal-title":"Proc. IEEE"},{"key":"2023020210405814700_B36","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1093\/bioinformatics\/btl302","article-title":"Ebimed \u2013 text crunching to gather facts for proteins from medline. newblock","volume":"23","author":"Rebholz-Schuhmann","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020210405814700_B37","first-page":"111","article-title":"Understanding chemical terminology","volume":"12","author":"Reyle","year":"2006","journal-title":"Terminology"},{"key":"2023020210405814700_B38","first-page":"304","article-title":"Mining patents using molecular similarity search","volume":"Vol. 12","author":"Rhodes","year":"2007","journal-title":"Proceedings of the Pacific Symposium on Biocomputing"},{"key":"2023020210405814700_B39","volume-title":"Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond: Support Vector Machines, Regularization, Optimization and Beyond (Adaptive Computation and Machine Learning)","author":"Sch\u00f6lkopf","year":"2002"},{"key":"2023020210405814700_B40","doi-asserted-by":"crossref","first-page":"3191","DOI":"10.1093\/bioinformatics\/bti475","article-title":"ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text","volume":"21","author":"Settles","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020210405814700_B41","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1021\/ci025584y","article-title":"The chemistry development kit (cdk): an open-source java library for chemo- and bioinformatics","volume":"43","author":"Steinbeck","year":"2003","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023020210405814700_B42","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1145\/1242572.1242607","article-title":"Extraction and search of chemical formulae in text documents on the web","volume-title":"Proceedings of the International World Wide Web Conference","author":"Sun","year":"2007"},{"key":"2023020210405814700_B43","unstructured":"Tomanek\n              K\n            \n            \u00a0et al.\n          A reappriasal of sentence and token splitting for life science documents\n          Proceedings of the 12th World Congress on Medical Informatics\n          2007\n          Available at http:\/\/www.julielab.de\/(last accessed date December 16, 2007)"},{"key":"2023020210405814700_B44","unstructured":"U.S. National Library of Medicine\n          Medlineplus\n          2007\n          Available at http:\/\/www.nlm.nih.gov\/medlineplus\/druginformation.html(last accessed date September 1, 2008)"},{"key":"2023020210405814700_B45","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023020210405814700_B46","first-page":"7","article-title":"Biocreative 2. gene mention task","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop","author":"Wilbur","year":"2007"},{"key":"2023020210405814700_B47","doi-asserted-by":"crossref","first-page":"D668","DOI":"10.1093\/nar\/gkj067","article-title":"Drugbank: a comprehensive resource for in silico drug discovery and exploration","volume":"34","author":"Wishart","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020210405814700_B48","article-title":"Combating illiteracy in chemistry: towards computer-based chemical structure reconstruction","volume-title":"Proceedings of the 1st German Conference on Chemoinformatics","author":"Zimmermann","year":"2005"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/i268\/49053818\/bioinformatics_24_13_i268.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/i268\/49053818\/bioinformatics_24_13_i268.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T12:26:34Z","timestamp":1675340794000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/13\/i268\/235854"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,7,1]]},"references-count":48,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2008,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn181","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,7,1]]},"published":{"date-parts":[[2008,7,1]]}}}