{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T14:46:36Z","timestamp":1783089996636,"version":"3.54.6"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,8,17]],"date-time":"2021-08-17T00:00:00Z","timestamp":1629158400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,17]],"date-time":"2021-08-17T00:00:00Z","timestamp":1629158400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Carl-Zeiss-Foundation"},{"DOI":"10.13039\/100012957","name":"Friedrich-Schiller-Universit\u00e4t Jena","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100012957","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recognition (OCSR) tools were developed where the best performing OCSR tools are mostly rule-based. The DECIMER (Deep lEarning for Chemical ImagE Recognition) project was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution. Various current deep learning approaches were explored to seek a best-fitting solution to the problem. In a preliminary communication, we outlined the prospect of being able to predict SMILES encodings of chemical structure depictions with about 90% accuracy using a dataset of 50\u2013100\u00a0million molecules. In this article, the new DECIMER model is presented, a transformer-based network, which can predict SMILES with above 96% accuracy from depictions of chemical structures without stereochemical information and above 89% accuracy for depictions with stereochemical information.<\/jats:p>","DOI":"10.1186\/s13321-021-00538-8","type":"journal-article","created":{"date-parts":[[2021,8,17]],"date-time":"2021-08-17T05:02:52Z","timestamp":1629176572000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":65,"title":["DECIMER 1.0: deep learning for chemical image recognition using transformers"],"prefix":"10.1186","volume":"13","author":[{"given":"Kohulan","family":"Rajan","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Achim","family":"Zielesny","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6966-0814","authenticated-orcid":false,"given":"Christoph","family":"Steinbeck","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,8,17]]},"reference":[{"key":"538_CR1","doi-asserted-by":"publisher","first-page":"903","DOI":"10.4155\/fmc.10.191","volume":"2","author":"A Gaulton","year":"2010","unstructured":"Gaulton A, Overington JP (2010) Role of open chemical data in aiding drug discovery and design. Future Med Chem 2:903\u2013907 [cito:cites]","journal-title":"Future Med Chem"},{"key":"538_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1758-2946-3-1","volume":"3","author":"NM O\u2019Boyle","year":"2011","unstructured":"O\u2019Boyle NM, Guha R, Willighagen EL et al (2011) Open data, open source and open standards in chemistry: the blue obelisk five years on. J Cheminform 3:1\u201315 [cito:cites] [cito:agreesWith]","journal-title":"J Cheminform"},{"key":"538_CR3","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1007\/978-1-60761-931-4_2","volume-title":"Chemical library design","author":"JZ Zhou","year":"2011","unstructured":"Zhou JZ (2011) Chemoinformatics and library design. In: Zhou JZ (ed) Chemical library design. Humana Press, Totowa, pp 27\u201352 [cito:cites]"},{"key":"538_CR4","doi-asserted-by":"publisher","first-page":"1894","DOI":"10.1021\/acs.jcim.6b00207","volume":"56","author":"MC Swain","year":"2016","unstructured":"Swain MC, Cole JM (2016) ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J Chem Inf Model 56:1894\u20131904 [cito:cites]","journal-title":"J Chem Inf Model"},{"key":"538_CR5","doi-asserted-by":"publisher","first-page":"7673","DOI":"10.1021\/acs.chemrev.6b00851","volume":"117","author":"M Krallinger","year":"2017","unstructured":"Krallinger M, Rabal O, Louren\u00e7o A, Oyarzabal J, Valencia A (2017) Information retrieval and text mining technologies for chemistry. Chem Rev 117:7673\u20137761 [cito:cites]","journal-title":"Chem Rev"},{"key":"538_CR6","doi-asserted-by":"publisher","first-page":"S1","DOI":"10.1186\/1758-2946-7-S1-S1","volume":"7","author":"M Krallinger","year":"2015","unstructured":"Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A (2015) CHEMDNER: the drugs and chemical names extraction challenge. J Cheminform 7:S1[cito:cites]","journal-title":"J Cheminform"},{"key":"538_CR7","doi-asserted-by":"publisher","first-page":"2059","DOI":"10.1021\/acs.jcim.0c00042","volume":"60","author":"EJ Beard","year":"2020","unstructured":"Beard EJ, Cole JM (2020) ChemSchematicResolver: a toolkit to decode 2D chemical diagrams with labels and R-groups into annotated chemical named entities. J Chem Inf Model 60:2059\u20132072 [cito:cites]","journal-title":"J Chem Inf Model"},{"key":"538_CR8","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1186\/s13321-020-00465-0","volume":"12","author":"K Rajan","year":"2020","unstructured":"Rajan K, Brinkhaus HO, Zielesny A, Steinbeck C (2020) A review of optical chemical structure recognition tools. J Cheminform 12:60 [cito:cites] [cito:agreesWith] [cito:citesAsAuthority]","journal-title":"J Cheminform"},{"key":"538_CR9","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31\u201336 [cito:cites]","journal-title":"J Chem Inf Comput Sci"},{"key":"538_CR10","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1186\/s13321-015-0068-4","volume":"7","author":"SR Heller","year":"2015","unstructured":"Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D (2015) InChI, the IUPAC international chemical identifier. J Cheminform 7:23 [cito:cites]","journal-title":"J Cheminform"},{"key":"538_CR11","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1021\/ci800067r","volume":"49","author":"IV Filippov","year":"2009","unstructured":"Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information: OSRA, an open source solution. J Chem Inf Model 49:740\u2013743 [cito:cites] [cito:citesAsAuthority]","journal-title":"J Chem Inf Model"},{"key":"538_CR12","unstructured":"Peryea T, Katzel D, Zhao T, Southall N, Nguyen D-T (2019) MOLVEC: Open source library for chemical structure recognition. In: Abstracts of papers of the American Chemical Society, vol 258 [cito:cites] [cito:citesAsAuthority]"},{"key":"538_CR13","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1021\/acs.jcim.8b00669","volume":"59","author":"J Staker","year":"2019","unstructured":"Staker J, Marshall K, Abel R, McQuaw CM (2019) Molecular structure extraction from documents using deep learning. J Chem Inf Model 59:1017\u20131029 [cito:cites] [cito:citesAsAuthority]","journal-title":"J Chem Inf Model"},{"key":"538_CR14","doi-asserted-by":"publisher","first-page":"4506","DOI":"10.1021\/acs.jcim.0c00459","volume":"60","author":"M Oldenhof","year":"2020","unstructured":"Oldenhof M, Arany A, Moreau Y, Simm J (2020) ChemGrapher: optical graph recognition of chemical compounds by deep learning. J Chem Inf Model 60:4506\u20134517 [cito:cites] [cito:citesAsAuthority]","journal-title":"J Chem Inf Model"},{"key":"538_CR15","doi-asserted-by":"publisher","DOI":"10.26434\/chemrxiv.14156957.v1","author":"H Weir","year":"2021","unstructured":"Weir H, Thompson K, Choi B, Woodward A, Braun A, Mart\u00ednez TJ (2021) ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning. ChemRxiv. https:\/\/doi.org\/10.26434\/chemrxiv.14156957.v1[cito:cites] [cito:citesAsAuthority]","journal-title":"ChemRxiv"},{"key":"538_CR16","doi-asserted-by":"publisher","DOI":"10.26434\/chemrxiv.14320907.v1","author":"D-A Clevert","year":"2021","unstructured":"Clevert D-A, Le T, Winter R, Montanari F (2021) Img2Mol\u2014accurate SMILES recognition from molecular graphical depictions. ChemRxiv. https:\/\/doi.org\/10.26434\/chemrxiv.14320907.v1[cito:cites] [cito:citesAsAuthority]","journal-title":"ChemRxiv"},{"key":"538_CR17","doi-asserted-by":"publisher","first-page":"10378","DOI":"10.1039\/D0SC03115A","volume":"11","author":"T Le","year":"2020","unstructured":"Le T, Winter R, No\u00e9 F, Clevert D-A (2020) Neuraldecipher\u2014reverse-engineering extended-connectivity fingerprints (ECFPs) to their molecular structures. Chem Sci 11:10378\u201310389 [cito:cites] [cito:citesAsAuthority]","journal-title":"Chem Sci"},{"key":"538_CR18","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1186\/s13321-020-00469-w","volume":"12","author":"K Rajan","year":"2020","unstructured":"Rajan K, Zielesny A, Steinbeck C (2020) DECIMER: towards deep learning for chemical image recognition. J Cheminform 12:65 [cito:usesMethodIn] [cito:citesAsAuthority] [cito:extends]","journal-title":"J Cheminform"},{"key":"538_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00496-1","volume":"13","author":"K Rajan","year":"2021","unstructured":"Rajan K, Brinkhaus HO, Sorokina M, Zielesny A, Steinbeck C (2021) DECIMER-segmentation: automated extraction of chemical structure depictions from scientific literature. J Cheminform 13:1\u20139. https:\/\/doi.org\/10.1186\/s13321-021-00496-1[cito:cites] [cito:extends] [cito:citesAsAuthority]","journal-title":"J Cheminform"},{"key":"538_CR20","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550:354\u2013359 [cito:cites] [cito:agreesWith]","journal-title":"Nature"},{"key":"538_CR21","doi-asserted-by":"publisher","first-page":"D1102","DOI":"10.1093\/nar\/gky1033","volume":"47","author":"S Kim","year":"2019","unstructured":"Kim S, Chen J, Cheng T et al (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47:D1102\u2013D1109 [cito:citesAsDataSource]","journal-title":"Nucleic Acids Res"},{"key":"538_CR22","doi-asserted-by":"publisher","first-page":"6065","DOI":"10.1021\/acs.jcim.0c00675","volume":"60","author":"JJ Irwin","year":"2020","unstructured":"Irwin JJ, Tang KG, Young J, Dandarchuluun C, Wong BR, Khurelbaatar M, Moroz YS, Mayfield J, Sayle RA (2020) ZINC20\u2014a free ultralarge-scale chemical database for ligand discovery. J Chem Inf Model 60:6065\u20136073 [cito:cites]","journal-title":"J Chem Inf Model"},{"key":"538_CR23","doi-asserted-by":"publisher","first-page":"2864","DOI":"10.1021\/ci300415d","volume":"52","author":"L Ruddigkeit","year":"2012","unstructured":"Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L (2012) Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model 52:2864\u20132875 [cito:cites]","journal-title":"J Chem Inf Model"},{"key":"538_CR24","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1021\/ci025584y","volume":"43","author":"C Steinbeck","year":"2003","unstructured":"Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an open-source Java library for Chemo- and bioinformatics. J Chem Inf Comput Sci 43:493\u2013500 [cito:usesMethodIn]","journal-title":"J Chem Inf Comput Sci"},{"key":"538_CR25","unstructured":"Jung AB, Wada K, Crall J et al (2020) Imgaug. GitHub: San Francisco, CA, USA [cito:usesMethodIn]"},{"key":"538_CR26","doi-asserted-by":"publisher","DOI":"10.26434\/chemrxiv.7097960.v1","author":"N O\u2019Boyle","year":"2018","unstructured":"O\u2019Boyle N, Dalke A (2018) DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures. ChemRxiv. https:\/\/doi.org\/10.26434\/chemrxiv.7097960.v1[cito:usesMethodIn]","journal-title":"ChemRxiv"},{"key":"538_CR27","doi-asserted-by":"publisher","first-page":"045024","DOI":"10.1088\/2632-2153\/aba947","volume":"1","author":"M Krenn","year":"2020","unstructured":"Krenn M, H\u00e4se F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn Sci Technol 1:045024 [cito:usesMethodIn]","journal-title":"Mach Learn Sci Technol"},{"key":"538_CR28","doi-asserted-by":"crossref","unstructured":"Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2818\u20132826 [cito:cites]","DOI":"10.1109\/CVPR.2016.308"},{"key":"538_CR29","unstructured":"Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105\u20136114 [cito:cites]"},{"key":"538_CR30","doi-asserted-by":"crossref","unstructured":"Deng J, Dong W, Socher R, Li L, Kai Li, Li Fei-Fei (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp 248\u2013255 [cito:cites]","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"538_CR31","doi-asserted-by":"crossref","unstructured":"Xie Q, Luong M-T, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 10687\u201310698 [cito:cites] [cito:usesMethodIn]","DOI":"10.1109\/CVPR42600.2020.01070"},{"key":"538_CR32","unstructured":"Chollet F et al (2015) Keras. https:\/\/keras.io. [cito:usesMethodIn]"},{"key":"538_CR33","unstructured":"Abadi M, Agarwal A, Barham P et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. [cito:usesMethodIn]"},{"key":"538_CR34","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1109\/MM.2021.3058217","volume":"41","author":"T Norrie","year":"2021","unstructured":"Norrie T, Patil N, Yoon DH, Kurian G, Li S, Laudon J, Young C, Jouppi N, Patterson D (2021) The design process for Google\u2019s training chips: TPUv2 and TPUv3. IEEE Micro 41:56\u201363 [cito:cites]","journal-title":"IEEE Micro"},{"key":"538_CR35","unstructured":"Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning. PMLR, Lille, France, pp 2048\u20132057 [cito:usesMethodIn]"},{"key":"538_CR36","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv [cs.CL] [cito:usesMethodIn]"},{"key":"538_CR37","unstructured":"Image captioning with visual attention. https:\/\/www.tensorflow.org\/tutorials\/text\/image_captioning. Accessed 17 Mar 2021 [cito:usesMethodIn]"},{"key":"538_CR38","unstructured":"Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv [cs.CL] [cito:usesMethodIn]"},{"key":"538_CR39","unstructured":"Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv [cs.LG] [cito:usesMethodIn]"},{"key":"538_CR40","unstructured":"Landrum G et al (2016) RDKit: open-source cheminformatics software (2016). http:\/\/www.rdkit.org\/, https:\/\/github.com\/rdkit\/rdkit[cito:usesMethodIn]"},{"key":"538_CR41","unstructured":"dtype support\u2014imgaug 0.4.0 documentation. https:\/\/imgaug.readthedocs.io\/en\/latest\/source\/dtype_support.html. Accessed 15 Apr 2021 [cito:usesMethodIn]"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-021-00538-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-021-00538-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-021-00538-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,22]],"date-time":"2021-11-22T14:03:56Z","timestamp":1637589836000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-021-00538-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,17]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["538"],"URL":"https:\/\/doi.org\/10.1186\/s13321-021-00538-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2021-9j7wg-v2","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv.14479287.v1","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,17]]},"assertion":[{"value":"29 April 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 July 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 August 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 November 2021","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"In the original publication, Ref.19 lead to the incorrect article. The article has been updated and the reference link has been corrected.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"AZ is co-founder of GNWI\u2014Gesellschaft f\u00fcr naturwissenschaftliche Informatik mbH, Dortmund, Germany.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"61"}}