{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T18:09:02Z","timestamp":1783102142256,"version":"3.54.6"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,10,27]],"date-time":"2020-10-27T00:00:00Z","timestamp":1603756800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,10,27]],"date-time":"2020-10-27T00:00:00Z","timestamp":1603756800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Projekt DEAL"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image of a molecule, as found in publications, into a SMILES. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are superior over SMILES and we have a preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggests that we might be able to achieve near-accurate prediction with 50 to 100\u00a0million training structures. This work is entirely based on open-source software and open data and is available to the general public for any purpose.<\/jats:p>","DOI":"10.1186\/s13321-020-00469-w","type":"journal-article","created":{"date-parts":[[2020,10,27]],"date-time":"2020-10-27T06:04:01Z","timestamp":1603778641000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":75,"title":["DECIMER: towards deep learning for chemical image recognition"],"prefix":"10.1186","volume":"12","author":[{"given":"Kohulan","family":"Rajan","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Achim","family":"Zielesny","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6966-0814","authenticated-orcid":false,"given":"Christoph","family":"Steinbeck","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2020,10,27]]},"reference":[{"issue":"4","key":"469_CR1","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1021\/ci00008a018","volume":"32","author":"JR McDaniel","year":"1992","unstructured":"McDaniel JR, Balmuth JR (1992) Kekule: OCR-optical chemical (structure) recognition. J Chem Inf Model 32(4):373\u2013378. https:\/\/doi.org\/10.1021\/ci00008a018","journal-title":"J Chem Inf Model"},{"issue":"12","key":"469_CR2","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1021\/cen-v070n012.p017","volume":"70","author":"S Borman","year":"1992","unstructured":"Borman S (1992) New computer program reads, interprets chemical structures. Chem Eng News 70(12):17\u201319. https:\/\/doi.org\/10.1021\/cen-v070n012.p017","journal-title":"Chem Eng News"},{"issue":"3","key":"469_CR3","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1021\/ci00067a014","volume":"30","author":"ML Contreras","year":"1990","unstructured":"Contreras ML, Allendes C, Alvarez LT, Rozas R (1990) Computational perception and recognition of digitized molecular structures. J Chem Inf Model 30(3):302\u2013307. https:\/\/doi.org\/10.1021\/ci00067a014","journal-title":"J Chem Inf Model"},{"key":"469_CR4","unstructured":"Casey R, Boyer S, Healey P, Miller A, Oudot B, Zilles K (1993) Optical recognition of chemical graphics. In: Proceedings of 2nd international conference on document analysis and recognition (ICDAR \u201993). IEEE Computer Society Press, Washington, DC, pp 627\u2013631. https:\/\/ieeexplore.ieee.org\/document\/395658\/"},{"issue":"3","key":"469_CR5","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1021\/ci00013a010","volume":"33","author":"P Ibison","year":"1993","unstructured":"Ibison P, Jacquot M, Kam F, Neville AG, Simpson RW, Tonnelier C et al (1993) Chemical literature data extraction: the CLiDE Project. J Chem Inf Model 33(3):338\u2013344. https:\/\/doi.org\/10.1021\/ci00013a010","journal-title":"J Chem Inf Model"},{"key":"469_CR6","unstructured":"Zimmermann M, Bui Thi LT, Hofmann M (2005) Combating illiteracy in chemistry: towards computer-based chemical structure reconstruction. ERCIM News 60(60):40\u201341. https:\/\/www.ercim.eu\/publication\/Ercim_News\/enw60\/zimmermann.html, https:\/\/www.researchgate.net\/publication\/228766116_Combating_illiteracy_in_chemistry_towards_computer-based_chemical_structure_reconstruction"},{"key":"469_CR7","doi-asserted-by":"crossref","unstructured":"Algorri M-E, Zimmermann M, Friedrich CM, Akle S, Hofmann-Apitius M (2007) Reconstruction of chemical molecules from images. In: 2007 29th annual international conference of the IEEE engineering in medicine and biology society. IEEE, New York, pp 4609\u20134612. https:\/\/ieeexplore.ieee.org\/document\/4353366\/","DOI":"10.1109\/IEMBS.2007.4353366"},{"key":"469_CR8","doi-asserted-by":"crossref","unstructured":"Algorri M-E, Zimmermann M, Hofmann-Apitius M (2007) Automatic recognition of chemical images. In: Eighth Mexican international conference on current trends in computer science (ENC 2007). IEEE, New York, pp 41\u201346. https:\/\/ieeexplore.ieee.org\/document\/4351423\/","DOI":"10.1109\/ENC.2007.25"},{"issue":"1","key":"469_CR9","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/1752-153X-3-4","volume":"3","author":"J Park","year":"2009","unstructured":"Park J, Rosania GR, Shedden KA, Nguyen M, Lyu N, Saitou K (2009) Automated extraction of chemical structure information from digital raster images. Chem Cent J 3(1):4. https:\/\/doi.org\/10.1186\/1752-153X-3-4","journal-title":"Chem Cent J"},{"issue":"3","key":"469_CR10","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1021\/ci800067r","volume":"49","author":"IV Filippov","year":"2009","unstructured":"Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information: OSRA, an open source solution. J Chem Inf Model 49(3):740\u2013743. https:\/\/doi.org\/10.1021\/ci800067r","journal-title":"J Chem Inf Model"},{"key":"469_CR11","unstructured":"Karthikeyan M (2017) Chemical structure recognition tool. US Patent 9,558,403 B2"},{"issue":"5","key":"469_CR12","doi-asserted-by":"publisher","first-page":"253","DOI":"10.3390\/md18050253","volume":"18","author":"O-S Kwon","year":"2020","unstructured":"Kwon O-S, Kim D, Kim C-K, Sun J, Sim CJ, Oh D-C et al (2020) Cytotoxic scalarane sesterterpenes from the sponge Hyrtios erectus. Mar Drugs 18(5):253","journal-title":"Mar Drugs"},{"issue":"7676","key":"469_CR13","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354\u2013359. https:\/\/doi.org\/10.1038\/nature24270","journal-title":"Nature"},{"issue":"3","key":"469_CR14","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1021\/acs.jcim.8b00669","volume":"59","author":"J Staker","year":"2019","unstructured":"Staker J, Marshall K, Abel R, McQuaw CM (2019) Molecular structure extraction from documents using deep learning. J Chem Inf Model 59(3):1017\u20131029","journal-title":"J Chem Inf Model"},{"key":"469_CR15","doi-asserted-by":"crossref","unstructured":"Oldenhof M, Arany A, Moreau Y, Simm J (2020) ChemGrapher: optical graph recognition of chemical compounds by deep learning. https:\/\/arxiv.org\/abs\/2002.09914","DOI":"10.1021\/acs.jcim.0c00459"},{"issue":"1","key":"469_CR16","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/s13321-017-0220-4","volume":"9","author":"EL Willighagen","year":"2017","unstructured":"Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N et al (2017) The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9(1):33. https:\/\/doi.org\/10.1186\/s13321-017-0220-4","journal-title":"J Cheminform"},{"issue":"D1","key":"469_CR17","doi-asserted-by":"publisher","first-page":"D1102","DOI":"10.1093\/nar\/gky1033","volume":"47","author":"S Kim","year":"2019","unstructured":"Kim S, Chen J, Cheng T, Gindulyte A, He J, He S et al (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47(D1):D1102\u2013D1109","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"469_CR18","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system: 1: introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31\u201336","journal-title":"J Chem Inf Comput Sci"},{"key":"469_CR19","author":"N O\u2019Boyle","year":"2018","unstructured":"O\u2019Boyle N, Dalke A (2018) DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures. chemRxiv: 1026434, pp 1\u20139. https:\/\/github.com\/nextmovesoftware\/deepsmiles"},{"key":"469_CR20","doi-asserted-by":"crossref","unstructured":"Krenn M, H\u00e4se F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. https:\/\/github.com\/aspuru-guzik-group\/selfies. Accessed 2 June 2020","DOI":"10.1088\/2632-2153\/aba947"},{"key":"469_CR21","unstructured":"Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C et al (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems. https:\/\/arxiv.org\/abs\/1603.04467"},{"key":"469_CR22","unstructured":"Xu K, Ba JL, Kiros R, Cho K, Courville A, Salakhutdinov R et al (2015) Show, attend and tell: neural image caption generation with visual attention. In: 32nd International conference on machine learning, ICML 2015, vol 3, pp 2048\u20132057"},{"key":"469_CR23","unstructured":"tensorflow. tensorflow\/docs. https:\/\/github.com\/tensorflow\/docs\/blob\/master\/site\/en\/tutorials\/text\/image_captioning.ipynb. Accessed 18 Aug 2020"},{"key":"469_CR24","unstructured":"Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd international conference on learning representations, ICLR 2015\u2014Conf Track Proc, pp 1\u201315"},{"key":"469_CR25","doi-asserted-by":"crossref","unstructured":"Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2818\u20132826","DOI":"10.1109\/CVPR.2016.308"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-020-00469-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-020-00469-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-020-00469-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T19:31:32Z","timestamp":1635276692000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-020-00469-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,27]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["469"],"URL":"https:\/\/doi.org\/10.1186\/s13321-020-00469-w","relation":{"references":[{"id-type":"uri","id":"https:\/\/github.com\/nextmovesoftware\/deepsmiles","asserted-by":"subject"}],"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv.12464420","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv.12464420.v1","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv.12464420.v2","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,27]]},"assertion":[{"value":"11 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 October 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 October 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"65"}}