{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T14:46:37Z","timestamp":1783089997311,"version":"3.54.6"},"reference-count":59,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,10,7]],"date-time":"2020-10-07T00:00:00Z","timestamp":1602028800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,10,7]],"date-time":"2020-10-07T00:00:00Z","timestamp":1602028800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100007569","name":"Carl-Zeiss-Stiftung","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007569","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Projekt DEAL"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Structural information about chemical compounds is typically conveyed as 2D images of molecular structures in scientific documents. Unfortunately, these depictions are not a machine-readable representation of the molecules. With a backlog of decades of chemical literature in printed form not properly represented in open-access databases, there is a high demand for the translation of graphical molecular depictions into machine-readable formats. This translation process is known as Optical Chemical Structure Recognition (OCSR). Today, we are looking back on nearly three decades of development in this demanding research field. Most OCSR methods follow a rule-based approach where the key step of vectorization of the depiction is followed by the interpretation of vectors and nodes as bonds and atoms. Opposed to that, some of the latest approaches are based on deep neural networks (DNN). This review provides an overview of all methods and tools that have been published in the field of OCSR. Additionally, a small benchmark study was performed with the available open-source OCSR tools in order to examine their performance.<\/jats:p>","DOI":"10.1186\/s13321-020-00465-0","type":"journal-article","created":{"date-parts":[[2020,10,7]],"date-time":"2020-10-07T10:03:16Z","timestamp":1602064996000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":51,"title":["A review of optical chemical structure recognition tools"],"prefix":"10.1186","volume":"12","author":[{"given":"Kohulan","family":"Rajan","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Henning Otto","family":"Brinkhaus","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Achim","family":"Zielesny","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6966-0814","authenticated-orcid":false,"given":"Christoph","family":"Steinbeck","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2020,10,7]]},"reference":[{"key":"465_CR1","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1186\/1758-2946-6-17","volume":"6","author":"S Eltyeb","year":"2014","unstructured":"Eltyeb S, Salim N (2014) Chemical named entities recognition: a review on approaches and applications. J Cheminform 6:17","journal-title":"J Cheminform"},{"key":"465_CR2","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1021\/ci00067a014","volume":"30","author":"ML Contreras","year":"1990","unstructured":"Contreras ML, Leonor Contreras M, Allendes C, Tomas Alvarez L, Rozas R (1990) Computational perception and recognition of digitized molecular structures. J Chem Inf Model 30:302\u2013307","journal-title":"J Chem Inf Model"},{"key":"465_CR3","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1021\/ci00065a003","volume":"30","author":"R Rozas","year":"1990","unstructured":"Rozas R, Fernandez H (1990) Automatic processing of graphics for image databases in science. J Chem Inf Model 30:7\u201312","journal-title":"J Chem Inf Model"},{"key":"465_CR4","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1021\/ci00008a018","volume":"32","author":"JR McDaniel","year":"1992","unstructured":"McDaniel JR, Balmuth JR (1992) Kekule: OCR-optical chemical (structure) recognition. J Chem Inf Model 32:373\u2013378","journal-title":"J Chem Inf Model"},{"key":"465_CR5","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1021\/ci800067r","volume":"49","author":"IV Filippov","year":"2009","unstructured":"Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information: OSRA, an open source solution. J Chem Inf Model 49:740\u2013743","journal-title":"J Chem Inf Model"},{"key":"465_CR6","doi-asserted-by":"crossref","unstructured":"Smolov V, Zentsev F, Rybalkin M (2011) Imago: open-source toolkit for 2D chemical structure image recognition. In: The Twentieth Text REtrieval Conference (TREC 2011) Proceedings","DOI":"10.6028\/NIST.SP.500-296.chemical-GGA"},{"key":"465_CR7","unstructured":"Peryea T, Katzel D, Zhao T, Southall N, Nguyen D-T (2019) MOLVEC: Open source library for chemical structure recognition. Abstracts of papers of the american chemical society 258"},{"key":"465_CR8","doi-asserted-by":"crossref","unstructured":"Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. Thirty-first AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"465_CR9","doi-asserted-by":"publisher","unstructured":"Abadi M (2016) TensorFlow: learning functions at scale. In: Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming\u2014ICFP 2016. https:\/\/doi.org\/https:\/\/doi.org\/10.1145\/2951913.2976746","DOI":"10.1145\/2951913.2976746"},{"key":"465_CR10","unstructured":"Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch."},{"key":"465_CR11","doi-asserted-by":"crossref","unstructured":"Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia. Association for Computing Machinery, New York, NY, USA, pp 675\u2013678","DOI":"10.1145\/2647868.2654889"},{"key":"465_CR12","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Model 28:31\u201336","journal-title":"J Chem Inf Model"},{"key":"465_CR13","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1021\/ci00007a012","volume":"32","author":"A Dalby","year":"1992","unstructured":"Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, Laufer J (1992) Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J Chem Inf Comput Sci 32:244\u2013255","journal-title":"J Chem Inf Comput Sci"},{"key":"465_CR14","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1145\/361237.361242","volume":"15","author":"RO Duda","year":"1972","unstructured":"Duda RO, Hart PE (1972) Use of the Hough transformation to detect lines and curves in pictures. Commun ACM 15:11\u201315","journal-title":"Commun ACM"},{"key":"465_CR15","doi-asserted-by":"publisher","unstructured":"Casey R, Boyer S, Healey P, Miller A, Oudot B, Zilles K Optical recognition of chemical graphics. In: Proceedings of 2nd international conference on document analysis and recognition (ICDAR \u201993). https:\/\/doi.org\/https:\/\/doi.org\/10.1109\/icdar.1993.395658","DOI":"10.1109\/icdar.1993.395658"},{"key":"465_CR16","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1021\/ci00013a010","volume":"33","author":"P Ibison","year":"1993","unstructured":"Ibison P, Jacquot M, Kam F, Neville AG, Simpson RW, Tonnelier C, Venczel T, Johnson AP (1993) Chemical literature data extraction: the CLiDE Project. J Chem Inf Model 33:338\u2013344","journal-title":"J Chem Inf Model"},{"key":"465_CR17","doi-asserted-by":"publisher","first-page":"780","DOI":"10.1021\/ci800449t","volume":"49","author":"AT Valko","year":"2009","unstructured":"Valko AT, Johnson AP (2009) CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition. J Chem Inf Model 49:780\u2013787","journal-title":"J Chem Inf Model"},{"key":"465_CR18","unstructured":"Filippov I OSRAChangelog. https:\/\/sourceforge.net\/p\/osra\/wiki\/Download\/. Accessed 23 June 2020"},{"issue":"Suppl 17","key":"465_CR19","doi-asserted-by":"publisher","first-page":"S9","DOI":"10.1186\/1471-2105-13-S17-S9","volume":"13","author":"A Tharatipyakul","year":"2012","unstructured":"Tharatipyakul A, Numnark S, Wichadakul D, Ingsriswang S (2012) ChemEx: information extraction system for chemical data curation. BMC Bioinformatics 13(Suppl 17):S9","journal-title":"BMC Bioinformatics"},{"key":"465_CR20","doi-asserted-by":"publisher","first-page":"1894","DOI":"10.1021\/acs.jcim.6b00207","volume":"56","author":"MC Swain","year":"2016","unstructured":"Swain MC, Cole JM (2016) ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J Chem Inf Model 56:1894\u20131904","journal-title":"J Chem Inf Model"},{"key":"465_CR21","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/1752-153X-3-4","volume":"3","author":"J Park","year":"2009","unstructured":"Park J, Rosania GR, Shedden KA, Nguyen M, Lyu N, Saitou K (2009) Automated extraction of chemical structure information from digital raster images. Chem Cent J 3:4","journal-title":"Chem Cent J"},{"key":"465_CR22","unstructured":"Sadawi N (2009) Recognising chemical formulas from molecule depictions. In: Pre-proceedings of the 8th IAPR international workshop on graphics recognition (GREC 2009). pp 167\u2013175"},{"key":"465_CR23","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1109\/TSMC.1979.4310076","volume":"9","author":"N Otsu","year":"1979","unstructured":"Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9:62\u201366","journal-title":"IEEE Trans Syst Man Cybern"},{"key":"465_CR24","volume-title":"Digital image processing algorithms and applications","author":"I Pitas","year":"2000","unstructured":"Pitas I (2000) Digital image processing algorithms and applications. Wiley, Hoboken"},{"key":"465_CR25","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/0031-3203(86)90026-9","volume":"19","author":"R Stefanelli","year":"1986","unstructured":"Stefanelli R (1986) A comment on an investigation into the skeletonization approach of Hilditch. Pattern Recognit 19:13\u201314","journal-title":"Pattern Recognit"},{"issue":"1117\/12","key":"465_CR26","first-page":"912185","volume":"10","author":"NM Sadawi","year":"2012","unstructured":"Sadawi NM, Sexton AP, Sorge V (2012) Chemical structure recognition: a rule-based approach. Doc Recogn Retrieval XIX 10(1117\/12):912185","journal-title":"Doc Recogn Retrieval XIX"},{"key":"465_CR27","doi-asserted-by":"publisher","first-page":"112","DOI":"10.3138\/FM57-6770-U75U-7727","volume":"10","author":"DH Douglas","year":"1973","unstructured":"Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica 10:112\u2013122","journal-title":"Cartographica"},{"key":"465_CR28","doi-asserted-by":"crossref","unstructured":"Zimmermann M (2011) Chemical structure reconstruction with chemoCR. In: The Twentieth Text REtrieval conference (TREC 2011) Proceedings","DOI":"10.6028\/NIST.SP.500-296.chemical-chemoCR"},{"key":"465_CR29","first-page":"4609","volume":"2007","author":"M-E Algorri","year":"2007","unstructured":"Algorri M-E, Zimmermann M, Friedrich CM, Akle S, Hofmann-Apitius M (2007) Reconstruction of chemical molecules from images. ConfProc IEEE Eng Med Biol Soc 2007:4609\u20134612","journal-title":"ConfProc IEEE Eng Med Biol Soc"},{"key":"465_CR30","doi-asserted-by":"crossref","unstructured":"Algorri M, Zimmermann M, Hofmann-Apitius M (2007) Automatic recognition of chemical images. In: Eighth Mexican International Conference on Current Trends in Computer Science (ENC 2007). pp 41\u201346","DOI":"10.1109\/ENC.2007.4351423"},{"key":"465_CR31","unstructured":"Fujiyoshi A, Nakagawa K, Suzuki M (2011) Robust method of segmentation and recognition of chemical structure images in cheminfty. In: Pre-proceedings of the 9th IAPR international workshop on graphics recognition, GREC"},{"key":"465_CR32","unstructured":"Ratnayaka L, De Silva PSU, WijesiriHNM, Samaradiwakara AM, Ranpatabendi N, Rajapaksha U (2012) E-learning based chemical information extracting tool (eChem)"},{"key":"465_CR33","doi-asserted-by":"publisher","first-page":"2380","DOI":"10.1021\/ci5002197","volume":"54","author":"P Frasconi","year":"2014","unstructured":"Frasconi P, Gabbrielli F, Lippi M, Marinai S (2014) Markov logic networks for optical chemical structure recognition. J Chem Inf Model 54:2380\u20132390","journal-title":"J Chem Inf Model"},{"key":"465_CR34","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1007\/s10994-006-5833-1","volume":"62","author":"M Richardson","year":"2006","unstructured":"Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62:107\u2013136","journal-title":"Mach Learn"},{"key":"465_CR35","doi-asserted-by":"crossref","unstructured":"Raedt LD, De Raedt L, Kersting K (2008) Probabilistic inductive logic programming. Probabilistic Inductive Logic Programming, pp 1\u201327","DOI":"10.1007\/978-3-540-78652-8_1"},{"key":"465_CR36","doi-asserted-by":"crossref","unstructured":"Chen Hong XD (2015) Research on chemical expression images recognition. In: 2015 Joint International Mechanical, Electronic and Information Technology Conference (JIMET-15). Atlantis Press, pp 267\u2013271","DOI":"10.2991\/jimet-15.2015.50"},{"key":"465_CR37","unstructured":"Karthikeyan M (2017) Chemical structure recognition tool. US Patent"},{"key":"465_CR38","doi-asserted-by":"publisher","first-page":"1342","DOI":"10.1021\/ci034017n","volume":"43","author":"GV Gkoutos","year":"2003","unstructured":"Gkoutos GV, Rzepa H, Clark RM, Adjei O, Johal H (2003) Chemical machine vision: automated extraction of chemical metadata from raster images. J Chem Inf Comput Sci 43:1342\u20131355","journal-title":"J Chem Inf Comput Sci"},{"key":"465_CR39","doi-asserted-by":"publisher","first-page":"1568","DOI":"10.4249\/scholarpedia.1568","volume":"2","author":"T Kohonen","year":"2007","unstructured":"Kohonen T, Honkela T (2007) Kohonen network. Scholarpedia J 2:1568","journal-title":"Scholarpedia J"},{"key":"465_CR40","first-page":"273","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273\u2013297","journal-title":"Mach Learn"},{"key":"465_CR41","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1021\/acs.jcim.8b00669","volume":"59","author":"J Staker","year":"2019","unstructured":"Staker J, Marshall K, Abel R, McQuaw CM (2019) Molecular structure extraction from documents using deep learning. J Chem Inf Model 59:1017\u20131029","journal-title":"J Chem Inf Model"},{"key":"465_CR42","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention\u2014MICCAI 2015. Springer International Publishing, pp 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"465_CR43","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097\u20131105"},{"key":"465_CR44","first-page":"3104","volume-title":"Advances in neural information processing systems 27","author":"I Sutskever","year":"2014","unstructured":"Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates Inc, New York, pp 3104\u20133112"},{"key":"465_CR45","unstructured":"Indigo Toolkit. https:\/\/lifescience.opensource.epam.com\/indigo\/. Accessed 25 June 2020"},{"key":"465_CR46","doi-asserted-by":"publisher","first-page":"D1102","DOI":"10.1093\/nar\/gky1033","volume":"47","author":"S Kim","year":"2019","unstructured":"Kim S, Chen J, Cheng T et al (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47:D1102\u2013D1109","journal-title":"Nucleic Acids Res"},{"key":"465_CR47","unstructured":"Lowe D Chemical reactions from US patents (1976\u2013Sep 2016) (2017). https:\/\/figshare.com\/articles\/Chemical_reactions_from_US_patents_1976-Sep2016_\/5104873"},{"key":"465_CR48","doi-asserted-by":"publisher","unstructured":"Lowe DM (2012) Extraction of chemical structures and reactions from the literature. https:\/\/doi.org\/https:\/\/doi.org\/10.17863\/CAM.16293","DOI":"10.17863\/CAM.16293"},{"key":"465_CR49","doi-asserted-by":"crossref","unstructured":"Oldenhof M, Arany A, Moreau Y, Simm J (2020) ChemGrapher: optical graph recognition of chemical compounds by deep learning. arXiv [stat.ML]","DOI":"10.1021\/acs.jcim.0c00459"},{"key":"465_CR50","unstructured":"Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv [cs.CV]"},{"key":"465_CR51","unstructured":"Website. RDKit: open-source cheminformatics. https:\/\/www.rdkit.org. Accessed 15 Sept 2020"},{"key":"465_CR52","doi-asserted-by":"publisher","first-page":"D945","DOI":"10.1093\/nar\/gkw1074","volume":"45","author":"A Gaulton","year":"2017","unstructured":"Gaulton A, Hersey A, Nowotka M et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45:D945\u2013D954","journal-title":"Nucleic Acids Res"},{"key":"465_CR53","unstructured":"OSRA validation datasets. https:\/\/sourceforge.net\/p\/osra\/wiki\/Validation\/. Accessed 24 June 2020"},{"key":"465_CR54","unstructured":"MolrecUOB Benchmark dataset. https:\/\/www.cs.bham.ac.uk\/research\/groupings\/reasoning\/sdag\/chemical.php. Accessed 29 June 2020"},{"key":"465_CR55","unstructured":"CLEF-IP 2012 Structure Recognition Test Set. https:\/\/www.ifs.tuwien.ac.at\/~clef-ip\/download\/2012\/qrels\/clef-ip-2012-chem-recognition-qrels.tgz. Accessed 29 June 2020"},{"key":"465_CR56","unstructured":"Imago Download. https:\/\/lifescience.opensource.epam.com\/download\/imago.html. Accessed 24 June 2020"},{"key":"465_CR57","unstructured":"Beard E PyosraConda Recipe. https:\/\/github.com\/edbeard\/conda_recipes\/tree\/master\/pyosra. Accessed 24 June 2020"},{"key":"465_CR58","unstructured":"ChemSchematicResolver Documentation. https:\/\/www.chemschematicresolver.org\/docs\/install. Accessed 24 June 2020"},{"key":"465_CR59","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1186\/1758-2946-5-7","volume":"5","author":"S Heller","year":"2013","unstructured":"Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I (2013) InChI - the worldwide chemical structure identifier standard. J Cheminform 5:7","journal-title":"J Cheminform"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-020-00465-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-020-00465-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-020-00465-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T16:43:31Z","timestamp":1723740211000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-020-00465-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,7]]},"references-count":59,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["465"],"URL":"https:\/\/doi.org\/10.1186\/s13321-020-00465-0","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,7]]},"assertion":[{"value":"7 July 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 September 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 October 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"60"}}