{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T19:16:05Z","timestamp":1777403765614,"version":"3.51.4"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T00:00:00Z","timestamp":1702944000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T00:00:00Z","timestamp":1702944000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004504","name":"Research Council of Lithuania","doi-asserted-by":"crossref","award":["MIP-23-87"],"award-info":[{"award-number":["MIP-23-87"]}],"id":[{"id":"10.13039\/501100004504","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004504","name":"Research Council of Lithuania","doi-asserted-by":"crossref","award":["MIP-23-87"],"award-info":[{"award-number":["MIP-23-87"]}],"id":[{"id":"10.13039\/501100004504","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004504","name":"Research Council of Lithuania","doi-asserted-by":"crossref","award":["MIP-23-87"],"award-info":[{"award-number":["MIP-23-87"]}],"id":[{"id":"10.13039\/501100004504","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health"},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Knowledge about the 3-dimensional structure, orientation and interaction of chemical compounds is important in many areas of science and technology. X-ray crystallography is one of the experimental techniques capable of providing a large amount of structural information for a given compound, and it is widely used for characterisation of organic and metal-organic molecules. The method provides precise 3D coordinates of atoms inside crystals, however, it does not directly deliver information about certain chemical characteristics such as bond orders, delocalization, charges, lone electron pairs or lone electrons. These aspects of a molecular model have to be derived from crystallographic data using refined information about interatomic distances and atom types as well as employing general chemical knowledge. This publication describes a curated automatic pipeline for the derivation of chemical attributes of molecules from crystallographic models. The method is applied to build a catalogue of chemical entities in an open-access crystallographic database, the Crystallography Open Database (COD). The catalogue of such chemical entities is provided openly as a derived database. The content of this catalogue and the problems arising in the fully automated pipeline are discussed, along with the possibilities to introduce manual data curation into the process.<\/jats:p>","DOI":"10.1186\/s13321-023-00780-2","type":"journal-article","created":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T09:02:29Z","timestamp":1702976549000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":117,"title":["A workflow for deriving chemical entities from crystallographic data and its application to the Crystallography Open Database"],"prefix":"10.1186","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5944-1391","authenticated-orcid":false,"given":"Antanas","family":"Vaitkus","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7731-6236","authenticated-orcid":false,"given":"Andrius","family":"Merkys","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4019-1959","authenticated-orcid":false,"given":"Thomas","family":"Sander","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1583-4468","authenticated-orcid":false,"given":"Miguel","family":"Quir\u00f3s","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1992-2086","authenticated-orcid":false,"given":"Paul A.","family":"Thiessen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5959-6190","authenticated-orcid":false,"given":"Evan E.","family":"Bolton","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7928-5218","authenticated-orcid":false,"given":"Saulius","family":"Gra\u017eulis","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,12,19]]},"reference":[{"issue":"36","key":"780_CR1","doi-asserted-by":"publisher","first-page":"15665","DOI":"10.1002\/anie.202004239","volume":"59","author":"S Spicher","year":"2020","unstructured":"Spicher S, Grimme S (2020) Robust atomistic modeling of materials, organometallic, and biochemical systems. Angewandte Chemie International Edition 59(36):15665\u201315673. https:\/\/doi.org\/10.1002\/anie.202004239","journal-title":"Angewandte Chemie International Edition"},{"issue":"5","key":"780_CR2","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1021\/ci00009a001","volume":"32","author":"JC Baber","year":"1992","unstructured":"Baber JC, Hodgkin EE (1992) Automatic assignment of chemical connectivity to organic molecules in the Cambridge Structural Database. J Chem Inform Model 32(5):401\u2013406. https:\/\/doi.org\/10.1021\/ci00009a001","journal-title":"J Chem Inform Model"},{"issue":"4","key":"780_CR3","doi-asserted-by":"publisher","first-page":"774","DOI":"10.1021\/ci9603487","volume":"37","author":"M Hendlich","year":"1997","unstructured":"Hendlich M, Rippmann F, Barnickel G (1997) BALI: Automatic assignment of bond and atom types for protein ligands in the Brookhaven Protein Databank. J Chem Inform Comput Sci 37(4):774\u2013778. https:\/\/doi.org\/10.1021\/ci9603487","journal-title":"J Chem Inform Comput Sci"},{"key":"780_CR4","unstructured":"Sayle RA. PDB: Cruft to Content (perception of Molecular Connectivity from 3D Coordinates). https:\/\/www.daylight.com\/meetings\/mug01\/Sayle\/m4xbondage.html Accessed 2023-08-21"},{"issue":"2","key":"780_CR5","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1021\/ci049915d","volume":"45","author":"P Labute","year":"2005","unstructured":"Labute P (2005) On the perception of molecules from 3D atomic coordinates. J Chem Inform Model 45(2):215\u2013221. https:\/\/doi.org\/10.1021\/ci049915d","journal-title":"J Chem Inform Model"},{"issue":"5","key":"780_CR6","doi-asserted-by":"publisher","first-page":"1267","DOI":"10.1021\/ci049645z","volume":"45","author":"M Froeyen","year":"2005","unstructured":"Froeyen M, Herdewijn P (2005) Correct bond order assignment in a molecular framework using integer linear programming with application to molecules where only non-hydrogen atom coordinates are available. J Chem Inform Model 45(5):1267\u20131274. https:\/\/doi.org\/10.1021\/ci049645z","journal-title":"J Chem Inform Model"},{"issue":"6","key":"780_CR7","doi-asserted-by":"publisher","first-page":"1649","DOI":"10.1016\/j.febslet.2006.02.003","volume":"580","author":"HJ Feldman","year":"2006","unstructured":"Feldman HJ, Snyder KA, Ticoll A, Pintilie G, Hogue CWV (2006) A complete small molecule dataset from the Protein Data Bank. FEBS Lett 580(6):1649\u20131653. https:\/\/doi.org\/10.1016\/j.febslet.2006.02.003","journal-title":"FEBS Lett"},{"issue":"4","key":"780_CR8","doi-asserted-by":"publisher","first-page":"1379","DOI":"10.1021\/ci700028w","volume":"47","author":"Y Zhao","year":"2007","unstructured":"Zhao Y, Cheng T, Wang R (2007) Automatic perception of organic molecules based on essential structural information. J Chem Inform Model 47(4):1379\u20131385. https:\/\/doi.org\/10.1021\/ci700028w","journal-title":"J Chem Inform Model"},{"issue":"8","key":"780_CR9","doi-asserted-by":"publisher","first-page":"1410","DOI":"10.1021\/acs.jcim.5b00512","volume":"56","author":"M Kadukova","year":"2016","unstructured":"Kadukova M, Grudinin S (2016) Knodle: A support vector machines-based automatic perception of organic molecules from 3D coordinates. J Chem Inform Model 56(8):1410\u20131419. https:\/\/doi.org\/10.1021\/acs.jcim.5b00512","journal-title":"J Chem Inform Model"},{"key":"780_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-019-0340-0","volume":"11","author":"ID Welsh","year":"2019","unstructured":"Welsh ID, Allison JR (2019) Automated simultaneous assignment of bond orders and formal charges. J Cheminform 11:1. https:\/\/doi.org\/10.1186\/s13321-019-0340-0","journal-title":"J Cheminform"},{"issue":"6","key":"780_CR11","doi-asserted-by":"publisher","first-page":"2668","DOI":"10.1021\/acs.jcim.0c00076","volume":"60","author":"F Lazzari","year":"2020","unstructured":"Lazzari F, Salvadori A, Mancini G, Barone V (2020) Molecular perception for visualization and computation: The Proxima library. J Chem Inform Model 60(6):2668\u20132672. https:\/\/doi.org\/10.1021\/acs.jcim.0c00076","journal-title":"J Chem Inform Model"},{"issue":"4","key":"780_CR12","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1107\/s0108768111024608","volume":"67","author":"IJ Bruno","year":"2011","unstructured":"Bruno IJ, Shields GP, Taylor R (2011) Deducing chemical structure from crystallographically determined atomic coordinates. Acta Crystallographica B 67(4):333\u2013349. https:\/\/doi.org\/10.1107\/s0108768111024608","journal-title":"Acta Crystallographica B"},{"key":"780_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-018-0279-6","volume":"10","author":"M Quir\u00f3s","year":"2018","unstructured":"Quir\u00f3s M, Gra\u017eulis S, Girdzijauskait\u0117 S, Merkys A, Vaitkus A (2018) Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database. J Cheminform 10:1. https:\/\/doi.org\/10.1186\/s13321-018-0279-6","journal-title":"J Cheminform"},{"issue":"12","key":"780_CR14","doi-asserted-by":"publisher","first-page":"3149","DOI":"10.1021\/ci200488k","volume":"51","author":"AM Clark","year":"2011","unstructured":"Clark AM (2011) Accurate specification of molecular structures: the case for zero-order bonds and explicit hydrogen counting. J Chem Inform Model 51(12):3149\u20133157. https:\/\/doi.org\/10.1021\/ci200488k","journal-title":"J Chem Inform Model"},{"key":"780_CR15","unstructured":"Apodaca RL. Of Zero-Order Bonds and Bonding Systems. https:\/\/depth-first.com\/articles\/2021\/05\/04\/of-zero-order-bonds-and-bonding-systems\/ Accessed 21 Mar 2023"},{"key":"780_CR16","unstructured":"Vaitkus A. cif-perceive-chemistry, Version 0.4.0. svn:\/\/www.crystallography.net\/cif-perceive-chemistry\/tags\/v0.4.0 Accessed 21 Aug 2023"},{"issue":"4","key":"780_CR17","doi-asserted-by":"publisher","first-page":"726","DOI":"10.1107\/S0021889809016690","volume":"42","author":"S Gra\u017eulis","year":"2009","unstructured":"Gra\u017eulis S, Chateigner D, Downs RT, Yokochi AFT, Quir\u00f3s M, Lutterotti L, Manakova E, Butkus J, Moeck P, Le Bail A (2009) Crystallography Open Database\u2014an open-access collection of crystal structures. J Appl Crystallogr 42(4):726\u2013729. https:\/\/doi.org\/10.1107\/S0021889809016690","journal-title":"J Appl Crystallogr"},{"issue":"D1","key":"780_CR18","doi-asserted-by":"publisher","first-page":"420","DOI":"10.1093\/nar\/gkr900","volume":"40","author":"S Gra\u017eulis","year":"2012","unstructured":"Gra\u017eulis S, Da\u0161kevi\u010d A, Merkys A, Chateigner D, Lutterotti L, Quir\u00f3s M, Serebryanaya NR, Moeck P, Downs RT, Le Bail A (2012) Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res 40(D1):420\u2013427. https:\/\/doi.org\/10.1093\/nar\/gkr900","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"780_CR19","doi-asserted-by":"publisher","first-page":"655","DOI":"10.1107\/S010876739101067X","volume":"47","author":"SR Hall","year":"1991","unstructured":"Hall SR, Allen FH, Brown ID (1991) The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallographica A 47(6):655\u2013685. https:\/\/doi.org\/10.1107\/S010876739101067X","journal-title":"Acta Crystallographica A"},{"issue":"1","key":"780_CR20","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1107\/s1600576715021871","volume":"49","author":"HJ Bernstein","year":"2016","unstructured":"Bernstein HJ, Bollinger JC, Brown ID, Gra\u017eulis S, Hester JR, McMahon B, Spadaccini N, Westbrook JD, Westrip SP (2016) Specification of the crystallographic information file format, version 2.0. J Appl Crystallogr 49(1):277\u2013284. https:\/\/doi.org\/10.1107\/s1600576715021871","journal-title":"J Appl Crystallogr"},{"issue":"1","key":"780_CR21","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1107\/s1600576714025904","volume":"48","author":"S Gra\u017eulis","year":"2015","unstructured":"Gra\u017eulis S, Merkys A, Vaitkus A, Okuli\u010d-Kazarinas M (2015) Computing stoichiometric molecular composition from crystal structures. J Appl Crystallogr 48(1):85\u201391. https:\/\/doi.org\/10.1107\/s1600576714025904","journal-title":"J Appl Crystallogr"},{"issue":"3","key":"780_CR22","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1107\/s1600576722003107","volume":"55","author":"K Petrauskas","year":"2022","unstructured":"Petrauskas K, Merkys A, Vaitkus A, Laibinis L, Gra\u017eulis S (2022) Proving the correctness of the algorithm for building a crystallographic space group. J Appl Crystallogr 55(3):515\u2013525. https:\/\/doi.org\/10.1107\/s1600576722003107","journal-title":"J Appl Crystallogr"},{"key":"780_CR23","unstructured":"Vaitkus A, Merkys A, Gra\u017eulis. cod-tools, Version 3.6.0. svn:\/\/www.crystallography.net\/cod-tools\/tags\/v3.6.0 Accessed 21 Aug 2023"},{"issue":"6","key":"780_CR24","doi-asserted-by":"publisher","first-page":"1594","DOI":"10.1107\/S1600576721009109","volume":"54","author":"M Nespolo","year":"2021","unstructured":"Nespolo M, Benahsene AH (2021) Symmetry and chirality in crystals. J Appl Crystallogr 54(6):1594\u20131599. https:\/\/doi.org\/10.1107\/S1600576721009109","journal-title":"J Appl Crystallogr"},{"key":"780_CR25","unstructured":"CTFile formats. Technical report, BIOVIA (2020). https:\/\/discover.3ds.com\/sites\/default\/files\/2020-08\/biovia_ctfileformats_2020.pdf Accessed 21 Aug 2023"},{"key":"780_CR26","unstructured":"Lindner P. IANA, Text Media Types, Definition of Tab-separated-values (tsv). U of MN Internet Gopher Team. https:\/\/www.iana.org\/assignments\/media-types\/text\/tab-separated-values Accessed 21 Aug 2023"},{"key":"780_CR27","unstructured":"TSV, TAB-separated Values. Library of Congress. https:\/\/www.loc.gov\/preservation\/digital\/formats\/fdd\/fdd000533.shtml Accessed 21 Aug 2023"},{"key":"780_CR28","unstructured":"Sander T, Rufener C, B\u00e4r R, Korff M. OpenChemLib - Open Source Java-based Chemistry Library. https:\/\/github.com\/Actelion\/openchemlib. Accessed 21 Aug 2023"},{"issue":"2","key":"780_CR29","doi-asserted-by":"publisher","first-page":"460","DOI":"10.1021\/ci500588j","volume":"55","author":"T Sander","year":"2015","unstructured":"Sander T, Freyss J, Korff M, Rufener C (2015) DataWarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inform Model 55(2):460\u2013473. https:\/\/doi.org\/10.1021\/ci500588j","journal-title":"J Chem Inform Model"},{"key":"780_CR30","unstructured":"Sander T. The .dwar File Format. https:\/\/openmolecules.org\/help\/fileformats.html#dwar. Accessed 28 Aug 2023"},{"issue":"12","key":"780_CR31","doi-asserted-by":"publisher","first-page":"2369","DOI":"10.1021\/om020069a","volume":"21","author":"DA Ortmann","year":"2002","unstructured":"Ortmann DA, Webernd\u00f6rfer B, Ilg K, Laubender M, Werner H (2002) Carbene iridium(I) and iridium(III) complexes containing the metal center in different stereochemical environments. Organometallics 21(12):2369\u20132381. https:\/\/doi.org\/10.1021\/om020069a","journal-title":"Organometallics"},{"issue":"5","key":"780_CR32","doi-asserted-by":"publisher","first-page":"1250","DOI":"10.1107\/S0021889810030256","volume":"43","author":"RM Hanson","year":"2010","unstructured":"Hanson RM (2010) Jmol\u2014a paradigm shift in crystallographic visualization. J Appl Crystallogr 43(5):1250\u20131260. https:\/\/doi.org\/10.1107\/S0021889810030256","journal-title":"J Appl Crystallogr"},{"key":"780_CR33","unstructured":"Sander T, Rufener C, B\u00e4r R, Korff M. Molecule.java Class from the OpenChemLib Framework, Version 2022-11-1. https:\/\/raw.githubusercontent.com\/Actelion\/openchemlib\/2de8ed734271d2d0ff1cdd54c1e8267c628e0e74\/src\/main\/java\/com\/actelion\/research\/chem\/Molecule.java. Accessed 21 Aug 2023"},{"key":"780_CR34","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/1758-2946-3-33","volume":"3","author":"NM O\u2019Boyle","year":"2011","unstructured":"O\u2019Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33. https:\/\/doi.org\/10.1186\/1758-2946-3-33","journal-title":"J Cheminform"},{"key":"780_CR35","unstructured":"Gra\u017eulis S. cml-tools, Version 0.2.0. svn:\/\/saulius-grazulis.lt\/cml-tools\/tags\/v0.2.0. Accessed 21 Aug 2023"},{"key":"780_CR36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-015-0068-4","volume":"7","author":"SR Heller","year":"2015","unstructured":"Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D (2015) InChI, the IUPAC international chemical identifier. J Cheminform 7:1. https:\/\/doi.org\/10.1186\/s13321-015-0068-4","journal-title":"J Cheminform"},{"key":"780_CR37","unstructured":"Crystallography Open Database - PubChem Data Source. PubChem. https:\/\/pubchem.ncbi.nlm.nih.gov\/source\/849. Accessed 21 Aug 2023"},{"key":"780_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-023-00692-1","volume":"15","author":"A Merkys","year":"2023","unstructured":"Merkys A, Vaitkus A, Grybauskas A, Konovalovas A, Quir\u00f3s M, Gra\u017eulis S (2023) Graph isomorphism-based algorithm for cross-checking chemical and crystallographic descriptions. J Cheminform 15:1. https:\/\/doi.org\/10.1186\/s13321-023-00692-1","journal-title":"J Cheminform"},{"key":"780_CR39","unstructured":"Vaitkus A. Feature #1166: Add Means to Select a Specific Disorder Group Combination. COD. https:\/\/projects.ibt.lt\/repositories\/issues\/1166. Accessed 21 Aug 2023"},{"key":"780_CR40","unstructured":"Crystal Structure Information from COD in PubChem for CID 700843. PubChem. https:\/\/pubchem.ncbi.nlm.nih.gov\/compound\/700843#section=Crystal-Structures &fullscreen=true. Accessed 21 Aug 2023"},{"key":"780_CR41","unstructured":"Crystal Structure Information from COD in PubChem for SID 385842820. PubChem. https:\/\/pubchem.ncbi.nlm.nih.gov\/substance?source=Crystallography+Open+Database &sourceid=1100299#section=Crystal-Structures &fullscreen=true. Accessed 21 Aug 2023"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00780-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00780-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00780-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T09:05:42Z","timestamp":1702976742000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00780-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,19]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["780"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00780-2","relation":{"is-referenced-by":[{"id-type":"doi","id":"10.1007\/s10973-025-15056-0","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,19]]},"assertion":[{"value":"28 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 November 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 December 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"123"}}