{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T07:48:06Z","timestamp":1777189686481,"version":"3.51.4"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T00:00:00Z","timestamp":1677110400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T00:00:00Z","timestamp":1677110400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004504","name":"Research Council of Lithuania","doi-asserted-by":"crossref","award":["MIP-20-21"],"award-info":[{"award-number":["MIP-20-21"]}],"id":[{"id":"10.13039\/501100004504","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Published reports of chemical compounds often contain multiple machine-readable descriptions which may supplement each other in order to yield coherent and complete chemical representations. This publication presents a method to cross-check such descriptions using a canonical representation and isomorphism of molecular graphs. If immediate agreement between compound descriptions is not found, the algorithm derives the minimal set of simplifications required for both descriptions to arrive to a matching form (if any). The proposed algorithm is used to cross-check chemical descriptions from the Crystallography Open Database to identify coherently described entries as well as those requiring further curation.<\/jats:p>","DOI":"10.1186\/s13321-023-00692-1","type":"journal-article","created":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T03:03:36Z","timestamp":1677121416000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":90,"title":["Graph isomorphism-based algorithm for cross-checking chemical and crystallographic descriptions"],"prefix":"10.1186","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7731-6236","authenticated-orcid":false,"given":"Andrius","family":"Merkys","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5944-1391","authenticated-orcid":false,"given":"Antanas","family":"Vaitkus","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3391-9016","authenticated-orcid":false,"given":"Algirdas","family":"Grybauskas","sequence":"additional","affiliation":[]},{"given":"Aleksandras","family":"Konovalovas","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1583-4468","authenticated-orcid":false,"given":"Miguel","family":"Quir\u00f3s","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7928-5218","authenticated-orcid":false,"given":"Saulius","family":"Gra\u017eulis","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,23]]},"reference":[{"issue":"D1","key":"692_CR1","doi-asserted-by":"publisher","first-page":"930","DOI":"10.1093\/nar\/gky1075","volume":"47","author":"D Mendez","year":"2018","unstructured":"Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, F\u00e9lix E, Magari\u00f1os MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Mara\u00f1\u00f3n M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR (2018) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47(D1):930\u2013940. https:\/\/doi.org\/10.1093\/nar\/gky1075","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"692_CR2","doi-asserted-by":"publisher","first-page":"420","DOI":"10.1093\/nar\/gkr900","volume":"40","author":"S Gra\u017eulis","year":"2012","unstructured":"Gra\u017eulis S, Da\u0161kevi\u010d A, Merkys A, Chateigner D, Lutterotti L, Quir\u00f3s M, Serebryanaya NR, Moeck P, Downs RT, Le Bail A (2012) Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res 40(D1):420\u2013427. https:\/\/doi.org\/10.1093\/nar\/gkr900","journal-title":"Nucleic Acids Res"},{"key":"692_CR3","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1186\/1758-2946-3-44","volume":"3","author":"P Murray-Rust","year":"2011","unstructured":"Murray-Rust P, Rzepa H (2011) CML: evolution and design. J Cheminformatics 3:44. https:\/\/doi.org\/10.1186\/1758-2946-3-44","journal-title":"J Cheminformatics"},{"key":"692_CR4","unstructured":"Anderson E, Veith GD, Weininger D (1987) SMILES: a line notation and computerized interpreter for chemical structures. Technical report, Environmental Research Laboratory-Duluth"},{"issue":"1","key":"692_CR5","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1186\/s13321-015-0068-4","volume":"7","author":"SR Heller","year":"2015","unstructured":"Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D (2015) InChI, the IUPAC International Chemical Identifier. J Cheminformatics 7(1):23. https:\/\/doi.org\/10.1186\/s13321-015-0068-4","journal-title":"J Cheminformatics"},{"key":"692_CR6","unstructured":"Connelly NG, Damhus T, Hartshorn RM, Hutton AT (2005) Nomenclature of Inorganic Chemistry: IUPAC Recommendations 2005. Royal Society of Chemistry"},{"key":"692_CR7","doi-asserted-by":"publisher","DOI":"10.1039\/9781849733069","author":"HA Favre","year":"2013","unstructured":"Favre HA, Powell WH (2013) Nomenclature of organic chemistry: IUPAC recommendations and preferred names 2013. Royal Soc Chem. https:\/\/doi.org\/10.1039\/9781849733069","journal-title":"Royal Soc Chem"},{"issue":"6","key":"692_CR8","doi-asserted-by":"publisher","first-page":"655","DOI":"10.1107\/S010876739101067X","volume":"47","author":"SR Hall","year":"1991","unstructured":"Hall SR, Allen FH, Brown ID (1991) The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr A 47(6):655\u2013685. https:\/\/doi.org\/10.1107\/S010876739101067X","journal-title":"Acta Crystallogr A"},{"issue":"1","key":"692_CR9","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1107\/s1600576715021871","volume":"49","author":"HJ Bernstein","year":"2016","unstructured":"Bernstein HJ, Bollinger JC, Brown ID, Gra\u017eulis S, Hester JR, McMahon B, Spadaccini N, Westbrook JD, Westrip SP (2016) Specification of the crystallographic information file format, version 2.0. J Appl Crystallogr 49(1):277\u2013284. https:\/\/doi.org\/10.1107\/s1600576715021871","journal-title":"J Appl Crystallogr"},{"key":"692_CR10","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1021\/ci100384d","volume":"51","author":"DM Lowe","year":"2011","unstructured":"Lowe DM, Corbett PT, Murray-Rust P, Glen RC (2011) Chemical name to structure: OPSIN, an open source solution. J Chem Inf Model 51:739. https:\/\/doi.org\/10.1021\/ci100384d","journal-title":"J Chem Inf Model"},{"key":"692_CR11","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-018-0279-6","author":"M Quir\u00f3s","year":"2018","unstructured":"Quir\u00f3s M, Gra\u017eulis S, Girdzijauskait\u0117 S, Merkys A, Vaitkus A (2018) Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database. J Cheminformatics. https:\/\/doi.org\/10.1186\/s13321-018-0279-6","journal-title":"J Cheminformatics"},{"key":"692_CR12","doi-asserted-by":"publisher","unstructured":"McNaught AD, Wilkinson A (2014) IUPAC\u2014molecular entity. The IUPAC Compendium of Chemical Terminology. https:\/\/doi.org\/10.1351\/goldbook.m03986","DOI":"10.1351\/goldbook.m03986"},{"key":"692_CR13","doi-asserted-by":"publisher","unstructured":"McNaught AD, Wilkinson A (2014) IUPAC\u2014molecular graph. The IUPAC Compendium of Chemical Terminology. https:\/\/doi.org\/10.1351\/goldbook.MT07069","DOI":"10.1351\/goldbook.MT07069"},{"issue":"3","key":"692_CR14","doi-asserted-by":"publisher","first-page":"432","DOI":"10.1021\/ci9702914","volume":"38","author":"J-L Faulon","year":"1998","unstructured":"Faulon J-L (1998) Isomorphism, automorphism partitioning, and canonical labeling can be solved in polynomial-time for molecular graphs. J Chem Inf Comput Sci 38(3):432\u2013444. https:\/\/doi.org\/10.1021\/ci9702914","journal-title":"J Chem Inf Comput Sci"},{"key":"692_CR15","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1016\/j.jsc.2013.09.003","volume":"60","author":"BD McKay","year":"2014","unstructured":"McKay BD, Piperno A (2014) Practical graph isomorphism, II. J Symb Comput 60:94\u2013112. https:\/\/doi.org\/10.1016\/j.jsc.2013.09.003","journal-title":"J Symb Comput"},{"issue":"1","key":"692_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00456-1","volume":"12","author":"AP Bento","year":"2020","unstructured":"Bento AP, Hersey A, F\u00e9lix E, Landrum G, Gaulton A, Atkinson F, Bellis LJ, Veij MD, Leach AR (2020) An open source chemical structure curation pipeline using RDKit. J Cheminformatics 12(1):1\u201316. https:\/\/doi.org\/10.1186\/s13321-020-00456-1","journal-title":"J Cheminformatics"},{"issue":"1","key":"692_CR17","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1186\/1758-2946-4-22","volume":"4","author":"NM O\u2019Boyle","year":"2012","unstructured":"O\u2019Boyle NM (2012) Towards a Universal SMILES representation\u2014a standard method to generate canonical SMILES based on the InChI. J Cheminformatics 4(1):22. https:\/\/doi.org\/10.1186\/1758-2946-4-22","journal-title":"J Cheminformatics"},{"key":"692_CR18","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The generation of a unique machine description for chemical structures-a technique developed at Chemical Abstracts Service. J Chem Doc 5:107\u2013113. https:\/\/doi.org\/10.1021\/c160017a018","journal-title":"J Chem Doc"},{"issue":"2","key":"692_CR19","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1021\/ci00062a008","volume":"29","author":"D Weininger","year":"1989","unstructured":"Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29(2):97\u2013101. https:\/\/doi.org\/10.1021\/ci00062a008","journal-title":"J Chem Inf Comput Sci"},{"issue":"8","key":"692_CR20","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1007\/s10822-015-9854-3","volume":"29","author":"WA Warr","year":"2015","unstructured":"Warr WA (2015) Many InChIs and quite some feat. J Comput Aided Mol Des 29(8):681\u2013694. https:\/\/doi.org\/10.1007\/s10822-015-9854-3","journal-title":"J Comput Aided Mol Des"},{"key":"692_CR21","unstructured":"Merkys A. Graph::Nauty\u2014Perl Bindings for Nauty, Version 0.5.0. Accessed 18 Jul 2022. https:\/\/metacpan.org\/pod\/Graph::Nauty"},{"key":"692_CR22","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/1758-2946-3-33","volume":"3","author":"NM O\u2019Boyle","year":"2011","unstructured":"O\u2019Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminformatics 3:33. https:\/\/doi.org\/10.1186\/1758-2946-3-33","journal-title":"J Cheminformatics"},{"key":"692_CR23","unstructured":"Pipeline Pilot. Accessed 4 Jul 2022. https:\/\/www.3ds.com\/products-services\/biovia\/products\/data-science\/pipeline-pilot\/"},{"key":"692_CR24","unstructured":"Mayfield J. Re: [BlueObelisk-SMILES] Lone Pairs in Tetrahedral Chiral Centers in SMILES. Accessed 22 Jun 2022. https:\/\/sourceforge.net\/p\/blueobelisk\/mailman\/blueobelisk-smiles\/thread\/9FD799B6-4FEC-481C-8EB5-D185F9B801E7@gmail.com"},{"key":"692_CR25","unstructured":"Apodaca RL. A Comprehensive Treatment of Aromaticity in the SMILES Language. Accessed 1 Jul 2022. https:\/\/depth-first.com\/articles\/2020\/02\/10\/a-comprehensive-treatment-of-aromaticity-in-the-smiles-language\/"},{"key":"692_CR26","unstructured":"Apodaca, RL. Writing Aromatic SMILES. Accessed 1 Jul 2022. https:\/\/depth-first.com\/articles\/2021\/06\/30\/writing-aromatic-smiles\/"},{"key":"692_CR27","doi-asserted-by":"publisher","unstructured":"Vaitkus A. cif-perceive-chemistry, Version 0.1.0. Accessed 16 Feb 2023. https:\/\/doi.org\/10.5281\/zenodo.7490273","DOI":"10.5281\/zenodo.7490273"},{"key":"692_CR28","unstructured":"Vaitkus et al., in preparation"},{"key":"692_CR29","unstructured":"Sander T, Rufener C, B\u00e4r R, von Korff M. OpenChemLib\u2014Open Source Java-based Chemistry Library. Accessed 22 Jun 2022. https:\/\/github.com\/Actelion\/openchemlib"},{"key":"692_CR30","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.1c01041","author":"J Wahl","year":"2022","unstructured":"Wahl J, Sander T (2022) Fully automated creation of virtual chemical fragment spaces using the open-source library OpenChemLib. J Chem Inf Model. https:\/\/doi.org\/10.1021\/acs.jcim.1c01041","journal-title":"J Chem Inf Model"},{"key":"692_CR31","unstructured":"Sayle R. PDB: Cruft to Content. Accessed 16 Feb 2023. https:\/\/www.daylight.com\/meetings\/mug01\/Sayle\/m4xbondage.html"},{"issue":"1","key":"692_CR32","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1107\/S1600576714025904","volume":"48","author":"S Gra\u017eulis","year":"2015","unstructured":"Gra\u017eulis S, Merkys A, Vaitkus A, Okuli\u010d-Kazarinas M (2015) Computing stoichiometric molecular composition from crystal structures. J Appl Crystallogr 48(1):85\u201391. https:\/\/doi.org\/10.1107\/S1600576714025904","journal-title":"J Appl Crystallogr"},{"key":"692_CR33","unstructured":"James CA. OpenSMILES Specification, Version 1.0. Accessed 6 Feb 2022. http:\/\/opensmiles.org\/opensmiles.html"},{"key":"692_CR34","unstructured":"Scalfani VF, Bolton E, Cooke H, Grulke C, Irwin J, Koepler O, Landrum G, Lenci E, Medina-Franco JL, Quir\u00f3s M, Richardson S, Yamada I. IUPAC SMILES+ Specification\u2014Project Details. Accessed 10 Jan 2022. https:\/\/iupac.org\/project\/2019-002-2-024"},{"key":"692_CR35","unstructured":"Apodaca RL. Beyond SMILES. Accessed 6 Dec 2021. https:\/\/depth-first.com\/articles\/2021\/09\/22\/beyond-smiles\/"},{"key":"692_CR36","unstructured":"Merkys A, Gra\u017eulis S, Vaitkus A, Grybauskas A, Quir\u00f3s M. smiles-scripts, Version 0.2.0. Accessed 17 Aug 2022. https:\/\/www.crystallography.net\/smiles-scripts"},{"issue":"12","key":"692_CR37","doi-asserted-by":"publisher","first-page":"3149","DOI":"10.1021\/ci200488k","volume":"51","author":"AM Clark","year":"2011","unstructured":"Clark AM (2011) Accurate specification of molecular structures: the case for zero-order bonds and explicit hydrogen counting. J Chem Inf Model 51(12):3149\u20133157. https:\/\/doi.org\/10.1021\/ci200488k","journal-title":"J Chem Inf Model"},{"key":"692_CR38","unstructured":"Apodaca RL. Of Zero-Order Bonds and Bonding Systems. Accessed 2022-01-10. https:\/\/depth-first.com\/articles\/2021\/05\/04\/of-zero-order-bonds-and-bonding-systems\/"},{"issue":"1","key":"692_CR39","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1186\/1758-2946-3-41","volume":"3","author":"DM Jessop","year":"2011","unstructured":"Jessop DM, Adams SE, Willighagen EL, Hawizy L, Murray-Rust P (2011) OSCAR4: a flexible architecture for chemical text-mining. J Cheminformatics 3(1):41. https:\/\/doi.org\/10.1186\/1758-2946-3-41","journal-title":"J Cheminformatics"},{"issue":"6","key":"692_CR40","doi-asserted-by":"publisher","first-page":"1594","DOI":"10.1107\/S1600576721009109","volume":"54","author":"M Nespolo","year":"2021","unstructured":"Nespolo M, Benahsene AH (2021) Symmetry and chirality in crystals. J Appl Crystallogr 54(6):1594\u20131599. https:\/\/doi.org\/10.1107\/S1600576721009109","journal-title":"J Appl Crystallogr"},{"issue":"7","key":"692_CR41","doi-asserted-by":"publisher","DOI":"10.1107\/S2414314618009628","volume":"3","author":"A Mahfoud","year":"2018","unstructured":"Mahfoud A, Al Houari G, El Yazidi M, Saadi M, El Ammari L (2018) 2-methyl-3$$^\\prime$$-(4-methylphenyl)-4$$^\\prime$$-(2-nitrophenyl)-4$$^\\prime$$ h-spiro[chroman-3,5$$^\\prime$$-isoxazol]-4-one. IUCrData 3(7):180962. https:\/\/doi.org\/10.1107\/S2414314618009628","journal-title":"IUCrData"},{"issue":"6","key":"692_CR42","doi-asserted-by":"publisher","first-page":"623","DOI":"10.1515\/pac-2021-2009","volume":"94","author":"RM Hanson","year":"2022","unstructured":"Hanson RM, Jeannerat D, Archibald M, Bruno IJ, Chalk SJ, Davies AN, Lancashire RJ, Lang J, Rzepa HS (2022) IUPAC specification for the FAIR management of spectroscopic data in chemistry (IUPAC FAIRSpec)\u2014guiding principles. Pure Appl Chem 94(6):623\u2013636. https:\/\/doi.org\/10.1515\/pac-2021-2009","journal-title":"Pure Appl Chem"},{"issue":"16","key":"692_CR43","doi-asserted-by":"publisher","first-page":"3331","DOI":"10.1021\/jm020891g","volume":"45","author":"BA \u0160olaja","year":"2002","unstructured":"\u0160olaja BA, Terzi\u0107 N, Pocsfalvi G, Gerena L, Tinant B, Opsenica D, Milhous WK (2002) Mixed steroidal 1,2,4,5-tetraoxanes: antimalarial and antimycobacterial activity. J Med Chem 45(16):3331\u20133336. https:\/\/doi.org\/10.1021\/jm020891g","journal-title":"J Med Chem"},{"key":"692_CR44","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-017-0220-4","author":"EL Willighagen","year":"2017","unstructured":"Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Chert\u00f3 M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C (2017) The chemistry development kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminformatics. https:\/\/doi.org\/10.1186\/s13321-017-0220-4","journal-title":"J Cheminformatics"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00692-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00692-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00692-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T03:07:03Z","timestamp":1677121623000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00692-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,23]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["692"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00692-1","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,23]]},"assertion":[{"value":"17 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 February 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"25"}}