{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:30:03Z","timestamp":1772166603878,"version":"3.50.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,7,29]],"date-time":"2024-07-29T00:00:00Z","timestamp":1722211200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,29]],"date-time":"2024-07-29T00:00:00Z","timestamp":1722211200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Intramural research program of the NCATS, NIH"},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret, and the \u2018\u2018landscape\u2019\u2019 on the map is prone to \u2018\u2018rearrangement\u2019\u2019 when embedding different sets of compounds.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this study we present the Hilbert-Curve Assisted Space Embedding (HCASE) method which was designed to create maps by organizing structures according to a logic familiar to medicinal chemists. First, a chemical space is created with the help of a set of \u2018\u2018reference scaffolds\u2019\u2019. These scaffolds are sorted according to the medicinal chemistry inspired Scaffold-Key algorithm found in prior art. Next, the ordered scaffolds are mapped to a line which is folded into a higher dimensional (here: 2D) space. The intricately folded line is referred to as a pseudo-Hilbert-Curve. The embedding of a compound happens by locating its most similar reference scaffold in the pseudo-Hilbert-Curve and assuming the respective position. Through a series of experiments, we demonstrate the properties of the maps generated by the HCASE method. Subjects of embeddings were compounds of the DrugBank and CANVASS libraries, and the chemical spaces were defined by scaffolds extracted from the ChEMBL database.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Scientific contribution<\/jats:title>\n                    <jats:p>The novelty of HCASE method lies in generating robust and intuitive chemical space embeddings that are reflective of a medicinal chemist\u2019s reasoning, and the precedential use of space filling (Hilbert) curve in the process.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability<\/jats:title>\n                    <jats:p>\n                      <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ncats\/hcase\">https:\/\/github.com\/ncats\/hcase<\/jats:ext-link>\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Graphical Abstract<\/jats:title>\n                  <\/jats:sec>","DOI":"10.1186\/s13321-024-00850-z","type":"journal-article","created":{"date-parts":[[2024,7,29]],"date-time":"2024-07-29T06:03:44Z","timestamp":1722233024000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Hilbert-curve assisted structure embedding method"],"prefix":"10.1186","volume":"16","author":[{"given":"Gergely","family":"Zahor\u00e1nszky-K\u0151halmi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kanny K.","family":"Wan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexander G.","family":"Godfrey","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,7,29]]},"reference":[{"key":"850_CR1","doi-asserted-by":"publisher","DOI":"10.1037\/h0071325","author":"H Hotelling","year":"1933","unstructured":"Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ. https:\/\/doi.org\/10.1037\/h0071325","journal-title":"J Educ"},{"key":"850_CR2","first-page":"399","volume":"5","author":"M Quist","year":"2004","unstructured":"Quist M, Yona G (2004) Distributional scaling: an algorithm for structure-preserving embedding of metric and nonmetric spaces. J Mach Learn Res 5:399\u2013420","journal-title":"J Mach Learn Res"},{"key":"850_CR3","unstructured":"L. van der Maaten, \u201cLearning a Parametric Embedding by Preserving Local Structure,\u201d in Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, D. van Dyk and M. Welling, Eds., in Proceedings of Machine Learning Research, vol. 5. Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA: PMLR, 2009, pp. 384\u2013391."},{"key":"850_CR4","unstructured":"J. M. Leland McInnes, John Healy. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction."},{"key":"850_CR5","volume-title":"Artificial Neural Networks","author":"T Kohonen","year":"1991","unstructured":"Kohonen T (1991) Self-organizing maps ophmization approaches. In: Kohonen T, M\u00e4kisara K, Simula O, Kangas J (eds) Artificial Neural Networks. North-Holland, Amsterdam"},{"issue":"5500","key":"850_CR6","doi-asserted-by":"publisher","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","volume":"290","author":"JB Tenenbaum","year":"2000","unstructured":"Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319\u20132323. https:\/\/doi.org\/10.1126\/science.290.5500.2319","journal-title":"Science"},{"key":"850_CR7","unstructured":"Distill: How to Use t-SNE Effectively. https:\/\/distill.pub\/2016\/misread-tsne\/ (Accessed 03 Mar, 2022)."},{"issue":"9","key":"850_CR8","doi-asserted-by":"publisher","first-page":"959","DOI":"10.1517\/17460441.2015.1060216","volume":"10","author":"DI Osolodkin","year":"2015","unstructured":"Osolodkin DI, Radchenko EV, Orlov AA, Voronkov AE, Palyulin VA, Zefirov NS (2015) Progress in visual representations of chemical space. Expert Opin Drug Discov 10(9):959\u2013973. https:\/\/doi.org\/10.1517\/17460441.2015.1060216","journal-title":"Expert Opin Drug Discov"},{"issue":"2","key":"850_CR9","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1021\/cc0000388","volume":"3","author":"TI Oprea","year":"2001","unstructured":"Oprea TI, Gottfries J (2001) Chemography: the art of navigating in chemical space. J Comb Chem 3(2):157\u2013166. https:\/\/doi.org\/10.1021\/cc0000388","journal-title":"J Comb Chem"},{"issue":"11","key":"850_CR10","doi-asserted-by":"publisher","first-page":"1803","DOI":"10.1002\/cmdc.200900317","volume":"4","author":"KT Nguyen","year":"2009","unstructured":"Nguyen KT, Blum LC, van Deursen R, Reymond J-L (2009) Classification of organic molecules by molecular quantum numbers. ChemMedChem 4(11):1803\u20131805. https:\/\/doi.org\/10.1002\/cmdc.200900317","journal-title":"ChemMedChem"},{"key":"850_CR11","unstructured":"J. Velkoborsk\u00fd. Hierarchical visualization of the chemical space Master\u2019s. Charles University. Prague, Czech Republic."},{"issue":"6","key":"850_CR12","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1002\/cmdc.201700561","volume":"13","author":"A Lin","year":"2018","unstructured":"Lin A, Horvath D, Afonina V, Marcou G, Reymond J-L, Varnek A (2018) Mapping of the available chemical space versus the chemical universe of lead-like compounds. ChemMedChem 13(6):540\u2013554. https:\/\/doi.org\/10.1002\/cmdc.201700561","journal-title":"ChemMedChem"},{"key":"850_CR13","doi-asserted-by":"publisher","first-page":"510","DOI":"10.3389\/fchem.2019.00510","volume":"7","author":"JJ Naveja","year":"2019","unstructured":"Naveja JJ, Medina-Franco JL (2019) Finding constellations in chemical space through core analysis. Front Chem 7:510. https:\/\/doi.org\/10.3389\/fchem.2019.00510","journal-title":"Front Chem"},{"issue":"1","key":"850_CR14","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1186\/s13321-020-0416-x","volume":"12","author":"D Probst","year":"2020","unstructured":"Probst D, Reymond J-L (2020) Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminform 12(1):12. https:\/\/doi.org\/10.1186\/s13321-020-0416-x","journal-title":"J Cheminform"},{"issue":"15","key":"850_CR15","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","volume":"39","author":"GW Bemis","year":"1996","unstructured":"Bemis GW, Murcko MA (1996) The properties of known drugs. 1. molecular frameworks. J Med Chem 39(15):2887\u20132893. https:\/\/doi.org\/10.1021\/jm9602928","journal-title":"J Med Chem"},{"key":"850_CR16","doi-asserted-by":"crossref","unstructured":"D. Hilbert. (1935). \u00dcber die stetige Abbildung einer Linie auf ein Fl\u00e4chenst\u00fcck in Dritter Band: Analysis\u00b7Grundlagen der Mathematik\u00b7Physik Verschiedenes. Springer. Berlin","DOI":"10.1007\/978-3-662-38452-7_1"},{"key":"850_CR17","unstructured":"G. Sanderson. Hilbert\u2019s Curve: Is infinite math useful?\u201d https:\/\/www.youtube.com\/watch?v=3s7h2MHQtxc&t=798s"},{"issue":"1","key":"850_CR18","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1109\/69.908985","volume":"13","author":"B Moon","year":"2001","unstructured":"Moon B, Jagadish HV, Faloutsos C, Saltz JH (2001) Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans Knowl Data Eng 13(1):124\u2013141. https:\/\/doi.org\/10.1109\/69.908985","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"6","key":"850_CR19","doi-asserted-by":"publisher","first-page":"1617","DOI":"10.1021\/ci5001983","volume":"54","author":"P Ertl","year":"2014","unstructured":"Ertl P (2014) Intuitive ordering of scaffolds and Scaffold Similarity Searching Using Scaffold Keys. J Chem Inf Model 54(6):1617\u20131622. https:\/\/doi.org\/10.1021\/ci5001983","journal-title":"J Chem Inf Model"},{"key":"850_CR20","unstructured":"Python Library: Hilbert-Curve. https:\/\/pypi.org\/project\/hilbertcurve\/"},{"key":"850_CR21","unstructured":"Hilbert-Curve Implementation Details. https:\/\/stackoverflow.com\/questions\/499166\/mapping-n-dimensional-value-to-a-point-on-hilbert-curve"},{"issue":"1","key":"850_CR22","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1186\/1758-2946-5-7","volume":"5","author":"S Heller","year":"2013","unstructured":"Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I (2013) InChI\u2014the worldwide chemical structure identifier standard. J Cheminform 5(1):7. https:\/\/doi.org\/10.1186\/1758-2946-5-7","journal-title":"J Cheminform"},{"key":"850_CR23","unstructured":"\u201cHilbert-Curve Assisted Space Embedding (HCASE) Method Source Code Repository.\u201d https:\/\/github.com\/ncats\/hcase"},{"key":"850_CR24","unstructured":"Michael R. Fabian Dill and Thomas R. 2007 Gabriel and Tobias K\\\"{o}tter and Thorsten Meinl and Peter Ohl and Christoph Sieb and Kilian Thiel and Bernd Wiswedel, Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007) Springer. Berlin"},{"issue":"2","key":"850_CR25","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1021\/ci025584y","volume":"43","author":"C Steinbeck","year":"2003","unstructured":"Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an open-source Java library for Chemo\u2014and Bioinformatics. J Chem Inf Comput Sci 43(2):493\u2013500. https:\/\/doi.org\/10.1021\/ci025584y","journal-title":"J Chem Inf Comput Sci"},{"key":"850_CR26","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-017-0220-4","author":"EL Willighagen","year":"2017","unstructured":"Willighagen EL et al (2017) The chemistry development kit (CDK) v20: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform. https:\/\/doi.org\/10.1186\/s13321-017-0220-4","journal-title":"J Cheminform"},{"key":"850_CR27","unstructured":"The Chemistry Development Kit (CDK). https:\/\/github.com\/cdk\/cdk"},{"key":"850_CR28","unstructured":"CDK Nodes for KNIME. https:\/\/www.knime.com\/community\/cdk"},{"key":"850_CR29","unstructured":"Greg Landrum. RDKit: Open-source cheminformatics.\u201d http:\/\/www.rdkit.org\/ (Accessed 24 Feb 2018)."},{"key":"850_CR30","unstructured":"\u201cRDKit Nodes for KNIME.\u201d https:\/\/www.knime.com\/nodeguide\/community\/rdkit"},{"key":"850_CR31","unstructured":"\u201cChemAxon Ltd., Marvin Suite. Molecules were depicted with ChemAxon\u2019s MarvinSketch 16.12.12.\u201d http:\/\/www.chemaxon.com"},{"issue":"2","key":"850_CR32","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J Chem Doc 5(2):107\u2013113. https:\/\/doi.org\/10.1021\/c160017a018","journal-title":"J Chem Doc"},{"key":"850_CR33","unstructured":"T. T. Tanimoto. (1957) BM Internal Report."},{"key":"850_CR34","first-page":"547","volume":"37","author":"P Jaccard","year":"1901","unstructured":"Jaccard P (1901) \u00c9tude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles 37:547\u2013579","journal-title":"Bulletin de la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles"},{"key":"850_CR35","doi-asserted-by":"publisher","DOI":"10.1017\/9780511811487","volume-title":"Modern mathematical methods for physicists and engineers","author":"CD Cantrell","year":"2000","unstructured":"Cantrell CD (2000) Modern mathematical methods for physicists and engineers. Cambridge University Press, Cambridge"},{"key":"850_CR36","unstructured":"J. R. Hurst and T. W. Heritage. (1996) Molecular Hologram QSAR."},{"issue":"6","key":"850_CR37","doi-asserted-by":"publisher","first-page":"983","DOI":"10.1021\/ci9800211","volume":"38","author":"P Willett","year":"1998","unstructured":"Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Model 38(6):983\u2013996. https:\/\/doi.org\/10.1021\/ci9800211","journal-title":"J Chem Inf Model"},{"issue":"1","key":"850_CR38","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-015-0069-3","volume":"7","author":"D Bajusz","year":"2015","unstructured":"Bajusz D, R\u00e1cz A, H\u00e9berger K (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform 7(1):20. https:\/\/doi.org\/10.1186\/s13321-015-0069-3","journal-title":"J Cheminform"},{"issue":"1","key":"850_CR39","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1186\/s13321-016-0127-5","volume":"8","author":"G Zahor\u00e1nszky-K\u0151halmi","year":"2016","unstructured":"Zahor\u00e1nszky-K\u0151halmi G, Bologa CG, Oprea TI (2016) Impact of similarity threshold on the topology of molecular similarity networks and clustering outcomes. J Cheminform 8(1):16. https:\/\/doi.org\/10.1186\/s13321-016-0127-5","journal-title":"J Cheminform"},{"issue":"D1","key":"850_CR40","doi-asserted-by":"publisher","first-page":"D1074","DOI":"10.1093\/nar\/gkx1037","volume":"46","author":"DS Wishart","year":"2018","unstructured":"Wishart DS et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46(D1):D1074\u2013D1082. https:\/\/doi.org\/10.1093\/nar\/gkx1037","journal-title":"Nucleic Acids Res"},{"issue":"12","key":"850_CR41","doi-asserted-by":"publisher","first-page":"1727","DOI":"10.1021\/acscentsci.8b00747","volume":"4","author":"SE Kearney","year":"2018","unstructured":"Kearney SE et al (2018) Canvass: a crowd-sourced, natural-product screening library for exploring biological space. ACS Cent Sci 4(12):1727\u20131741. https:\/\/doi.org\/10.1021\/acscentsci.8b00747","journal-title":"ACS Cent Sci"},{"key":"850_CR42","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkt1031","author":"AP Bento","year":"2014","unstructured":"Bento AP et al (2014) The ChEMBL bioactivity database: an update. Nucl Acids Res. https:\/\/doi.org\/10.1093\/nar\/gkt1031"},{"key":"850_CR43","unstructured":"SmartGraph Backend Source Code Repository. [https:\/\/github.com\/ncats\/smartgraph_backend\/tree\/master\/knime_workflow]"},{"issue":"1","key":"850_CR44","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/s13321-020-0409-9","volume":"12","author":"G Zahor\u00e1nszky-K\u0151halmi","year":"2020","unstructured":"Zahor\u00e1nszky-K\u0151halmi G, Sheils T, Oprea TI (2020) SmartGraph: a network pharmacology investigation platform. J Cheminform 12(1):5. https:\/\/doi.org\/10.1186\/s13321-020-0409-9","journal-title":"J Cheminform"},{"key":"850_CR45","unstructured":"L. van der Maaten. Source code repository of t-SNE.\u201d https:\/\/lvdmaaten.github.io\/tsne\/ (Accessed 03 Mar 2022)."},{"key":"850_CR46","unstructured":"Suggestion by Reviewer 2."},{"key":"850_CR47","volume-title":"Statistics (international student edition). Pisani, R. Purves","author":"D Freedman","year":"2007","unstructured":"Freedman D, Pisani R, Purves R (2007) Statistics (international student edition). Pisani, R. Purves, 4th edn. WW Norton & Company, New York","edition":"4"},{"key":"850_CR48","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1093\/biomet\/30.1-2.81","volume":"30","author":"M Kendall","year":"1938","unstructured":"Kendall M (1938) A new measure of rank correlation. Biometrika 30:81\u201389. https:\/\/doi.org\/10.1093\/biomet\/30.1-2.81","journal-title":"Biometrika"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00850-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-024-00850-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00850-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,29]],"date-time":"2024-07-29T06:08:03Z","timestamp":1722233283000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-024-00850-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,29]]},"references-count":48,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["850"],"URL":"https:\/\/doi.org\/10.1186\/s13321-024-00850-z","relation":{"references":[{"id-type":"uri","id":"","asserted-by":"subject"}],"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv.11911296.v1","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,29]]},"assertion":[{"value":"3 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 July 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The source code of the HCASE method, Jupyter notebooks of the experiments, input and output files can be found at\n                      \n                      .","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Availability of data and materials"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}],"article-number":"87"}}