{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T11:31:37Z","timestamp":1780486297880,"version":"3.54.1"},"reference-count":62,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T00:00:00Z","timestamp":1733184000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T00:00:00Z","timestamp":1733184000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"UK EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling","award":["EP\/L015803\/1"],"award-info":[{"award-number":["EP\/L015803\/1"]}]},{"name":"Lhasa Limited"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Extended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of graph pooling methods. In contrast, sets of detected ECFP substructures are by default transformed into bit vectors using only a simple hash-based folding procedure. We introduce a general mathematical framework for the vectorisation of structural fingerprints via a formal operation called substructure pooling that encompasses hash-based folding, algorithmic substructure selection, and a wide variety of other potential techniques. We go on to describe <jats:italic>Sort &amp; Slice<\/jats:italic>, an easy-to-implement and bit-collision-free alternative to hash-based folding for the pooling of ECFP substructures. Sort &amp; Slice first sorts ECFP substructures according to their relative prevalence in a given set of training compounds and then slices away all but the <jats:italic>L<\/jats:italic> most frequent substructures which are subsequently used to generate a binary fingerprint of desired length, <jats:italic>L<\/jats:italic>. We computationally compare the performance of hash-based folding, Sort &amp; Slice, and two advanced supervised substructure-selection schemes (filtering and mutual-information maximisation) for ECFP-based molecular property prediction. Our results indicate that, despite its technical simplicity, Sort &amp; Slice robustly (and at times substantially) outperforms traditional hash-based folding as well as the other investigated substructure-pooling methods across distinct prediction tasks, data splitting techniques, machine-learning models and ECFP hyperparameters. We thus recommend that Sort &amp; Slice canonically replace hash-based folding as the default substructure-pooling technique to vectorise ECFPs for supervised molecular machine learning. <\/jats:p><jats:p><jats:bold>Scientific contribution<\/jats:bold><\/jats:p><jats:p>A general mathematical framework for the vectorisation of structural fingerprints called <jats:italic>substructure pooling<\/jats:italic>; and the technical description and computational evaluation of <jats:italic>Sort &amp; Slice<\/jats:italic>, a conceptually simple and bit-collision-free method for the pooling of ECFP substructures that robustly and markedly outperforms classical hash-based folding at molecular property prediction.<\/jats:p>","DOI":"10.1186\/s13321-024-00932-y","type":"journal-article","created":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T12:17:09Z","timestamp":1733228229000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Sort &amp; Slice: a simple and superior alternative to hash-based folding for extended-connectivity fingerprints"],"prefix":"10.1186","volume":"16","author":[{"given":"Markus","family":"Dablander","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thierry","family":"Hanser","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Renaud","family":"Lambiotte","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Garrett M.","family":"Morris","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,12,3]]},"reference":[{"issue":"5","key":"932_CR1","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"D Rogers","year":"2010","unstructured":"Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742\u2013754","journal-title":"J Chem Inf Model"},{"issue":"2","key":"932_CR2","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The generation of a unique machine description for chemical structures\u2014a technique developed at chemical abstracts service. J Chem Doc 5(2):107\u2013113","journal-title":"J Chem Doc"},{"issue":"1","key":"932_CR3","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1186\/1758-2946-5-26","volume":"5","author":"S Riniker","year":"2013","unstructured":"Riniker S, Landrum G (2013) Open-source platform to benchmark fingerprints for ligand-based virtual screening. J Cheminf 5(1):26","journal-title":"J Cheminf"},{"key":"932_CR4","first-page":"2224","volume":"28","author":"DK Duvenaud","year":"2015","unstructured":"Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. Adv Neural Inf Process Syst 28:2224\u20132232","journal-title":"Adv Neural Inf Process Syst"},{"issue":"7","key":"932_CR5","doi-asserted-by":"publisher","first-page":"731","DOI":"10.1007\/s10822-020-00310-4","volume":"34","author":"HE Webel","year":"2020","unstructured":"Webel HE, Kimber TB, Radetzki S, Neuenschwander M, Nazar\u00e9 M, Volkamer A (2020) Revealing cytotoxic substructures in molecules using deep learning. J Comput Aided Mol Des 34(7):731\u2013746","journal-title":"J Comput Aided Mol Des"},{"issue":"10","key":"932_CR6","doi-asserted-by":"publisher","first-page":"2647","DOI":"10.1021\/ci500361u","volume":"54","author":"J Alvarsson","year":"2014","unstructured":"Alvarsson J, Eklund M, Engkvist O, Spjuth O, Carlsson L, Wikberg JES, Noeske T (2014) Ligand-based target prediction with signature fingerprints. J Chem Inf Model 54(10):2647\u20132653","journal-title":"J Chem Inf Model"},{"key":"932_CR7","unstructured":"Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: International conference on machine learning, PMLR, pp 1263\u20131272"},{"issue":"104","key":"932_CR8","first-page":"197","volume":"130","author":"T Stepi\u0161nik","year":"2021","unstructured":"Stepi\u0161nik T, \u0160krlj B, Wicker J, Kocev D (2021) A comprehensive comparison of molecular feature representations for use in predictive modeling. Comput Biol Med 130(104):197","journal-title":"Comput Biol Med"},{"issue":"24","key":"932_CR9","doi-asserted-by":"publisher","first-page":"5441","DOI":"10.1039\/C8SC00148K","volume":"9","author":"A Mayr","year":"2018","unstructured":"Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H, Clevert DA, Hochreiter S (2018) Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem Sci 9(24):5441\u20135451","journal-title":"Chem Sci"},{"issue":"2","key":"932_CR10","doi-asserted-by":"publisher","first-page":"664","DOI":"10.1021\/acs.jcim.0c01208","volume":"61","author":"J Menke","year":"2021","unstructured":"Menke J, Koch O (2021) Using domain-specific fingerprints generated through neural networks to enhance ligand-based virtual screening. J Chem Inf Model 61(2):664\u2013675","journal-title":"J Chem Inf Model"},{"key":"932_CR11","unstructured":"Chithrananda S, Grand G, Ramsundar B (2020) ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885"},{"issue":"6","key":"932_CR12","doi-asserted-by":"publisher","first-page":"1692","DOI":"10.1039\/C8SC04175J","volume":"10","author":"R Winter","year":"2019","unstructured":"Winter R, Montanari F, No\u00e9 F, Clevert DA (2019) Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 10(6):1692\u20131701","journal-title":"Chem Sci"},{"issue":"1","key":"932_CR13","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1186\/s13321-023-00708-w","volume":"15","author":"M Dablander","year":"2023","unstructured":"Dablander M, Hanser T, Lambiotte R, Morris GM (2023) Exploring QSAR models for activity-cliff prediction. J Cheminf 15(1):47","journal-title":"J Cheminf"},{"key":"932_CR14","doi-asserted-by":"publisher","unstructured":"Dablander M, Hanser T, Lambiotte R, Morris GM (2021) Siamese neural networks work for activity cliff prediction [Poster presentation]. In: 4th RSC-BMCS\/RSC-CICAG artificial intelligence in chemistry symposium, virtual. https:\/\/doi.org\/10.13140\/RG.2.2.18137.60000. Accessed 28 Jan 2024","DOI":"10.13140\/RG.2.2.18137.60000"},{"issue":"1","key":"932_CR15","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. J Chem Inf Comput Sci 28(1):31\u201336","journal-title":"J Chem Inf Comput Sci"},{"issue":"1","key":"932_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-016-0173-z","volume":"8","author":"M G\u00fctlein","year":"2016","unstructured":"G\u00fctlein M, Kramer S (2016) Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability. J Cheminf 8(1):1\u201316","journal-title":"J Cheminf"},{"key":"932_CR17","unstructured":"Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint arXiv:1810.00826"},{"key":"932_CR18","unstructured":"Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907"},{"issue":"8","key":"932_CR19","doi-asserted-by":"publisher","first-page":"3370","DOI":"10.1021\/acs.jcim.9b00237","volume":"59","author":"K Yang","year":"2019","unstructured":"Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59(8):3370\u20133388","journal-title":"J Chem Inf Model"},{"issue":"1","key":"932_CR20","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/TNNLS.2020.2978386","volume":"32","author":"Z Wu","year":"2020","unstructured":"Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32(1):4\u201324","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"932_CR21","doi-asserted-by":"publisher","DOI":"10.1016\/j.ddtec.2020.11.009","author":"O Wieder","year":"2020","unstructured":"Wieder O, Kohlbacher S, Kuenemann M, Garon A, Ducrot P, Seidel T, Langer T (2020) A compact review of molecular property prediction with graph neural networks. Drug Discov Today Technol. https:\/\/doi.org\/10.1016\/j.ddtec.2020.11.009","journal-title":"Drug Discov Today Technol"},{"issue":"14","key":"932_CR22","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.3390\/ijms20143389","volume":"20","author":"K Liu","year":"2019","unstructured":"Liu K, Sun X, Jia L, Ma J, Xing H, Wu J, Gao H, Sun Y, Boulnois F, Fan J (2019) Chemi-Net: a molecular graph convolutional network for accurate drug property prediction. Int J Mol Sci 20(14):3389","journal-title":"Int J Mol Sci"},{"key":"932_CR23","doi-asserted-by":"crossref","unstructured":"Navarin N, Van Tran D, Sperduti A (2019) Universal readout for graph convolutional neural networks. In: Proceedings of international joint conference on neural networks (IJCNN), pp 1\u20137","DOI":"10.1109\/IJCNN.2019.8852103"},{"key":"932_CR24","unstructured":"Cangea C, Veli\u010dkovi\u0107 P, Jovanovi\u0107 N, Kipf T, Li\u00f2 P (2018) Towards sparse hierarchical graph classifiers. arXiv preprint arXiv:1811.01287"},{"key":"932_CR25","unstructured":"Lee J, Lee I, Kang J (2019) Self-attention graph pooling. In: International conference on machine learning, PMLR, pp 3734\u20133743"},{"key":"932_CR26","doi-asserted-by":"crossref","unstructured":"Ranjan E, Sanyal S, Talukdar P (2020) Asap: Adaptive structure aware pooling for learning hierarchical graph representations. In: Proceedings of the AAAI conference on artificial intelligence, vol 34. pp 5470\u20135477","DOI":"10.1609\/aaai.v34i04.5997"},{"key":"932_CR27","first-page":"16,421","volume":"33","author":"Z Ma","year":"2020","unstructured":"Ma Z, Xuan J, Wang YG, Li M, Li\u00f2 P (2020) Path integral based convolution and pooling for graph neural networks. Adv Neural Inf Process Syst 33:16,421-16,433","journal-title":"Adv Neural Inf Process Syst"},{"issue":"46","key":"932_CR28","doi-asserted-by":"publisher","first-page":"18,193","DOI":"10.1021\/acs.est.3c02198","volume":"57","author":"S Zhong","year":"2023","unstructured":"Zhong S, Guan X (2023) Count-based Morgan fingerprint: A more efficient and interpretable molecular representation in developing machine learning-based predictive regression models for water contaminants\u2019 activities and properties. Environ Sci Technol 57(46):18,193-18,202","journal-title":"Environ Sci Technol"},{"key":"932_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-020-3378-0","volume":"21","author":"S Harada","year":"2020","unstructured":"Harada S, Akita H, Tsubaki M, Baba Y, Takigawa I, Yamanishi Y, Kashima H (2020) Dual graph convolutional neural network for predicting chemical networks. BMC Bioinf 21:1\u201313","journal-title":"BMC Bioinf"},{"issue":"1","key":"932_CR30","doi-asserted-by":"publisher","first-page":"1186","DOI":"10.1038\/s41467-022-28857-w","volume":"13","author":"UV Ucak","year":"2022","unstructured":"Ucak UV, Ashyrmamatov I, Ko J, Lee J (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nat Commun 13(1):1186","journal-title":"Nat Commun"},{"issue":"1","key":"932_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00445-4","volume":"12","author":"A Capecchi","year":"2020","unstructured":"Capecchi A, Probst D, Reymond JL (2020) One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. J Cheminf 12(1):1\u201315","journal-title":"J Cheminf"},{"issue":"38","key":"932_CR32","doi-asserted-by":"publisher","first-page":"10,378","DOI":"10.1039\/D0SC03115A","volume":"11","author":"T Le","year":"2020","unstructured":"Le T, Winter R, No\u00e9 F, Clevert DA (2020) Neuraldecipher\u2013reverse-engineering extended-connectivity fingerprints (ECFPs) to their molecular structures. Chem Sci 11(38):10,378-10,389","journal-title":"Chem Sci"},{"key":"932_CR33","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/j.ddtec.2020.05.001","volume":"32","author":"J Shen","year":"2019","unstructured":"Shen J, Nicolaou CA (2019) Molecular property prediction: recent trends in the era of artificial intelligence. Drug Discov Today Technol 32:29\u201336","journal-title":"Drug Discov Today Technol"},{"key":"932_CR34","doi-asserted-by":"publisher","unstructured":"Tripp A, Bacallado S, Singh S, Hern\u00e1ndez-Lobato JM (2024) Tanimoto random features for scalable molecular machine learning. Adv Neural Inf Process Syst (NeurIPS 2023) 37:33656\u201333686. https:\/\/doi.org\/10.48550\/arXiv.2306.14809","DOI":"10.48550\/arXiv.2306.14809"},{"key":"932_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-018-0321-8","volume":"10","author":"D Probst","year":"2018","unstructured":"Probst D, Reymond JL (2018) A probabilistic molecular fingerprint for big data settings. J Cheminf 10:1\u201312","journal-title":"J Cheminf"},{"issue":"3","key":"932_CR36","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","volume":"27","author":"CE Shannon","year":"1948","unstructured":"Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379\u2013423","journal-title":"Bell Syst Tech J"},{"issue":"1","key":"932_CR37","first-page":"12","volume":"2","author":"TM Cover","year":"1991","unstructured":"Cover TM, Thomas JA et al (1991) Entropy, relative entropy and mutual information. Elem Inf Theory 2(1):12\u201313","journal-title":"Elem Inf Theory"},{"key":"932_CR38","unstructured":"MacDougall T (2022) Reduced collision fingerprints and pairwise molecular comparisons for explainable property prediction using deep learning. M.Sc. thesis, Universit\u00e9 de Montr\u00e9al, https:\/\/hdl.handle.net\/1866\/26533, accessed on 05.10.2023"},{"key":"932_CR39","unstructured":"Sayle R (1997) 1st-class SMARTS patterns. In: EuroMUG 97, https:\/\/www.daylight.com\/meetings\/emug97\/Sayle, accessed on 28.01.2024"},{"issue":"6","key":"932_CR40","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.1021\/ci010132r","volume":"42","author":"JL Durant","year":"2002","unstructured":"Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42(6):1273\u20131280","journal-title":"J Chem Inf Comput Sci"},{"issue":"2","key":"932_CR41","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1039\/C7SC02664A","volume":"9","author":"Z Wu","year":"2018","unstructured":"Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513\u2013530","journal-title":"Chem Sci"},{"issue":"7825","key":"932_CR42","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1038\/s41586-020-2649-2","volume":"585","author":"CR Harris","year":"2020","unstructured":"Harris CR, Millman KJ, Van Der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ et al (2020) Array programming with NumPy. Nature 585(7825):357\u2013362","journal-title":"Nature"},{"key":"932_CR43","unstructured":"Landrum G (2006) RDKit: open-source cheminformatics. http:\/\/www.rdkit.org. Accessed on 05 October 2023"},{"issue":"302","key":"932_CR44","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1080\/14786440009463897","volume":"50","author":"K Pearson","year":"1900","unstructured":"Pearson K (1900) On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond Edinburgh Dublin Philos Mag J Sci 50(302):157\u2013175","journal-title":"Lond Edinburgh Dublin Philos Mag J Sci"},{"key":"932_CR45","unstructured":"Zaheer M, Kottur S, Ravanbakhsh S, Poczos B, Salakhutdinov RR, Smola AJ (2017) Deep sets. In: Advances in neural information processing systems, vol 30"},{"issue":"1","key":"932_CR46","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1038\/s41597-019-0151-1","volume":"6","author":"MC Sorkun","year":"2019","unstructured":"Sorkun MC, Khetan A, Er S (2019) AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Sci Data 6(1):143","journal-title":"Sci Data"},{"issue":"9","key":"932_CR47","doi-asserted-by":"publisher","first-page":"2077","DOI":"10.1021\/ci900161g","volume":"49","author":"K Hansen","year":"2009","unstructured":"Hansen K, Mika S, Schroeter T, Sutter A, Ter Laak A, Steger-Hartmann T, Heinrich N, Muller KR (2009) Benchmark data set for in silico prediction of Ames mutagenicity. J Chem Inf Model 49(9):2077\u20132081","journal-title":"J Chem Inf Model"},{"key":"932_CR48","unstructured":"COVID Moonshot Consortium, Achdout H, Aimon A, Alonzi DS, Arbon R, Bar-David E, Barr H, Ben-Shmuel A, Bennett J, Bilenko VA et al (2020) Open science discovery of potent non-covalent SARS-CoV-2 main protease inhibitors. BioRxiv pp 2020\u20132010"},{"issue":"9","key":"932_CR49","doi-asserted-by":"publisher","first-page":"4263","DOI":"10.1021\/acs.jcim.0c00155","volume":"60","author":"VK Tran-Nguyen","year":"2020","unstructured":"Tran-Nguyen VK, Jacquemard C, Rognan D (2020) LIT-PCBA: an unbiased data set for machine learning and virtual screening. J Chem Inf Model 60(9):4263\u20134273","journal-title":"J Chem Inf Model"},{"issue":"1","key":"932_CR50","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00456-1","volume":"12","author":"AP Bento","year":"2020","unstructured":"Bento AP, Hersey A, F\u00e9lix E, Landrum G, Gaulton A, Atkinson F, Bellis LJ, de Veij M, Leach AR (2020) An open source chemical structure curation pipeline using RDKit. J Cheminf 12(1):1\u201316","journal-title":"J Cheminf"},{"issue":"15","key":"932_CR51","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","volume":"39","author":"GW Bemis","year":"1996","unstructured":"Bemis GW, Murcko MA (1996) The properties of known drugs: molecular frameworks. J Med Chem 39(15):2887\u20132893","journal-title":"J Med Chem"},{"key":"932_CR52","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"932_CR53","doi-asserted-by":"publisher","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S  (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library\u201d, Adv Neural Inf Process Syst (NeurIPS 2019), 33:8026 \u2013 8037. https:\/\/doi.org\/10.48550\/arXiv.1912.01703","DOI":"10.48550\/arXiv.1912.01703"},{"key":"932_CR54","unstructured":"Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of machine learning research, pp 448\u2013456"},{"issue":"1","key":"932_CR55","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929\u20131958","journal-title":"J Mach Learn Res"},{"key":"932_CR56","unstructured":"Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101"},{"issue":"1","key":"932_CR57","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1038\/s41598-018-35457-6","volume":"8","author":"E Tokunaga","year":"2018","unstructured":"Tokunaga E, Yamamoto T, Ito E, Shibata N (2018) Understanding the thalidomide chirality in biological processes by the self-disproportionation of enantiomers. Sci Rep 8(1):17\u2013131","journal-title":"Sci Rep"},{"key":"932_CR58","doi-asserted-by":"publisher","DOI":"10.1002\/9783527628766","volume-title":"Molecular descriptors for chemoinformatics","author":"R Todeschini","year":"2009","unstructured":"Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics, vol 41. Wiley, New York"},{"issue":"5","key":"932_CR59","doi-asserted-by":"publisher","first-page":"363","DOI":"10.2174\/1386207003331454","volume":"3","author":"L Xue","year":"2000","unstructured":"Xue L, Bajorath J (2000) Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. Comb Chem High Throughput Screen 3(5):363\u2013372","journal-title":"Comb Chem High Throughput Screen"},{"issue":"7","key":"932_CR60","doi-asserted-by":"publisher","first-page":"1337","DOI":"10.1021\/ci800038f","volume":"48","author":"H Hong","year":"2008","unstructured":"Hong H, Xie Q, Ge W, Qian F, Fang H, Shi L, Su Z, Perkins R, Tong W (2008) $$\\text{ Mold}^2$$, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J Chem Inf Model 48(7):1337\u20131344","journal-title":"J Chem Inf Model"},{"issue":"5","key":"932_CR61","doi-asserted-by":"publisher","first-page":"2745","DOI":"10.1109\/TIT.2011.2179702","volume":"58","author":"Z Zhang","year":"2011","unstructured":"Zhang Z, Zhang X (2011) A normal law for the plug-in estimator of entropy. IEEE Trans Inf Theory 58(5):2745\u20132747","journal-title":"IEEE Trans Inf Theory"},{"issue":"9","key":"932_CR62","first-page":"1531","volume":"5","author":"F Fleuret","year":"2004","unstructured":"Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Research 5(9):1531\u20131555","journal-title":"J Mach Learn Research"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00932-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-024-00932-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00932-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T13:04:34Z","timestamp":1733231074000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-024-00932-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,3]]},"references-count":62,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["932"],"URL":"https:\/\/doi.org\/10.1186\/s13321-024-00932-y","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,3]]},"assertion":[{"value":"4 March 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 November 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 December 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"135"}}