{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T10:17:21Z","timestamp":1777889841479,"version":"3.51.4"},"reference-count":57,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T00:00:00Z","timestamp":1693180800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T00:00:00Z","timestamp":1693180800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005713","name":"Technische Universit\u00e4t M\u00fcnchen","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005713","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Decision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure\u2013activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of gradient boosting exist, the most popular being XGBoost, LightGBM and CatBoost. Our study provides the first comprehensive comparison of these approaches for QSAR. To this end, we trained 157,590 gradient boosting models, which were evaluated on 16 datasets and 94 endpoints, comprising 1.4 million compounds in total. Our results show that XGBoost generally achieves the best predictive performance, while LightGBM requires the least training time, especially for larger datasets. In terms of feature importance, the models surprisingly rank molecular features differently, reflecting differences in regularization techniques and decision tree structures. Thus, expert knowledge must always be employed when evaluating data-driven explanations of bioactivity. Furthermore, our results show that the relevance of each hyperparameter varies greatly across datasets and that it is crucial to optimize as many hyperparameters as possible to maximize the predictive performance. In conclusion, our study provides the first set of guidelines for cheminformatics practitioners to effectively train, optimize and evaluate gradient boosting models for virtual screening and QSAR applications.<\/jats:p>\n                <jats:p><jats:bold>Graphical abstract<\/jats:bold><\/jats:p>","DOI":"10.1186\/s13321-023-00743-7","type":"journal-article","created":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T05:01:41Z","timestamp":1693198901000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":114,"title":["Practical guidelines for the use of gradient boosting for molecular property prediction"],"prefix":"10.1186","volume":"15","author":[{"given":"Davide","family":"Boldini","sequence":"first","affiliation":[]},{"given":"Francesca","family":"Grisoni","sequence":"additional","affiliation":[]},{"given":"Daniel","family":"Kuhn","sequence":"additional","affiliation":[]},{"given":"Lukas","family":"Friedrich","sequence":"additional","affiliation":[]},{"given":"Stephan A.","family":"Sieber","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,8,28]]},"reference":[{"issue":"1","key":"743_CR1","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1186\/s13321-022-00590-y","volume":"14","author":"A Keshavarzi Arshadi","year":"2022","unstructured":"Keshavarzi Arshadi A, Salem M, Firouzbakht A, Yuan JS (2022) MolData, a molecular benchmark for disease and target based machine learning. J Cheminf 14(1):10. https:\/\/doi.org\/10.1186\/s13321-022-00590-y","journal-title":"J Cheminf"},{"issue":"8","key":"743_CR2","doi-asserted-by":"publisher","first-page":"3370","DOI":"10.1021\/acs.jcim.9b00237","volume":"59","author":"K Yang","year":"2019","unstructured":"Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M, Palmer A, Settels V, Jaakkola T, Jensen K, Barzilay R (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59(8):3370\u20133388. https:\/\/doi.org\/10.1021\/acs.jcim.9b00237","journal-title":"J Chem Inf Model"},{"key":"743_CR3","doi-asserted-by":"publisher","DOI":"10.1002\/minf.202100113","author":"S Aleksi\u0107","year":"2021","unstructured":"Aleksi\u0107 S, Seeliger D, Brown JB (2021) ADMET Predictability at Boehringer Ingelheim: state-of-the-art, and do bigger datasets or algorithms make a difference? Mol Inform. https:\/\/doi.org\/10.1002\/minf.202100113","journal-title":"Mol Inform"},{"issue":"24","key":"743_CR4","doi-asserted-by":"publisher","first-page":"5441","DOI":"10.1039\/C8SC00148K","volume":"9","author":"A Mayr","year":"2018","unstructured":"Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H, Clevert D-A, Hochreiter S (2018) Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem Sci 9(24):5441\u20135451. https:\/\/doi.org\/10.1039\/C8SC00148K","journal-title":"Chem Sci"},{"issue":"9\u201310","key":"743_CR5","doi-asserted-by":"publisher","first-page":"1800041","DOI":"10.1002\/minf.201800041","volume":"37","author":"H Chen","year":"2018","unstructured":"Chen H, Kogej T, Engkvist O (2018) Cheminformatics in drug discovery, an industrial perspective. Mol Inform 37(9\u201310):1800041. https:\/\/doi.org\/10.1002\/minf.201800041","journal-title":"Mol Inform"},{"issue":"1","key":"743_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-019-0407-y","volume":"12","author":"M Withnall","year":"2020","unstructured":"Withnall M, Lindel\u00f6f E, Engkvist O, Chen H (2020) Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction. J Cheminf 12(1):1. https:\/\/doi.org\/10.1186\/s13321-019-0407-y","journal-title":"J Cheminf"},{"issue":"1","key":"743_CR7","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1186\/s13065-021-00737-2","volume":"15","author":"MVS Santana","year":"2021","unstructured":"Santana MVS, De S-J (2021) Novo design and bioactivity prediction of sars-cov-2 main protease inhibitors using recurrent neural network-based transfer learning. BMC Chem 15(1):8. https:\/\/doi.org\/10.1186\/s13065-021-00737-2","journal-title":"BMC Chem"},{"key":"743_CR8","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.1c00683","author":"VO Gawriljuk","year":"2021","unstructured":"Gawriljuk VO, Zin PPK, Puhl AC, Zorn KM, Foil DH, Lane TR, Hurst B, Tavella TA, Costa FTM, Lakshmanane P, Bernatchez J, Godoy AS, Oliva G, Siqueira-Neto JL, Madrid PB, Ekins S (2021) Machine learning models identify inhibitors of SARS-CoV-2. J Chem Inf Model. https:\/\/doi.org\/10.1021\/acs.jcim.1c00683","journal-title":"J Chem Inf Model"},{"issue":"4","key":"743_CR9","doi-asserted-by":"publisher","first-page":"688","DOI":"10.1016\/j.cell.2020.01.021","volume":"180","author":"JM Stokes","year":"2020","unstructured":"Stokes JM, Yang K, Swanson K, Jin W, Cubillos-Ruiz A, Donghia NM, MacNair CR, French S, Carfrae LA, Bloom-Ackermann Z, Tran VM, Chiappino-Pepe A, Badran AH, Andrews IW, Chory EJ, Church GM, Brown ED, Jaakkola TS, Barzilay R, Collins JJ (2020) A deep learning approach to antibiotic discovery. Cell 180(4):688-702.e13. https:\/\/doi.org\/10.1016\/j.cell.2020.01.021","journal-title":"Cell"},{"issue":"2","key":"743_CR10","doi-asserted-by":"publisher","first-page":"653","DOI":"10.1021\/acs.jcim.0c01164","volume":"61","author":"S Jain","year":"2021","unstructured":"Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV (2021) Large-scale modeling of multispecies acute toxicity end points using consensus of multitask deep learning methods. J Chem Inf Model 61(2):653\u2013663. https:\/\/doi.org\/10.1021\/acs.jcim.0c01164","journal-title":"J Chem Inf Model"},{"issue":"1","key":"743_CR11","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1186\/s13321-022-00611-w","volume":"14","author":"M Walter","year":"2022","unstructured":"Walter M, Allen LN, de la Vega de Le\u00f3n A, Webb SJ, Gillet VJ (2022) Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction. J Cheminf 14(1):32. https:\/\/doi.org\/10.1186\/s13321-022-00611-w","journal-title":"J Cheminf"},{"key":"743_CR12","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.9b00633","author":"J Zhang","year":"2019","unstructured":"Zhang J, Mucs D, Norinder U, Svensson F (2019) LightGBM: an effective and scalable algorithm for prediction of chemical toxicity-application to the tox21 and mutagenicity data sets. J Chem Inf Model. https:\/\/doi.org\/10.1021\/acs.jcim.9b00633","journal-title":"J Chem Inf Model"},{"issue":"5","key":"743_CR13","doi-asserted-by":"publisher","first-page":"1839","DOI":"10.1021\/acs.jcim.8b00794","volume":"59","author":"F Grisoni","year":"2019","unstructured":"Grisoni F, Consonni V, Ballabio D (2019) Machine learning consensus to predict the binding to the androgen receptor within the CoMPARA project. J Chem Inf Model 59(5):1839\u20131848. https:\/\/doi.org\/10.1021\/acs.jcim.8b00794","journal-title":"J Chem Inf Model"},{"key":"743_CR14","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkab255","author":"G Xiong","year":"2021","unstructured":"Xiong G, Wu Z, Yi J, Fu L, Yang Z, Hsieh C, Yin M, Zeng X, Wu C, Lu A, Chen X, Hou T, Cao D (2021) ADMETlab 20: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res. https:\/\/doi.org\/10.1093\/nar\/gkab255","journal-title":"Nucleic Acids Res"},{"issue":"16","key":"743_CR15","doi-asserted-by":"publisher","first-page":"8705","DOI":"10.1021\/acs.jmedchem.0c00385","volume":"63","author":"KV Chuang","year":"2020","unstructured":"Chuang KV, Gunsalus LM, Keiser MJ (2020) Learning molecular representations for medicinal chemistry: miniperspective. J Med Chem 63(16):8705\u20138722. https:\/\/doi.org\/10.1021\/acs.jmedchem.0c00385","journal-title":"J Med Chem"},{"issue":"1","key":"743_CR16","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1186\/s13321-020-00479-8","volume":"13","author":"D Jiang","year":"2021","unstructured":"Jiang D, Wu Z, Hsieh C-Y, Chen G, Liao B, Wang Z, Shen C, Cao D, Wu J, Hou T (2021) Could Graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models. J Cheminf 13(1):12. https:\/\/doi.org\/10.1186\/s13321-020-00479-8","journal-title":"J Cheminf"},{"issue":"6","key":"743_CR17","doi-asserted-by":"publisher","first-page":"1692","DOI":"10.1039\/C8SC04175J","volume":"10","author":"R Winter","year":"2019","unstructured":"Winter R, Montanari F, No\u00e9 F, Clevert D-A (2019) Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 10(6):1692\u20131701. https:\/\/doi.org\/10.1039\/C8SC04175J","journal-title":"Chem Sci"},{"issue":"2","key":"743_CR18","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/s11749-016-0481-7","volume":"25","author":"G Biau","year":"2016","unstructured":"Biau G, Scornet E (2016) A random forest guided tour. TEST 25(2):197\u2013227. https:\/\/doi.org\/10.1007\/s11749-016-0481-7","journal-title":"TEST"},{"issue":"1","key":"743_CR19","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1038\/s42256-019-0138-9","volume":"2","author":"SM Lundberg","year":"2020","unstructured":"Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2(1):56\u201367. https:\/\/doi.org\/10.1038\/s42256-019-0138-9","journal-title":"Nat Mach Intell"},{"issue":"3","key":"743_CR20","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273\u2013297. https:\/\/doi.org\/10.1007\/BF00994018","journal-title":"Mach Learn"},{"key":"743_CR21","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1016\/j.neucom.2019.10.118","volume":"408","author":"J Cervantes","year":"2020","unstructured":"Cervantes J, Garcia-Lamont F, Rodr\u00edguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications. Chall Trends Neurocomp 408:189\u2013215. https:\/\/doi.org\/10.1016\/j.neucom.2019.10.118","journal-title":"Chall Trends Neurocomp"},{"key":"743_CR22","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1016\/j.inffus.2021.11.011","volume":"81","author":"R Shwartz-Ziv","year":"2022","unstructured":"Shwartz-Ziv R, Armon A (2022) Tabular data: deep learning is not all you need. Inf Fusion 81:84\u201390. https:\/\/doi.org\/10.1016\/j.inffus.2021.11.011","journal-title":"Inf Fusion"},{"issue":"3","key":"743_CR23","doi-asserted-by":"publisher","first-page":"1937","DOI":"10.1007\/s10462-020-09896-5","volume":"54","author":"C Bent\u00e9jac","year":"2021","unstructured":"Bent\u00e9jac C, Cs\u00f6rg\u0151 A, Mart\u00ednez-Mu\u00f1oz G (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54(3):1937\u20131967. https:\/\/doi.org\/10.1007\/s10462-020-09896-5","journal-title":"Artif Intell Rev"},{"issue":"W1","key":"743_CR24","doi-asserted-by":"publisher","first-page":"W174","DOI":"10.1093\/nar\/gkab438","volume":"49","author":"S Zheng","year":"2021","unstructured":"Zheng S, Aldahdooh J, Shadbahr T, Wang Y, Aldahdooh D, Bao J, Wang W, Tang J (2021) Drugcomb update: a more comprehensive drug sensitivity data repository and analysis portal. Nucleic Acids Res 49(W1):W174\u2013W184. https:\/\/doi.org\/10.1093\/nar\/gkab438","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"743_CR25","doi-asserted-by":"publisher","first-page":"18040","DOI":"10.1038\/s41598-020-74921-0","volume":"10","author":"Y Zhu","year":"2020","unstructured":"Zhu Y, Brettin T, Evrard YA, Partin A, Xia F, Shukla M, Yoo H, Doroshow JH, Stevens RL (2020) Ensemble transfer learning for the prediction of anti-cancer drug response. Sci Rep 10(1):18040. https:\/\/doi.org\/10.1038\/s41598-020-74921-0","journal-title":"Sci Rep"},{"issue":"2","key":"743_CR26","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1007\/s12539-021-00488-7","volume":"14","author":"Y Zhang","year":"2022","unstructured":"Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B (2022) Deepstack-DTIs: predicting drug-target interactions using LightGBM feature selection and deep-stacked ensemble classifier. Interdiscip Sci Comput Life Sci 14(2):311\u2013330. https:\/\/doi.org\/10.1007\/s12539-021-00488-7","journal-title":"Interdiscip Sci Comput Life Sci"},{"issue":"2","key":"743_CR27","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1039\/C7SC02664A","volume":"9","author":"Z Wu","year":"2018","unstructured":"Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513\u2013530. https:\/\/doi.org\/10.1039\/C7SC02664A","journal-title":"Chem Sci"},{"issue":"12","key":"743_CR28","doi-asserted-by":"publisher","first-page":"6007","DOI":"10.1021\/acs.jcim.0c00884","volume":"60","author":"VB Siramshetty","year":"2020","unstructured":"Siramshetty VB, Nguyen D-T, Martinez NJ, Southall NT, Simeonov A, Zakharov AV (2020) Critical analysis. J Chem Inf Model 60(12):6007\u20136019. https:\/\/doi.org\/10.1021\/acs.jcim.0c00884","journal-title":"J Chem Inf Model"},{"issue":"1","key":"743_CR29","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1186\/s13321-022-00657-w","volume":"14","author":"D Boldini","year":"2022","unstructured":"Boldini D, Friedrich L, Kuhn D, Sieber SA (2022) Tuning gradient boosting for imbalanced bioassay modelling with custom loss functions. J Cheminf 14(1):80. https:\/\/doi.org\/10.1186\/s13321-022-00657-w","journal-title":"J Cheminf"},{"issue":"23","key":"743_CR30","doi-asserted-by":"publisher","first-page":"5938","DOI":"10.1021\/acs.jcim.2c01073","volume":"62","author":"D van Tilborg","year":"2022","unstructured":"van Tilborg D, Alenicheva A, Grisoni F (2022) Exposing the limitations of molecular machine learning with activity cliffs. J Chem Inf Model 62(23):5938\u20135951. https:\/\/doi.org\/10.1021\/acs.jcim.2c01073","journal-title":"J Chem Inf Model"},{"key":"743_CR31","doi-asserted-by":"publisher","unstructured":"Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM San Francisco California USA. 2016. https:\/\/doi.org\/10.1145\/2939672.2939785","DOI":"10.1145\/2939672.2939785"},{"key":"743_CR32","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1706.09516","author":"G Ke","year":"2017","unstructured":"Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) LightGBM: a highly efficient gradient boosting decision tree in advances in neural information processing systems. Curran Assoc. https:\/\/doi.org\/10.48550\/arXiv.1706.09516","journal-title":"Curran Assoc."},{"key":"743_CR33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1706.09516","author":"L Prokhorenkova","year":"2018","unstructured":"Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) CatBoost: unbiased boosting with categorical features. Adv Neural Inf Process Sys. https:\/\/doi.org\/10.48550\/arXiv.1706.09516","journal-title":"Adv Neural Inf Process Sys"},{"issue":"6","key":"743_CR34","doi-asserted-by":"publisher","first-page":"2623","DOI":"10.1021\/acs.jcim.1c00160","volume":"61","author":"C Esposito","year":"2021","unstructured":"Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S (2021) GHOST: adjusting the decision threshold to handle imbalanced data in machine learning. J Chem Inf Model 61(6):2623\u20132640. https:\/\/doi.org\/10.1021\/acs.jcim.1c00160","journal-title":"J Chem Inf Model"},{"issue":"5","key":"743_CR35","doi-asserted-by":"publisher","first-page":"2091","DOI":"10.1021\/jm5019093","volume":"58","author":"JL Dahlin","year":"2015","unstructured":"Dahlin JL, Nissink JWM, Strasser JM, Francis S, Higgins L, Zhou H, Zhang Z, Walters MA (2015) PAINS in the assay: chemical mechanisms of assay interference and promiscuous enzymatic inhibition observed during a sulfhydryl-scavenging HTS. J Med Chem 58(5):2091\u20132113. https:\/\/doi.org\/10.1021\/jm5019093","journal-title":"J Med Chem"},{"key":"743_CR36","doi-asserted-by":"publisher","DOI":"10.1201\/9781315139470","volume-title":"Classification and regression trees","author":"L Breiman","year":"2017","unstructured":"Breiman L (2017) Classification and regression trees. Routledge, New York"},{"issue":"5","key":"743_CR37","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.1214\/aos\/1013203451","volume":"29","author":"JH Friedman","year":"2001","unstructured":"Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189\u20131232. https:\/\/doi.org\/10.1214\/aos\/1013203451","journal-title":"Ann Stat"},{"key":"743_CR38","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1201.0490","author":"F Pedregosa","year":"2012","unstructured":"Pedregosa F (2012) Scikit-learn: machine learning in python. Mach Learn. https:\/\/doi.org\/10.48550\/arXiv.1201.0490","journal-title":"Mach Learn"},{"key":"743_CR39","unstructured":"XGBoost Documentation\u2014xgboost 1.6.2 documentation. https:\/\/xgboost.readthedocs.io\/en\/stable\/. Accessed 31 Aug 2022"},{"key":"743_CR40","unstructured":"Welcome to LightGBM\u2019s documentation!\u2014LightGBM 3.3.2 documentation. https:\/\/lightgbm.readthedocs.io\/en\/v3.3.2\/. Accessed 31 Aug 2022"},{"key":"743_CR41","doi-asserted-by":"publisher","DOI":"10.1002\/9783527613106","author":"R Todeschini","year":"2000","unstructured":"Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Methods Princ Med Chem. https:\/\/doi.org\/10.1002\/9783527613106","journal-title":"Methods Princ Med Chem"},{"key":"743_CR42","unstructured":"CatBoost - state-of-the-art open-source gradient boosting library with categorical features support. https:\/\/catboost.ai. Accessed 31 Aug 2022"},{"key":"743_CR43","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2206.05608","author":"A Ustimenko","year":"2022","unstructured":"Ustimenko A, Beliakov A, Prokhorenkova L (2022) Gradient boosting performs gaussian process inference. ArXiv. https:\/\/doi.org\/10.48550\/arXiv.2206.05608","journal-title":"ArXiv"},{"key":"743_CR44","unstructured":"Ustimenko, A.; Prokhorenkova, L. SGLB: Stochastic Gradient Langevin Boosting. http:\/\/arxiv.org\/abs\/2001.07248. Accessed 20 May 2022."},{"key":"743_CR45","unstructured":"Sharchilev, B.; Ustinovsky, Y.; Serdyukov, P.; de Rijke, M. Finding Influential Training Samples for Gradient Boosted Decision Trees. arXiv March 12, 2018. http:\/\/arxiv.org\/abs\/1802.06640 Accessed 29 Jul 2022"},{"issue":"3","key":"743_CR46","doi-asserted-by":"publisher","first-page":"1269","DOI":"10.1021\/acs.jcim.8b00542","volume":"59","author":"I Cort\u00e9s-Ciriano","year":"2019","unstructured":"Cort\u00e9s-Ciriano I, Bender A (2019) Deep confidence: a computationally efficient framework for calculating reliable prediction errors for deep neural networks. J Chem Inf Model 59(3):1269\u20131281. https:\/\/doi.org\/10.1021\/acs.jcim.8b00542","journal-title":"J Chem Inf Model"},{"issue":"3","key":"743_CR47","doi-asserted-by":"publisher","first-page":"652","DOI":"10.1002\/bimj.201800148","volume":"61","author":"G Fu","year":"2019","unstructured":"Fu G, Yi L, Pan J (2019) Tuning model parameters in class-imbalanced learning with precision-recall curve. Biom J 61(3):652\u2013664. https:\/\/doi.org\/10.1002\/bimj.201800148","journal-title":"Biom J"},{"key":"743_CR48","unstructured":"Feng Y, Zhou M, Tong X Imbalanced classification: a paradigm-based review. http:\/\/arxiv.org\/abs\/2002.04592. Accessed 10 Oct 2022"},{"issue":"293","key":"743_CR49","doi-asserted-by":"publisher","first-page":"52","DOI":"10.2307\/2282330","volume":"56","author":"OJ Dunn","year":"1961","unstructured":"Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52\u201364. https:\/\/doi.org\/10.2307\/2282330","journal-title":"J Am Stat Assoc"},{"key":"743_CR50","unstructured":"RDKit. https:\/\/www.rdkit.org\/. Accessed 09 May 2021"},{"issue":"1","key":"743_CR51","doi-asserted-by":"publisher","first-page":"014008","DOI":"10.1088\/1749-4699\/8\/1\/014008","volume":"8","author":"J Bergstra","year":"2015","unstructured":"Bergstra J, Komer B, Eliasmith C, Yamins D, Cox DD (2015) Hyperopt: a python library for model selection and hyperparameter optimization. Comput Sci Discov 8(1):014008. https:\/\/doi.org\/10.1088\/1749-4699\/8\/1\/014008","journal-title":"Comput Sci Discov"},{"issue":"10","key":"743_CR52","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1038\/s42256-020-00236-4","volume":"2","author":"J Jim\u00e9nez-Luna","year":"2020","unstructured":"Jim\u00e9nez-Luna J, Grisoni F, Schneider G (2020) Drug discovery with explainable artificial intelligence. Nat Mach Intell 2(10):573\u2013584. https:\/\/doi.org\/10.1038\/s42256-020-00236-4","journal-title":"Nat Mach Intell"},{"key":"743_CR53","volume-title":"Contributions to the theory of games (AM-28)","author":"L Shapley","year":"1953","unstructured":"Shapley L (1953) A value for n-person games. In: Kuhn HW, Tucker A (eds) Contributions to the theory of games (AM-28). Princeton University Press, Princeton"},{"issue":"4","key":"743_CR54","doi-asserted-by":"publisher","first-page":"1324","DOI":"10.1021\/acs.jcim.8b00825","volume":"59","author":"RP Sheridan","year":"2019","unstructured":"Sheridan RP (2019) Interpretation of QSAR models by coloring atoms according to changes in predicted activity: how robust is it? J Chem Inf Model 59(4):1324\u20131337. https:\/\/doi.org\/10.1021\/acs.jcim.8b00825","journal-title":"J Chem Inf Model"},{"key":"743_CR55","unstructured":"Hutter F, Hoos H, Leyton-Brown K (2014) An Efficient Approach for Assessing Hyperparameter Importance. In Proceedings of the 31st International Conference on International Conference on Machine Learning. ICML\u201914; JMLR.org: Beijing, China. 32:I-754\u2013I-762. https:\/\/dl.acm.org\/doi\/10.5555\/3044805.3044891"},{"key":"743_CR56","doi-asserted-by":"publisher","DOI":"10.1021\/ci010132r","author":"JL Durant","year":"2002","unstructured":"Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Model. https:\/\/doi.org\/10.1021\/ci010132r","journal-title":"J Chem Inf Model"},{"issue":"9","key":"743_CR57","doi-asserted-by":"publisher","first-page":"1702","DOI":"10.1016\/j.drudis.2020.07.001","volume":"25","author":"AH G\u00f6ller","year":"2020","unstructured":"G\u00f6ller AH, Kuhnke L, Montanari F, Bonin A, Schneckener S, ter Laak A, Wichard J, Lobell M, Hillisch A (2020) Bayer\u2019s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov Today 25(9):1702\u20131709. https:\/\/doi.org\/10.1016\/j.drudis.2020.07.001","journal-title":"Drug Discov Today"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00743-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00743-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00743-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T16:18:59Z","timestamp":1700497139000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00743-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,28]]},"references-count":57,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["743"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00743-7","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,28]]},"assertion":[{"value":"31 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 August 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 August 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"73"}}