{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T13:55:04Z","timestamp":1773928504925,"version":"3.50.1"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T00:00:00Z","timestamp":1753056000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T00:00:00Z","timestamp":1753056000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>This study, focusing on predicting Absorption, Distribution, Metabolism, Excretion, and Toxicology (ADMET) properties, addresses the key challenges of ML models trained using ligand-based representations. We propose a structured approach to data feature selection, taking a step beyond the conventional practice of combining different representations without systematic reasoning. Additionally, we enhance model evaluation methods by integrating cross-validation with statistical hypothesis testing, adding a layer of reliability to the model assessments. Our final evaluations include a practical scenario, where models trained on one source of data are evaluated on a different one. This approach aims to bolster the reliability of ADMET predictions, providing more dependable and informative model evaluations.<\/jats:p>\n                  <jats:p>\n                    <jats:bold>Scientific contribution<\/jats:bold>\n                  <\/jats:p>\n                  <jats:p>This study provided a structured approach to feature selection. We improve model evaluation by combining cross-validation with statistical hypothesis testing, making results more reliable. The methodology used in our study can be generalized beyond feature selection, boosting the confidence in selected models which is crucial in a noisy domain such as the ADMET prediction tasks. Additionally, we assess how well models trained on one dataset perform on another, offering practical insights for using external data in drug discovery.<\/jats:p>","DOI":"10.1186\/s13321-025-01041-0","type":"journal-article","created":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T16:37:50Z","timestamp":1753115870000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Benchmarking ML in ADMET predictions: the practical impact of feature representations in ligand-based models"],"prefix":"10.1186","volume":"17","author":[{"given":"Gintautas","family":"Kamuntavi\u010dius","sequence":"first","affiliation":[]},{"given":"Tanya","family":"Paquet","sequence":"additional","affiliation":[]},{"given":"Orestis","family":"Bastas","sequence":"additional","affiliation":[]},{"given":"Dainius","family":"\u0160alkauskas","sequence":"additional","affiliation":[]},{"given":"Alvaro","family":"Prat","sequence":"additional","affiliation":[]},{"given":"Hisham Abdel","family":"Aty","sequence":"additional","affiliation":[]},{"given":"Aurimas","family":"Pabrinkis","sequence":"additional","affiliation":[]},{"given":"Povilas","family":"Norvai\u0161as","sequence":"additional","affiliation":[]},{"given":"Roy","family":"Tal","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,21]]},"reference":[{"issue":"10","key":"1041_CR1","doi-asserted-by":"publisher","first-page":"1033","DOI":"10.1038\/s41589-022-01131-2","volume":"18","author":"K Huang","year":"2022","unstructured":"Huang K, Tianfan F, Gao W, Zhao Y, Roohani Y, Leskovec J, Coley CW, Xiao C, Sun J, Zitnik M (2022) Artificial intelligence foundation for therapeutic science. Nat Chem Biol 18(10):1033\u20131036","journal-title":"Nat Chem Biol"},{"key":"1041_CR2","unstructured":"Notwell J (2023) Maplight tdc: Source code for maplight\u2019s therapeutics data commons (tdc) admet benchmark group submission. https:\/\/github.com\/maplightrx\/MapLight-TDC"},{"key":"1041_CR3","unstructured":"Turon G, Duran-Frigola, M (2022) Zairachem: automated ml modelling for chemistry datasets, 11"},{"key":"1041_CR4","doi-asserted-by":"publisher","DOI":"10.26434\/chemrxiv-2022-zz776","author":"D Huang","year":"2022","unstructured":"Huang D, Chowdhuri S, Li A, Agrawal A, Gano K, Zhu A (2022) A unified system for molecular property predictions: Oloren chemengine and its applications. ChemRxiv Preprint. https:\/\/doi.org\/10.26434\/chemrxiv-2022-zz776","journal-title":"ChemRxiv Preprint"},{"key":"1041_CR5","unstructured":"Chilingaryan G, Tamoyan H, Tevosyan A, Babayan N, Khondkaryan L, Hambardzumyan K, Navoyan Z, Khachatrian H, Aghajanyan A (2022) Bartsmiles: Generative masked language models for molecular representations. arXiv preprintarXiv:2211.16349"},{"issue":"12","key":"1041_CR6","doi-asserted-by":"publisher","first-page":"1256","DOI":"10.1038\/s42256-022-00580-7","volume":"4","author":"J Ross","year":"2022","unstructured":"Ross J, Belgodere B, Chenthamarakshan V, Padhi I, Mroueh Y, Das P (2022) Large-scale chemical language representations capture molecular structure and properties. Nat Mach Intell 4(12):1256\u20131264","journal-title":"Nat Mach Intell"},{"key":"1041_CR7","first-page":"12559","volume":"33","author":"Yu Rong","year":"2020","unstructured":"Rong Yu, Bian Y, Tingyang X, Xie W, Wei Y, Huang W, Huang J (2020) Self-supervised graph transformer on large-scale molecular data. Adv Neural Inf Process Syst 33:12559\u201312571","journal-title":"Adv Neural Inf Process Syst"},{"issue":"1","key":"1041_CR8","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1021\/acs.jcim.7b00616","volume":"58","author":"S Jaeger","year":"2018","unstructured":"Jaeger S, Fulle S, Turk S (2018) Mol2vec: unsupervised machine learning approach with chemical intuition. J Chem Inf Model 58(1):27\u201335","journal-title":"J Chem Inf Model"},{"issue":"11","key":"1041_CR9","doi-asserted-by":"publisher","first-page":"1549","DOI":"10.1021\/acs.jcim.3c00160","volume":"63","author":"C Fang","year":"2023","unstructured":"Fang C, Wang Y, Grater R, Kapadnis S, Black C, Trapa P, Sciabola S (2023) Prospective validation of machine learning algorithms for absorption, distribution, metabolism, and excretion prediction: an industrial perspective. J Chem Inf Model 63(11):1549\u20139596","journal-title":"J Chem Inf Model"},{"key":"1041_CR10","unstructured":"Green J, Diaz CC, Jakobs MAH, Dimitracopoulos A, van\u00a0der Wilk M, Greenhalgh RD (2023) Current methods for drug property prediction in the real world. arXiv preprint. arXiv:2309.17161"},{"issue":"1","key":"1041_CR11","doi-asserted-by":"publisher","first-page":"6395","DOI":"10.1038\/s41467-023-41948-6","volume":"14","author":"J Deng","year":"2023","unstructured":"Deng J, Yang Z, Wang H, Ojima I, Samaras D, Wang F (2023) A systematic study of key elements underlying molecular property prediction. Nat Commun 14(1):6395","journal-title":"Nat Commun"},{"issue":"1","key":"1041_CR12","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1021\/acs.molpharmaceut.0c01013","volume":"18","author":"TR Lane","year":"2020","unstructured":"Lane TR, Foil DH, Minerali E, Urbina F, Zorn KM, Ekins S (2020) Bioactivity comparison across multiple machine learning algorithms using over 5000 datasets for drug discovery. Mol Pharm 18(1):403\u2013415","journal-title":"Mol Pharm"},{"issue":"13","key":"1041_CR13","doi-asserted-by":"publisher","first-page":"4127","DOI":"10.1016\/j.bmc.2011.05.005","volume":"19","author":"R Guha","year":"2011","unstructured":"Guha R, Dexheimer TS, Kestranek AN, Jadhav A, Chervenak AM, Ford MG, Simeonov A, Roth GP, Thomas CJ (2011) Exploratory analysis of kinetic solubility measurements of a small molecule library. Bioorg Med Chem 19(13):4127\u20134134","journal-title":"Bioorg Med Chem"},{"issue":"1","key":"1041_CR14","first-page":"1758","volume":"12","author":"A Patrcia Bento","year":"2020","unstructured":"Patrcia Bento A, Hersey A, Flix E, Landrum G, Gaulton A, Atkinson F, Bellis LJ, De Veij M, Leach AR (2020) An open source chemical structure curation pipeline using RDKit. J Cheminform 12(1):1758\u20132946","journal-title":"J Cheminform"},{"issue":"2","key":"1041_CR15","doi-asserted-by":"publisher","first-page":"460","DOI":"10.1021\/ci500588j","volume":"55","author":"T Sander","year":"2015","unstructured":"Sander T, Freyss J, von Korff M, Rufener C (2015) Datawarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model 55(2):460\u2013473","journal-title":"J Chem Inf Model"},{"issue":"1\u20132","key":"1041_CR16","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1081\/DMR-120001392","volume":"34","author":"S Rendic","year":"2002","unstructured":"Rendic S (2002) Summary of information on human CYP enzymes: human p450 metabolism data. Drug Metab Rev 34(1\u20132):83\u2013448","journal-title":"Drug Metab Rev"},{"issue":"15","key":"1041_CR17","doi-asserted-by":"publisher","first-page":"697","DOI":"10.1016\/j.drudis.2011.05.007","volume":"16","author":"M Chen","year":"2011","unstructured":"Chen M, Vijay V, Shi Q, Liu Z, Fang H, Tong W (2011) FDA-approved drug labeling for the study of drug-induced liver injury. Drug Discov Today 16(15):697\u2013703","journal-title":"Drug Discov Today"},{"issue":"3","key":"1041_CR18","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1023\/A:1022627411411","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273\u2013297","journal-title":"Mach Learn"},{"key":"1041_CR19","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45:5\u201332","journal-title":"Mach Learn"},{"key":"1041_CR20","first-page":"3146","volume":"30","author":"G Ke","year":"2017","unstructured":"Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:3146\u20133154","journal-title":"Adv Neural Inf Process Syst"},{"key":"1041_CR21","unstructured":"Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) Catboost: unbiased boosting with categorical features. Adv Neural Inform Process Syst, 31"},{"issue":"1","key":"1041_CR22","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1021\/acs.jcim.3c01250","volume":"64","author":"E Heid","year":"2023","unstructured":"Heid E, Greenman KP, Chung Y, Li S-C, Graff DE, Vermeire FH, Wu H, Green WH, McGill CJ (2023) Chemprop: a machine learning package for chemical property prediction. J Chem Inf Model 64(1):9\u201317","journal-title":"J Chem Inf Model"},{"key":"1041_CR23","unstructured":"Rdkit: Open-source cheminformatics. https:\/\/www.rdkit.org"},{"issue":"5","key":"1041_CR24","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"D Rogers","year":"2010","unstructured":"Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742\u2013754","journal-title":"J Chem Inf Model"},{"key":"1041_CR25","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1021\/ci00046a002","volume":"25","author":"RE Carhart","year":"1985","unstructured":"Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 25:64\u201373","journal-title":"J Chem Inf Comput Sci"},{"issue":"5","key":"1041_CR26","doi-asserted-by":"publisher","first-page":"1924","DOI":"10.1021\/ci050413p","volume":"46","author":"P Gedeck","year":"2006","unstructured":"Gedeck P, Rohde B, Bartels C (2006) Qsar- how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J Chem Inf Model 46(5):1924\u20131936","journal-title":"J Chem Inf Model"},{"issue":"1","key":"1041_CR27","doi-asserted-by":"publisher","first-page":"208","DOI":"10.1021\/ci050457y","volume":"46","author":"N Stiefl","year":"2006","unstructured":"Stiefl N, Watson IA, Baumann K, Zaliani A (2006) ErG: 2D pharmacophore descriptions for scaffold hopping. J Chem Inf Model 46(1):208\u2013220","journal-title":"J Chem Inf Model"},{"issue":"4","key":"1041_CR28","first-page":"1758","volume":"10","author":"H Moriwaki","year":"2018","unstructured":"Moriwaki H, Tian Y-S, Kawashita N, Takagi T (2018) Mordred: a molecular descriptor calculator. J Cheminform 10(4):1758\u20132946","journal-title":"J Cheminform"},{"key":"1041_CR29","unstructured":"St. John P, Lin D, Binder P, Greaves M, Shah V, St. John J, Lange A, Hsu P, Illango R, Ramanathan A, Anandkumar A, Brookes DH, Busia A, Mahajan A, Malina S, Prasad N, Sinai S, Edwards L, Gaudelet T, Regepy C, Steinegger M, Rost B, Brace A, Hippe K, Naef L, Kamata K, Armstrong G, Boyd K, Cao Z, Chou H-Y, Chu S, dos Santos\u00a0Costa A, Darabi S, Dawson E, Didi K, C Fu, Geiger M, Gill M, Hsu D, Kaushik G, Korshunova M, S Kothen-Hill, Lee Y, Liu M, Livne M, Z McClure, Mitchell J, Moradzadeh A, Mosafi O, Nashed Y, Paliwal S, Peng Y, Rabhi S, Ramezanghorbani F, Reidenbach D, Ricketts C, Roland B, Shah K, Shimko T, Sirelkhatim H, Srinivasan S, Stern CA, Toczydlowska D, Veccham SP, Venanzi NAE, Vorontsov A, Wilber J, Wilkinson I, Wong WJ, Xue E, C Ye, X Yu, Zhang Y, Zhou G, Zandstein B, Dallago C, Trentini B, Kucukbenli E, Paliwal S, Rvachov T, Calleja E, Israeli J, Clifford H, Haukioja R, Haemel N, Tretina K, Tadimeti N, Costa AB (2024) Bionemo framework: a modular, high-performance library for ai model development in drug discovery. arXiv preprint. arXiv:2411.10548"},{"key":"1041_CR30","doi-asserted-by":"publisher","DOI":"10.1016\/j.jhazmat.2025.137575","volume":"489","author":"N Li","year":"2025","unstructured":"Li N, Chen Z, Zhang W, Li Y, Huang X, Li X (2025) Web server-based deep learning-driven predictive models for respiratory toxicity of environmental chemicals: mechanistic insights and interpretability. J Hazard Mater 489:137575","journal-title":"J Hazard Mater"},{"issue":"3","key":"1041_CR31","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0118432","volume":"10","author":"T Saito","year":"2015","unstructured":"Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432","journal-title":"PLoS ONE"},{"key":"1041_CR32","doi-asserted-by":"crossref","unstructured":"Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on Machine learning, pp 233\u2013240","DOI":"10.1145\/1143844.1143874"},{"key":"1041_CR33","doi-asserted-by":"publisher","DOI":"10.1016\/j.jhazmat.2024.134724","volume":"474","author":"Z Chen","year":"2024","unstructured":"Chen Z, Li N, Zhang P, Li Y, Li X (2024) Cardiodpi: an explainable deep-learning model for identifying cardiotoxic chemicals targeting herg, cav1. 2, and nav1. 5 channels. J Hazard Mater 474:134724","journal-title":"J Hazard Mater"},{"issue":"200","key":"1041_CR34","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1080\/01621459.1937.10503522","volume":"32","author":"M Friedman","year":"1937","unstructured":"Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675\u2013701","journal-title":"J Am Stat Assoc"},{"key":"1041_CR35","unstructured":"Nemenyi PB (1963) Distribution-free Multiple Comparisons. PhD thesis, Princeton University"},{"key":"1041_CR36","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1007\/978-1-4612-4380-9_16","volume-title":"Breakthroughs in statistics: methodology and distribution","author":"F Wilcoxon","year":"1992","unstructured":"Wilcoxon F (1992) Individual comparisons by ranking methods. In: Wilcoxon F (ed) Breakthroughs in statistics: methodology and distribution. Springer, New York, pp 196\u2013202"},{"issue":"1\/2","key":"1041_CR37","doi-asserted-by":"publisher","first-page":"151","DOI":"10.2307\/2332181","volume":"19","author":"Student","year":"1927","unstructured":"Student (1927) Errors of routine analysis. Biometrika 19(1\/2):151\u2013164","journal-title":"Biometrika"},{"issue":"11","key":"1041_CR38","doi-asserted-by":"publisher","first-page":"2200072","DOI":"10.1002\/minf.202200072","volume":"41","author":"P Kir\u00e1ly","year":"2022","unstructured":"Kir\u00e1ly P, Kiss R, Kov\u00e1cs D, Ballaj A, T\u00f3th G (2022) The relevance of goodness-of-fit, robustness and prediction validation categories of OECD-QSAR principles with respect to sample size and model type. Mol Inf 41(11):2200072","journal-title":"Mol Inf"},{"issue":"5","key":"1041_CR39","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0282924","volume":"18","author":"SJ Belfield","year":"2023","unstructured":"Belfield SJ, Cronin MTD, Enoch SJ, Firman JW (2023) Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS ONE 18(5):e0282924","journal-title":"PLoS ONE"},{"issue":"22","key":"1041_CR40","doi-asserted-by":"publisher","first-page":"8440","DOI":"10.1021\/acs.jcim.4c01186","volume":"64","author":"X Yuzhi","year":"2024","unstructured":"Yuzhi X, Liu X, Xia W, Ge J, Cheng-Wei J, Zhang H, Zhang JZH (2024) Chemxtree: a feature-enhanced graph neural network-neural decision tree framework for Admet prediction. J Chem Inf Model 64(22):8440\u20138452","journal-title":"J Chem Inf Model"},{"issue":"10","key":"1041_CR41","first-page":"281","volume":"13","author":"J Bergstra","year":"2012","unstructured":"Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(10):281\u2013305","journal-title":"J Mach Learn Res"},{"key":"1041_CR42","unstructured":"Chithrananda S, Grand G, Ramsundar B (2020) Chemberta: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint. arXiv:2010.09885"},{"key":"1041_CR43","doi-asserted-by":"publisher","DOI":"10.1016\/j.egyai.2022.100201","volume":"10","author":"X Han","year":"2022","unstructured":"Han X, Jia M, Chang Y, Li Y, Shaohua W (2022) Directed message passing neural network (D-MPNN) with graph edge attention (GEA) for property prediction of biofuel-relevant species. Energy AI 10:100201","journal-title":"Energy AI"},{"key":"1041_CR44","volume-title":"A unified approach to interpreting model predictions","author":"SM Lundberg","year":"2017","unstructured":"Lundberg SM, Lee S-I (2017) Advances in Neural Information Processing Systems. In: Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) A unified approach to interpreting model predictions, vol 30. Curran Associates, Inc., Red Hook"},{"key":"1041_CR45","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1016\/j.vascn.2016.10.006","volume":"84","author":"B Williamson","year":"2017","unstructured":"Williamson B, Wilson C, Dagnell G, Riley RJ (2017) Harmonised high throughput microsomal stability assay. J Pharmacol Toxicol Methods 84:31\u201336","journal-title":"J Pharmacol Toxicol Methods"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01041-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-01041-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01041-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T16:30:13Z","timestamp":1757262613000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-01041-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,21]]},"references-count":45,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1041"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-01041-0","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2023-x6tjr-v4","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv-2023-x6tjr-v5","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv-2023-x6tjr-v6","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv-2023-x6tjr-v7","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,21]]},"assertion":[{"value":"27 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 June 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"108"}}