{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T03:58:06Z","timestamp":1773719886611,"version":"3.50.1"},"reference-count":49,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T00:00:00Z","timestamp":1761782400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T00:00:00Z","timestamp":1761782400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["KO 4689\/6-1"],"award-info":[{"award-number":["KO 4689\/6-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["KO 4689\/5-1"],"award-info":[{"award-number":["KO 4689\/5-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Programa de Apoyo a Proyectos de Investigacion e Innovacion Tecnologica","award":["IA207225 and IA210023"],"award-info":[{"award-number":["IA207225 and IA210023"]}]},{"name":"Secretar\u00b4\u0131a de Ciencia, Humanidades, Tecnolog\u0131a e Innovacion de Mexico","award":["grant CBF-2025-I-3433"],"award-info":[{"award-number":["grant CBF-2025-I-3433"]}]},{"DOI":"10.13039\/501100004869","name":"Universit\u00e4t M\u00fcnster","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004869","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Machine learning models using protein-ligand interaction fingerprints show promise as target-specific scoring functions in drug discovery, but their performance critically depends on the underlying decoy selection strategies. Recognizing this critical role in model performance, various decoy selection strategies were analyzed to enhance machine learning models based on the Protein per Atom Score Contributions Derived Interaction Fingerprint (PADIF). We explored three distinct workflows for decoy selection: (1) random selection from extensive databases like ZINC15, (2) leveraging recurrent non-binders from high-throughput screening (HTS) assays stored as dark chemical matter, and (3) data augmentation by utilizing diverse conformations from docking results. Active molecules from ChEMBL, combined with these decoy approaches, were used to train and test different machine learning models based on PADIF. The final validation was done by confirming experimentally determined inactive compounds from the LIT-PCBA dataset. Our findings reveal that models trained with random selections from ZINC15 and compounds from dark chemical matter closely mimic the performance of those trained with actual non-binders, presenting viable alternatives for creating accurate models in the absence of specific inactivity data. Furthermore, all models showed an enhanced ability to explore new chemical spaces for their specific target and enhanced the top active compound selection over classical scoring functions, thereby boosting the screening power in molecular docking. These findings demonstrate that appropriate decoy selection strategies can maintain model accuracy while expanding applicability to targets even when lacking extensive experimental data.<\/jats:p>","DOI":"10.1186\/s13321-025-01107-z","type":"journal-article","created":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T13:04:24Z","timestamp":1761829464000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Efficient decoy selection to improve virtual screening using machine learning models"],"prefix":"10.1186","volume":"17","author":[{"given":"Felipe","family":"Victoria-Mu\u00f1oz","sequence":"first","affiliation":[]},{"given":"Janosch","family":"Menke","sequence":"additional","affiliation":[]},{"given":"Norberto","family":"Sanchez-Cruz","sequence":"additional","affiliation":[]},{"given":"Oliver","family":"Koch","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,10,30]]},"reference":[{"key":"1107_CR1","doi-asserted-by":"publisher","first-page":"507","DOI":"10.2174\/1568026620666201207095626","volume":"21","author":"VB Sulimov","year":"2020","unstructured":"Sulimov VB, Kutov DC, Taschilova AS, Ilin IS, Tyrtyshnikov EE, Sulimov AV (2020) Docking paradigm in drug design. Curr Top Med Chem 21:507\u2013546. https:\/\/doi.org\/10.2174\/1568026620666201207095626","journal-title":"Curr Top Med Chem"},{"key":"1107_CR2","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1007\/s12551-016-0247-1","volume":"9","author":"NS Pagadala","year":"2017","unstructured":"Pagadala NS, Syed K, Tuszynski J (2017) Software for molecular docking: a review. Biophys Rev 9:91\u2013102. https:\/\/doi.org\/10.1007\/s12551-016-0247-1","journal-title":"Biophys Rev"},{"key":"1107_CR3","doi-asserted-by":"publisher","DOI":"10.1016\/j.drudis.2022.103439","author":"T Danel","year":"2023","unstructured":"Danel T, Leski J, Podlewska S, Podolak IT (2023) Docking-based generative approaches in the search for new drug candidates. Drug Discov Today. https:\/\/doi.org\/10.1016\/j.drudis.2022.103439","journal-title":"Drug Discov Today"},{"key":"1107_CR4","doi-asserted-by":"publisher","first-page":"6789","DOI":"10.1021\/jm0608356","volume":"49","author":"N Huang","year":"2006","unstructured":"Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789\u20136801. https:\/\/doi.org\/10.1021\/jm0608356","journal-title":"J Med Chem"},{"key":"1107_CR5","doi-asserted-by":"publisher","DOI":"10.1016\/j.tips.2014.12.001","volume-title":"Beware of docking","author":"YC Chen","year":"2015","unstructured":"Chen YC (2015) Beware of docking. Elsevier Ltd, Amsterdam. https:\/\/doi.org\/10.1016\/j.tips.2014.12.001"},{"key":"1107_CR6","doi-asserted-by":"publisher","unstructured":"Cole J, Davis E, Jones G, Sage CR (2017) In: Chackalamannil, S., Rotella, D., Ward, S.E. (eds.) 3.12 - Molecular Docking\u2014A Solved Problem?, pp. 297\u2013318. Elsevier, Oxford https:\/\/doi.org\/10.1016\/B978-0-12-409547-2.12352-2","DOI":"10.1016\/B978-0-12-409547-2.12352-2"},{"key":"1107_CR7","doi-asserted-by":"publisher","first-page":"2489","DOI":"10.1021\/acs.jmedchem.0c02227","volume":"64","author":"A Fischer","year":"2021","unstructured":"Fischer A, Smie\u0161ko M, Sellner M, Lill MA (2021) Decision making in structure-based drug discovery: Visual inspection of docking results. J Med Chem 64:2489\u20132500. https:\/\/doi.org\/10.1021\/acs.jmedchem.0c02227","journal-title":"J Med Chem"},{"key":"1107_CR8","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1021\/jm030331x","volume":"47","author":"Z Deng","year":"2004","unstructured":"Deng Z, Chuaqui C, Singh J (2004) Structural interaction fingerprint (sift): A novel method for analyzing three-dimensional protein-ligand binding interactions. J Med Chem 47:337\u2013344. https:\/\/doi.org\/10.1021\/jm030331x","journal-title":"J Med Chem"},{"key":"1107_CR9","doi-asserted-by":"publisher","first-page":"1717","DOI":"10.1021\/ci500081m","volume":"54","author":"Y Li","year":"2014","unstructured":"Li Y, Han L, Liu Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 2. evaluation methods and general results. J Chem Inf Model 54:1717\u20131736. https:\/\/doi.org\/10.1021\/ci500081m","journal-title":"J Chem Inf Model"},{"key":"1107_CR10","doi-asserted-by":"publisher","DOI":"10.3389\/fbinf.2022.885983","author":"R Meli","year":"2022","unstructured":"Meli R, Morris GM, Biggin PC (2022) Scoring functions for protein-ligand binding affinity prediction using structure-based deep learning: A review. Front Bioinform. https:\/\/doi.org\/10.3389\/fbinf.2022.885983","journal-title":"Front Bioinform"},{"key":"1107_CR11","doi-asserted-by":"publisher","first-page":"1169","DOI":"10.1093\/bioinformatics\/btq112","volume":"26","author":"PJ Ballester","year":"2010","unstructured":"Ballester PJ, Mitchell JBO (2010) A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26:1169\u20131175. https:\/\/doi.org\/10.1093\/bioinformatics\/btq112","journal-title":"Bioinformatics"},{"key":"1107_CR12","doi-asserted-by":"publisher","first-page":"1376","DOI":"10.1093\/bioinformatics\/btaa982","volume":"37","author":"N S\u00e1nchez-Cruz","year":"2021","unstructured":"S\u00e1nchez-Cruz N, Medina-Franco JL, Mestres J, Barril X (2021) Extended connectivity interaction features: Improving binding affinity prediction through chemical description. Bioinformatics 37:1376\u20131382. https:\/\/doi.org\/10.1093\/bioinformatics\/btaa982","journal-title":"Bioinformatics"},{"key":"1107_CR13","doi-asserted-by":"publisher","first-page":"1334","DOI":"10.1093\/bioinformatics\/bty757","volume":"35","author":"M W\u00f3jcikowski","year":"2019","unstructured":"W\u00f3jcikowski M, Kukielka M, Stepniewska-Dziubinska MM, Siedlecki P (2019) Development of a protein-ligand extended connectivity (plec) fingerprint and its application for binding affinity predictions. Bioinformatics 35:1334\u20131341. https:\/\/doi.org\/10.1093\/bioinformatics\/bty757","journal-title":"Bioinformatics"},{"key":"1107_CR14","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-021-00507-1","author":"S Kumar","year":"2021","unstructured":"Kumar S, Kim M (2021) Smplip-score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors. J Cheminform. https:\/\/doi.org\/10.1186\/s13321-021-00507-1","journal-title":"J Cheminform"},{"key":"1107_CR15","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1021\/acs.jcim.8b00545","volume":"59","author":"M Su","year":"2019","unstructured":"Su M, Yang Q, Du Y, Feng G, Liu Z, Li Y, Wang R (2019) Comparative assessment of scoring functions: The casf-2016 update. J Chem Inf Model 59:895\u2013913. https:\/\/doi.org\/10.1021\/acs.jcim.8b00545","journal-title":"J Chem Inf Model"},{"key":"1107_CR16","doi-asserted-by":"publisher","DOI":"10.1002\/wcms.1465","volume-title":"Machine-learning scoring functions for structure-based drug lead optimization","author":"H Li","year":"2020","unstructured":"Li H, Sze KH, Lu G, Ballester PJ (2020) Machine-learning scoring functions for structure-based drug lead optimization. Blackwell Publishing Inc, Oxford"},{"key":"1107_CR17","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-018-0264-0","author":"JB Jasper","year":"2018","unstructured":"Jasper JB, Humbeck L, Brinkjost T, Koch O (2018) A novel interaction fingerprint derived from per atom score contributions: exhaustive evaluation of interaction fingerprint performance in docking based virtual screening. J Cheminform. https:\/\/doi.org\/10.1186\/s13321-018-0264-0","journal-title":"J Cheminform"},{"key":"1107_CR18","doi-asserted-by":"publisher","first-page":"1238","DOI":"10.1021\/acs.jcim.8b00773","volume":"59","author":"MS Nogueira","year":"2019","unstructured":"Nogueira MS, Koch O (2019) The development of target-specific machine learning models as scoring functions for docking-based target prediction. J Chem Inf Model 59:1238\u20131252. https:\/\/doi.org\/10.1021\/acs.jcim.8b00773","journal-title":"J Chem Inf Model"},{"key":"1107_CR19","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1007\/s10822-007-9167-2","volume":"22","author":"AC Good","year":"2008","unstructured":"Good AC, Oprea TI (2008) Optimization of camd techniques 3. virtual screening enrichment studies: A help or hindrance in tool selection? J Comput Aided Mol Des 22:169\u2013178. https:\/\/doi.org\/10.1007\/s10822-007-9167-2","journal-title":"J Comput Aided Mol Des"},{"key":"1107_CR20","doi-asserted-by":"publisher","first-page":"5957","DOI":"10.1021\/acs.jcim.0c00565","volume":"60","author":"EL C\u00e1ceres","year":"2020","unstructured":"C\u00e1ceres EL, Mew NC, Keiser MJ (2020) Adding stochastic negative examples into machine learning improves molecular bioactivity prediction. J Chem Inf Model 60:5957\u20135970. https:\/\/doi.org\/10.1021\/acs.jcim.0c00565","journal-title":"J Chem Inf Model"},{"key":"1107_CR21","doi-asserted-by":"publisher","unstructured":"Dealing with a data dilemma. Nature Reviews Drug Discovery 7, 632\u2013633 (2008) https:\/\/doi.org\/10.1038\/nrd2649","DOI":"10.1038\/nrd2649"},{"key":"1107_CR22","doi-asserted-by":"publisher","first-page":"1447","DOI":"10.1021\/ci400115b","volume":"53","author":"MR Bauer","year":"2013","unstructured":"Bauer MR, Ibrahim TM, Vogel SM, Boeckler FM (2013) Evaluation and optimization of virtual screening workflows with dekois 2.0 - a public library of challenging docking benchmark sets. J Chem Inf Model 53:1447\u20131462. https:\/\/doi.org\/10.1021\/ci400115b","journal-title":"J Chem Inf Model"},{"key":"1107_CR23","doi-asserted-by":"publisher","first-page":"1595","DOI":"10.1021\/ci4002712","volume":"53","author":"K Heikamp","year":"2013","unstructured":"Heikamp K, Bajorath J (2013) Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening. J Chem Inf Model 53:1595\u20131601. https:\/\/doi.org\/10.1021\/ci4002712","journal-title":"J Chem Inf Model"},{"key":"1107_CR24","doi-asserted-by":"publisher","DOI":"10.1016\/j.drudis.2022.05.005","author":"E L\u00f3pez-L\u00f3pez","year":"2022","unstructured":"L\u00f3pez-L\u00f3pez E, Gortari, E.F.-d., Medina-Franco, J.L. (2022) Yes sir! on the structure-inactivity relationships in drug discovery. Drug Discov Today. https:\/\/doi.org\/10.1016\/j.drudis.2022.05.005","journal-title":"Drug Discov Today"},{"key":"1107_CR25","doi-asserted-by":"publisher","first-page":"861","DOI":"10.21105\/joss.00861","volume":"3","author":"L McInnes","year":"2018","unstructured":"McInnes L, Healy J, Saul N, Gro\u00dfberger L (2018) Umap: Uniform manifold approximation and projection. J Open Source Softw 3:861","journal-title":"J Open Source Softw"},{"key":"1107_CR26","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0175410","author":"R Kurczab","year":"2017","unstructured":"Kurczab R, Bojarski AJ (2017) The influence of the negative-positive ratio and screening database size on the performance of machine learning-based virtual screening. PLoS ONE. https:\/\/doi.org\/10.1371\/journal.pone.0175410","journal-title":"PLoS ONE"},{"key":"1107_CR27","doi-asserted-by":"publisher","DOI":"10.1186\/1758-2946-6-32","author":"R Kurczab","year":"2014","unstructured":"Kurczab R, Smusz S, Bojarski AJ (2014) The influence of negative training set size on machine learning-based virtual screening. J Cheminform. https:\/\/doi.org\/10.1186\/1758-2946-6-32","journal-title":"J Cheminform"},{"key":"1107_CR28","doi-asserted-by":"publisher","first-page":"4263","DOI":"10.1021\/acs.jcim.0c00155","volume":"60","author":"VK Tran-Nguyen","year":"2020","unstructured":"Tran-Nguyen VK, Jacquemard C, Rognan D (2020) Lit-pcba: An unbiased data set for machine learning and virtual screening. J Chem Inf Model 60:4263\u20134273. https:\/\/doi.org\/10.1021\/acs.jcim.0c00155","journal-title":"J Chem Inf Model"},{"key":"1107_CR29","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1021\/ci0503255","volume":"46","author":"H Chen","year":"2006","unstructured":"Chen H, Lyne PD, Giordanetto F, Lovell T, Li J (2006) On evaluating molecular-docking methods for pose prediction and enrichment factors. J Chem Inf Model 46:401\u2013415. https:\/\/doi.org\/10.1021\/ci0503255","journal-title":"J Chem Inf Model"},{"key":"1107_CR30","doi-asserted-by":"publisher","first-page":"4593","DOI":"10.1016\/j.csbj.2021.07.032","volume":"19","author":"J Menke","year":"2021","unstructured":"Menke J, Massa J, Koch O (2021) Natural product scores and fingerprints extracted from artificial neural networks. Comput Struct Biotechnol J 19:4593\u20134602. https:\/\/doi.org\/10.1016\/j.csbj.2021.07.032","journal-title":"Comput Struct Biotechnol J"},{"key":"1107_CR31","doi-asserted-by":"publisher","first-page":"1014","DOI":"10.1021\/ci800023x","volume":"48","author":"E Kellenberger","year":"2008","unstructured":"Kellenberger E, Foata N, Rognan D (2008) Ranking targets in structure-based virtual screening of three-dimensional protein libraries: Methods and problems. J Chem Inf Model 48:1014\u20131025. https:\/\/doi.org\/10.1021\/ci800023x","journal-title":"J Chem Inf Model"},{"key":"1107_CR32","doi-asserted-by":"publisher","first-page":"612","DOI":"10.1093\/nar\/gkv352","volume":"43","author":"M Davies","year":"2015","unstructured":"Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, Bellis L, Overington JP (2015) Chembl web services: Streamlining access to drug discovery data and utilities. Nucleic Acids Res 43:612\u2013620. https:\/\/doi.org\/10.1093\/nar\/gkv352","journal-title":"Nucleic Acids Res"},{"key":"1107_CR33","doi-asserted-by":"publisher","unstructured":"Huang A, Knight IS, Naprienko S (2025) Data leakage and redundancy in the lit-pcba benchmark https:\/\/doi.org\/10.48550\/arXiv.2507.21404","DOI":"10.48550\/arXiv.2507.21404"},{"key":"1107_CR34","doi-asserted-by":"publisher","first-page":"2324","DOI":"10.1021\/acs.jcim.5b00559","volume":"55","author":"T Sterling","year":"2015","unstructured":"Sterling T, Irwin JJ (2015) Zinc 15 - ligand discovery for everyone. J Chem Inf Model 55:2324\u20132337. https:\/\/doi.org\/10.1021\/acs.jcim.5b00559","journal-title":"J Chem Inf Model"},{"key":"1107_CR35","doi-asserted-by":"publisher","first-page":"958","DOI":"10.1038\/nchembio.1936","volume":"11","author":"AM Wassermann","year":"2015","unstructured":"Wassermann AM, Lounkine E, Hoepfner D, Goff GL, King FJ, Studer C, Peltier JM, Grippo ML, Prindle V, Tao J, Schuffenhauer A, Wallace IM, Chen S, Krastel P, Cobos-Correa A, Parker CN, Davies JW, Glick M (2015) Dark chemical matter as a promising starting point for drug lead discovery. Nat Chem Biol 11:958\u2013966. https:\/\/doi.org\/10.1038\/nchembio.1936","journal-title":"Nat Chem Biol"},{"key":"1107_CR36","unstructured":"Ali M (2020) Pycaret: An open source, low-code machine learning library in python PyCaret version 2.1"},{"issue":"85","key":"1107_CR37","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay, (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12(85):2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"1107_CR38","unstructured":"Ramsundar B, Eastman P, Walters P, Pande V (2019) Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More. O\u2019Reilly"},{"key":"1107_CR39","doi-asserted-by":"publisher","unstructured":"Ho TK (1995) Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278\u20132821 https:\/\/doi.org\/10.1109\/ICDAR.1995.598994","DOI":"10.1109\/ICDAR.1995.598994"},{"key":"1107_CR40","doi-asserted-by":"publisher","unstructured":"Cortes C, Vapnik V, Saitta L (1995) Support-vector networks Machine Leaming 20:273\u2013297. https:\/\/doi.org\/10.1007\/BF00994018","DOI":"10.1007\/BF00994018"},{"key":"1107_CR41","doi-asserted-by":"publisher","unstructured":"Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13, pp. 785\u2013794. Association for Computing Machinery, https:\/\/doi.org\/10.1145\/2939672.2939785","DOI":"10.1145\/2939672.2939785"},{"key":"1107_CR42","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1037\/h0042519","volume":"65","author":"F Rosenblatt","year":"1958","unstructured":"Rosenblatt F (1958) The perceptron: A probabilistic model for information storage and organization in the brain. Psychol Rev 65:19\u201327. https:\/\/doi.org\/10.1037\/h0042519","journal-title":"Psychol Rev"},{"key":"1107_CR43","unstructured":"Ali M (2020) PyCaret: An open source, low-code machine learning library in Python. PyCaret version 1.0 https:\/\/www.pycaret.org"},{"key":"1107_CR44","doi-asserted-by":"publisher","unstructured":"Prathyusha KS, Reddy BE (2021) Normalization methods for multiple sources of data. Institute of Electrical and Electronics Engineers Inc. https:\/\/doi.org\/10.1109\/ICICCS51141.2021.9432142","DOI":"10.1109\/ICICCS51141.2021.9432142"},{"key":"1107_CR45","doi-asserted-by":"publisher","first-page":"2839","DOI":"10.1016\/j.patcog.2015.03.009","volume":"48","author":"TT Wong","year":"2015","unstructured":"Wong TT (2015) Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recogn 48:2839\u20132846. https:\/\/doi.org\/10.1016\/j.patcog.2015.03.009","journal-title":"Pattern Recogn"},{"key":"1107_CR46","doi-asserted-by":"publisher","unstructured":"He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322\u20131328 https:\/\/doi.org\/10.1109\/IJCNN.2008.4633969","DOI":"10.1109\/IJCNN.2008.4633969"},{"key":"1107_CR47","doi-asserted-by":"publisher","unstructured":"Jeni LA, Cohn JF, Torre FDL (2013) Facing imbalanced data - recommendations for the use of performance metrics. In: Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013, pp. 245\u2013251 https:\/\/doi.org\/10.1109\/ACII.2013.47","DOI":"10.1109\/ACII.2013.47"},{"key":"1107_CR48","doi-asserted-by":"publisher","DOI":"10.1016\/j.comtox.2021.100178","author":"SY Bae","year":"2021","unstructured":"Bae SY, Lee J, Jeong J, Lim C, Choi J (2021) Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints. Comput Toxicol. https:\/\/doi.org\/10.1016\/j.comtox.2021.100178","journal-title":"Comput Toxicol"},{"key":"1107_CR49","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.8b00363","author":"S Liu","year":"2019","unstructured":"Liu S, Alnammi M, Ericksen SS, Voter AF, Ananiev GE, Keck JL, Hoffmann FM, Wildman SA, Gitter A (2019) Practical Model Selection for Prospective Virtual Screening. Am Chem Soc. https:\/\/doi.org\/10.1021\/acs.jcim.8b00363","journal-title":"Am Chem Soc"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01107-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-01107-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01107-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T13:04:27Z","timestamp":1761829467000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-01107-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,30]]},"references-count":49,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1107"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-01107-z","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2025-2hh8z","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv-2025-2hh8z-v2","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,30]]},"assertion":[{"value":"3 May 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 October 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 October 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval and consent to participate"}},{"value":"OK is Scientific Advisor at NUVISAN ICB GmbH and Prosion GmbH.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"165"}}