{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T12:39:12Z","timestamp":1767962352135,"version":"3.49.0"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2013,4,5]],"date-time":"2013-04-05T00:00:00Z","timestamp":1365120000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In this study, the influence of the way of forming the set of inactives on the classification process was examined: random and diverse selection from the ZINC database, MDDR database and libraries generated according to the DUD methodology. All learning methods were tested in two modes: using one test set, the same for each method of inactive molecules generation and using test sets with inactives prepared in an analogous way as for training. The experiments were carried out for 5 different protein targets, 3 fingerprints for molecules representation and 7 classification algorithms with varying parameters. It appeared that the process of inactive set formation had a substantial impact on the machine learning methods performance.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>The level of chemical space limitation determined the ability of tested classifiers to select potentially active molecules in virtual screening tasks, as for example DUDs (widely applied in docking experiments) did not provide proper selection of active molecules from databases with diverse structures. The study clearly showed that inactive compounds forming training set should be representative to the highest possible extent for libraries that undergo screening.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1758-2946-5-17","type":"journal-article","created":{"date-parts":[[2013,4,5]],"date-time":"2013-04-05T19:32:28Z","timestamp":1365190348000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["The influence of the inactives subset generation on the performance of machine learning methods"],"prefix":"10.1186","volume":"5","author":[{"given":"Sabina","family":"Smusz","sequence":"first","affiliation":[]},{"given":"Rafa\u0142","family":"Kurczab","sequence":"additional","affiliation":[]},{"given":"Andrzej J","family":"Bojarski","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,4,5]]},"reference":[{"key":"454_CR1","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1021\/ci900419k","volume":"50","author":"H Geppert","year":"2010","unstructured":"Geppert H, Vogt M, Bajorath J: Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model. 2010, 50: 205-216. 10.1021\/ci900419k.","journal-title":"J Chem Inf Model"},{"key":"454_CR2","doi-asserted-by":"publisher","first-page":"332","DOI":"10.2174\/138620709788167980","volume":"12","author":"JL Melville","year":"2009","unstructured":"Melville JL, Burke EK, Hirst JD: Machine learning in virtual screening. Comb Chem High Throughput Screen. 2009, 12: 332-343. 10.2174\/138620709788167980.","journal-title":"Comb Chem High Throughput Screen"},{"key":"454_CR3","doi-asserted-by":"publisher","first-page":"453","DOI":"10.2174\/138620709788489064","volume":"12","author":"A Schwaighofer","year":"2009","unstructured":"Schwaighofer A, Schroeter T, Mika S, Blanchard G: How wrong can we get? A review of machine learning approaches and error bars. Comb Chem High Throughput Screen. 2009, 12: 453-468. 10.2174\/138620709788489064.","journal-title":"Comb Chem High Throughput Screen"},{"key":"454_CR4","doi-asserted-by":"publisher","first-page":"3256","DOI":"10.1039\/b409865j","volume":"2","author":"J Hert","year":"2004","unstructured":"Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A: Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org Biomol Chem. 2004, 2: 3256-3266. 10.1039\/b409865j.","journal-title":"Org Biomol Chem"},{"key":"454_CR5","doi-asserted-by":"publisher","first-page":"2101","DOI":"10.1021\/ci900135u","volume":"49","author":"XH Liu","year":"2009","unstructured":"Liu XH, Ma XH, Tan CY, Jiang YY, Go ML, Low BC, Chen YZ: Virtual screening of Abl inhibitors from large compound libraries by support vector machines. J Chem Inf Model. 2009, 49: 2101-2110. 10.1021\/ci900135u.","journal-title":"J Chem Inf Model"},{"key":"454_CR6","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1021\/ci600332j","volume":"47","author":"CL Bruce","year":"2007","unstructured":"Bruce CL, Melville JL, Pickett SD, Hirst JD: Contemporary QSAR classifiers compared. J Chem Inf Model. 2007, 47: 219-227. 10.1021\/ci600332j.","journal-title":"J Chem Inf Model"},{"key":"454_CR7","doi-asserted-by":"publisher","first-page":"1098","DOI":"10.1021\/ci050519k","volume":"46","author":"D Plewczynski","year":"2006","unstructured":"Plewczynski D, Spieser SAH, Koch U: Assessing different classification methods for virtual screening. J Chem Inf Model. 2006, 46: 1098-1106. 10.1021\/ci050519k.","journal-title":"J Chem Inf Model"},{"key":"454_CR8","first-page":"796","volume":"33","author":"F Hammann","year":"2009","unstructured":"Hammann F, Gutmann H, Baumann U, Helma C, Drewe J: Classification of Cytochrome P 450 Activities Using Machine Learning Methods. Mol Pharmaceutics. 2009, 33: 796-801.","journal-title":"Mol Pharmaceutics"},{"key":"454_CR9","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1007\/s10822-006-9096-5","volume":"21","author":"B Chen","year":"2007","unstructured":"Chen B, Harrison RF, Papadatos G, Willett P, Wood DJ, Lewell XQ, Greenidge P, Stiefl N: Evaluation of machine-learning methods for ligand-based virtual screening. J Comput Aided Mol Des. 2007, 21: 53-62. 10.1007\/s10822-006-9096-5.","journal-title":"J Comput Aided Mol Des"},{"key":"454_CR10","doi-asserted-by":"publisher","first-page":"1276","DOI":"10.1016\/j.jmgm.2007.12.002","volume":"26","author":"LY Han","year":"2008","unstructured":"Han LY, Ma XH, Lin HH, Jia J, Zhu F, Xue Y, Li ZR, Cao ZW, Ji ZL, Chen YZ: A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. J Mol Graph Model. 2008, 26: 1276-1286. 10.1016\/j.jmgm.2007.12.002.","journal-title":"J Mol Graph Model"},{"key":"454_CR11","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1016\/j.jmgm.2006.01.007","volume":"25","author":"H Li","year":"2006","unstructured":"Li H, Ung CY, Yap CW, Xue Y, Li ZR, Chen YZ: Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods. J Mol Graph Model. 2006, 25: 313-323. 10.1016\/j.jmgm.2006.01.007.","journal-title":"J Mol Graph Model"},{"key":"454_CR12","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1021\/ci049714+","volume":"45","author":"JJ Irwin","year":"2005","unstructured":"Irwin JJ, Shoichet BK ZINC: A Free Database of Commercially Available Compounds for Virtual Screening. J Chem Inf Model. 2005, 45: 177-182. 10.1021\/ci049714+.","journal-title":"J Chem Inf Model"},{"key":"454_CR13","unstructured":"MDDR licensed by Accelrys, Inc. USA. http:\/\/www.accelrys.com,"},{"key":"454_CR14","doi-asserted-by":"publisher","first-page":"6789","DOI":"10.1021\/jm0608356","volume":"49","author":"N Huang","year":"2006","unstructured":"Huang N, Shoichet BK, Irwin JJ: Benchmarking sets for molecular docking. J Med Chem. 2006, 49: 6789-6801. 10.1021\/jm0608356.","journal-title":"J Med Chem"},{"key":"454_CR15","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1007\/s10822-008-9170-2","volume":"22","author":"A Nicholls","year":"2008","unstructured":"Nicholls A: What do we know and when do we know it?. J Comput Aided Mol Des. 2008, 22: 239-255. 10.1007\/s10822-008-9170-2.","journal-title":"J Comput Aided Mol Des"},{"key":"454_CR16","doi-asserted-by":"publisher","first-page":"344","DOI":"10.2174\/138620709788167944","volume":"12","author":"XH Ma","year":"2009","unstructured":"Ma XH, Jia J, Zhu F, Xue Y, Li ZR, Chen YZ: Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries. Comb Chem High Throughput Screen. 2009, 12: 344-357. 10.2174\/138620709788167944.","journal-title":"Comb Chem High Throughput Screen"},{"key":"454_CR17","doi-asserted-by":"publisher","first-page":"2133","DOI":"10.1007\/s00894-010-0854-x","volume":"17","author":"D Plewczynski","year":"2011","unstructured":"Plewczynski D: Brainstorming: weighted voting prediction of inhibitors for protein targets. J Mol Model. 2011, 17: 2133-2141. 10.1007\/s00894-010-0854-x.","journal-title":"J Mol Model"},{"key":"454_CR18","doi-asserted-by":"publisher","first-page":"189","DOI":"10.2174\/138620707780126705","volume":"10","author":"D Plewczynski","year":"2007","unstructured":"Plewczynski D, von Grotthuss M, Spieser SAH, Rychlewski L, Wyrwicz LS, Ginalski K, Koch U: Virtual high throughput screening using combined random forest and flexible docking. Comb Chem High Throughput Screen. 2007, 10: 189-196. 10.2174\/138620707780126705.","journal-title":"Comb Chem High Throughput Screen"},{"key":"454_CR19","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1002\/sam.10037","volume":"2","author":"EJ Gardiner","year":"2009","unstructured":"Gardiner EJ, Gillet VJ, Haranczyk M, Hert J, Holliday JD, Malim N, Patel Y, Willet P: Turbo Similarity Searching: Effect of Fingerprint and Dataset on Virtual-Screening Performance. Stat Anal Data Min. 2009, 2: 103-114. 10.1002\/sam.10037.","journal-title":"Stat Anal Data Min"},{"key":"454_CR20","volume-title":"ChemAxon","author":"InstantJChem","year":"2011","unstructured":"InstantJChem: ChemAxon. 2011, http:\/\/www.chemaxon.com,"},{"key":"454_CR21","unstructured":"RDKit: Open-source cheminformatics. http:\/\/www.rdkit.org,"},{"key":"454_CR22","unstructured":"Discovery Studio, provided by Accelrys, Inc USA. http:\/\/www.accelrys.com,"},{"key":"454_CR23","doi-asserted-by":"publisher","first-page":"D400","DOI":"10.1093\/nar\/gkr1132","volume":"40","author":"Y Wang","year":"2012","unstructured":"Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, Han L, Karapetyan K, Dracheva S, Shoemaker BA, Bolton E, Gindulyte A, Bryant SH: PubChem's BioAssay Database. Nucleic Acids Res. 2012, 40: D400-412. 10.1093\/nar\/gkr1132.","journal-title":"Nucleic Acids Res"},{"key":"454_CR24","doi-asserted-by":"publisher","first-page":"1466","DOI":"10.1002\/jcc.21707","volume":"32","author":"CWEI Yap","year":"2010","unstructured":"Yap CWEI: PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints.J Comput Chem. 2010, 32: 1466-1474.","journal-title":"J Comput Chem"},{"key":"454_CR25","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1021\/ci025584y","volume":"43","author":"C Steinbeck","year":"2003","unstructured":"Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E: The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. J Chem Inf Comput Sci. 2003, 43: 493-500. 10.1021\/ci025584y.","journal-title":"J Chem Inf Comput Sci"},{"key":"454_CR26","doi-asserted-by":"publisher","first-page":"2423","DOI":"10.1021\/ci060155b","volume":"46","author":"T Ewing","year":"2006","unstructured":"Ewing T, Baber JC, Feher M: Novel 2D fingerprints for ligand-based virtual screening. J Chem Inf Model. 2006, 46: 2423-2431. 10.1021\/ci060155b.","journal-title":"J Chem Inf Model"},{"key":"454_CR27","doi-asserted-by":"publisher","first-page":"2518","DOI":"10.1093\/bioinformatics\/btn479","volume":"24","author":"J Klekota","year":"2008","unstructured":"Klekota J, Roth FP: Chemical substructures that enrich for biological activity. Bioinformatics. 2008, 24: 2518-2525. 10.1093\/bioinformatics\/btn479.","journal-title":"Bioinformatics"},{"key":"454_CR28","first-page":"185","volume-title":"Advances in Kernel Methods \u2013 Support Vector Learning","author":"JC Platt","year":"1999","unstructured":"Platt JC: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Advances in Kernel Methods \u2013 Support Vector Learning. Edited by: Scholkopf B, Burges C, Smola AJ. 1999, Cambridge: MIT Press, 185-208."},{"key":"454_CR29","first-page":"505","volume-title":"Mooney RJ Constructing Diverse Classifier Ensembles using Artificial Training Examples","author":"P Melville","year":"2003","unstructured":"Melville P: Mooney RJ Constructing Diverse Classifier Ensembles using Artificial Training Examples. 2003, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence: Morgan Kaufmann Publishers Inc, 505-510."},{"key":"454_CR30","first-page":"457","volume-title":"Recent Advances in Intelligent Information Systems","author":"J Stefanowski","year":"2009","unstructured":"Stefanowski J, Pachocki M: Comparing Performance of Committee Based Approaches to Active Learning. Recent Advances in Intelligent Information Systems. Edited by: Klopotek M, Przepiorkowski A, Wierzchon S, Trojanowski K. 2009, Warsaw: EXIT, 457-470."},{"key":"454_CR31","volume-title":"Randomized Decimation HyperPipes","author":"ZA Deeb","year":"2010","unstructured":"Deeb ZA, Devine T: Randomized Decimation HyperPipes. 2010, http:\/\/www.csee.wvu.edu\/~timm\/tmp\/r7.pdf,"},{"key":"454_CR32","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L: Random Forests. Mach Learn. 2001, 45: 5-32. 10.1023\/A:1010933404324.","journal-title":"Mach Learn"},{"key":"454_CR33","doi-asserted-by":"publisher","first-page":"1947","DOI":"10.1021\/ci034160g","volume":"43","author":"V Svetnik","year":"2003","unstructured":"Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP: Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003, 43: 1947-1958. 10.1021\/ci034160g.","journal-title":"J Chem Inf Comput Sci"},{"key":"454_CR34","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1145\/1656274.1656278","volume":"11","author":"M Hall","year":"2009","unstructured":"Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. SIGKDD Explorations. 2009, 11: 10-18. 10.1145\/1656274.1656278.","journal-title":"SIGKDD Explorations"},{"key":"454_CR35","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1007\/978-3-642-21946-7_8","volume-title":"Computational Intelligence Methods for Bioinformatics and Biostatistics 7th International Meeting","author":"C Savojardo","year":"2011","unstructured":"Savojardo C, Fariselli P, Martelli PL, Shukla P, Casadio R: Prediction of the Bonding State of Cysteine Residues in Proteins with Machine-Learning Methods. Computational Intelligence Methods for Bioinformatics and Biostatistics 7th International Meeting. Edited by: Rizzo R, Lisboa PJG. 2011, Berlin Heidelberg: Springer-Verlag, 98-111. 6665","edition":"6665"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-5-17.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1758-2946-5-17\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-5-17.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:59:23Z","timestamp":1630537163000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/1758-2946-5-17"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,4,5]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["454"],"URL":"https:\/\/doi.org\/10.1186\/1758-2946-5-17","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,4,5]]},"assertion":[{"value":"10 January 2013","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 March 2013","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 April 2013","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"17"}}