{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T18:01:12Z","timestamp":1771956072086,"version":"3.50.1"},"reference-count":110,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2019,4,30]],"date-time":"2019-04-30T00:00:00Z","timestamp":1556582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["PTDC\/EEI-ESS\/4923\/2014"],"award-info":[{"award-number":["PTDC\/EEI-ESS\/4923\/2014"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Molecules"],"abstract":"<jats:p>The performance of quantitative structure\u2013activity relationship (QSAR) models largely depends on the relevance of the selected molecular representation used as input data matrices. This work presents a thorough comparative analysis of two main categories of molecular representations (vector space and metric space) for fitting robust machine learning models in QSAR problems. For the assessment of these methods, seven different molecular representations that included RDKit descriptors, five different fingerprints types (MACCS, PubChem, FP2-based, Atom Pair, and ECFP4), and a graph matching approach (non-contiguous atom matching structure similarity; NAMS) in both vector space and metric space, were subjected to state-of-art machine learning methods that included different dimensionality reduction methods (feature selection and linear dimensionality reduction). Five distinct QSAR data sets were used for direct assessment and analysis. Results show that, in general, metric-space and vector-space representations are able to produce equivalent models, but there are significant differences between individual approaches. The NAMS-based similarity approach consistently outperformed most fingerprint representations in model quality, closely followed by Atom Pair fingerprints. To further verify these findings, the metric space-based models were fitted to the same data sets with the closest neighbors removed. These latter results further strengthened the above conclusions. The metric space graph-based approach appeared significantly superior to the other representations, albeit at a significant computational cost.<\/jats:p>","DOI":"10.3390\/molecules24091698","type":"journal-article","created":{"date-parts":[[2019,5,2]],"date-time":"2019-05-02T03:15:22Z","timestamp":1556766922000},"page":"1698","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5207-7136","authenticated-orcid":false,"given":"Samina","family":"Kausar","sequence":"first","affiliation":[{"name":"LASIGE, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"},{"name":"BioISI\u2014Biosystems &amp; Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3588-8746","authenticated-orcid":false,"given":"Andre O.","family":"Falcao","sequence":"additional","affiliation":[{"name":"LASIGE, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"},{"name":"BioISI\u2014Biosystems &amp; Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2019,4,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"4977","DOI":"10.1021\/jm4004285","article-title":"QSAR Modeling: Where Have You Been? Where Are You Going To?","volume":"57","author":"Cherkasov","year":"2014","journal-title":"J. Med. Chem."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"213","DOI":"10.2174\/138620706776055539","article-title":"Computational methods in developing quantitative structure-activity relationships (QSAR): A review","volume":"9","author":"Dudek","year":"2006","journal-title":"Comb. Chem. High Throughput Screen."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1038\/194178b0","article-title":"Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients","volume":"194","author":"Hansch","year":"1962","journal-title":"Nature"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Yoo, C., and Shahlaei, M. (2017). The applications of PCA in QSAR studies: A case study on CCR5 antagonists. Chem. Biol. Drug Des.","DOI":"10.1111\/cbdd.13064"},{"key":"ref_5","unstructured":"Todeschini, R., and Consonni, V. (2008). Handbook of Molecular Descriptors, Volume 11, Wiley-VCH Verlag GmbH."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1145\/502807.502808","article-title":"Searching in Metric Spaces","volume":"33","author":"Navarro","year":"2001","journal-title":"ACM Comput. Surv."},{"key":"ref_7","unstructured":"Gasteiger, J. (2008). Handbook of Chemoinformatics: From Data to Knowledge, Volumes 1\u20134, Wiley-VCH."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1186\/s13321-016-0148-0","article-title":"Comparing structural fingerprints using a literature-based similarity benchmark","volume":"8","author":"Sayle","year":"2016","journal-title":"J. Cheminform."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1218","DOI":"10.1021\/ci010291a","article-title":"Toward an Optimal Procedure for Variable Selection and QSAR Model Building","volume":"41","author":"Yasri","year":"2001","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Puzyn, T., Leszczynski, J., and Cronin, M.T. (2009). Recent Advances in QSAR Studies: Methods and Applications (Challenges and Advances in Computational Chemistry and Physics), Springer.","DOI":"10.1007\/978-1-4020-9783-6"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1080\/10629360902949567","article-title":"How not to develop a quantitative structure-activity or structure-property relationship (QSAR\/QSPR)","volume":"20","author":"Dearden","year":"2009","journal-title":"SAR QSAR Environ. Res."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3494","DOI":"10.2174\/138161207782794257","article-title":"Predictive QSAR modeling workflow, model applicability domains, and virtual screening","volume":"13","author":"Tropsha","year":"2007","journal-title":"Curr. Pharm. Des."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1002\/minf.201000061","article-title":"Best practices for QSAR model development, validation, and exploitation","volume":"29","author":"Tropsha","year":"2010","journal-title":"Mol. Inform."},{"key":"ref_14","unstructured":"Lesk, A.M. (2014). Introduction to Bioinformatics, Oxford University Press. [4th ed.]."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Orengo, C.A., and Bateman, A. (2013). Protein Families: Relating Protein Sequence, Structure, and Function, John Wiley & Sons, Inc.","DOI":"10.1002\/9781118743089"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1833","DOI":"10.1021\/ci500110v","article-title":"Structural similarity based kriging for quantitative structure activity and property relationship modeling","volume":"54","author":"Teixeira","year":"2014","journal-title":"J. Chem. Inf. Model."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"4350","DOI":"10.1021\/jm020155c","article-title":"Do Structurally Similar Molecules Have Similar Biological Activity?","volume":"45","author":"Martin","year":"2002","journal-title":"J. Med. Chem."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1006","DOI":"10.1002\/qsar.200330831","article-title":"Approaches to Measure Chemical Similarity\u2014A Review","volume":"22","author":"Nikolova","year":"2003","journal-title":"QSAR Comb. Sci."},{"key":"ref_19","unstructured":"Johnson, M.A., and Maggiora, G.M. (1990). Concepts and Applications of Molecular Similarity, John Wiley & Sons."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1021\/ci9800211","article-title":"Chemical Similarity Searching","volume":"38","author":"Willett","year":"1998","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"3204","DOI":"10.1039\/b409813g","article-title":"Molecular similarity: A key technique in molecular informatics","volume":"2","author":"Bender","year":"2004","journal-title":"Org. Biomol. Chem."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3186","DOI":"10.1021\/jm401411z","article-title":"Molecular Similarity in Medicinal Chemistry","volume":"57","author":"Maggiora","year":"2014","journal-title":"J. Med. Chem."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.drudis.2007.01.011","article-title":"Molecular similarity analysis in virtual screening: Foundations, limitations and novel approaches","volume":"12","author":"Eckert","year":"2007","journal-title":"Drug Discov. Today"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1002\/wcms.23","article-title":"Similarity searching","volume":"1","author":"Stumpfe","year":"2011","journal-title":"Wiley Interdiscip. Rev. Comput. Mol. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Maggiora, G.M., and Shanmugasundaram, V. (2004). Molecular Similarity Measures. Methods in Molecular Biology, Springer.","DOI":"10.1385\/1-59259-802-1:001"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Keith, J.M. (2017). Molecular Similarity Concepts for Informatics Applications. Bioinformatics: Volume II: Structure, Function, and Applications, Springer.","DOI":"10.1007\/978-1-4939-6613-4"},{"key":"ref_27","unstructured":"James, C., Weininger, D., and Delaney, J. (2011). Daylight Theory Manual Version 4.9, Daylight Chemical Information Systems, Inc."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2511","DOI":"10.1021\/ci400324u","article-title":"Noncontiguous atom matching structural similarity function","volume":"53","author":"Teixeira","year":"2013","journal-title":"J. Chem. Inf. Model."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1002\/wcms.5","article-title":"Maximum common subgraph isomorphism algorithms and their applications in molecular science: A review","volume":"1","author":"Ehrlich","year":"2011","journal-title":"Wiley Interdiscip. Rev. Comput. Mol. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1023\/A:1021271615909","article-title":"Maximum common subgraph isomorphism algorithms for the matching of chemical structures","volume":"16","author":"Raymond","year":"2002","journal-title":"J. Comput.-Aided Mol. Des."},{"key":"ref_31","first-page":"532","article-title":"Substructure searching methods: Old and new","volume":"33","author":"Barnard","year":"1993","journal-title":"J. Chem. Inf. Model."},{"key":"ref_32","first-page":"379","article-title":"On the Properties of Bit String-Based Measures of Chemical Similarity","volume":"38","author":"Flower","year":"1998","journal-title":"J. Chem. Inf. Model."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-015-0069-3","article-title":"Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?","volume":"7","author":"Bajusz","year":"2015","journal-title":"J. Cheminform."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1037\/0033-295X.84.4.327","article-title":"Features of similarity","volume":"84","author":"Tversky","year":"1977","journal-title":"Psychol. Rev."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Leskovec, J., Rajaraman, A., and Ullman, J.D. (2014). Mining of Massive Datasets, Cambridge University Press. [2nd ed.].","DOI":"10.1017\/CBO9781139924801"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1021\/jm00004a009","article-title":"Molecular similarity matrices and quantitative structure-activity relationships: A case study with methodological implications","volume":"38","author":"Benigni","year":"1995","journal-title":"J. Med. Chem."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"4360","DOI":"10.1021\/jm970488n","article-title":"Three-dimensional quantitative structure-activity relationships from molecular similarity matrices and genetic neural networks. 2. Applications","volume":"40","author":"So","year":"1997","journal-title":"J. Med. Chem."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1002\/1097-461X(2000)80:3<265::AID-QUA1>3.0.CO;2-K","article-title":"Quantum similarity QSAR: Study of inhibitors binding to thrombin, trypsin, and factor Xa, including a comparison with CoMFA and CoMSIA methods","volume":"80","author":"Robert","year":"2000","journal-title":"Int. J. Quantum Chem."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1021\/ci0202842","article-title":"Molecular quantum similarity-based QSARs for binding affinities of several steroid sets","volume":"42","year":"2002","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1021\/ar010048x","article-title":"Molecular quantum similarity and the fundamentals of QSAR","volume":"35","author":"Amat","year":"2002","journal-title":"Acc. Chem. Res."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1080\/10629360701304113","article-title":"About the prediction of molecular properties using the fundamental Quantum QSPR (QQSPR) equation \u2020","volume":"18","year":"2007","journal-title":"SAR QSAR Environ. Res."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Carb\u00f3-Dorca, R., and Mezey, P.G. (1999). Advances in Molecular Similarity, Elsevier Science. Number v. 2 in Advances in Molecular Similarity.","DOI":"10.1016\/S1873-9776(98)80007-2"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1678","DOI":"10.1021\/ci0600511","article-title":"A Steroids QSAR Approach Based on Approximate Similarity Measurements","volume":"46","year":"2006","journal-title":"J. Chem. Inf. Model."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Girschick, T., Almeida, P.R., Kramer, S., and Sta\u00ec\u0161lring, J. (2013). Similarity boosted quantitative structure-activity relationship\u2014A systematic study of enhancing structural descriptors by molecular similarity. J. Chem. Inf. Model.","DOI":"10.1021\/ci300182p"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1080\/1062936X.2018.1442879","article-title":"QSAR classification and regression models for \u03b2-secretase inhibitors using relative distance matrices","volume":"29","year":"2018","journal-title":"SAR QSAR Environ. Res."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"D945","DOI":"10.1093\/nar\/gkw1074","article-title":"The ChEMBL database in 2017","volume":"45","author":"Gaulton","year":"2017","journal-title":"Nucleic Acids Res."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-017-0256-5","article-title":"An automated framework for QSAR model building","volume":"10","author":"Kausar","year":"2018","journal-title":"J. Cheminform."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Todeschini, R., and Consonni, V. (2009). Molecular Descriptors for Chemoinformatics, Wiley-VCH Verlag GmbH & Co. KGaA. Methods and Principles in Medicinal Chemistry.","DOI":"10.1002\/9783527628766"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1039\/cs9952400279","article-title":"QSPR: The correlation and quantitative prediction of chemical and physical properties from structure","volume":"24","author":"Katritzky","year":"1995","journal-title":"Chem. Soc. Rev."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Gasteiger, J. (2003). Handbook of Chemoinformatics, Wiley-VCH Verlag GmbH. Volumes 1\u20134.","DOI":"10.1002\/3527601643.ch1"},{"key":"ref_51","unstructured":"Bajorath, J. (2004). Chemoinformatics: Concepts, Methods, and Tools for Drug Discovery, Volume 275, Humana Press."},{"key":"ref_52","unstructured":"Roy, K., Kar, S., and Das, R.N. (2015). Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Elsevier."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1002\/minf.201000100","article-title":"Chemoinformatics as a theoretical chemistry discipline","volume":"30","author":"Varnek","year":"2011","journal-title":"Mol. Inform."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.ymeth.2014.08.005","article-title":"Molecular fingerprint similarity search in virtual screening","volume":"71","author":"Ojeda","year":"2015","journal-title":"Methods"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1504","DOI":"10.1021\/ci700052x","article-title":"Comparison of Topological, Shape, and Docking Methods in Virtual Screening","volume":"47","author":"McGaughey","year":"2007","journal-title":"J. Chem. Inf. Model."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"927","DOI":"10.2174\/138955708785132792","article-title":"Synergies of Virtual Screening Approaches","volume":"8","author":"Muegge","year":"2008","journal-title":"Mini-Rev. Med. Chem."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1016\/S1359-6446(02)02411-X","article-title":"Why do we need so many chemical similarity search methods?","volume":"7","author":"Sheridan","year":"2002","journal-title":"Drug Discov. Today"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1536","DOI":"10.1021\/jm050468i","article-title":"Scaffold Hopping through Virtual Screening Using 2D and 3D Similarity Descriptors: Ranking, Voting, and Consensus Scoring","volume":"49","author":"Zhang","year":"2006","journal-title":"J. Med. Chem."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1517\/17460441.2016.1117070","article-title":"An overview of molecular fingerprint similarity search in virtual screening","volume":"11","author":"Muegge","year":"2016","journal-title":"Expert Opin. Drug Discov."},{"key":"ref_60","first-page":"1","article-title":"RDKit Documentation","volume":"1","author":"Landrum","year":"2018","journal-title":"Release"},{"key":"ref_61","first-page":"64","article-title":"Atom pairs as molecular features in structure-activity studies: Definition and applications","volume":"25","author":"Carhart","year":"1985","journal-title":"J. Chem. Inf. Model."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-Connectivity Fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1021\/ci010132r","article-title":"Reoptimization of MDL Keys for Use in Drug Discovery","volume":"42","author":"Durant","year":"2002","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_64","unstructured":"U.S. National Library of Medicine (2009). PubChem Substructure Fingerprint."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/1758-2946-3-33","article-title":"Open Babel: An open chemical toolbox","volume":"3","author":"Banck","year":"2011","journal-title":"J. Cheminform."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1002\/minf.201400024","article-title":"The Calculation of Molecular Structural Similarity: Principles and Practice","volume":"33","author":"Willett","year":"2014","journal-title":"Mol. Inform."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"591","DOI":"10.12688\/f1000research.8357.1","article-title":"Activity-relevant similarity values for fingerprints and implications for similarity searching","volume":"5","author":"Jasial","year":"2016","journal-title":"F1000Research"},{"key":"ref_68","unstructured":"Han, J., Kamber, M., and Pei, J. (2012). Data Mining: Concepts and Techniques, Elsevier."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer. [2nd ed.].","DOI":"10.1007\/978-0-387-84858-7"},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1016\/j.drudis.2006.10.005","article-title":"Similarity-based virtual screening using 2D fingerprints","volume":"11","author":"Willett","year":"2006","journal-title":"Drug Discov. Today"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"5707","DOI":"10.1021\/jm100492z","article-title":"Scaffold Hopping Using Two-Dimensional Fingerprints: True Potential, Black Magic, or a Hopeless Endeavor? Guidelines for Virtual Screening","volume":"53","author":"Vogt","year":"2010","journal-title":"J. Med. Chem."},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1042\/bst0310603","article-title":"Similarity-based approaches to virtual screening","volume":"31","author":"Willett","year":"2003","journal-title":"Biochem. Soc. Trans."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"1978","DOI":"10.3390\/ijms10051978","article-title":"Current mathematical methods used in QSAR\/QSPR studies","volume":"10","author":"Liu","year":"2009","journal-title":"Int. J. Mol. Sci."},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1517\/17460441.2016.1146250","article-title":"Use of machine learning approaches for novel drug discovery","volume":"11","author":"Lima","year":"2016","journal-title":"Expert Opin. Drug Discov."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-Vector Networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1758-2946-5-9","article-title":"Random forests for feature selection in QSPR models\u2014An application for predicting standard enthalpy of formation of hydrocarbons","volume":"5","author":"Teixeira","year":"2013","journal-title":"J. Cheminform."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Statnikov, A., Wang, L., and Aliferis, C. (2008). A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform., 9.","DOI":"10.1186\/1471-2105-9-319"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Yee, L.C., and Wei, Y.C. (2012). Current Modeling Methods Used in QSAR\/QSPR. Statistical Modelling of Molecular Descriptors in QSAR\/QSPR, Wiley-VCH Verlag GmbH & Co. KGaA.","DOI":"10.1002\/9783527645121.ch1"},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"1413","DOI":"10.1021\/ci200409x","article-title":"Machine Learning Methods for Property Prediction in Chemoinformatics","volume":"52","author":"Varnek","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"4289","DOI":"10.2174\/092986712802884259","article-title":"Machine learning techniques and drug design","volume":"19","author":"Gertrudes","year":"2012","journal-title":"Curr. Med. Chem."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"1913","DOI":"10.2174\/1568026614666140929124203","article-title":"In silico machine learning methods in drug development","volume":"14","author":"Dobchev","year":"2014","journal-title":"Curr. Top. Med. Chem."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"1606","DOI":"10.2174\/156802608786786552","article-title":"Variable selection methods in QSAR: An overview","volume":"8","author":"Teijeira","year":"2008","journal-title":"Curr. Top. Med. Chem."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Dehmer, M., Varmuza, K., Bonchev, D., and Emmert-Streib, F. (2012). Statistical Modelling of Molecular Descriptors in QSAR\/QSPR, Wiley-VCH Verlag GmbH.","DOI":"10.1002\/9783527645121"},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"2225","DOI":"10.1016\/j.patrec.2010.03.014","article-title":"Variable selection using Random Forests","volume":"31","author":"Genuer","year":"2012","journal-title":"Pattern Recognit. Lett."},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Zaki, J.M., and Meira, W. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press.","DOI":"10.1017\/CBO9780511810114"},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Lee, J.A., and Verleysen, M. (2007). Nonlinear Dimensionality Reduction, Springer. Information Science and Statistics.","DOI":"10.1007\/978-0-387-39351-3"},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1007\/s11030-006-9024-6","article-title":"Megavariate analysis of environmental QSAR data. Part I\u2014A basic framework founded on principal component analysis (PCA), partial least squares (PLS), and statistical molecular design (SMD)","volume":"10","author":"Eriksson","year":"2006","journal-title":"Mol. Divers."},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1002\/qsar.200610151","article-title":"Principles of QSAR models validation: Internal and external","volume":"26","author":"Gramatica","year":"2007","journal-title":"QSAR Comb. Sci."},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1021\/ci000134w","article-title":"Interpretation of Quantitative Structure-Property and -Activity Relationships","volume":"41","author":"Katritzky","year":"2001","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_91","first-page":"32","article-title":"Random Forests: Some methodological insights","volume":"6729","author":"Genuer","year":"2008","journal-title":"Inria"},{"key":"ref_92","first-page":"1063","article-title":"Analysis of a Random Forests Model","volume":"13","author":"Biau","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Spiess, A.N., and Neumeyer, N. (2010). An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: A Monte Carlo approach. BMC Pharmacol., 10.","DOI":"10.1186\/1471-2210-10-6"},{"key":"ref_94","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1021\/ci025584y","article-title":"The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics","volume":"43","author":"Steinbeck","year":"2003","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_95","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1145\/1656274.1656280","article-title":"KNIME\u2014The Konstanz Information Miner","volume":"11","author":"Berthold","year":"2009","journal-title":"SIGKDD Explor."},{"key":"ref_96","unstructured":"R Development Core Team (2011). R: A Language and Environment for Statistical Computing, R Development Core Team."},{"key":"ref_97","unstructured":"Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., and Leisch, F. (2014). Misc Functions of the Department of Statistics (e1071), TU Wien, R Development Core Team."},{"key":"ref_98","first-page":"18","article-title":"Classification and Regression by randomForest","volume":"2","author":"Liaw","year":"2002","journal-title":"R News"},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Kassambara, A., and Mundt, F. (2017). Package \u2018Factoextra\u2019 for R: Extract and Visualize the Results of Multivariate Data Analyses, R Development Core Team.","DOI":"10.32614\/CRAN.package.factoextra"},{"key":"ref_100","doi-asserted-by":"crossref","first-page":"2310","DOI":"10.1021\/ci050314b","article-title":"Modeling robust QSAR","volume":"46","author":"Polanski","year":"2006","journal-title":"J. Chem. Inf. Model."},{"key":"ref_101","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1021\/ci100176x","article-title":"Trust but verify: On the importance of chemical structure curation in chemoinformatics and QSAR modeling research","volume":"50","author":"Fourches","year":"2010","journal-title":"J. Chem. Inf. Model."},{"key":"ref_102","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1002\/minf.201300076","article-title":"Using graph indices for the analysis and comparison of chemical datasets","volume":"32","author":"Fourches","year":"2013","journal-title":"Mol. Inform."},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"1337","DOI":"10.1002\/qsar.200810084","article-title":"Are the chemical structures in your QSAR correct?","volume":"27","author":"Young","year":"2008","journal-title":"QSAR Comb. Sci."},{"key":"ref_104","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1021\/ci400572x","article-title":"Data set modelability by QSAR","volume":"54","author":"Golbraikh","year":"2014","journal-title":"J. Chem. Inf. Model."},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Golbraikh, A., Fourches, D., Sedykh, A., Muratov, E., Liepina, I., and Tropsha, A. (2014). Modelability Criteria: Statistical Characteristics Estimating Feasibility to Build Predictive QSAR Models for a Dataset, Springer.","DOI":"10.1007\/978-1-4899-7445-7_7"},{"key":"ref_106","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1021\/acs.jcim.5b00539","article-title":"Kernel Target Alignment Parameter: A New Modelability Measure for Regression Tasks","volume":"56","author":"Marcou","year":"2016","journal-title":"J. Chem. Inf. Model."},{"key":"ref_107","doi-asserted-by":"crossref","unstructured":"Hollander, M., Wolfe, D., and Chicken, E. (2015). Nonparametric Statistical Methods, Wiley. [3rd ed.].","DOI":"10.1002\/9781119196037"},{"key":"ref_108","unstructured":"Mendiburu, F.D. (2017). Agricolae: Statistical Procedures for Agricultural Research, R Package Team. R Package Version 1.2-8."},{"key":"ref_109","doi-asserted-by":"crossref","first-page":"1733","DOI":"10.1021\/ci800151m","article-title":"Critical assessment of QSAR models of environmental toxicity against tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection","volume":"48","author":"Tetko","year":"2008","journal-title":"J. Chem. Inf. Model."},{"key":"ref_110","doi-asserted-by":"crossref","first-page":"766","DOI":"10.1021\/ci700443v","article-title":"Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis","volume":"48","author":"Zhu","year":"2008","journal-title":"J. Chem. Inf. Model."}],"container-title":["Molecules"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1420-3049\/24\/9\/1698\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:48:24Z","timestamp":1760186904000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1420-3049\/24\/9\/1698"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,30]]},"references-count":110,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2019,5]]}},"alternative-id":["molecules24091698"],"URL":"https:\/\/doi.org\/10.3390\/molecules24091698","relation":{},"ISSN":["1420-3049"],"issn-type":[{"value":"1420-3049","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,4,30]]}}}