{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T12:21:31Z","timestamp":1767961291117,"version":"3.49.0"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"S6","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-s6-s7","type":"journal-article","created":{"date-parts":[[2008,5,28]],"date-time":"2008-05-28T18:15:46Z","timestamp":1211998546000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["Asymmetric bagging and feature selection for activities prediction of drug molecules"],"prefix":"10.1186","volume":"9","author":[{"given":"Guo-Zheng","family":"Li","sequence":"first","affiliation":[]},{"given":"Hao-Hua","family":"Meng","sequence":"additional","affiliation":[]},{"given":"Wen-Cong","family":"Lu","sequence":"additional","affiliation":[]},{"given":"Jack Y","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Mary Qu","family":"Yang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,5,28]]},"reference":[{"key":"2618_CR1","volume-title":"10th Online World Conference on Soft Computing in Industrial Applications","author":"SJ Barrett","year":"2005","unstructured":"Barrett SJ, Langdon WB: Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development. 10th Online World Conference on Soft Computing in Industrial Applications. 2005, Springer"},{"issue":"1","key":"2618_CR2","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/S0169-7439(99)00034-9","volume":"49","author":"Y Tominaga","year":"1999","unstructured":"Tominaga Y: Comparative Study of Class Data Analysis with PCA-LDA, SIMCA, PLS, ANNs, and K-NN. Chemometrics and Intelligent Laboratory Systems. 1999, 49 (1): 105-115.","journal-title":"Chemometrics and Intelligent Laboratory Systems"},{"key":"2618_CR3","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1016\/S0169-7439(02)00050-3","volume":"64","author":"K Tang","year":"2002","unstructured":"Tang K, Li T: Combining PLS with GA-GP for QSAR. Chemometrics and Intelligent Laboratory Systems. 2002, 64: 55-64.","journal-title":"Chemometrics and Intelligent Laboratory Systems"},{"key":"2618_CR4","doi-asserted-by":"publisher","first-page":"2106","DOI":"10.1021\/ci049798m","volume":"44","author":"KT Fang","year":"2004","unstructured":"Fang KT, Yin H, Liang YZ: New Approach by Kriging Models to Problems in QSAR. Journal of Chemical Information and Computer Science. 2004, 44: 2106-2113.","journal-title":"Journal of Chemical Information and Computer Science"},{"issue":"6","key":"2618_CR5","doi-asserted-by":"publisher","first-page":"2047","DOI":"10.1021\/ci049941b","volume":"44","author":"GZ Li","year":"2004","unstructured":"Li GZ, Yang J, Song HF, Yang SS, Lu WC, Chen NY: Semiempirical Quantum Chemical Method and Artificial Neural Networks Applied for Max Computation of Some Azo Dyes. Journal of Chemical Information and Computer Science. 2004, 44 (6): 2047-2050.","journal-title":"Journal of Chemical Information and Computer Science"},{"issue":"5","key":"2618_CR6","doi-asserted-by":"publisher","first-page":"1630","DOI":"10.1021\/ci049869h","volume":"44","author":"Y Xue","year":"2004","unstructured":"Xue Y, Li ZR, Yap CW, Sun LZ, Chen X, Chen YZ: Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents. Journal of Chemical Information & Computer Science. 2004, 44 (5): 1630-1638.","journal-title":"Journal of Chemical Information & Computer Science"},{"key":"2618_CR7","doi-asserted-by":"publisher","DOI":"10.1142\/9789812794710","volume-title":"Support Vector Machines in Chemistry","author":"NY Chen","year":"2004","unstructured":"Chen NY, Lu WC, Yang J, Li GZ: Support Vector Machines in Chemistry. 2004, Singapore: World Scientific Publishing Company"},{"issue":"6","key":"2618_CR8","doi-asserted-by":"publisher","first-page":"2478","DOI":"10.1021\/ci060128l","volume":"46","author":"S Bhavani","year":"2006","unstructured":"Bhavani S, Nagargadde A, Thawani A, Sridhar V, Chandra N: Substructure-Based Support Vector Machine Classifiers for Prediction of Adverse Effects in Diverse Classes of Drugs. Journal of Chemical Information and Modeling. 2006, 46 (6): 2478-2486.","journal-title":"Journal of Chemical Information and Modeling"},{"issue":"4","key":"2618_CR9","first-page":"97","volume":"18","author":"T Dietterich","year":"1998","unstructured":"Dietterich T: Machine-learning research: Four current directions. The AI Magazine. 1998, 18 (4): 97-136.","journal-title":"The AI Magazine"},{"issue":"2","key":"2618_CR10","first-page":"197","volume":"5","author":"R Schapire","year":"1990","unstructured":"Schapire R: The strength of weak learn ability. Machine learning. 1990, 5 (2): 197-227.","journal-title":"Machine learning"},{"issue":"2","key":"2618_CR11","first-page":"123","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L: Bagging predictors. Machine Learning. Machine learning. 1996, 24 (2): 123-140.","journal-title":"Machine learning"},{"issue":"1\u20132","key":"2618_CR12","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1023\/A:1007515423169","volume":"36","author":"E Bauer","year":"1999","unstructured":"Bauer E, Kohavi R: An empirical comparison of voting classification algorithms: Bagging, Boosting, and variants. Machine learning. 1999, 36 (1\u20132): 105-139.","journal-title":"Machine learning"},{"key":"2618_CR13","doi-asserted-by":"publisher","first-page":"903","DOI":"10.1021\/ci0203702","volume":"42","author":"DK Agrafiotis","year":"2002","unstructured":"Agrafiotis DK, no WC, Lobanov VS: On the Use of Neural Network Ensembles in QSAR and QSPR. J Chem Inf Comput Sci. 2002, 42: 903-911.","journal-title":"J Chem Inf Comput Sci"},{"key":"2618_CR14","doi-asserted-by":"publisher","first-page":"2163","DOI":"10.1021\/ci034129e","volume":"43","author":"JK Lanctot","year":"2003","unstructured":"Lanctot JK, Putta S, Lemmen C, Greene J: Using Ensembles to Classify Compounds for Drug Discovery. J Chem Inf Comput Sci. 2003, 43: 2163-2169.","journal-title":"J Chem Inf Comput Sci"},{"key":"2618_CR15","doi-asserted-by":"publisher","first-page":"2179","DOI":"10.1021\/ci049849f","volume":"44","author":"R Guha","year":"2004","unstructured":"Guha R, Jurs PC: Development of Linear, Ensemble, and Nonlinear Models for the Prediction andInterpretation of the Biological Activity of a Set of PDGFR Inhibitors. J Chem Inf Comput Sci. 2004, 44: 2179-2189.","journal-title":"J Chem Inf Comput Sci"},{"issue":"3","key":"2618_CR16","doi-asserted-by":"publisher","first-page":"989","DOI":"10.1021\/ci600563w","volume":"47","author":"D Dutta","year":"2007","unstructured":"Dutta D, Guha R, Wild D, Chen T: Ensemble Feature Selection: Consistent Descriptor Subsets for Multiple QSAR Models. Journal of Chemical Information and Modeling. 2007, 47 (3): 989-997.","journal-title":"Journal of Chemical Information and Modeling"},{"key":"2618_CR17","doi-asserted-by":"publisher","first-page":"2408","DOI":"10.1021\/ci7002076","volume":"47","author":"T Hou","year":"2007","unstructured":"Hou T, Wang J, Li Y: ADME Evaluation in Drug Discovery. 8. The Prediction of Human Intestinal Absorption by a Support Vector Machine. J Chem Inf Model. 2007, 47: 2408-2415.","journal-title":"J Chem Inf Model"},{"issue":"7","key":"2618_CR18","doi-asserted-by":"publisher","first-page":"1088","DOI":"10.1109\/TPAMI.2006.134","volume":"28","author":"D Tao","year":"2006","unstructured":"Tao D, Tang X, Li X, Wu X: Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006, 28 (7): 1088-1099.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2618_CR19","volume-title":"Construction and Assessment of Classification Rules","author":"DJ Hand","year":"1997","unstructured":"Hand DJ: Construction and Assessment of Classification Rules. 1997, Chichester: John Wiley and Sons"},{"issue":"Oct","key":"2618_CR20","first-page":"1205","volume":"5","author":"L Yu","year":"2004","unstructured":"Yu L, Liu H: Efficient Feature Selection Via Analysis of Relevance and Redundancy. Journal of Machine Learning Research. 2004, 5 (Oct): 1205-1224.","journal-title":"Journal of Machine Learning Research"},{"key":"2618_CR21","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/S0004-3702(97)00043-X","volume":"97","author":"R Kohavi","year":"1997","unstructured":"Kohavi R, George JH: Wrappers for Feature Subset Selection. Artificial Intelligence. 1997, 97: 273-324.","journal-title":"Artificial Intelligence"},{"key":"2618_CR22","first-page":"1157","volume":"3","author":"I Guyon","year":"2003","unstructured":"Guyon I, Elisseeff A: An Introduction to Variable and Feature Selection. Journal of machine learning research. 2003, 3: 1157-1182.","journal-title":"Journal of machine learning research"},{"issue":"5","key":"2618_CR23","doi-asserted-by":"publisher","first-page":"1823","DOI":"10.1021\/ci049875d","volume":"44","author":"Y Liu","year":"2004","unstructured":"Liu Y: A Comparative Study on Feature Selection Methods for Drug Discovery. J Chem Inf Comput Sci. 2004, 44 (5): 1823-1828.","journal-title":"J Chem Inf Comput Sci"},{"issue":"5","key":"2618_CR24","doi-asserted-by":"publisher","first-page":"1376","DOI":"10.1021\/ci050135u","volume":"45","author":"H Li","year":"2005","unstructured":"Li H, Yap CW, Ung CY, Xue Y, Cao ZW, Chen YZ: Effect of Selection of Molecular Descriptors on the Prediction of Blood-Brain Barrier Penetrating and Nonpenetrating Agents by Statistical Learning Methods. Journal of Chemical Information and Modeling. 2005, 45 (5): 1376-1384.","journal-title":"Journal of Chemical Information and Modeling"},{"issue":"1","key":"2618_CR25","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1021\/ci6002619","volume":"47","author":"T Eitrich","year":"2007","unstructured":"Eitrich T, Kless A, Druska C, Meye W, Grotendorst J: Classification of Highly Unbalanced CYP450 Data of Drugs Using Cost Sensitive Machine Learning Techniques. Journal of Chemical Information and Modeling. 2007, 47 (1): 97-103.","journal-title":"Journal of Chemical Information and Modeling"},{"key":"2618_CR26","first-page":"292","volume-title":"Lecture Notes on Artificial Intelligence 3173 (PRICAI2004)","author":"GZ Li","year":"2004","unstructured":"Li GZ, Yang J, Liu GP, Xue L: Feature selection for multi-class problems using support vector machines. Lecture Notes on Artificial Intelligence 3173 (PRICAI2004). 2004, Springer, 292-300."},{"key":"2618_CR27","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1145\/130385.130401","volume-title":"Proceedings of the Fifth Annual Workshop on Computational Learning Theory","author":"B Boser","year":"1992","unstructured":"Boser B, Guyon L, Vapnik V: A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory. 1992, Pittsburgh: ACM, 144-152."},{"key":"2618_CR28","volume-title":"An Introduction to Support Vector Machines","author":"N Cristianini","year":"2000","unstructured":"Cristianini N, Shawe-Taylor J: An Introduction to Support Vector Machines. 2000, Cambridge: Cambridge University Press"},{"key":"2618_CR29","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","volume":"46","author":"I Guyon","year":"2002","unstructured":"Guyon I, Weston J, Barnhill S, Vapnik V: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning. 2002, 46: 389-422.","journal-title":"Machine Learning"},{"key":"2618_CR30","volume-title":"Statistical Learning Theory","author":"V Vapnik","year":"1998","unstructured":"Vapnik V: Statistical Learning Theory. 1998, New York: Wiley"},{"key":"2618_CR31","volume-title":"Master's thesis","author":"W Karush","year":"1939","unstructured":"Karush W: Minima of Functions of Several Variables with Inequalities as Side Constraints. Master's thesis. 1939, Deptment of Mathematics, University of Chicago"},{"key":"2618_CR32","first-page":"481","volume-title":"Proceeding of the 2nd Berkeley Symposium on Mathematical Statistics and Probabilistic","author":"HW Kuhn","year":"1951","unstructured":"Kuhn HW, Tucker AW: Nonlinear Programming. Proceeding of the 2nd Berkeley Symposium on Mathematical Statistics and Probabilistic. 1951, Berkeley: University of California Press, 481-492."},{"key":"2618_CR33","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1098\/rsta.1909.0016","volume":"A 209","author":"J Mercer","year":"1909","unstructured":"Mercer J: Functions of Positive and Negative Type and their Connection with the Theory of Integral Equations. Philosophy Transactions on Royal Society in London. 1909, A 209: 415-446.","journal-title":"Philosophy Transactions on Royal Society in London"},{"key":"2618_CR34","volume-title":"Tech rep","author":"CW Hsu","year":"2003","unstructured":"Hsu CW, Chang CC, Lin CJ: A Practical Guide to Support Vector Classification. Tech rep. 2003, Department of Computer Science and Information Engineering of National Taiwan University, [14 August 2003], [http:\/\/www.csie.ntu.edu.tw\/~cjlin\/papers\/guide\/guide.pdf]"},{"key":"2618_CR35","volume-title":"LIBSVM \u2013 A Library for Support Vector Machines Version 2.85","author":"CC Chang","year":"2007","unstructured":"Chang CC, Lin CJ: LIBSVM \u2013 A Library for Support Vector Machines Version 2.85. 2007, [http:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvm\/index.html]"},{"key":"2618_CR36","first-page":"271","volume-title":"PRICAI2006 Lecuture Notes in Computer Science 4099","author":"GZ Li","year":"2006","unstructured":"Li GZ, Liu TY: Feature Selection for Bagging of Support Vector Machines. PRICAI2006 Lecuture Notes in Computer Science 4099. 2006, Springer, 271-277."},{"key":"2618_CR37","first-page":"683","volume-title":"Advances in Neural Information Processing Systems","author":"J Moody","year":"1992","unstructured":"Moody J, Utans J: Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction. Advances in Neural Information Processing Systems. Edited by: Moody JE, Hanson SJ, Lippmann RP. 1992, Morgan Kaufmann Publishers, Inc, 683-690."},{"key":"2618_CR38","volume-title":"Pattern Classification","author":"RO Duda","year":"2000","unstructured":"Duda RO, Hart PE, Stork DG: Pattern Classification. 2000, Wiley Interscience, 2","edition":"2"},{"key":"2618_CR39","doi-asserted-by":"publisher","DOI":"10.1002\/9783527613106","volume-title":"Handbook of Molecular Descriptors","author":"R Todeschini","year":"2000","unstructured":"Todeschini R, Consonni V: Handbook of Molecular Descriptors. 2000, Weinheim, Germany: Viley-VCH"},{"key":"2618_CR40","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/S0169-7439(01)00181-2","volume":"60","author":"SS Young","year":"2002","unstructured":"Young SS, Gombar VK, Emptage MR, Cariello NF, Lambert C: Mixture Deconvolution and Analysis of Ames Mutagenicity Data. Chemometrics and Intelligent Laboratory Systems. 2002, 60: 5-11.","journal-title":"Chemometrics and Intelligent Laboratory Systems"},{"key":"2618_CR41","doi-asserted-by":"publisher","first-page":"1463","DOI":"10.1021\/ci034032s","volume":"43","author":"J Feng","year":"2003","unstructured":"Feng J, Lurati L, Ouyang H, Robinson T, Wang Y, Yuan S, Young SS: Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods. Journal of Chemical Information and Computer Science. 2003, 43: 1463-1470.","journal-title":"Journal of Chemical Information and Computer Science"},{"key":"2618_CR42","doi-asserted-by":"crossref","unstructured":"Levner I: Feature Selection and Nearest Centroid Classification for Protein Mass Spectrometry. BMC Bioinformatics. 2005, 6 (68):","DOI":"10.1186\/1471-2105-6-68"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-S6-S7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T08:26:51Z","timestamp":1630484811000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-S6-S7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,5]]},"references-count":42,"journal-issue":{"issue":"S6","published-print":{"date-parts":[[2008,5]]}},"alternative-id":["2618"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-s6-s7","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,5]]},"assertion":[{"value":"28 May 2008","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S7"}}