{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T14:27:48Z","timestamp":1780583268072,"version":"3.54.1"},"reference-count":31,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2021,2,6]],"date-time":"2021-02-06T00:00:00Z","timestamp":1612569600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002724","name":"American University of Sharjah","doi-asserted-by":"publisher","award":["000000000"],"award-info":[{"award-number":["000000000"]}],"id":[{"id":"10.13039\/501100002724","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In the past decade, big data has become increasingly prevalent in a large number of applications. As a result, datasets suffering from noise and redundancy issues have necessitated the use of feature selection across multiple domains. However, a common concern in feature selection is that different approaches can give very different results when applied to similar datasets. Aggregating the results of different selection methods helps to resolve this concern and control the diversity of selected feature subsets. In this work, we implemented a general framework for the ensemble of multiple feature selection methods. Based on diversified datasets generated from the original set of observations, we aggregated the importance scores generated by multiple feature selection techniques using two methods: the Within Aggregation Method (WAM), which refers to aggregating importance scores within a single feature selection; and the Between Aggregation Method (BAM), which refers to aggregating importance scores between multiple feature selection methods. We applied the proposed framework on 13 real datasets with diverse performances and characteristics. The experimental evaluation showed that WAM provides an effective tool for determining the best feature selection method for a given dataset. WAM has also shown greater stability than BAM in terms of identifying important features. The computational demands of the two methods appeared to be comparable. The results of this work suggest that by applying both WAM and BAM, practitioners can gain a deeper understanding of the feature selection process.<\/jats:p>","DOI":"10.3390\/e23020200","type":"journal-article","created":{"date-parts":[[2021,2,7]],"date-time":"2021-02-07T14:07:19Z","timestamp":1612706839000},"page":"200","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["A Bootstrap Framework for Aggregating within and between Feature Selection Methods"],"prefix":"10.3390","volume":"23","author":[{"given":"Reem","family":"Salman","sequence":"first","affiliation":[{"name":"Department of Mathematics and Statistics, American University of Sharjah, Sharjah P.O. Box 26666, United Arab Emirates"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ayman","family":"Alzaatreh","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, American University of Sharjah, Sharjah P.O. Box 26666, United Arab Emirates"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4952-8298","authenticated-orcid":false,"given":"Hana","family":"Sulieman","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, American University of Sharjah, Sharjah P.O. Box 26666, United Arab Emirates"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shaimaa","family":"Faisal","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, American University of Sharjah, Sharjah P.O. Box 26666, United Arab Emirates"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,6]]},"reference":[{"key":"ref_1","first-page":"3","article-title":"A Supervised Feature Selection Approach Based on Global Sensitivity","volume":"5","author":"Sulieman","year":"2018","journal-title":"Arch. Data Sci. Ser. A (Online First)"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1016\/j.ejor.2015.09.051","article-title":"Integer programming models for feature selection: New extensions and a randomized solution algorithm","volume":"250","author":"Bertolazzi","year":"2016","journal-title":"Eur. J. Oper. Res."},{"key":"ref_3","first-page":"2320","article-title":"Review and evaluation of feature selection algorithms in synthetic problems","volume":"1101","year":"2011","journal-title":"CORR"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1099","DOI":"10.1057\/palgrave.jors.2601976","article-title":"Data mining feature selection for credit scoring models","volume":"56","author":"Liu","year":"2005","journal-title":"J. Oper. Res. Soc."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1007\/s10462-013-9406-y","article-title":"Metalearning: A survey of trends and technologies","volume":"44","author":"Lemke","year":"2015","journal-title":"Artif. Intell. Rev."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.eswa.2017.01.013","article-title":"Metalearning for choosing feature selection algorithms in data mining: Proposal of a new framework","volume":"75","author":"Parmezan","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Dietterich, T.G. (2000). Ensemble methods in machine learning. International Workshop on Multiple Classifier Systems, Springer.","DOI":"10.1007\/3-540-45014-9_1"},{"key":"ref_8","unstructured":"Khaire, U.M., and Dhanalakshmi, R. (2019). Stability of feature selection algorithm: A review. J. King Saud Univ. Comput. Inf. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.spl.2018.07.020","article-title":"The scale enhanced wild bootstrap method for evaluating climate models using wavelets","volume":"144","author":"Chatterjee","year":"2019","journal-title":"Stat. Probab. Lett."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1093\/bioinformatics\/btp630","article-title":"Robust biomarker identification for cancer diagnosis with ensemble feature selection methods","volume":"26","author":"Abeel","year":"2010","journal-title":"Bioinformatics"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhou, Q., Ding, J., Ning, Y., Luo, L., and Li, T. (2014, January 19\u201321). Stable feature selection with ensembles of multi-relieff. Proceedings of the 2014 10th International Conference on Natural Computation (ICNC), Xiamen, China.","DOI":"10.1109\/ICNC.2014.6975929"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Diren, D.D., Boran, S., Selvi, I.H., and Hatipoglu, T. (2019). Root cause detection with an ensemble machine learning approach in the multivariate manufacturing process. Industrial Engineering in the Big Data Era, Springer.","DOI":"10.1007\/978-3-030-03317-0_14"},{"key":"ref_13","first-page":"289","article-title":"Feature Selection Ensemble","volume":"10","author":"Shen","year":"2012","journal-title":"Turing-100"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wald, R., Khoshgoftaar, T.M., and Dittman, D. (2012, January 12\u201315). Mean aggregation versus robust rank aggregation for ensemble gene selection. Proceedings of the 2012 11th International Conference on Machine Learning and Applications, Boca Raton, FL, USA.","DOI":"10.1109\/ICMLA.2012.20"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/bioinformatics\/btr709","article-title":"Robust rank aggregation for gene list integration and meta-analysis","volume":"28","author":"Kolde","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"880","DOI":"10.1109\/TNNLS.2014.2320415","article-title":"A bootstrap based neyman-pearson test for identifying variable importance","volume":"26","author":"Ditzler","year":"2014","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1650029","DOI":"10.1142\/S0219720016500293","article-title":"Evaluating feature-selection stability in next-generation proteomics","volume":"14","author":"Goh","year":"2016","journal-title":"J. Bioinform. Comput. Biol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/s10115-006-0040-8","article-title":"Stability of feature selection algorithms: A study on high-dimensional spaces","volume":"12","author":"Kalousis","year":"2007","journal-title":"Knowl. Inf. Syst."},{"key":"ref_19","unstructured":"Jurman, G., Riccadonna, S., Visintainer, R., and Furlanello, C. (2009, January 11). Canberra distance on ranked lists. Proceedings of the Advances in Ranking NIPS 09 Workshop, Citeseer, Whistler, BC, Canada."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Shen, Z., Chen, X., and Garibaldi, J.M. (2019, January 23\u201326). A Novel Weighted Combination Method for Feature Selection using Fuzzy Sets. Proceedings of the 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), New Orleans, LA, USA.","DOI":"10.1109\/FUZZ-IEEE.2019.8858890"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/j.inffus.2018.02.007","article-title":"On developing an automatic threshold applied to feature selection ensembles","volume":"45","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1007\/s11063-017-9619-1","article-title":"Testing different ensemble configurations for feature selection","volume":"46","year":"2017","journal-title":"Neural Process. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Khoshgoftaar, T.M., Golawala, M., and Van Hulse, J. (2007, January 29\u201331). An empirical study of learning from imbalanced data using random forest. Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), Patras, Greece.","DOI":"10.1109\/ICTAI.2007.46"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1007\/s10115-012-0487-8","article-title":"A review of feature selection methods on synthetic data","volume":"34","year":"2013","journal-title":"Knowl. Inf. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1093\/bioinformatics\/bti171","article-title":"Optimal number of features as a function of sample size for various classification rules","volume":"21","author":"Hua","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_26","unstructured":"S\u00e1nchez-Marono, N., Alonso-Betanzos, A., and Tombilla-Sanrom\u00e1n, M. (2007, January 16\u201319). Filter methods for feature selection\u2013a comparative study. Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Birmingham, UK."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1080\/21642583.2019.1620658","article-title":"An ensemble feature selection method for high-dimensional data based on sort aggregation","volume":"7","author":"Wang","year":"2019","journal-title":"Syst. Sci. Control Eng."},{"key":"ref_28","unstructured":"John, G.H., and Langley, P. (2013). Estimating continuous distributions in Bayesian classifiers. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"27:1","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: A library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"3940","DOI":"10.1093\/bioinformatics\/bti623","article-title":"ROCR: Visualizing classifier performance in R","volume":"21","author":"Sing","year":"2005","journal-title":"Bioinformatics"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/2\/200\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:20:21Z","timestamp":1760160021000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/2\/200"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,6]]},"references-count":31,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["e23020200"],"URL":"https:\/\/doi.org\/10.3390\/e23020200","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,6]]}}}