{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T20:51:53Z","timestamp":1761598313401,"version":"build-2065373602"},"reference-count":21,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2021,3,9]],"date-time":"2021-03-09T00:00:00Z","timestamp":1615248000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor.<\/jats:p>","DOI":"10.3390\/e23030324","type":"journal-article","created":{"date-parts":[[2021,3,9]],"date-time":"2021-03-09T12:08:01Z","timestamp":1615291681000},"page":"324","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Ensemble Linear Subspace Analysis of High-Dimensional Data"],"prefix":"10.3390","volume":"23","author":[{"given":"S. Ejaz","family":"Ahmed","sequence":"first","affiliation":[{"name":"Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada"}]},{"given":"Saeid","family":"Amiri","sequence":"additional","affiliation":[{"name":"Department of Civil, Geologic and Mining Engineering Polytechnique Montre\u00e1l, Montre\u00e1l, QC H3T 1J4, Canada"}]},{"given":"Kjell","family":"Doksum","sequence":"additional","affiliation":[{"name":"Department of Statistics, University of Wisconsin, Madison, WI 53706, USA"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Guo, H., Yu, Z., An, J., Han, G., Ma, Y., and Tang, R. (2020). A two-stage mutual information based Bayesian Lasso algorithm for multi-locus genome-wide association studies. Entropy, 22.","DOI":"10.3390\/e22030329"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B, 267\u2013288.","DOI":"10.1111\/j.2517-6161.1996.tb02080.x"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1198\/016214506000000735","article-title":"The adaptive lasso and its oracle properties","volume":"101","author":"Zou","year":"2006","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable selection via nonconcave penalized likelihood and its oracle properties","volume":"96","author":"Fan","year":"2001","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1214\/009053604000000067","article-title":"Least angle regression","volume":"32","author":"Efron","year":"2004","journal-title":"Ann. Stat."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1214\/07-AOS582","article-title":"Lasso-type recovery of sparse representations for high-dimensional data","volume":"37","author":"Meinshausen","year":"2009","journal-title":"Ann. Stat."},{"key":"ref_8","first-page":"2541","article-title":"On model selection consistency of Lasso","volume":"7","author":"Zhao","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"874","DOI":"10.1016\/j.csda.2011.09.021","article-title":"Absolute penalty and shrinkage estimation in partially linear models","volume":"56","author":"Raheem","year":"2012","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2183","DOI":"10.1109\/TIT.2009.2016018","article-title":"Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (Lasso)","volume":"55","author":"Wainwright","year":"2009","journal-title":"Inf. Theory IEEE Trans."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"460","DOI":"10.1080\/10618600.2013.773239","article-title":"GlmmLasso: An algorithm for high-dimensional generalized linear mixed models using \u21131-penalization","volume":"23","author":"Schelldorfer","year":"2014","journal-title":"J. Comput. Graph. Stat."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ranganai, E., and Mudhombo, I. (2021). Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights. Entropy, 23.","DOI":"10.3390\/e23010033"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ahmed, S.E. (2014). Penalty, Shrinkage and Pretest Strategies: Variable Selection and Estimation, Springer.","DOI":"10.1007\/978-3-319-03149-1"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"B\u00fchlmann, P., and Van De Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Science & Business Media.","DOI":"10.1007\/978-3-642-20192-9"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations, CRC Press.","DOI":"10.1201\/b18401"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.inffus.2018.11.008","article-title":"Ensembles for feature selection: A review and future trends","volume":"52","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tu, W., Yang, D., Kong, L., Che, M., Shi, Q., Li, G., and Tian, G. (2019, January 10\u201316). Ensemble-based Ultrahigh-dimensional Variable Screening. Proceedings of the International Joint Conferences on Artificial Intelligence Organization, Macao, China.","DOI":"10.24963\/ijcai.2019\/501"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"14429","DOI":"10.1073\/pnas.0602562103","article-title":"Regulation of gene expression in the mammalian eye and its relevance to eye disease","volume":"103","author":"Scheetz","year":"2006","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"3498","DOI":"10.1214\/09-AOS683","article-title":"A unified approach to model selection and sparse recovery using regularized least squares","volume":"37","author":"Lv","year":"2009","journal-title":"Ann. Statist."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/3\/324\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:35:33Z","timestamp":1760160933000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/3\/324"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,9]]},"references-count":21,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["e23030324"],"URL":"https:\/\/doi.org\/10.3390\/e23030324","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2021,3,9]]}}}