{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T18:37:27Z","timestamp":1776278247331,"version":"3.50.1"},"reference-count":59,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2024,6,21]],"date-time":"2024-06-21T00:00:00Z","timestamp":1718928000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Axioms"],"abstract":"<jats:p>This article assesses the predictive accuracy of factor models utilizing Partial\u00b7Least\u00b7Squares (PLS) and Principal\u00b7Component\u00b7Analysis (PCA) in comparison to autometrics and penalization techniques. The simulation exercise examines three types of scenarios by introducing the issues of multicollinearity, heteroscedasticity, and autocorrelation. The number of predictors and sample size are adjusted to observe the effects. The accuracy of the models is evaluated by calculating the Root\u00b7Mean\u00b7Square\u00b7Error (RMSE) and the Mean\u00b7Absolute\u00b7Error (MAE). In the presence of severe multicollinearity, the factor approach utilizing (PLS demonstrates exceptional performance in comparison. Autometrics achieves the lowest RMSE and MAE values across all levels of heteroscedasticity. Autometrics provides better forecasts with low and moderate autocorrelation. However, Elastic\u00b7Smoothly\u00b7Clipped\u00b7Absolute\u00b7Deviation (E-SCAD) forecasts well with severe autocorrelation. In addition to the simulation, we employ a popular Pakistani macroeconomic dataset for empirical research. The dataset contains 79 monthly variables from January 2013 to December 2020. The competing approaches perform differently compared to the simulation datasets, although \u201cThe PLS factor approach outperforms its competing approaches in forecasting, with lower RMSE and MAE\u201d. It is more probable that the actual dataset exhibits a high degree of multicollinearity.<\/jats:p>","DOI":"10.3390\/axioms13070418","type":"journal-article","created":{"date-parts":[[2024,6,21]],"date-time":"2024-06-21T08:50:08Z","timestamp":1718959808000},"page":"418","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Analysis of Fat Big Data Using Factor Models and Penalization Techniques: A Monte Carlo Simulation and Application"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4820-8911","authenticated-orcid":false,"given":"Faridoon","family":"Khan","sequence":"first","affiliation":[{"name":"Department of Creative Technology, Faculty of Computing and AI, Air University, Islamabad 44000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7772-0386","authenticated-orcid":false,"given":"Olayan","family":"Albalawi","sequence":"additional","affiliation":[{"name":"Department of Statistics, Faculty of Science, University of Tabuk, Tabuk 47512, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,6,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"e1524","DOI":"10.1002\/wics.1524","article-title":"Robust linear regression for high-dimensional data: An overview","volume":"13","author":"Filzmoser","year":"2021","journal-title":"Wiley Interdiscip. Rev. Comput. Stat."},{"key":"ref_2","unstructured":"Gujarati, D.N., Porter, D.C., and Gunasekar, S. (2012). Basic Econometrics, Tata McGraw-Hill Education."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Kim, H.H., and Swanson, N.R. (2013). Mining Big Data Using Parsimonious Factor and Shrinkage Methods, Rutgers University. Working paper.","DOI":"10.2139\/ssrn.2294110"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1198\/073500102317351921","article-title":"Macroeconomic forecasting using diffusion indexes","volume":"20","author":"Stock","year":"2002","journal-title":"J. Bus. Econ. Stat."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1080\/07350015.2012.715956","article-title":"Generalized shrinkage methods for forecasting using many predictors","volume":"30","author":"Stock","year":"2012","journal-title":"J. Bus. Econ. Stat."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1017\/S0266466618000245","article-title":"The factor-lasso and k-step bootstrap approach for inference in high-dimensional economic applications","volume":"35","author":"Hansen","year":"2019","journal-title":"Econom. Theory"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jeconom.2015.10.003","article-title":"Efficient estimation of approximate factor models via penalized maximum likelihood","volume":"191","author":"Bai","year":"2016","journal-title":"J. Econom."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Fan, J., Ke, Y., and Liao, Y. (2016). Robust factor models with explanatory proxies. arXiv.","DOI":"10.2139\/ssrn.2753404"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1214\/15-AOS1364","article-title":"Projected principal component analysis in factor models","volume":"44","author":"Fan","year":"2016","journal-title":"Ann. Stat."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/j.jeconom.2017.08.009","article-title":"Sufficient forecasting using factor models","volume":"201","author":"Fan","year":"2017","journal-title":"J. Econom."},{"key":"ref_11","first-page":"387","article-title":"Measuring the effects of monetary policy: A factor-augmented vector autoregressive (FAVAR) approach","volume":"120","author":"Bernanke","year":"2005","journal-title":"Q. J. Econ."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1077","DOI":"10.1080\/00036846.2020.1826399","article-title":"Macroeconomic forecasting for Pakistan in a data-rich environment","volume":"53","author":"Syed","year":"2021","journal-title":"Appl. Econ."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties","volume":"96","author":"Fan","year":"2001","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1198\/016214506000000735","article-title":"The adaptive lasso and its oracle properties","volume":"101","author":"Zou","year":"2006","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1733","DOI":"10.1214\/08-AOS625","article-title":"On the adaptive elastic-net with a diverging number of parameters","volume":"37","author":"Zou","year":"2009","journal-title":"Ann. Stat."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"894","DOI":"10.1214\/09-AOS729","article-title":"Nearly unbiased variable selection under minimax concave penalty","volume":"38","author":"Zhang","year":"2010","journal-title":"Ann. Stat."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1080\/02331888.2012.719513","article-title":"Group variable selection via SCAD-L 2","volume":"48","author":"Zeng","year":"2014","journal-title":"Statistics"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1016\/j.jeconom.2008.08.010","article-title":"Forecasting economic time series using targeted predictors","volume":"146","author":"Bai","year":"2008","journal-title":"J. Econom."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1016\/j.jeconom.2008.08.011","article-title":"Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components?","volume":"146","author":"Giannone","year":"2008","journal-title":"J. Econom."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1016\/j.jeconom.2013.04.015","article-title":"Forecasting by factors, by variables, by both or neither?","volume":"177","author":"Castle","year":"2013","journal-title":"J. Econom."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.ijforecast.2013.05.001","article-title":"Forecasting with approximate dynamic factor models: The role of non-pervasive shocks","volume":"30","author":"Luciani","year":"2014","journal-title":"Int. J. Forecast."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1045216","DOI":"10.1080\/23322039.2015.1045216","article-title":"Statistical model selection with big data","volume":"3","author":"Doornik","year":"2015","journal-title":"Cogent Econ. Financ."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1080\/07350015.2015.1084308","article-title":"Diffusion indexes with sparse loadings","volume":"35","author":"Kristensen","year":"2017","journal-title":"J. Bus. Econ. Stat."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1016\/j.ijforecast.2014.03.016","article-title":"Forecasting macroeconomic time series: LASSO-based approaches and their forecast combinations with dynamic factor models","volume":"30","author":"Li","year":"2014","journal-title":"Int. J. Forecast."},{"key":"ref_27","unstructured":"Marsilli, C. (2024, June 17). Variable Selection in Predictive MIDAS Models. Banque de France Working Paper No. 520. Available online: https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=2531339."},{"key":"ref_28","unstructured":"Nicholson, W., Matteson, D., and Bien, J. (2017). BigVAR: Tools for modeling sparse high-dimensional multivariate time series. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"352","DOI":"10.1016\/j.jeconom.2013.08.033","article-title":"Forecasting financial and macroeconomic variables using data reduction methods: New empirical evidence","volume":"178","author":"Kim","year":"2014","journal-title":"J. Econom."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1016\/j.ijforecast.2016.02.012","article-title":"Mining big data using parsimonious factor, machine learning, variable selection and shrinkage methods","volume":"34","author":"Kim","year":"2018","journal-title":"Int. J. Forecast."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1111\/caje.12336","article-title":"Big data analytics in economics: What have we learned so far, and where should we go from here?","volume":"51","author":"Swanson","year":"2018","journal-title":"Can. J. Econ."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1002\/jae.2768","article-title":"Predicting interest rates using shrinkage methods, real-time diffusion indexes, and model combinations","volume":"35","author":"Swanson","year":"2020","journal-title":"J. Appl. Econom."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1016\/j.ijforecast.2018.01.001","article-title":"Macroeconomic forecasting using penalized regression methods","volume":"34","author":"Smeekes","year":"2018","journal-title":"Int. J. Forecast."},{"key":"ref_34","first-page":"12","article-title":"Forecasting using supervised factor models","volume":"4","author":"Tu","year":"2019","journal-title":"J. Manag. Sci. Eng."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.econmod.2019.09.046","article-title":"Improving forecast accuracy of financial vulnerability: PLS factor model approach","volume":"88","author":"Kim","year":"2020","journal-title":"Econ. Model."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"101104","DOI":"10.1016\/j.jjie.2020.101104","article-title":"Macroeconomic forecasting using factor models and machine learning: An application to Japan","volume":"58","author":"Maehashi","year":"2020","journal-title":"J. Jpn. Int. Econ."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"10","DOI":"10.2478\/crebss-2020-0002","article-title":"Modelling and forecasting GDP using factor model: An empirical study from Bosnia and Herzegovina","volume":"6","year":"2020","journal-title":"Croat. Rev. Econ. Bus. Soc. Stat."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2859","DOI":"10.1007\/s00181-019-01744-y","article-title":"Forecasting financial stress indices in Korea: A factor model approach","volume":"59","author":"Kim","year":"2020","journal-title":"Empir. Econ."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1002\/for.2724","article-title":"Forecasting financial vulnerability in the USA: A factor model approach","volume":"40","author":"Kim","year":"2021","journal-title":"J. Forecast."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"6117513","DOI":"10.1155\/2021\/6117513","article-title":"Comparing the Forecast Performance of Advanced Statistical and Machine Learning Techniques Using Huge Big Data: Evidence from Monte Carlo Experiments","volume":"2021","author":"Khan","year":"2021","journal-title":"Complexity"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Kelly, B.T., Kuznetsov, B., Malamud, S., and Xu, T.A. (2023). Deep Learning from Implied Volatility Surfaces, Swiss Finance Institute. Swiss Finance Institute Research Paper.","DOI":"10.2139\/ssrn.4531181"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Kelly, B., Kuznetsov, B., Malamud, S., and Xu, T.A. (2024). Large (and Deep) Factor Models. arXiv.","DOI":"10.2139\/ssrn.4679269"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Kozak, S., and Nagel, S. (2023). When Do Cross-Sectional Asset Pricing Factors Span the Stochastic Discount Factor? (No. w31275), National Bureau of Economic Research.","DOI":"10.3386\/w31275"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Didisheim, A., Ke, S.B., Kelly, B.T., and Malamud, S. (2023). Complexity in Factor Pricing Models (No. w31689), National Bureau of Economic Research.","DOI":"10.3386\/w31689"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1287\/mnsc.2023.4695","article-title":"Deep learning in asset pricing","volume":"70","author":"Chen","year":"2024","journal-title":"Manag. Sci."},{"key":"ref_46","unstructured":"Fan, J., Ke, Z.T., Liao, Y., and Neuhierl, A. (2024, June 17). Structural Deep Learning in Conditional Asset Pricing. Available at SSRN 4117882. Available online: https:\/\/static1.squarespace.com\/static\/5d6417169b0edd0001903770\/t\/655524542cbf566e3801a2ed\/1700078678513\/guilherme+piancetino.pdf."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1016\/S0304-3932(99)00027-6","article-title":"Forecasting inflation","volume":"44","author":"Stock","year":"1999","journal-title":"J. Monet. Econ."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1556","DOI":"10.1016\/j.ijforecast.2020.08.002","article-title":"Modelling non-stationary \u2018Big Data\u2019","volume":"37","author":"Castle","year":"2021","journal-title":"Int. J. Forecast."},{"key":"ref_49","first-page":"6607330","article-title":"Evaluating the performance of feature selection methods using huge big data: A Monte Carlo simulation approach","volume":"2022","author":"Khan","year":"2022","journal-title":"Math. Probl. Eng."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1167","DOI":"10.1198\/016214502388618960","article-title":"Forecasting using principal components from a large number of predictors","volume":"97","author":"Stock","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1133","DOI":"10.1111\/j.1468-0262.2006.00696.x","article-title":"Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions","volume":"74","author":"Bai","year":"2006","journal-title":"Econometrica"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1111\/1468-0262.00273","article-title":"Determining the number of factors in approximate factor models","volume":"70","author":"Bai","year":"2002","journal-title":"Econometrica"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1016\/j.jeconom.2005.01.015","article-title":"Evaluating latent and observed factors in macroeconomics and finance","volume":"131","author":"Bai","year":"2006","journal-title":"J. Econom."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1016\/j.jeconom.2005.01.027","article-title":"Are more data always better for factor analysis?","volume":"132","author":"Boivin","year":"2006","journal-title":"J. Econom."},{"key":"ref_55","unstructured":"Wold, H. (1982). Soft Modelling: The Basic Design and Some Extensions, Vol. 1 of Systems under Indirect Observation, Part II, North-Holland."},{"key":"ref_56","unstructured":"Pascual Herrero, H. (2020). Least Squares Regression Principal Component Analysis. [Bachelor\u2019s Thesis, Universitat Polit\u00e8cnica de Catalunya]."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1007\/s10463-016-0588-3","article-title":"Variable selection and estimation using a continuous approximation to the L0 penalty","volume":"70","author":"Wang","year":"2018","journal-title":"Ann. Inst. Stat. Math."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1007\/s00362-019-01107-w","article-title":"Nonnegative estimation and variable selection under minimax concave penalty for sparse high-dimensional linear regression models","volume":"62","author":"Li","year":"2021","journal-title":"Stat. Pap."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"9223763","DOI":"10.1155\/2021\/9223763","article-title":"A Comparison of Autometrics and Penalization Techniques under Various Error Distributions: Evidence from Monte Carlo Simulation","volume":"2021","author":"Khan","year":"2021","journal-title":"Complexity"}],"container-title":["Axioms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2075-1680\/13\/7\/418\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:02:17Z","timestamp":1760108537000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2075-1680\/13\/7\/418"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,21]]},"references-count":59,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["axioms13070418"],"URL":"https:\/\/doi.org\/10.3390\/axioms13070418","relation":{},"ISSN":["2075-1680"],"issn-type":[{"value":"2075-1680","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,21]]}}}