{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T20:13:42Z","timestamp":1772309622090,"version":"3.50.1"},"reference-count":20,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T00:00:00Z","timestamp":1760572800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Center for Research and Development in Mathematics and Applications (CIDMA) under the Portuguese Foundation for Science and Technology","award":["UID\/4106\/2025"],"award-info":[{"award-number":["UID\/4106\/2025"]}]},{"name":"Center for Research and Development in Mathematics and Applications (CIDMA) under the Portuguese Foundation for Science and Technology","award":["UID\/PRR\/4106\/2025"],"award-info":[{"award-number":["UID\/PRR\/4106\/2025"]}]},{"name":"FCT","award":["CEECIND\/04697\/2017"],"award-info":[{"award-number":["CEECIND\/04697\/2017"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Despite the advances on data analysis methodologies in the last decades, most of the traditional regression methods cannot be directly applied to large-scale data. Although aggregation methods are especially designed to deal with large-scale data, their performance may be strongly reduced in ill-conditioned problems (due to collinearity issues). This work compares the performance of a recent approach based on normalized entropy, a concept from information theory and info-metrics, with bagging and magging, two well-established aggregation methods in the literature, providing valuable insights for applications in regression analysis with large-scale data. While the results reveal a similar performance between methods in terms of prediction accuracy, the approach based on normalized entropy largely outperforms the other methods in terms of precision accuracy, even considering a smaller number of groups and observations per group, which represents an important advantage in inference problems with large-scale data. This work also alerts for the risk of using the OLS estimator, particularly under collinearity scenarios, knowing that data scientists frequently use linear models as a simplified view of the reality in big data analysis, and the OLS estimator is routinely used in practice. Beyond the promising findings of the simulation study, our estimation and aggregation strategies show strong potential for real-world applications in fields such as econometrics, genomics, environmental sciences, and machine learning, where data challenges such as noise and ill-conditioning are persistent.<\/jats:p>","DOI":"10.3390\/e27101075","type":"journal-article","created":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T12:17:02Z","timestamp":1760617022000},"page":"1075","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Aggregation in Ill-Conditioned Regression Models: A Comparison with Entropy-Based Methods"],"prefix":"10.3390","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4632-3561","authenticated-orcid":false,"given":"Ana Helena","family":"Tavares","sequence":"first","affiliation":[{"name":"Center for Research and Development in Mathematics and Applications (CIDMA), 3810-193 Aveiro, Portugal"},{"name":"\u00c1gueda School of Technology and Management, University of Aveiro, 3750-127 \u00c1gueda, Portugal"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ana","family":"Silva","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tiago","family":"Freitas","sequence":"additional","affiliation":[{"name":"Department of Physics, University of Aveiro, 3810-193 Aveiro, Portugal"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4776-6375","authenticated-orcid":false,"given":"Maria","family":"Costa","sequence":"additional","affiliation":[{"name":"Center for Research and Development in Mathematics and Applications (CIDMA), 3810-193 Aveiro, Portugal"},{"name":"Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4371-8069","authenticated-orcid":false,"given":"Pedro","family":"Macedo","sequence":"additional","affiliation":[{"name":"Center for Research and Development in Mathematics and Applications (CIDMA), 3810-193 Aveiro, Portugal"},{"name":"Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9102-1362","authenticated-orcid":false,"given":"Rui A.","family":"da Costa","sequence":"additional","affiliation":[{"name":"Department of Physics, University of Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Institute for Nanostructures, Nanomodelling and Nanofabrication (i3N), 3810-193 Aveiro, Portugal"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Valenzuela, O., Rojas, F., Pomares, H., and Rojas, I. (2018, January 19\u201321). Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data. Proceedings of the Theory and Applications of Time Series Analysis, Granada, Spain.","DOI":"10.1007\/978-3-030-26036-1_2"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"620","DOI":"10.1103\/PhysRev.106.620","article-title":"Information theory and statistical mechanics","volume":"106","author":"Jaynes","year":"1957","journal-title":"Phys. Rev."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1103\/PhysRev.108.171","article-title":"Information theory and statistical mechanics. II","volume":"108","author":"Jaynes","year":"1957","journal-title":"Phys. Rev."},{"key":"ref_4","unstructured":"Golan, A., Judge, G., and Miller, D. (1996). Maximum Entropy Econometrics: Robust Estimation with Limited Data, Wiley."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Golan, A. (2018). Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information, Oxford University Press.","DOI":"10.1093\/oso\/9780199349524.001.0001"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1023\/A:1018054314350","article-title":"Bagging Predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1109\/JPROC.2015.2494161","article-title":"Magging: Maximin Aggregation for Inhomogeneous Large-Scale Data","volume":"104","author":"Meinshausen","year":"2016","journal-title":"Proc. IEEE"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","article-title":"Ridge Regression: Biased Estimation for Nonorthogonal Problems","volume":"12","author":"Hoerl","year":"1970","journal-title":"Technometrics"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1080\/00401706.1970.10488635","article-title":"Ridge Regression: Applications to Nonorthogonal Problems","volume":"12","author":"Hoerl","year":"1970","journal-title":"Technometrics"},{"key":"ref_11","unstructured":"Belsley, D.A., Kuh, E., and Welsch, R.E. (2004). Regression Diagnostics\u2013Identifying Influential Data and Sources of Collinearity, Wiley."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1080\/00031305.1994.10476030","article-title":"The Three Sigma Rule","volume":"48","author":"Pukelsheim","year":"1994","journal-title":"Am. Stat."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1080\/03610918.2022.2057540","article-title":"A two-stage maximum entropy approach for time series regression","volume":"53","author":"Macedo","year":"2024","journal-title":"Commun. Stat.-Simul. Comput."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"917","DOI":"10.3390\/e11040917","article-title":"A Weighted Generalized Maximum Entropy Estimator with a Data-driven Weight","volume":"11","author":"Wu","year":"2009","journal-title":"Entropy"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"190003","DOI":"10.1063\/5.0082228","article-title":"Neagging: An Aggregation Procedure Based On Normalized Entropy","volume":"2425","author":"Costa","year":"2022","journal-title":"AIP Conf. Proc."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1756","DOI":"10.3390\/e15051756","article-title":"The Data-Constrained Generalized Maximum Entropy Estimator of the GLM: Asymptotic Theory and Inference","volume":"15","author":"Mittelhammer","year":"2013","journal-title":"Entropy"},{"key":"ref_17","unstructured":"The MathWorks Inc. (2025, July 23). MATLAB Version: 9.7.0 (R2019b), Available online: https:\/\/www.mathworks.com."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"108551","DOI":"10.1016\/j.engappai.2024.108551","article-title":"A simple rapid sample-based clustering for large-scale data","volume":"133","author":"Chen","year":"2024","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_19","unstructured":"Khine, M.S. (2024). Penalized Regression in Large-Scale Data Analysis. Machine Learning in Educational Sciences: Approaches, Applications and Advances, Springer Nature."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1023\/A:1018046112532","article-title":"Stacked Regressions","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/10\/1075\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T12:38:28Z","timestamp":1760963908000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/10\/1075"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,16]]},"references-count":20,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["e27101075"],"URL":"https:\/\/doi.org\/10.3390\/e27101075","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,16]]}}}