{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T10:17:14Z","timestamp":1767867434412,"version":"3.49.0"},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T00:00:00Z","timestamp":1705881600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T00:00:00Z","timestamp":1705881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"The National Science and Technology Council","award":["112-2118-M-A49-003"],"award-info":[{"award-number":["112-2118-M-A49-003"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Rapid development in data science enables machine learning and artificial intelligence to be the most popular research tools across various disciplines. While numerous articles have shown decent predictive ability, little research has examined the impact of complex correlated data. We aim to develop a more accurate model under repeated measures or hierarchical data structures. Therefore, this study proposes a novel algorithm, the Generalized Estimating Equations Boosting (GEEB) machine, to integrate the gradient boosting technique into the benchmark statistical approach that deals with the correlated data, the generalized Estimating Equations (GEE). Unlike the previous gradient boosting utilizing all input features, we randomly select some input features when building the model to reduce predictive errors. The simulation study evaluates the predictive performance of the GEEB, GEE, eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) across several hierarchical structures with different sample sizes. Results suggest that the new strategy GEEB outperforms the GEE and demonstrates superior predictive accuracy than the SVM and XGBoost in most situations. An application to a real-world dataset, the Forest Fire Data, also revealed that the GEEB reduced mean squared errors by 4.5% to 25% compared to GEE, XGBoost, and SVM. This research also provides a freely available R function that could implement the GEEB machine effortlessly for longitudinal or hierarchical data.<\/jats:p>","DOI":"10.1186\/s40537-023-00875-5","type":"journal-article","created":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T06:02:34Z","timestamp":1705903354000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Generalized Estimating Equations Boosting (GEEB) machine for correlated data"],"prefix":"10.1186","volume":"11","author":[{"given":"Yuan-Wey","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hsin-Chou","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi-Hau","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chao-Yu","family":"Guo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,1,22]]},"reference":[{"key":"875_CR1","unstructured":"Bruin J (2012). R advanced: simulating the hospital doctor patient dataset. https:\/\/stats.idre.ucla.edu\/r\/codefragments\/mesimulation\/. Accessed 27 Oct 2022."},{"key":"875_CR2","doi-asserted-by":"crossref","unstructured":"Caruana, R. and A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. Proceedings of the 23rd international conference on Machine learning. 2006.","DOI":"10.1145\/1143844.1143865"},{"key":"875_CR3","doi-asserted-by":"crossref","unstructured":"Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.","DOI":"10.1145\/2939672.2939785"},{"key":"875_CR4","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273\u201397.","journal-title":"Mach Learn"},{"key":"875_CR5","unstructured":"Cortez P, Morais AJR. A data mining approach to predict forest fires using meteorological data. 2007."},{"key":"875_CR6","doi-asserted-by":"publisher","DOI":"10.1093\/oso\/9780198524847.001.0001","volume-title":"Analysis of longitudinal data","author":"P Diggle","year":"2002","unstructured":"Diggle P, Diggle PJ, Heagerty P, Liang K-Y, Zeger S. Analysis of longitudinal data. Oxford: Oxford University Press; 2002."},{"key":"875_CR7","doi-asserted-by":"publisher","DOI":"10.1201\/b13880","volume-title":"Generalized estimating equations","author":"JW Hardin","year":"2012","unstructured":"Hardin JW, Hilbe JM. Generalized estimating equations. Boca Raton: CRC Press; 2012."},{"key":"875_CR8","unstructured":"Ho TK. Random decision forests. Proceedings of 3rd international conference on document analysis and recognition, IEEE. 1995."},{"issue":"6245","key":"875_CR9","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1126\/science.aaa8415","volume":"349","author":"MI Jordan","year":"2015","unstructured":"Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255\u201360.","journal-title":"Science"},{"key":"875_CR10","doi-asserted-by":"publisher","first-page":"963","DOI":"10.2307\/2529876","volume":"38","author":"NM Laird","year":"1982","unstructured":"Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963\u201374.","journal-title":"Biometrics"},{"issue":"1","key":"875_CR11","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1093\/biomet\/73.1.13","volume":"73","author":"K-Y Liang","year":"1986","unstructured":"Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13\u201322.","journal-title":"Biometrika"},{"key":"875_CR12","doi-asserted-by":"publisher","first-page":"270","DOI":"10.2307\/2533218","volume":"50","author":"SR Lipsitz","year":"1994","unstructured":"Lipsitz SR, Fitzmaurice GM, Orav EJ, Laird NM. Performance of generalized estimating equations in practical situations. Biometrics. 1994;50:270\u20138.","journal-title":"Biometrics"},{"issue":"11","key":"875_CR13","doi-asserted-by":"publisher","first-page":"1149","DOI":"10.1002\/sim.4780131106","volume":"13","author":"SR Lipsitz","year":"1994","unstructured":"Lipsitz SR, Kim K, Zhao L. Analysis of repeated categorical data using generalized estimating equations. Stat Med. 1994;13(11):1149\u201363.","journal-title":"Stat Med"},{"key":"875_CR14","unstructured":"Louppe G, Wehenkel L, Sutera A, Geurts P. Understanding variable importances in forests of randomized trees. Adv Neural Inf Process Syst. 2013;26."},{"key":"875_CR15","volume-title":"Generalized, linear, and mixed models","author":"CE McCulloch","year":"2004","unstructured":"McCulloch CE, Searle SR. Generalized, linear, and mixed models. Hoboken: Wiley; 2004."},{"key":"875_CR16","unstructured":"OpenAI. ChatGPT (Mar 14 version) [Large language model]. 2023. https:\/\/chat.openai.com\/chat."},{"key":"875_CR17","volume-title":"Correlated data analysis: modeling, analytics, and applications","author":"X-KS Peter","year":"2007","unstructured":"Peter X-KS, Song K. Correlated data analysis: modeling, analytics, and applications. Berlin: Springer; 2007."},{"key":"875_CR18","unstructured":"R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, V., Austria. 2014. http:\/\/www.R-project.org\/."},{"issue":"1","key":"875_CR19","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1038\/s41746-018-0029-1","volume":"1","author":"A Rajkomar","year":"2018","unstructured":"Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1(1):18.","journal-title":"NPJ Digit Med"},{"key":"875_CR20","unstructured":"SAS Institute Inc 2013. SAS\/ACCESS\u00ae 9.4 Interface to ADABAS: Reference. Cary, N. S. I. I."},{"key":"875_CR21","volume-title":"Categorical data analysis using SAS","author":"ME Stokes","year":"2012","unstructured":"Stokes ME, Davis CS, Koch GG. Categorical data analysis using SAS. 3rd ed. Cary: SAS Institute Inc; 2012.","edition":"3"},{"issue":"1","key":"875_CR22","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1071\/WF05021","volume":"15","author":"SW Taylor","year":"2006","unstructured":"Taylor SW, Alexander ME. Science, technology, and human factors in fire danger rating: the Canadian experience. Int J Wildland Fire. 2006;15(1):121\u201335.","journal-title":"Int J Wildland Fire"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-023-00875-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-023-00875-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-023-00875-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T06:06:58Z","timestamp":1705903618000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-023-00875-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,22]]},"references-count":22,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["875"],"URL":"https:\/\/doi.org\/10.1186\/s40537-023-00875-5","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,22]]},"assertion":[{"value":"11 October 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable. It is a computer simulation study.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"All authors read and approved the final manuscript for publication.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no conflicts of interest related to the subject matter or materials discussed in this article.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"20"}}