{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T02:17:03Z","timestamp":1771467423418,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,12,16]],"date-time":"2024-12-16T00:00:00Z","timestamp":1734307200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union","award":["101046314 (END-VOC)"],"award-info":[{"award-number":["101046314 (END-VOC)"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Background\/Objectives: The COVID-19 pandemic, caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), led to significant global health challenges, including the urgent need for accurate symptom severity prediction aimed at optimizing treatment. While machine learning (ML) and deep learning (DL) models have shown promise in predicting COVID-19 severity using imaging and clinical data, there is limited research utilizing comprehensive tabular symptom datasets. This study aims to address this gap by leveraging a detailed symptom dataset to develop robust models for categorizing COVID-19 symptom severity, thereby enhancing clinical decision making. Methods: A unique tabular dataset was created using questionnaire responses from 5654 individuals, including demographic information, comorbidities, travel history, and medical data. Both unsupervised and supervised ML techniques were employed, including k-means clustering to categorize symptom severity into mild, moderate, and severe clusters. In addition, classification models, namely, Support Vector Machine (SVM), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), random forest, and a deep neural network (DNN) were used to predict symptom severity levels. Feature importance was analyzed using the random forest model for its robustness with high-dimensional data and ability to capture complex non-linear relationships, and statistical significance was evaluated through ANOVA and Chi-square tests. Results: Our study showed that fatigue, joint pain, and headache were the most important features in predicting severity. SVM, AdaBoost, and random forest achieved an accuracy of 94%, while XGBoost achieved an accuracy of 96%. DNN showed robust performance in handling complex patterns with 98% accuracy. In terms of precision and recall metrics, both the XGBoost and DNN models demonstrated robust performance, particularly for the moderate class. XGBoost recorded 98% precision and 97% recall, while DNN achieved 99% precision and recall. The clustering approach improved classification accuracy by reducing noise and dimensionality. Statistical tests confirmed the significance of additional features like Body Mass Index (BMI), age, and dominant variant type. Conclusions: Integrating symptom data with advanced ML models offers a promising approach for accurate COVID-19 severity classification. This method provides a reliable tool for healthcare professionals to optimize patient care and resource management, particularly in managing COVID-19 and potential future pandemics. Future work should focus on incorporating imaging and clinical data to further enhance model accuracy and clinical applicability.<\/jats:p>","DOI":"10.3390\/bdcc8120192","type":"journal-article","created":{"date-parts":[[2024,12,16]],"date-time":"2024-12-16T12:09:28Z","timestamp":1734350968000},"page":"192","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Integrating Statistical Methods and Machine Learning Techniques to Analyze and Classify COVID-19 Symptom Severity"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-5184-1630","authenticated-orcid":false,"given":"Yaqeen","family":"Raddad","sequence":"first","affiliation":[{"name":"Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5794-928X","authenticated-orcid":false,"given":"Ahmad","family":"Hasasneh","sequence":"additional","affiliation":[{"name":"Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine"}]},{"given":"Obada","family":"Abdallah","sequence":"additional","affiliation":[{"name":"Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine"}]},{"given":"Camil","family":"Rishmawi","sequence":"additional","affiliation":[{"name":"Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine"}]},{"given":"Nouar","family":"Qutob","sequence":"additional","affiliation":[{"name":"Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine"}]}],"member":"1968","published-online":{"date-parts":[[2024,12,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"n1648","DOI":"10.1136\/bmj.n1648","article-title":"Long Covidmechanisms, Risk Factors, and Management","volume":"374","author":"Crook","year":"2021","journal-title":"BMJ"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1038\/s41564-020-0695-z","article-title":"The Species Severe Acute Respiratory Syndrome-Related Coronavirus: Classifying 2019-NCoV and Naming It SARS-CoV-2","volume":"5","author":"Gorbalenya","year":"2020","journal-title":"Nat. Microbiol."},{"key":"ref_3","first-page":"584","article-title":"Genomic Epidemiology of the First Epidemic Wave of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in Palestine","volume":"7","author":"Qutob","year":"2021","journal-title":"Microb. Genom."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/j.ijsu.2020.04.018","article-title":"The Socio-Economic Implications of the Coronavirus Pandemic (COVID-19): A Review","volume":"78","author":"Nicola","year":"2020","journal-title":"Int. J. Surg."},{"key":"ref_5","first-page":"1","article-title":"Predictors of COVID-19 Severity: A Literature Review","volume":"31","author":"Aghagoli","year":"2021","journal-title":"Rev. Med. Virol."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1473","DOI":"10.1080\/07853890.2022.2076901","article-title":"Pathophysiology and Mechanism of Long COVID: A Comprehensive Review","volume":"54","author":"Chalon","year":"2022","journal-title":"Ann. Med."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"20210759","DOI":"10.1259\/bjr.20210759","article-title":"The Diagnostic Performance of Deep-Learning-Based CT Severity Score to Identify COVID-19 Pneumonia","volume":"95","author":"Kardos","year":"2022","journal-title":"Br. J. Radiol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1435","DOI":"10.1016\/j.jiph.2021.07.015","article-title":"COVID-19 Diagnosis and Severity Detection from CT-Images Using Transfer Learning and Back Propagation Neural Network","volume":"14","author":"Aswathy","year":"2021","journal-title":"J. Infect. Public Health"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1007\/s40846-023-00783-2","article-title":"Predicting the Severity of COVID-19 from Lung CT Images Using Novel Deep Learning","volume":"43","author":"Alaiad","year":"2023","journal-title":"J. Med. Biol. Eng."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1007\/s10586-023-03972-5","article-title":"COVID-19 CT-Images Diagnosis and Severity Assessment Using Machine Learning Algorithm","volume":"27","author":"Albataineh","year":"2024","journal-title":"Clust. Comput."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Qiblawey, Y., Tahir, A., Chowdhury, M.E.H., Khandakar, A., Kiranyaz, S., Rahman, T., Ibtehaz, N., Mahmud, S., Al Maadeed, S., and Musharavati, F. (2021). Detection and Severity Classification of COVID-19 in CT Images Using Deep Learning. Diagnostics, 11.","DOI":"10.3390\/diagnostics11050893"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"e200075","DOI":"10.1148\/ryct.2020200075","article-title":"Serial Quantitative Chest CT Assessment of COVID-19: A Deep Learning Approach","volume":"2","author":"Huang","year":"2020","journal-title":"Radiol. Cardiothorac. Imaging"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.jpha.2020.03.004","article-title":"Quantitative Computed Tomography Analysis for Stratifying the Severity of Coronavirus Disease 2019","volume":"10","author":"Shen","year":"2020","journal-title":"J. Pharm. Anal."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yao, H., Zhang, N., Zhang, R., Duan, M., Xie, T., Pan, J., Peng, E., Huang, J., Zhang, Y., and Xu, X. (2020). Severity Detection for the Coronavirus Disease 2019 (COVID-19) Patients Using a Machine Learning Model Based on the Blood and Urine Tests. Front. Cell Dev. Biol., 8.","DOI":"10.3389\/fcell.2020.00683"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"e23948","DOI":"10.2196\/23948","article-title":"A Multimodality Machine Learning Approach to Differentiate Severe and Nonsevere COVID-19: Model Development and Validation","volume":"23","author":"Chen","year":"2021","journal-title":"J. Med. Internet Res."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ribeiro, P., Marques, J.A.L., Pordeus, D., Zacarias, L., Leite, C.F., Sobreira-Neto, M.A., Peixoto, A.A., de Oliveira, A., Madeiro, J.P.D.V., and Rodrigues, P.M. (2024). Machine Learning-Based Cardiac Activity Non-Linear Analysis for Discriminating COVID-19 Patients with Different Degrees of Severity. Biomed. Signal Process. Control, 87.","DOI":"10.1016\/j.bspc.2023.105558"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"42","DOI":"10.47262\/BL\/9.1.20240301","article-title":"An Approach for Detecting the Severity Levels of COVID-19 and Associated Features in District Gujranwala, Pakistan","volume":"10","author":"Saleem","year":"2024","journal-title":"Biomed. Lett."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"413","DOI":"10.33096\/ilkom.v15i3.1945.413-426","article-title":"Cloud-Based Realtime Decision System for Severity Classification of COVID-19 Self-Isolation Patients Using Machine Learning Algorithm","volume":"15","author":"Sugiono","year":"2023","journal-title":"ILKOM J. Ilm."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"e587","DOI":"10.1016\/S2589-7500(21)00131-X","article-title":"Early Detection of COVID-19 in the UK Using Self-Reported Symptoms: A Large-Scale, Prospective, Epidemiological Surveillance Study","volume":"3","author":"Canas","year":"2021","journal-title":"Lancet Digit. Health"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"25","DOI":"10.59543\/ijmscs.v1i.7715","article-title":"Severity Classification for COVID-19 Infections Based on Lasso-Logistic Regression Model","volume":"1","author":"Arif","year":"2023","journal-title":"Int. J. Math. Stat. Comput. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1271","DOI":"10.1109\/JBHI.2023.3239366","article-title":"Classification of Patient Recovery From COVID-19 Symptoms Using Consumer Wearables and Machine Learning","volume":"27","author":"Leitner","year":"2023","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Levi, Y., Brandeau, M.L., Shmueli, E., and Yamin, D. (2024). Prediction and Detection of Side Effects Severity Following COVID-19 and Influenza Vaccinations: Utilizing Smartwatches and Smartphones. Sci. Rep., 14.","DOI":"10.1038\/s41598-024-56561-w"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Talib, M.A., Afadar, Y., Nasir, Q., Nassif, A.B., Hijazi, H., and Hasasneh, A. (2024). A Tree-Based Explainable AI Model for Early Detection of COVID-19 Using Physiological Data. BMC Med. Inform. Decis. Mak., 24.","DOI":"10.1186\/s12911-024-02576-2"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.46234\/ccdcw2021.255","article-title":"GISAID\u2019s Role in Pandemic Response","volume":"3","author":"Khare","year":"2021","journal-title":"China CDC Wkly."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"715","DOI":"10.1093\/infdis\/jiab422","article-title":"Sex Differences in Influenza: The Challenge Study Experience","volume":"225","author":"Giurgea","year":"2022","journal-title":"J. Infect. Dis."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"e111040","DOI":"10.5812\/asjsm.111040","article-title":"Musculoskeletal Problems in Patients with COVID-19: A Review Study","volume":"12","author":"Motaqi","year":"2021","journal-title":"Asian J. Sports Med."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Erdi, A., Zakavi, M., Amani, M., Fooladi, S., and Abedi, A. (2023). Clinical Manifestations of Pain in Patients Suffering from COVID-19 Infected with Delta Variant of SARS-CoV-2. Front. Pain Res., 4.","DOI":"10.3389\/fpain.2023.1282527"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"17","DOI":"10.21037\/jphe-21-50","article-title":"COVID-19 and Pain: Any Relation?","volume":"6","author":"Divella","year":"2022","journal-title":"J. Public Health Emerg."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"157","DOI":"10.2147\/CLEP.S129785","article-title":"Missing Data and Multiple Imputation in Clinical Epidemiological Research","volume":"9","author":"Pedersen","year":"2017","journal-title":"Clin. Epidemiol."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Chan, J.Y.-L., Leow, S.M.H., Bea, K.T., Cheng, W.K., Phoong, S.W., Hong, Z.-W., and Chen, Y.-L. (2022). Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review. Mathematics, 10.","DOI":"10.3390\/math10081283"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"80716","DOI":"10.1109\/ACCESS.2020.2988796","article-title":"Unsupervised K-Means Clustering Algorithm","volume":"8","author":"Sinaga","year":"2020","journal-title":"IEEE Access"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Suthaharan, S. (2016). Support Vector Machine. Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning, Springer.","DOI":"10.1007\/978-1-4899-7641-3"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Natras, R., Soja, B., and Schmidt, M. (2022). Ensemble Machine Learning of Random Forest, AdaBoost and XGBoost for Vertical Total Electron Content Forecasting. Remote Sens., 14.","DOI":"10.3390\/rs14153547"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"13521","DOI":"10.1007\/s10462-023-10466-8","article-title":"Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges","volume":"56","author":"Ahmed","year":"2023","journal-title":"Artif. Intell. Rev."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.5121\/ijdkp.2015.5201","article-title":"A Review on Evaluation Metrics for Data Classification Evaluations","volume":"5","author":"Hossin","year":"2015","journal-title":"Int. J. Data Min. Knowl. Manag. Process"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.cytogfr.2020.05.003","article-title":"The Cytokine Storm in COVID-19: An Overview of the Involvement of the Chemokine\/Chemokine-Receptor System","volume":"53","author":"Coperchini","year":"2020","journal-title":"Cytokine Growth Factor. Rev."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"465","DOI":"10.11591\/eei.v7i3.1272","article-title":"Improving Classification Accuracy Using Clustering Technique","volume":"7","author":"Mathivanan","year":"2018","journal-title":"Bull. Electr. Eng. Inform."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/12\/192\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:53:24Z","timestamp":1760115204000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/12\/192"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,16]]},"references-count":37,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["bdcc8120192"],"URL":"https:\/\/doi.org\/10.3390\/bdcc8120192","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,16]]}}}