{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T09:59:56Z","timestamp":1763719196266,"version":"3.45.0"},"reference-count":51,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T00:00:00Z","timestamp":1763683200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Pandemics emphasize the importance of real-time, interpretable clinical decision-support systems for identifying high-risk patients and assisting with prompt triage, particularly in data-intensive healthcare systems. This paper describes a novel dual big-data pipeline that includes (i) a streaming module for real-time epidemiological hospitalization risk prediction and (ii) a supplementary imaging-based detection and reasoning module for chest X-rays, with COVID-19 as an example. The first pipeline uses state-of-the-art machine learning algorithms to estimate patient-level hospitalization risk based on data from the Centers for Disease Control and Prevention\u2019s (CDC) COVID-19 Case Surveillance dataset. A Bloom filter accelerated triage by constant-time pre-screening of high-risk profiles. Specifically, after significant experimentation and optimization, one of the models, XGBoost, was selected because it achieved the best minority-class F1-score (0.76) and recall (0.80), outperforming baseline models. Synthetic data generation was employed to mimic streaming workloads, including a strategy that used the Conditional Tabular Generative Adversarial Network (CTGAN) to produce the best balanced and realistic distributions. The second pipeline focuses on diagnostic imaging and combines an advanced convolutional neural network, EfficientNet-B0, with Grad-CAM visual explanations, achieving 99.5% internal and 99.3% external accuracy. A lightweight Generative Pre-trained Transformer (GPT)-based reasoning layer converts model predictions into auditable triage comments (ALERT\/FLAG\/LOG), yielding traceable and interpretable decision logs. This scalable, explainable, and near-real-time framework provides a foundation for future multimodal and genomic advancements in public health readiness.<\/jats:p>","DOI":"10.3390\/a18120730","type":"journal-article","created":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T09:50:26Z","timestamp":1763718626000},"page":"730","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Big Data Pipeline Approach for Predicting Real-Time Pandemic Hospitalization Risk"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6494-7832","authenticated-orcid":false,"given":"Vishnu S.","family":"Pendyala","sequence":"first","affiliation":[{"name":"Department Applied Data Science, College of Information, Data, and Society, San Jose State University, San Jose, CA 95192, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-6777-9227","authenticated-orcid":false,"given":"Mayank","family":"Kapadia","sequence":"additional","affiliation":[{"name":"Department Applied Data Science, College of Information, Data, and Society, San Jose State University, San Jose, CA 95192, USA"}]},{"given":"Basanth","family":"Periyapatnaroopakumar","sequence":"additional","affiliation":[{"name":"Department Applied Data Science, College of Information, Data, and Society, San Jose State University, San Jose, CA 95192, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6085-4094","authenticated-orcid":false,"given":"Manav","family":"Anandani","sequence":"additional","affiliation":[{"name":"Department Applied Data Science, College of Information, Data, and Society, San Jose State University, San Jose, CA 95192, USA"}]},{"given":"Nischitha","family":"Nagendran","sequence":"additional","affiliation":[{"name":"Department Applied Data Science, College of Information, Data, and Society, San Jose State University, San Jose, CA 95192, USA"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1177\/2347631120983481","article-title":"A literature review on impact of COVID-19 pandemic on teaching and learning","volume":"8","author":"Pokhrel","year":"2021","journal-title":"High. Educ. Future"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"113069","DOI":"10.1016\/j.psychres.2020.113069","article-title":"COVID-19 pandemic: Impact on psychiatric care in the United States","volume":"289","author":"Bojdani","year":"2020","journal-title":"Psychiatry Res."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"e695","DOI":"10.1093\/cid\/ciaa1419","article-title":"Risk factors for coronavirus disease 2019 (COVID-19)\u2013associated hospitalization: COVID-19\u2013associated hospitalization surveillance network and behavioral risk factor surveillance system","volume":"72","author":"Ko","year":"2021","journal-title":"Clin. Infect. Dis."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017, January 22\u201329). Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.74"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Vrdoljak, J., Boban, Z., Vilovi\u0107, M., Kumri\u0107, M., and Bo\u017ei\u0107, J. (2025). A Review of Large Language Models in Medical Education, Clinical Decision Support, and Healthcare Administration. Healthcare, 13.","DOI":"10.3390\/healthcare13060603"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_7","unstructured":"Centers for Disease Control and Prevention (CDC) (2025, July 15). COVID-19 Case Surveillance Public Use Data, Available online: https:\/\/data.cdc.gov\/Case-Surveillance\/COVID-19-Case-Surveillance-Public-Use-Data\/vbim-akqf\/about_data."},{"key":"ref_8","unstructured":"Tan, M., and Le, Q. (2019, January 9\u201315). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Space\/time trade-offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/BF00140664","article-title":"Random sampling from databases: A survey","volume":"5","author":"Olken","year":"1995","journal-title":"Stat. Comput."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Chakraborty, S., Fremont, D., Meel, K., Seshia, S., and Vardi, M. (2014, January 27\u201331). Distribution-aware sampling and weighted model counting for SAT. Proceedings of the AAAI Conference on Artificial Intelligence, Quebec, QC, Canada.","DOI":"10.1609\/aaai.v28i1.8990"},{"key":"ref_12","unstructured":"Xu, L., Skoularidou, M., Cuesta-Infante, A., and Veeramachaneni, K. (2019, January 8\u201314). Modeling tabular data using conditional gan. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liang, C., Lyu, T., Weissman, S., Daering, N., Olatosi, B., Hikmet, N., and Li, X. (2023). Early Prediction of COVID-19 Associated Hospitalization at the Time of CDC Contact Tracing using Machine Learning: Towards Pandemic Preparedness. Res. Sq., preprint.","DOI":"10.21203\/rs.3.rs-3213502\/v1"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Schwab, P., DuMont Sch\u00fctte, A., Dietz, B., and Bauer, S. (2020). Clinical predictive models for COVID-19: A systematic study. arXiv, Available online: https:\/\/arxiv.org\/abs\/2005.08302.","DOI":"10.2196\/preprints.21439"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Hwangbo, S., Kim, Y., Lee, C., Lee, S., Oh, B., Moon, M., and Kim, S. (2022). Machine learning models to predict the maximum severity of COVID-19 based on initial hospitalization record. Front. Public Health, 10.","DOI":"10.3389\/fpubh.2022.1007205"},{"key":"ref_16","first-page":"650","article-title":"Real-time healthcare monitoring system using online machine learning and spark streaming","volume":"11","author":"Hassan","year":"2020","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_17","unstructured":"Waehner, K. (2025, July 10). Real Time Analytics with Apache Kafka in the Healthcare Industry. Available online: https:\/\/www.kai-waehner.de\/blog\/2022\/04\/04\/real-time-analytics-machine-learning-with-apache-kafka-in-the-healthcare-industry\/."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"e2899","DOI":"10.7717\/peerj-cs.2899","article-title":"Cloud-based real-time enhancement for disease prediction using Confluent Cloud, Apache Kafka, feature optimization, and explainable artificial intelligence","volume":"11","author":"AlMohimeed","year":"2025","journal-title":"PeerJ Comput. Sci."},{"key":"ref_19","unstructured":"Waehner, K. (2023, November 27). The State of Data Streaming for Healthcare with Apache Kafka and Flink in 2023. Blog Post, Available online: https:\/\/www.kai-waehner.de\/blog\/2023\/11\/27\/the-state-of-data-streaming-for-healthcare-in-2023\/."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Fang, M., Dhami, D., and Kersting, K. (2022). DP-CTGAN: Differentially private medical data generation using CTGANs. Machine Learning and Knowledge Extraction, Springer.","DOI":"10.1007\/978-3-031-09342-5_17"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"e47859","DOI":"10.2196\/47859","article-title":"Synthetic tabular data based on generative adversarial networks in health care: Generation and validation using the divide-and-conquer strategy","volume":"11","author":"Kang","year":"2023","journal-title":"JMIR Med. Inform."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Ziegeldorf, J.H., Pennekamp, J., Hellmanns, D., Schwinger, F., Kunze, I., Henze, M., Hiller, J., Matzutt, R., and Wehrle, K. (2017). BLOOM: Bloom filter based oblivious outsourced matchings. BMC Med. Genom., 10.","DOI":"10.1186\/s12920-017-0277-y"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Akter, S., Shamrat, F.J.M., Chakraborty, S., Karim, A., and Azam, S. (2021). COVID-19 detection using deep learning algorithm on chest X-ray images. Biology, 10.","DOI":"10.3390\/biology10111174"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Constantinou, M., Exarchos, T., Vrahatis, A.G., and Vlamos, P. (2023). COVID-19 classification on chest X-ray images using deep learning methods. Int. J. Environ. Res. Public Health, 20.","DOI":"10.3390\/ijerph20032035"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Singh, T., Mishra, S., Kalra, R., Kumar, M., and Kim, T. (2024). COVID-19 severity detection using chest X-ray segmentation and deep learning. Sci. Rep., 14.","DOI":"10.1038\/s41598-024-70801-z"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Suara, S., Jha, A., Sinha, P., and Sekh, A.A. (2024). Is Grad-CAM Explainable in Medical Images?. Computer Vision and Image Processing, Springer Nature.","DOI":"10.1007\/978-3-031-58181-6_11"},{"key":"ref_27","unstructured":"Moreau, L. (2025, July 20). AI Explainability with Grad-CAM: Visualizing Neural Network Decisions. Available online: https:\/\/www.edgeimpulse.com\/blog\/ai-explainability-with-grad-cam-visualizing-neural-network-decisions."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, H., and Ogasawara, K. (2023). Grad-CAM-Based Explainable Artificial Intelligence Related to Medical Text Processing. Bioengineering, 10.","DOI":"10.3390\/bioengineering10091070"},{"key":"ref_29","unstructured":"Centers for Disease Control and Prevention (CDC) (2025, July 10). Coronavirus Disease 2019 (COVID-19), Available online: https:\/\/www.cdc.gov\/covid\/index.html."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"e76557","DOI":"10.2196\/76557","article-title":"Multimodal Integration in Health Care: Development With Applications in Disease Management","volume":"27","author":"Hao","year":"2025","journal-title":"J. Med. Internet Res."},{"key":"ref_31","unstructured":"Vinodhini Ravikumar, F.C.M. (2025, August 10). How Multimodal AI Is Impacting Healthcare. Available online: https:\/\/www.forbes.com\/councils\/forbestechcouncil\/2025\/04\/29\/how-multimodal-ai-is-impacting-healthcare\/."},{"key":"ref_32","unstructured":"Simbo AI (2025, August 15). Exploring the Integration of Multimodal Real-World Data in Healthcare: A New Frontier for Personalized Treatment. Available online: https:\/\/www.simbo.ai\/blog\/exploring-the-use-of-multimodal-real-world-data-in-precision-medicine-and-its-benefits-for-treatment-personalization-1000978\/."},{"key":"ref_33","unstructured":"The Cancer Imaging Archive (TCIA) (2025, July 15). COVID-19-AR: Chest Imaging with Clinical and Genomic Correlates Representing a Rural COVID-19 Positive Population. Available online: https:\/\/www.cancerimagingarchive.net\/collection\/covid-19-ar\/."},{"key":"ref_34","unstructured":"Rahman, T. (2025, July 10). COVID-19 Radiography Database. Available online: https:\/\/www.kaggle.com\/datasets\/tawsifurrahman\/covid19-radiography-database."},{"key":"ref_35","unstructured":"Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv."},{"key":"ref_36","unstructured":"Perez, L., and Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv."},{"key":"ref_37","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Armbrust, M., Das, T., Torres, J., Yavuz, B., Zhu, S., Xin, R., Ghodsi, A., Stoica, I., and Zaharia, M. (2018, January 10\u201315). Structured streaming: A declarative api for real-time applications in apache spark. Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA.","DOI":"10.1145\/3183713.3190664"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Nick, T.G., and Campbell, K.M. (2007). Logistic regression. Topics in Biostatistics, Humana Press.","DOI":"10.1007\/978-1-59745-530-5_14"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chen, D., Zhang, Q., and Zhu, Y. (2024). Efficient sequential decision making with large language models. arXiv.","DOI":"10.18653\/v1\/2024.emnlp-main.517"},{"key":"ref_42","unstructured":"Liu, Y., Yao, Y., Ton, J.F., Zhang, X., Guo, R., Cheng, H., Klochkov, Y., Taufiq, M.F., and Li, H. (2023). Trustworthy llms: A survey and guideline for evaluating large language models\u2019 alignment. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"2002","journal-title":"Proc. IEEE"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201323). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_46","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). Adam: A Method for Stochastic Optimization. Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA."},{"key":"ref_47","unstructured":"Powers, D.M. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_49","unstructured":"OpenAI (2025, September 23). GPT-3.5: OpenAI Language Model. Available online: https:\/\/platform.openai.com\/docs\/models\/gpt-3-5."},{"key":"ref_50","first-page":"1877","article-title":"Language Models are Few-Shot Learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst. (NeurIPS)"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020, January 16\u201320). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/730\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T09:55:40Z","timestamp":1763718940000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/730"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,21]]},"references-count":51,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["a18120730"],"URL":"https:\/\/doi.org\/10.3390\/a18120730","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,21]]}}}