{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:05:01Z","timestamp":1760058301764,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,3,25]],"date-time":"2025-03-25T00:00:00Z","timestamp":1742860800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>We study the relationship between tweets referencing Acute Respiratory Infections (ARI) or COVID-19 symptoms and confirmed cases of these diseases. Additionally, we propose a computational methodology for selecting and applying Machine Learning (ML) algorithms to predict public health indicators using social media data. To achieve this, a novel pipeline was developed, integrating three distinct models to predict confirmed cases of ARI and COVID-19. The dataset contains tweets related to respiratory diseases, published between 2020 and 2022 in the state of San Luis Potos\u00ed, Mexico, obtained via the Twitter API (now X). The methodology is composed of three stages, and it involves tools such as Dataiku and Python with ML libraries. The first two stages focuses on identifying the best-performing predictive models, while the third stage includes Natural Language Processing (NLP) algorithms for tweet selection. One of our key findings is that tweets contributed to improved predictions of ARI confirmed cases but did not enhance COVID-19 time series predictions. The best-performing NLP approach is the combination of Word2Vec algorithm with the KMeans model for tweet selection. Furthermore, predictions for both time series improved by 3% in the second half of 2020 when tweets were included as a feature, where the best prediction algorithm is DeepAR.<\/jats:p>","DOI":"10.3390\/computation13040086","type":"journal-article","created":{"date-parts":[[2025,3,25]],"date-time":"2025-03-25T12:18:52Z","timestamp":1742905132000},"page":"86","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Machine Learning-Based Computational Methodology for Predicting Acute Respiratory Infections Using Social Media Data"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5682-9908","authenticated-orcid":false,"given":"Jose Manuel","family":"Ramos-Varela","sequence":"first","affiliation":[{"name":"Engineering Faculty, Universidad Aut\u00f3noma de San Luis Potos\u00ed (UASLP), Zona Universitaria, Av. Dr. Manuel Nava No. 8, San Luis Potosi 78290, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7566-0412","authenticated-orcid":false,"given":"Juan C.","family":"Cuevas-Tello","sequence":"additional","affiliation":[{"name":"Engineering Faculty, Universidad Aut\u00f3noma de San Luis Potos\u00ed (UASLP), Zona Universitaria, Av. Dr. Manuel Nava No. 8, San Luis Potosi 78290, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4840-6742","authenticated-orcid":false,"given":"Daniel E.","family":"Noyola","sequence":"additional","affiliation":[{"name":"Research Center in Health Sciences and Biomedicine, Universidad Aut\u00f3noma de San Luis Potos\u00ed (UASLP), Av. Sierra Leona 550, San Luis Potosi 78210, Mexico"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Pilipiec, P., Samsten, I., and B\u00f3ta, A. (2023). Surveillance of communicable diseases using social media: A systematic review. PLoS ONE, 18.","DOI":"10.1371\/journal.pone.0282101"},{"key":"ref_2","first-page":"501","article-title":"Redefining pandemic preparedness: Multidisciplinary insights from the CERP modelling workshop in infectious diseases, workshop report","volume":"9","author":"Nunes","year":"2024","journal-title":"Infect. Dis. Model."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1111\/1468-0009.12038","article-title":"Social media and internet-based data in global systems for public health surveillance: A systematic review","volume":"92","author":"Velasco","year":"2014","journal-title":"Milbank Q."},{"key":"ref_4","unstructured":"World Health Organization, Regional Office for the Western Pacific (2008). A Guide to Establishing Event-Based Surveillance, WHO Regional Office for the Western Pacific."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1002\/hpm.3864","article-title":"The power of artificial intelligence for managing pandemics: A primer for public health professionals","volume":"40","author":"McKee","year":"2024","journal-title":"Int. J. Health Plan. Manag."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1016\/j.jinf.2023.01.029","article-title":"Infection rates of 70 after release of COVID-19 restrictions in macao, china","volume":"86","author":"Liang","year":"2023","journal-title":"J. Infect."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"100382","DOI":"10.1016\/j.infpip.2024.100382","article-title":"Digital Epidemiology: Harnessing Big Data For Early Detection And Monitoring Of Viral Outbreaks","volume":"6","author":"Fallatah","year":"2024","journal-title":"Infect. Prev. Pract."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Charles, L., Reynolds, T., Cameron, M., Conway, M., Lau, E., Olsen, J., Pavlin, J., Shigematsu, M., Streichert, L., and Suda, K. (2015). Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review. PLoS ONE, 10.","DOI":"10.1371\/journal.pone.0139701"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Gupta, A., and Katarya, R. (2020). Social Media based Surveillance Systems for Healthcare using Machine Learning: A Systematic Review. J. Biomed. Inform., 108.","DOI":"10.1016\/j.jbi.2020.103500"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"112552","DOI":"10.1016\/j.socscimed.2019.112552","article-title":"Systematic Literature Review on the Spread of Health-related Misinformation on Social Media","volume":"240","author":"Wang","year":"2019","journal-title":"Soc. Sci. Med."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Giancotti, M., Lopreite, M., Mauro, M., and Puliga, M. (2024). Innovating health prevention models in detecting infectious disease outbreaks through social media data: An umbrella review of the evidence. Front. Public Health, 12.","DOI":"10.3389\/fpubh.2024.1435724"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3932","DOI":"10.1038\/s41467-019-11901-7","article-title":"Technology to advance infectious disease forecasting for outbreak management","volume":"10","author":"George","year":"2019","journal-title":"Nat. Commun."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Dai, X., Bikdash, M., and Meyer, B. (April, January 30). From social media to public health surveillance: Word embedding based clustering method for twitter classification. Proceedings of the SoutheastCon 2017, Concord, NC, USA.","DOI":"10.1109\/SECON.2017.7925400"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Talvis, K., Chorianopoulos, K., and Kermanidis, K. (2014, January 6\u20137). Real-time monitoring of flu epidemics through linguistic and statistical analysis of Twitter. Proceedings of the 9th International Workshop on Semantic and Social Media Adaptation and Personalization, SMAP 2014, Corfu, Greece.","DOI":"10.1109\/SMAP.2014.38"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Hirose, H., and Wang, L. (2012, January 17\u201320). Prediction of infectious disease spread using Twitter: A case of influenza. Proceedings of the 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming, Taipei, Taiwan.","DOI":"10.1109\/PAAP.2012.23"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"e236","DOI":"10.2196\/jmir.3416","article-title":"A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives","volume":"16","author":"Nagar","year":"2014","journal-title":"J. Med. Internet Res."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/2047-2501-3-S1-S4","article-title":"Automatic detection of tweets reporting cases of influenza like illnesses in Australia","volume":"3","author":"Zuccon","year":"2015","journal-title":"Health Inf. Sci. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Santos, J.C., and Matos, S. (2014). Analysing Twitter and web queries for flu trend prediction. Theor. Biol. Med Model., 11.","DOI":"10.1186\/1742-4682-11-S1-S6"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Prieto, V., Matos, S., Alvarez, M., Cacheda, F., and Oliveira, J. (2014). Twitter: A Good Place to Detect Health Conditions. PLoS ONE, 9.","DOI":"10.1371\/journal.pone.0086191"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Jiang, J., Yao, C., and Song, X. (2024). A multidimensional comparative study of help-seeking messages on Weibo under different stages of COVID-19 pandemic in China. Front. Public Health, 12.","DOI":"10.3389\/fpubh.2024.1320146"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Agrawal, S., Jain, S., Sharma, S., and Khatri, A. (2022). COVID-19 Public Opinion: A Twitter Healthcare Data Processing Using Machine Learning Methodologies. Int. J. Environ. Res. Public Health, 20.","DOI":"10.3390\/ijerph20010432"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"e57880","DOI":"10.2196\/57880","article-title":"Correction: Verification in the Early Stages of the COVID-19 Pandemic: Sentiment Analysis of Japanese Twitter Users","volume":"4","author":"Ueda","year":"2024","journal-title":"JMIR Infodemiol."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Aldosery, A., Carruthers, R., Kay, K., Cave, C., Reynolds, P., and Kostkova, P. (2024). Enhancing public health response: A framework for topics and sentiment analysis of COVID-19 in the UK using Twitter and the embedded topic model. Front. Public Health, 12.","DOI":"10.3389\/fpubh.2024.1105383"},{"key":"ref_24","first-page":"502","article-title":"Public Opinions toward COVID-19 Vaccine Mandates: A Machine Learning-based Analysis of U.S. Tweets","volume":"2022","author":"Guo","year":"2023","journal-title":"AMIA Annu. Symp. Proc."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.12688\/f1000research.130610.1","article-title":"Sentiment analysis of Indonesian tweets on COVID-19 and COVID-19 vaccinations","volume":"12","author":"Kalanjati","year":"2023","journal-title":"F1000Research"},{"key":"ref_26","first-page":"1","article-title":"Paving the way for COVID survivors\u2019 psychosocial rehabilitation: Mining topics, sentiments, and their trajectories over time from Reddit","volume":"30","author":"Hamedani","year":"2024","journal-title":"Health Inform. J."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"e315","DOI":"10.2196\/jmir.7393","article-title":"Enhancing Seasonal Influenza Surveillance: Topic Analysis of Widely Used Medicinal Drugs Using Twitter Data","volume":"19","author":"Kagashe","year":"2017","journal-title":"J. Med. Internet Res."},{"key":"ref_28","first-page":"1","article-title":"Efficient Estimation of Word Representations in Vector Space","volume":"2013","author":"Mikolov","year":"2013","journal-title":"Proc. Workshop ICLR"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Kuang, S., and Davison, B. (2017). Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification. Appl. Sci., 7.","DOI":"10.3390\/app7080846"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Batbaatar, E., and Ryu, K. (2019). Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach. Int. J. Environ. Res. Public Health, 16.","DOI":"10.3390\/ijerph16193628"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"e24889","DOI":"10.2196\/24889","article-title":"Comparison of Viral COVID-19 Sina Weibo and Twitter Contents: A Novel Feature Extraction and Analytical Workflow","volume":"23","author":"Chen","year":"2021","journal-title":"J. Med Internet Res."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"4535541","DOI":"10.1155\/2022\/4535541","article-title":"COVID-19 Outbreak Forecasting Based on Vaccine Rates and Tweets Classification","volume":"2022","author":"Didi","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Volkova, S., Ayton, E., Porterfield, K., and Corley, C. (2017). Forecasting influenza-like illness dynamics for military populations using neural networks and social media. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0188941"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1007\/s12553-019-00309-4","article-title":"Predicting the spread of influenza epidemics by analyzing twitter messages","volume":"9","author":"Molaei","year":"2019","journal-title":"Health Technol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1181","DOI":"10.1016\/j.ijforecast.2019.07.001","article-title":"DeepAR: Probabilistic forecasting with autoregressive recurrent networks","volume":"36","author":"Salinas","year":"2020","journal-title":"Int. J. Forecast."},{"key":"ref_36","first-page":"1","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_37","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the naacL-HLT, Minneapolis, MN, USA."},{"key":"ref_38","unstructured":"Mitchell, T.M. (1997). Machine Learning, McGraw Hill."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"603","DOI":"10.3390\/make2040032","article-title":"Large-Scale, Language-Agnostic Discourse Classification of Tweets During COVID-19","volume":"2","author":"Gencoglu","year":"2020","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Lande, J., Pillay, A., and Chandra, R. (2023). Deep learning for COVID-19 topic modelling via Twitter: Alpha, Delta and Omicron. PLoS ONE, 18.","DOI":"10.1371\/journal.pone.0288681"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"To, Q., To, K., Huynh, V.A., Nguyen, N., Ngo, D., Alley, S., Tran, A., Tran, A., Thi Thanh Pham, N., and Bui, T. (2021). Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic. Int. J. Environ. Res. Public Health, 18.","DOI":"10.3390\/ijerph18084069"},{"key":"ref_42","first-page":"717","article-title":"Automatic Detection of Twitter Users Who Express Chronic Stress Experiences via Supervised Machine Learning and Natural Language Processing","volume":"41","author":"Yang","year":"2022","journal-title":"CIN Comput. Inform. Nurs."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Chen, P., Jin, Y., Ma, X., and Lin, Y. (2024). Public perception on active aging after COVID-19: An unsupervised machine learning analysis of 44,343 posts. Front. Public Health, 12.","DOI":"10.3389\/fpubh.2024.1329704"},{"key":"ref_44","unstructured":"(2024, December 17). Tweepy Documentation. Available online: https:\/\/www.tweepy.org\/."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"G\u00f3nzalez-Bandala, D.A., Cuevas-Tello, J.C., Noyola, D.E., Comas-Garc\u00eda, A., and Garc\u00eda-Sep\u00falveda, C.A. (2020). Computational Forecasting Methodology for Acute Respiratory Infectious Disease Dynamics. Int. J. Environ. Res. Public Health, 17.","DOI":"10.3390\/ijerph17124540"},{"key":"ref_46","unstructured":"Secretar\u00eda de Salud del Gobierno de M\u00e9xico (2024, December 17). Bolet\u00edn Epidemiol\u00f3gico. Sistema Nacional de Vigilancia Epidemiol\u00f3gica. Sistema \u00danico de Informaci\u00f3n, Available online: https:\/\/www.gob.mx\/salud\/acciones-y-programas\/direccion-general-de-epidemiologia-boletin-epidemiologico."},{"key":"ref_47","unstructured":"Secretar\u00eda de Salud del Gobierno de M\u00e9xico (2024, December 17). Datos Abiertos Direcci\u00f3n General de Epidemiolog\u00eda, Available online: https:\/\/www.gob.mx\/salud\/documentos\/datos-abiertos-152127."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Ma, L., and Zhang, Y. (November, January 29). Using Word2Vec to process big text data. Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7364114"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"ref_50","first-page":"1","article-title":"GluonTS: Probabilistic and Neural Time Series Modeling in Python","volume":"21","author":"Alexandrov","year":"2020","journal-title":"J. Mach. Learn. Res."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/4\/86\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:59:47Z","timestamp":1760029187000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/4\/86"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,25]]},"references-count":50,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["computation13040086"],"URL":"https:\/\/doi.org\/10.3390\/computation13040086","relation":{},"ISSN":["2079-3197"],"issn-type":[{"type":"electronic","value":"2079-3197"}],"subject":[],"published":{"date-parts":[[2025,3,25]]}}}