{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T18:39:05Z","timestamp":1777660745928,"version":"3.51.4"},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,2,15]],"date-time":"2021-02-15T00:00:00Z","timestamp":1613347200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,2,15]],"date-time":"2021-02-15T00:00:00Z","timestamp":1613347200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["NSF RAPID: IIS-2027890"],"award-info":[{"award-number":["NSF RAPID: IIS-2027890"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title \u201cModeling Corona Spread Using Big Data Analytics.\u201d The project is a joint effort between the Department of Computer &amp; Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions.<\/jats:p><jats:p>The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19\u00a0is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services.<\/jats:p><jats:p>Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5\u2009% of the population could become infected within 3\u2009months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239\u201342, 2020). A recent large-scale analysis from China suggests that 80\u2009% of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20\u2009% of the total infected. Of patients infected with Covid-19, about 15\u2009% have severe illness and 5\u2009% have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25\u2009% to as high as 3.0\u2009% (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80\u2009years (&gt;\u200914\u2009%) and those with coexisting conditions (10\u2009% for those with cardiovascular disease and 7\u2009% for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19\u00a0is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1\u2009%.<\/jats:p><jats:p>Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease\u2019s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual\u2019s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness.<\/jats:p><jats:p>In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC\/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I\/UCRC for Advanced Knowledge Enablement.<\/jats:p><jats:p>The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper.<\/jats:p>","DOI":"10.1186\/s40537-021-00423-z","type":"journal-article","created":{"date-parts":[[2021,2,17]],"date-time":"2021-02-17T15:38:19Z","timestamp":1613576299000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Modeling and tracking Covid-19 cases using Big Data analytics on HPCC system platform"],"prefix":"10.1186","volume":"8","author":[{"given":"Flavio","family":"Villanustre","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arjuna","family":"Chala","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Roger","family":"Dev","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lili","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jesse Shaw","family":"LexisNexis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Borko","family":"Furht","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Taghi","family":"Khoshgoftaar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,2,15]]},"reference":[{"key":"423_CR1","doi-asserted-by":"crossref","unstructured":"Emanuel EJ, et al. Fair allocation of scarce medical resources in the time of covid-19. N Engl J Med. 2020.","DOI":"10.1056\/NEJMsb2005114"},{"issue":"14","key":"423_CR2","doi-asserted-by":"publisher","first-page":"1335","DOI":"10.1001\/jama.2020.4344","volume":"323","author":"E Livingston","year":"2020","unstructured":"Livingston E, Bucher K. Coronavirus disease 2019 (COVID-19) in Italy. JAMA. 2020;323(14):1335.","journal-title":"JAMA."},{"issue":"13","key":"423_CR3","doi-asserted-by":"publisher","first-page":"1239","DOI":"10.1001\/jama.2020.2648","volume":"323","author":"Z Wu","year":"2020","unstructured":"Wu Z, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA. 2020;323(13):1239\u201342.","journal-title":"JAMA."},{"issue":"6","key":"423_CR4","doi-asserted-by":"publisher","first-page":"1339","DOI":"10.3201\/eid2606.200320","volume":"26","author":"N Wilson","year":"2020","unstructured":"Wilson N, Kvalsvig A, Barnard LT, Baker MG. Case-fatality risk estimates for COVID-19 calculated by using a lag time for fatality. Emerg Infect Dis. 2020;26(6):1339.","journal-title":"Emerg Infect Dis"},{"key":"423_CR5","doi-asserted-by":"crossref","unstructured":"Shaw J, Villanustre F, Furht B, Agarwal A, Jain A. Modeling Ebola Spread and Using HPCC\/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham..","DOI":"10.1007\/978-3-319-44550-2_14"},{"key":"423_CR6","doi-asserted-by":"publisher","first-page":"9780691116174","DOI":"10.1515\/9781400841035","volume-title":"Modeling Infectious Diseases: in Humans and Animals","author":"M Keeling","year":"2008","unstructured":"Keeling M, Rohani P. Modeling Infectious Diseases: in Humans and Animals. ISBN: Princeton University Press; 2008. p.\u00a09780691116174."},{"issue":"6164","key":"423_CR7","first-page":"1337","volume":"342","author":"D Brockmann","year":"2013","unstructured":"Brockmann D, Helbing D. The hidden geometry of complex, network-driven contagion phenomena. Sci Magaz. 2013;342(6164):1337\u201342.","journal-title":"Sci Magaz"},{"key":"423_CR8","first-page":"6","volume":"6","author":"CM Rivers","year":"2014","unstructured":"Rivers CM, Lofgren ET, Marathe M, Eubank S, Lewis BL. Modeling the impact of interventions on an epidemic of Ebola in sierra Leone and Liberia. PLoS Curr. 2014;6:6.","journal-title":"PLoS Curr."},{"key":"423_CR9","first-page":"6","volume":"6","author":"DN Fisman","year":"2013","unstructured":"Fisman DN, Hauck TS, Tuite AR, Greer AL. An IDEA for short term outbreak Projection: nearcasting using the basic reproduction number. PLoS Curr. 2013;6:6.","journal-title":"PLoS Curr."},{"key":"423_CR10","doi-asserted-by":"crossref","unstructured":"Li J, Peng W, Li T, Sun T. Social Network User Influence Dynamics Prediction. In Proceedings of the 15th Asia-Pacific Web conference (APWeb), pp.\u00a0310\u2013322, 2013.","DOI":"10.1007\/978-3-642-37401-2_32"},{"key":"423_CR11","doi-asserted-by":"crossref","unstructured":"Song X, Chi Y, Koji H, Tseng B. Information flow modeling based on diffusion rate for prediction and ranking. In: Proceedings of the 16th international conference on World Wide Web (WWW), pp.\u00a0191\u2013200, 2007.","DOI":"10.1145\/1242572.1242599"},{"key":"423_CR12","doi-asserted-by":"crossref","unstructured":"Kempe D, Kleinberg J, Tardos E. Maximizing the Spread of Influence Through A Social Network. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp.\u00a0137\u2013146, 2003.","DOI":"10.1145\/956750.956769"},{"key":"423_CR13","doi-asserted-by":"crossref","unstructured":"Yang J, Leskovec J. Modeling information diffusion in implicit networks. In Proceedings of IEEE international conference on data mining (ICDM), pp.\u00a0599\u2013608, 2010.","DOI":"10.1109\/ICDM.2010.22"},{"issue":"01","key":"423_CR14","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1017\/nws.2014.3","volume":"2","author":"MG Rodriguez","year":"2014","unstructured":"Rodriguez MG, Leskovec J, Balduzzi D, Scholkopf B. Uncovering the Structure and Temporal Dynamics of Information Propagation. In Network Science. 2014;2(01):26\u201365.","journal-title":"In Network Science"},{"key":"423_CR15","doi-asserted-by":"crossref","unstructured":"Ding C, Li T, Wang D. Label Propagation on K-Partite Graphs. In: Proceedings of IEEE International Conference on Machine Learning and Applications (ICMLA), pp.\u00a0273\u2013278, 2009.","DOI":"10.1109\/ICMLA.2009.89"},{"issue":"1","key":"423_CR16","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1109\/TKDE.2007.190672","volume":"20","author":"F Wang","year":"2008","unstructured":"Wang F, Zhang C. Label propagation through linear neighborhoods. IEEE Trans Knowl Data Eng. 2008;20(1):55\u201367.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"423_CR17","doi-asserted-by":"crossref","unstructured":"Zhu S, Li T, Chen Z, Wang D, Gong Y. Dynamic active probing of helpdesk databases. In: Proceedings of the 34th international conference on very large data bases (VLDB), pp.\u00a0748\u2013760, 2008.","DOI":"10.14778\/1453856.1453937"},{"key":"423_CR18","unstructured":"Silverman JD, Hupert N, Washburne AD. Using Influenza Surveillance Networks to Estimate State-specific case detection rates and forecast SARS-CoV-2 Spread in the United States."},{"key":"423_CR19","volume-title":"Causality: Models, reasoning, and inference","author":"J Pearl","year":"2000","unstructured":"Pearl J. Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press; 2000."},{"key":"423_CR20","volume-title":"Causal inference in statistics","author":"J Pearl","year":"2016","unstructured":"Pearl J, Glymour M, Jewell NP. Causal inference in statistics. New York: John Wiley and Sons; 2016."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00423-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-021-00423-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00423-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,28]],"date-time":"2021-02-28T10:02:55Z","timestamp":1614506575000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-021-00423-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,15]]},"references-count":20,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["423"],"URL":"https:\/\/doi.org\/10.1186\/s40537-021-00423-z","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,15]]},"assertion":[{"value":"6 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 February 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 February 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"33"}}