{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:11:22Z","timestamp":1772172682519,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008277","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T00:00:00Z","timestamp":1607040000000}}],"reference-count":49,"publisher":"Public Library of Science (PLoS)","issue":"11","license":[{"start":{"date-parts":[[2020,11,20]],"date-time":"2020-11-20T00:00:00Z","timestamp":1605830400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    According to the World Health Organization (WHO), around 60% of all outbreaks are detected using informal sources. In many public health institutes, including the WHO and the Robert Koch Institute (RKI), dedicated groups of public health agents sift through numerous articles and newsletters to detect relevant events. This media screening is one important part of event-based surveillance (EBS). Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. To support EBS, but also to gain insights into what makes an article and the event it describes relevant, we developed a natural language processing framework for automated information extraction and relevance scoring. First, we scraped relevant sources for EBS as done at the RKI (WHO Disease Outbreak News and ProMED) and automatically extracted the articles\u2019 key data:\n                    <jats:italic>disease<\/jats:italic>\n                    ,\n                    <jats:italic>country<\/jats:italic>\n                    ,\n                    <jats:italic>date<\/jats:italic>\n                    , and\n                    <jats:italic>confirmed-case count<\/jats:italic>\n                    . For this, we performed named entity recognition in two steps: EpiTator, an open-source epidemiological annotation tool, suggested many different possibilities for each. We extracted the key country and disease using a heuristic with good results. We trained a naive Bayes classifier to find the key date and confirmed-case count, using the RKI\u2019s EBS database as labels which performed modestly. Then, for relevance scoring, we defined two classes to which any article might belong: The article is\n                    <jats:italic>relevant<\/jats:italic>\n                    if it is in the EBS database and\n                    <jats:italic>irrelevant<\/jats:italic>\n                    otherwise. We compared the performance of different classifiers, using bag-of-words, document and word embeddings. The best classifier, a logistic regression, achieved a sensitivity of 0.82 and an index balanced accuracy of 0.61. Finally, we integrated these functionalities into a web application called\n                    <jats:italic>EventEpi<\/jats:italic>\n                    where relevant sources are automatically analyzed and put into a database. The user can also provide any URL or text, that will be analyzed in the same way and added to the database. Each of these steps could be improved, in particular with larger labeled datasets and fine-tuning of the learning algorithms. The overall framework, however, works already well and can be used in production, promising improvements in EBS. The source code and data are publicly available under open licenses.\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1008277","type":"journal-article","created":{"date-parts":[[2020,11,20]],"date-time":"2020-11-20T14:27:35Z","timestamp":1605882455000},"page":"e1008277","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":22,"title":["EventEpi\u2014A natural language processing framework for event-based surveillance"],"prefix":"10.1371","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4428-168X","authenticated-orcid":true,"given":"Auss","family":"Abbood","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4894-6124","authenticated-orcid":true,"given":"Alexander","family":"Ullrich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9593-8696","authenticated-orcid":true,"given":"R\u00fcdiger","family":"Busche","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3911-9573","authenticated-orcid":true,"given":"St\u00e9phane","family":"Ghozzi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2020,11,20]]},"reference":[{"key":"pcbi.1008277.ref001","unstructured":"WHO. Epidemiology; 2014. Available from: https:\/\/www.who.int\/topics\/epidemiology\/en\/."},{"key":"pcbi.1008277.ref002","unstructured":"WHO. Early detection, assessment and response to acute public health events. WHO. 2014."},{"issue":"2","key":"pcbi.1008277.ref003","doi-asserted-by":"crossref","DOI":"10.1136\/bmjopen-2015-010204","article-title":"Effect of temperature and precipitation on salmonellosis cases in South-East Queensland, Australia: an observational study","volume":"6","author":"DM Stephen","year":"2016","journal-title":"BMJ Open"},{"issue":"8","key":"pcbi.1008277.ref004","doi-asserted-by":"crossref","first-page":"e0135676","DOI":"10.1371\/journal.pone.0135676","article-title":"The Impact of Water, Sanitation and Hygiene Interventions to Control Cholera: A Systematic Review","volume":"10","author":"DL Taylor","year":"2015","journal-title":"PLOS ONE"},{"issue":"5","key":"pcbi.1008277.ref005","article-title":"What is epidemic intelligence, and how is it being improved in Europe?","volume":"11","author":"R Kaiser","year":"2006","journal-title":"Euro Surveillance"},{"key":"pcbi.1008277.ref006","unstructured":"WHO. Epidemic intelligence\u2014systematic event detection; 2015. Available from: https:\/\/www.who.int\/csr\/alertresponse\/epidemicintelligence\/en\/."},{"issue":"13","key":"pcbi.1008277.ref007","doi-asserted-by":"crossref","first-page":"19162","DOI":"10.2807\/ese.14.13.19162-en","article-title":"Internet surveillance systems for early alerting of health threats","volume":"14","author":"JP Linge","year":"2009","journal-title":"Eurosurveillance"},{"key":"pcbi.1008277.ref008","unstructured":"Source code for EventEpi;. Available from: https:\/\/github.com\/aauss\/EventEpi."},{"key":"pcbi.1008277.ref009","unstructured":"Incidence database (IDB);. Available from: https:\/\/doi.org\/10.6084\/m9.figshare.12575978."},{"key":"pcbi.1008277.ref010","unstructured":"EventEpi word embeddings;. Available from: https:\/\/doi.org\/10.6084\/m9.figshare.12575966."},{"key":"pcbi.1008277.ref011","unstructured":"Global Rapid Identification Tool System (GRITS);. Available from: https:\/\/github.com\/ecohealthalliance\/diagnostic-dashboard."},{"key":"pcbi.1008277.ref012","unstructured":"EpiTator;. Available from: https:\/\/github.com\/ecohealthalliance\/EpiTator."},{"key":"pcbi.1008277.ref013","unstructured":"MediSys;. Available from: http:\/\/medisys.newsbrief.eu\/medisys\/helsinkiedition\/en\/home.html."},{"key":"pcbi.1008277.ref014","unstructured":"Disease incidents\u2014MEDISYS;. Available from: http:\/\/medisys.newsbrief.eu\/medisys\/helsinkiedition\/en\/home.html."},{"key":"pcbi.1008277.ref015","unstructured":"PULS Project: Surveillance of Global News Media;. Available from: http:\/\/puls.cs.helsinki.fi\/static\/index.html."},{"key":"pcbi.1008277.ref016","unstructured":"PULS;. Available from: http:\/\/puls.cs.helsinki.fi\/static\/index.html."},{"key":"pcbi.1008277.ref017","unstructured":"Chollet F, Others. Keras; 2015. \\url{https:\/\/github.com\/fchollet\/keras}."},{"issue":"Oct","key":"pcbi.1008277.ref018","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"F Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"pcbi.1008277.ref019","unstructured":"spaCy \u00b7 Industrial-strength Natural Language Processing in Python;. Available from: https:\/\/spacy.io\/."},{"key":"pcbi.1008277.ref020","unstructured":"WHO\u2014Disease Outbreak News (DONs);. Available from: https:\/\/www.who.int\/csr\/don\/en\/."},{"issue":"3","key":"pcbi.1008277.ref021","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1093\/inthealth\/ihx014","article-title":"ProMED-mail: 22 years of digital surveillance of emerging infectious diseases","volume":"9","author":"M Carrion","year":"2017","journal-title":"International health"},{"key":"pcbi.1008277.ref022","unstructured":"ProMED-mail;. Available from: https:\/\/promedmail.org\/."},{"key":"pcbi.1008277.ref023","volume-title":"Natural Language Processing with Python","author":"S Bird","year":"2009"},{"issue":"1","key":"pcbi.1008277.ref024","first-page":"41","article-title":"A Comparison of Event Models for Naive Bayes Text Classification","volume":"752","author":"A McCallum","year":"1998","journal-title":"AAAI-98 workshop on learning for text categorization"},{"key":"pcbi.1008277.ref025","unstructured":"Johnson R, Zhang T. Supervised and semi-supervised text categorization using LSTM for region embeddings. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning\u2014Volume 48. New York, USA: JMLR.org; 2016. p. 526\u2013534. Available from: https:\/\/dl.acm.org\/citation.cfm?id=3045447."},{"key":"pcbi.1008277.ref026","doi-asserted-by":"crossref","unstructured":"Conneau A, Schwenk H, Barrault L, Lecun Y. Very Deep Convolutional Networks for Text Classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Valencia, Spain: Association for Computational Linguistics; 2017. p. 1107\u20131116. Available from: https:\/\/www.aclweb.org\/anthology\/papers\/E\/E17\/E17-1104\/.","DOI":"10.18653\/v1\/E17-1104"},{"key":"pcbi.1008277.ref027","unstructured":"GloVe: Global Vectors for Wor Representation\u2014Kaggle;. Available from: https:\/\/www.kaggle.com\/rtatman\/glove-global-vectors-for-word-representation."},{"key":"pcbi.1008277.ref028","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed Representations of Words and Phrases and their Compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in Neural Information Processing Systems 26. Curran Associates, Inc.; 2013. p. 3111\u20133119. Available from: http:\/\/papers.nips.cc\/paper\/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf."},{"key":"pcbi.1008277.ref029","unstructured":"Wikimedia Downloads;. Available from: https:\/\/dumps.wikimedia.org\/."},{"key":"pcbi.1008277.ref030","unstructured":"Code Google. Google Code Archive\u2014Long-term storage for Google Code Project Hosting.; 2013. Available from: https:\/\/code.google.com\/archive\/p\/word2vec\/."},{"key":"pcbi.1008277.ref031","doi-asserted-by":"crossref","unstructured":"Lau JH, Baldwin T. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. In: Proceedings of the 1st Workshop on Representation Learning for NLP. Berlin, Germany: Association for Computational Linguistics; 2016. p. 78\u201386. Available from: http:\/\/aclweb.org\/anthology\/W16-1609.","DOI":"10.18653\/v1\/W16-1609"},{"issue":"C","key":"pcbi.1008277.ref032","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1016\/j.patrec.2016.06.012","article-title":"Representation learning for very short texts using weighted word embedding aggregation","volume":"80","author":"C De Boom","year":"2016","journal-title":"Pattern Recognition Letters"},{"key":"pcbi.1008277.ref033","unstructured":"He H, Bai Y, Edwardo A G, Li S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE; 2008. p. 1322\u20131328. Available from: http:\/\/ieeexplore.ieee.org\/document\/4633969\/."},{"key":"pcbi.1008277.ref034","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.ins.2013.07.007","article-title":"An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics","volume":"250","author":"V L\u00f3pez","year":"2013","journal-title":"Information Sciences"},{"issue":"17","key":"pcbi.1008277.ref035","first-page":"1","article-title":"Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning","volume":"18","author":"G Lemaitre","year":"2017","journal-title":"Journal of Machine Learning Research"},{"issue":"8","key":"pcbi.1008277.ref036","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0181142","article-title":"\u201cWhat is relevant in a text document?\u201d: An interpretable machine learning approach","volume":"12","author":"L Arras","year":"2017","journal-title":"PLOS ONE"},{"issue":"93","key":"pcbi.1008277.ref037","first-page":"1","article-title":"iNNvestigate Neural Networks!","volume":"20","author":"M Alber","year":"2019","journal-title":"Journal of Machine Learning Research"},{"key":"pcbi.1008277.ref038","doi-asserted-by":"crossref","unstructured":"Chinchor N. MUC-4 Evaluation Metrics. In: Proceedings of the 4th Conference on Message Understanding. MUC4\u201992. USA: Association for Computational Linguistics; 1992. p. 22\u201329. Available from: https:\/\/doi.org\/10.3115\/1072064.1072067.","DOI":"10.3115\/1072064.1072067"},{"key":"pcbi.1008277.ref039","doi-asserted-by":"crossref","unstructured":"Garc\u00eda V, A Mollineda R, S\u00e1nchez J. Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions. In: 4th Iberian Conference. vol. 5524; 2009. p. 441\u2013448.","DOI":"10.1007\/978-3-642-02172-5_57"},{"key":"pcbi.1008277.ref040","unstructured":"Rennie JDM, Shih L, Teevan J, Karger DR. Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: In Proceedings of the Twentieth International Conference on Machine Learning. Washington, DC, USA: AAAI Press; 2003. p. 616\u2013623. Available from: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/citations?doi=10.1.1.13.8572."},{"key":"pcbi.1008277.ref041","unstructured":"Flask;. Available from: http:\/\/flask.pocoo.org\/."},{"key":"pcbi.1008277.ref042","unstructured":"DataTables;. Available from: https:\/\/datatables.net\/."},{"key":"pcbi.1008277.ref043","unstructured":"DeepL Translator;. Available from: https:\/\/www.deepl.com\/translator."},{"key":"pcbi.1008277.ref044","doi-asserted-by":"crossref","unstructured":"Chen X, Cardie C. Unsupervised Multilingual Word Embeddings. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2018. p. 261\u2013270. Available from: https:\/\/www.aclweb.org\/anthology\/D18-1024.","DOI":"10.18653\/v1\/D18-1024"},{"key":"pcbi.1008277.ref045","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019. p. 4171\u20134186. Available from: https:\/\/www.aclweb.org\/anthology\/N19-1423."},{"key":"pcbi.1008277.ref046","doi-asserted-by":"crossref","unstructured":"Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics; 2018. p. 2227\u20132237. Available from: https:\/\/www.aclweb.org\/anthology\/N18-1202.","DOI":"10.18653\/v1\/N18-1202"},{"key":"pcbi.1008277.ref047","unstructured":"Ribeiro MT, Singh S, Guestrin C. \u201cWhy Should {I} Trust You?\u201d: Explaining the Predictions of Any Classifier. In: Proceedings of the 22nd {ACM} {SIGKDD} International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016; 2016. p. 1135\u20131144."},{"issue":"3","key":"pcbi.1008277.ref048","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1007\/s10115-013-0679-x","article-title":"Explaining prediction models and individual predictions with feature contributions","volume":"41","author":"E \u0160trumbelj","year":"2014","journal-title":"Knowledge and Information Systems"},{"key":"pcbi.1008277.ref049","doi-asserted-by":"crossref","unstructured":"Kakas AC, Cohn D, Dasgupta S, Barto AG, Carpenter GA, Grossberg S, et al. Active Learning. In: Encyclopedia of Machine Learning. Boston, MA: Springer US; 2011. p. 10\u201314. Available from: http:\/\/www.springerlink.com\/index\/10.1007\/978-0-387-30164-8{_}6.","DOI":"10.1007\/978-0-387-30164-8_6"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008277","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T00:00:00Z","timestamp":1607040000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008277","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,4,14]],"date-time":"2021-04-14T19:27:54Z","timestamp":1618428474000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008277"}},"subtitle":[],"editor":[{"given":"Benjamin Muir","family":"Althouse","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,11,20]]},"references-count":49,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,11,20]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008277","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/19006395","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,20]]}}}