{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T11:27:20Z","timestamp":1776079640333,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2019,9,28]],"date-time":"2019-09-28T00:00:00Z","timestamp":1569628800000},"content-version":"vor","delay-in-days":1,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>Twitter posts are now recognized as an important source of patient-generated data, providing unique insights into population health. A fundamental step toward incorporating Twitter data in pharmacoepidemiologic research is to automatically recognize medication mentions in tweets. Given that lexical searches for medication names suffer from low recall due to misspellings or ambiguity with common words, we propose a more advanced method to recognize them.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We present Kusuri, an Ensemble Learning classifier able to identify tweets mentioning drug products and dietary supplements. Kusuri (\u85ac, \u201cmedication\u201d in Japanese) is composed of 2 modules: first, 4 different classifiers (lexicon based, spelling variant based, pattern based, and a weakly trained neural network) are applied in parallel to discover tweets potentially containing medication names; second, an ensemble of deep neural networks encoding morphological, semantic, and long-range dependencies of important words in the tweets makes the final decision.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>On a class-balanced (50-50) corpus of 15 005 tweets, Kusuri demonstrated performances close to human annotators with an F1 score of 93.7%, the best score achieved thus far on this corpus. On a corpus made of all tweets posted by 112 Twitter users (98 959 tweets, with only 0.26% mentioning medications), Kusuri obtained an F1 score of 78.8%. To the best of our knowledge, Kusuri is the first system to achieve this score on such an extremely imbalanced dataset.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>The system identifies tweets mentioning drug names with performance high enough to ensure its usefulness, and is ready to be integrated in pharmacovigilance, toxicovigilance, or more generally, public health pipelines that depend on medication name mentions.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocz156","type":"journal-article","created":{"date-parts":[[2019,8,15]],"date-time":"2019-08-15T19:11:55Z","timestamp":1565896315000},"page":"1618-1626","source":"Crossref","is-referenced-by-count":33,"title":["Deep neural networks ensemble for detecting medication mentions in tweets"],"prefix":"10.1093","volume":"26","author":[{"given":"Davy","family":"Weissenbacher","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7358-544X","authenticated-orcid":false,"given":"Abeed","family":"Sarker","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8281-3464","authenticated-orcid":false,"given":"Ari","family":"Klein","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]},{"given":"Karen","family":"O\u2019Connor","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]},{"given":"Arjun","family":"Magge","sequence":"additional","affiliation":[{"name":"Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, Tempe, Arizona, USA"}]},{"given":"Graciela","family":"Gonzalez-Hernandez","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,9,27]]},"reference":[{"issue":"1","key":"2020110612455014000_ocz156-B1","doi-asserted-by":"crossref","first-page":"e1\u2013e8","DOI":"10.2105\/AJPH.2016.303512a","article-title":"Twitter as a tool for health research: a systematic review","volume":"107","author":"Sinnenberg","year":"2017","journal-title":"Am J Public Health"},{"issue":"3","key":"2020110612455014000_ocz156-B2","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.artmed.2014.01.002","article-title":"Twitter mining for fine-grained syndromic surveillance","volume":"61","author":"Velardi","year":"2014","journal-title":"Artif Intell Med"},{"issue":"9","key":"2020110612455014000_ocz156-B3","doi-asserted-by":"crossref","first-page":"e315.","DOI":"10.2196\/jmir.7393","article-title":"Enhancing seasonal influenza surveillance: topic analysis of widely used medicinal drugs using twitter data","volume":"19","author":"Kagashe","year":"2017","journal-title":"J Med Internet Res"},{"issue":"6","key":"2020110612455014000_ocz156-B4","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1093\/jamia\/ocz013","article-title":"\u201cComment on: \u201cdeep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in twitter posts\u201d","volume":"26","author":"Magge","year":"2019","journal-title":"J Am Med Inform Assoc"},{"issue":"4","key":"2020110612455014000_ocz156-B5","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1093\/pubmed\/fdx020","article-title":"Systematic review of surveillance by social media platforms for illicit drug use","volume":"39","author":"Kazemi","year":"2017","journal-title":"J Public Health (Oxf)"},{"key":"2020110612455014000_ocz156-B6","first-page":"1977","author":"Sekine","year":"2004"},{"key":"2020110612455014000_ocz156-B7","first-page":"359","author":"Liu","year":"2011"},{"issue":"10","key":"2020110612455014000_ocz156-B8","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1093\/jamia\/ocy114","article-title":"Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task","volume":"25","author":"Sarker","year":"2018","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612455014000_ocz156-B9","first-page":"1524","author":"Ritter","year":"2011"},{"key":"2020110612455014000_ocz156-B10","first-page":"55","article-title":"Exploring brand-name drug mentions on twitter for pharmacovigilance","volume":"210","author":"Carbonell","year":"2015","journal-title":"Stud Health Technol Inform"},{"issue":"5","key":"2020110612455014000_ocz156-B11","doi-asserted-by":"crossref","first-page":"667\u2013700","DOI":"10.3233\/SW-170276","article-title":"Lessons learnt from the named entity recognition and linking (NEEL) challenge series","volume":"8","author":"Rizzo","year":"2017","journal-title":"Semant Web"},{"key":"2020110612455014000_ocz156-B12","first-page":"140","author":"Derczynski","year":"2017"},{"key":"2020110612455014000_ocz156-B13","author":"Lopez","year":"2017"},{"key":"2020110612455014000_ocz156-B14","first-page":"13","author":"Weissenbacher","year":"2018"},{"key":"2020110612455014000_ocz156-B15","first-page":"138","author":"Strauss","year":"2016"},{"key":"2020110612455014000_ocz156-B16","author":"Sileo","year":"2017"},{"key":"2020110612455014000_ocz156-B17","first-page":"145","author":"Limsopatham","year":"2016"},{"issue":"4","key":"2020110612455014000_ocz156-B18","doi-asserted-by":"crossref","first-page":"790","DOI":"10.3390\/info6040790","article-title":"Drug name recognition: approaches and resources","volume":"6","author":"Liu","year":"2015","journal-title":"Information"},{"issue":"5","key":"2020110612455014000_ocz156-B19","doi-asserted-by":"crossref","first-page":"514","DOI":"10.1136\/jamia.2010.003947","article-title":"Extracting medication information from clinical text","volume":"17","author":"Uzuner","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612455014000_ocz156-B20","first-page":"341","author":"Segura-Bedmar","year":"2013"},{"key":"2020110612455014000_ocz156-B21","doi-asserted-by":"crossref","DOI":"10.1186\/1758-2946-7-S1-S1","article-title":"CHEMDNER: the drugs and chemical names extraction challenge","volume":"7","author":"Krallinger","year":"2015","journal-title":"J Cheminform"},{"key":"2020110612455014000_ocz156-B22","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1016\/j.dib.2016.11.056","article-title":"A corpus for mining drug-related knowledge from twitter chatter: language models and their utilities","volume":"10","author":"Sarker","year":"2017","journal-title":"Data Brief"},{"key":"2020110612455014000_ocz156-B23","first-page":"643","article-title":"Identifying diseases, drugs, and symptoms in twitter","volume":"216","author":"Jimeno-Yepes","year":"2015","journal-title":"Stud Health Technol Inform"},{"key":"2020110612455014000_ocz156-B24","first-page":"34","author":"Wu","year":"2018"},{"issue":"10","key":"2020110612455014000_ocz156-B25","doi-asserted-by":"crossref","first-page":"e361.","DOI":"10.2196\/jmir.8164","article-title":"Discovering cohorts of pregnant women from social media for safety surveillance and analysis","volume":"19","author":"Sarker","year":"2017","journal-title":"J Med Internet Res"},{"key":"2020110612455014000_ocz156-B26","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.jbi.2018.11.007","article-title":"An unsupervised and customizable misspelling generator for mining noisy health-related text sources","volume":"88","author":"Sarker","year":"2018","journal-title":"J Biomed Inform"},{"key":"2020110612455014000_ocz156-B27","first-page":"2716","author":"Shen","year":"2016"},{"key":"2020110612455014000_ocz156-B28","author":"Grave","year":"2014"},{"key":"2020110612455014000_ocz156-B29","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1007\/s40264-018-0731-6","article-title":"Pharmacoepidemiologic evaluation of birth defects from health-related postings in social media during pregnancy","volume":"42","author":"Golder","year":"2019","journal-title":"Drug Saf"},{"key":"2020110612455014000_ocz156-B30","author":"Vanni","year":"2018"},{"key":"2020110612455014000_ocz156-B31","volume-title":"Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow","author":"Raschka","year":"2017","edition":"2nd ed"},{"key":"2020110612455014000_ocz156-B32","first-page":"1","author":"Chalapathy","year":"2016"},{"issue":"7","key":"2020110612455014000_ocz156-B33","doi-asserted-by":"crossref","first-page":"1895","DOI":"10.1162\/089976698300017197","article-title":"Approximate Statistical tests for comparing supervised classification learning algorithms","volume":"10","author":"Dietterich","year":"1998","journal-title":"Neural Comput"},{"key":"2020110612455014000_ocz156-B34","author":"Wang","year":"2018"},{"key":"2020110612455014000_ocz156-B35","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin","journal-title":"arXiv"},{"key":"2020110612455014000_ocz156-B36","author":"Peters","year":"2018"},{"key":"2020110612455014000_ocz156-B37","first-page":"670","author":"Conneau","year":"2017"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/12\/1618\/34151762\/ocz156.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/12\/1618\/34151762\/ocz156.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,16]],"date-time":"2021-01-16T20:58:46Z","timestamp":1610830726000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/26\/12\/1618\/5575394"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,27]]},"references-count":37,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2019,9,27]]},"published-print":{"date-parts":[[2019,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz156","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,12]]},"published":{"date-parts":[[2019,9,27]]}}}