{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T16:32:17Z","timestamp":1776184337838,"version":"3.50.1"},"reference-count":59,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2018,10,1]],"date-time":"2018-10-01T00:00:00Z","timestamp":1538352000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"crossref","award":["R01LM011176"],"award-info":[{"award-number":["R01LM011176"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000054","name":"National Cancer Institute","doi-asserted-by":"publisher","award":["R21CA218231"],"award-info":[{"award-number":["R21CA218231"]}],"id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"name":"UK EPSRC","award":["EP\/I028099\/1"],"award-info":[{"award-number":["EP\/I028099\/1"]}]},{"name":"UK EPSRC","award":["EP\/N027280\/1"],"award-info":[{"award-number":["EP\/N027280\/1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http:\/\/dx.doi.org\/10.17632\/rxwfb3tysd.1).<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocy114","type":"journal-article","created":{"date-parts":[[2018,8,2]],"date-time":"2018-08-02T19:29:38Z","timestamp":1533238178000},"page":"1274-1283","source":"Crossref","is-referenced-by-count":65,"title":["Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7358-544X","authenticated-orcid":false,"given":"Abeed","family":"Sarker","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]},{"given":"Maksim","family":"Belousov","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Manchester, Manchester, UK"}]},{"given":"Jasper","family":"Friedrichs","sequence":"additional","affiliation":[{"name":"Infosys Limited, Palo Alto, California, USA"}]},{"given":"Kai","family":"Hakala","sequence":"additional","affiliation":[{"name":"Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland"},{"name":"The University of Turku Graduate School, University of Turku, Turku, Finland"}]},{"given":"Svetlana","family":"Kiritchenko","sequence":"additional","affiliation":[{"name":"Digital Technologies Research Centre, National Research Council Canada, Ottawa, Canada"}]},{"given":"Farrokh","family":"Mehryary","sequence":"additional","affiliation":[{"name":"Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland"},{"name":"The University of Turku Graduate School, University of Turku, Turku, Finland"}]},{"given":"Sifei","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA"}]},{"given":"Tung","family":"Tran","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA"}]},{"given":"Anthony","family":"Rios","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA"}]},{"given":"Ramakanth","family":"Kavuluru","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA"},{"name":"Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA"}]},{"given":"Berry","family":"de Bruijn","sequence":"additional","affiliation":[{"name":"Digital Technologies Research Centre, National Research Council Canada, Ottawa, Canada"}]},{"given":"Filip","family":"Ginter","sequence":"additional","affiliation":[{"name":"Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland"}]},{"given":"Debanjan","family":"Mahata","sequence":"additional","affiliation":[{"name":"Bloomberg, New York, New York, USA"}]},{"given":"Saif M","family":"Mohammad","sequence":"additional","affiliation":[{"name":"Digital Technologies Research Centre, National Research Council Canada, Ottawa, Canada"}]},{"given":"Goran","family":"Nenadic","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Manchester, Manchester, UK"}]},{"given":"Graciela","family":"Gonzalez-Hernandez","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,10,1]]},"reference":[{"key":"2020110612231719600_ocy114-B1","author":"PEW Research Center","year":"2017"},{"key":"2020110612231719600_ocy114-B2","volume-title":"Public Interest in Science and Health Linked to Gender, Age and Personality.","author":"Kennedy","year":"2015"},{"key":"2020110612231719600_ocy114-B3","first-page":"265","article-title":"You are what you Tweet: analyzing Twitter for public health","author":"Paul","year":"2011","journal-title":"Proc Fifth Int AAAI Conf Weblogs Soc Media"},{"key":"2020110612231719600_ocy114-B4","first-page":"1568","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Aramaki","year":"2011"},{"issue":"5","key":"2020110612231719600_ocy114-B5","doi-asserted-by":"crossref","first-page":"e128","DOI":"10.2196\/jmir.3863","article-title":"Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in the Netherlands in 2013","volume":"17","author":"Mollema","year":"2015","journal-title":"J Med Internet Res"},{"key":"2020110612231719600_ocy114-B6","first-page":"480","article-title":"Text Classification for Automatic Detection of E-cigarette Use and Use for Smoking Cessation from Twitter: A Feasibility Pilot","volume":"21","author":"Aphinyanaphongs","year":"2016","journal-title":"Pac Symp Biocomput"},{"issue":"7","key":"2020110612231719600_ocy114-B7","doi-asserted-by":"crossref","first-page":"e170.","DOI":"10.2196\/jmir.3189","article-title":"The role of Facebook in Crush the Crave, a mobile- and social media-based smoking cessation intervention: qualitative framework analysis of posts","volume":"16","author":"Struik","year":"2014","journal-title":"J Med Internet Res"},{"key":"2020110612231719600_ocy114-B8","first-page":"85","author":"Kumar","year":"2015"},{"key":"2020110612231719600_ocy114-B9","doi-asserted-by":"crossref","first-page":"31","DOI":"10.3115\/v1\/W15-1204","volume-title":"Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. Association for Computational Linguistics","author":"Coppersmith","year":"2015"},{"key":"2020110612231719600_ocy114-B10","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1016\/j.jbi.2015.02.004","article-title":"Utilizing social media data for pharmacovigilance: a review","volume":"54","author":"Sarker","year":"2015","journal-title":"J Biomed Inform"},{"issue":"1","key":"2020110612231719600_ocy114-B11","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1093\/jamia\/ocx146","article-title":"Biomedical informatics and data science: evolving fields with significant overlap","volume":"25","author":"Brennan","year":"2018","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612231719600_ocy114-B12","year":"2017"},{"key":"2020110612231719600_ocy114-B13","author":"National Institute of Standards and Technology","year":"2017"},{"key":"2020110612231719600_ocy114-B14","year":"2017"},{"key":"2020110612231719600_ocy114-B15","author":"BioASQ","year":"2017"},{"key":"2020110612231719600_ocy114-B16","author":"BioCreative","year":"2017"},{"key":"2020110612231719600_ocy114-B17","author":"CLEF eHealth 2018","year":"2018"},{"key":"2020110612231719600_ocy114-B18"},{"issue":"1","key":"2020110612231719600_ocy114-B19","doi-asserted-by":"crossref","first-page":"224","DOI":"10.15265\/IY-2016-017","article-title":"Aspiring to unintended consequences of natural language processing: a review of recent developments in clinical and consumer-generated text processing","volume":"25","author":"Demner-Fushman","year":"2016","journal-title":"Yearb Med Inform"},{"issue":"1","key":"2020110612231719600_ocy114-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2414425.2414430","article-title":"Lexical normalization for social media text","volume":"4","author":"Han","year":"2013","journal-title":"ACM Trans Intell Syst Technol"},{"key":"2020110612231719600_ocy114-B21","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1016\/j.jbi.2014.11.002","article-title":"Portable automatic text classification for adverse drug reaction detection via multi-corpus training","volume":"53","author":"Sarker","year":"2015","journal-title":"J Biomed Inform"},{"issue":"01","key":"2020110612231719600_ocy114-B22","doi-asserted-by":"crossref","first-page":"214","DOI":"10.15265\/IY-2017-029","article-title":"Capturing the patient\u2019s perspective: a review of advances in natural language processing of health-related text","volume":"26","author":"Gonzalez-Hernandez","year":"2017","journal-title":"Yearb Med Inform"},{"issue":"3","key":"2020110612231719600_ocy114-B23","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1136\/jamia.2009.002733","article-title":"An overview of MetaMap: historical perspective and recent advances","volume":"17","author":"Aronson","year":"2010","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2020110612231719600_ocy114-B24","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1136\/jamia.2009.001560","article-title":"Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications","volume":"17","author":"Savova","year":"2010","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2020110612231719600_ocy114-B25","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1093\/jamia\/ocu041","article-title":"Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features","volume":"22","author":"Nikfarjam","year":"2015","journal-title":"J Am Med Informatics Assoc"},{"key":"2020110612231719600_ocy114-B26","author":"Sarker","year":"2017"},{"key":"2020110612231719600_ocy114-B27","first-page":"136","volume-title":"Proceedings of the BioNLP 2017 Workshop","author":"Klein"},{"key":"2020110612231719600_ocy114-B28","first-page":"581","article-title":"Social Media Mining Shared Task Workshop","volume":"21","author":"Sarker","year":"2016","journal-title":"Pac Symp Biocomput. World Scientific Publishing Company, Singapore"},{"issue":"1","key":"2020110612231719600_ocy114-B29","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educ Psychol Meas"},{"issue":"2","key":"2020110612231719600_ocy114-B30","doi-asserted-by":"crossref","first-page":"109","DOI":"10.2165\/00002018-199920020-00002","article-title":"The medical dictionary for regulatory activities (MedDRA)","volume":"20","author":"Brown","year":"1999","journal-title":"Drug Saf"},{"key":"2020110612231719600_ocy114-B31","first-page":"924","article-title":"Pharmacovigilance on Twitter? Mining tweets for adverse drug reactions","volume":"2014","author":"O\u2019Connor","year":"2014","journal-title":"AMIA Annu Symp Proc"},{"issue":"4","key":"2020110612231719600_ocy114-B32","doi-asserted-by":"crossref","first-page":"813","DOI":"10.1093\/jamia\/ocw180","article-title":"Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts","volume":"24","author":"Cocos","year":"2017","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612231719600_ocy114-B33","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1016\/j.jbi.2016.06.010","article-title":"OntoADR a semantic resource describing adverse drug reactions to support searching, coding, and information retrieval","volume":"63","author":"Souvignet","year":"2016","journal-title":"J Biomed Inform"},{"key":"2020110612231719600_ocy114-B34","author":"Owoputi","year":"2013"},{"key":"2020110612231719600_ocy114-B35","first-page":"1","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Kiritchenko","year":"2017"},{"issue":"1","key":"2020110612231719600_ocy114-B36","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1613\/jair.4272","article-title":"Sentiment of short informal texts","volume":"50","author":"Kiritchenko","year":"2014","journal-title":"J Artif Intell Res"},{"issue":"3","key":"2020110612231719600_ocy114-B37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3003433","article-title":"Stance and sentiment in tweets","volume":"17","author":"Mohammad","year":"2017","journal-title":"ACM Trans Internet Technol"},{"key":"2020110612231719600_ocy114-B38","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1145\/1014052.1014073","article-title":"Mining and summarizing customer reviews","volume":"4","author":"Hu","year":"2004","journal-title":"Proc 2004 ACM SIGKDD Int Conf Knowl Discov Data Min KDD 04"},{"issue":"4","key":"2020110612231719600_ocy114-B39","doi-asserted-by":"crossref","first-page":"1191","DOI":"10.3758\/s13428-012-0314-x","article-title":"Norms of valence, arousal, and dominance for 13, 915 English lemmas","volume":"45","author":"Warriner","year":"2013","journal-title":"Behav Res Methods"},{"issue":"12","key":"2020110612231719600_ocy114-B40","doi-asserted-by":"crossref","first-page":"e26752.","DOI":"10.1371\/journal.pone.0026752","article-title":"Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter","volume":"6","author":"Dodds","year":"2011","journal-title":"PLoS One"},{"key":"2020110612231719600_ocy114-B41","first-page":"49","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Han","year":"2017"},{"key":"2020110612231719600_ocy114-B42","first-page":"31","article-title":"Normalized (pointwise) mutual information in collocation extraction","author":"Bouma","year":"2009","journal-title":"Proc Ger Soc Comput Linguist (GSCL 2009)"},{"key":"2020110612231719600_ocy114-B43","first-page":"59","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Hakala","year":"2017"},{"key":"2020110612231719600_ocy114-B44","first-page":"68","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Friedrichs","year":"2017"},{"key":"2020110612231719600_ocy114-B45","author":"Godin","year":"2015"},{"key":"2020110612231719600_ocy114-B46","first-page":"149","volume-title":"8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis","author":"Shin","year":"2017"},{"key":"2020110612231719600_ocy114-B47","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J Artif Intell Res"},{"key":"2020110612231719600_ocy114-B48","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1016\/j.dib.2016.11.056","article-title":"A corpus for mining drug-related knowledge from Twitter chatter: language models and their utilities","volume":"10","author":"Sarker","year":"2017","journal-title":"Data Br"},{"key":"2020110612231719600_ocy114-B49","author":"Jozefowicz"},{"key":"2020110612231719600_ocy114-B50","first-page":"76","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Magge","year":"2017"},{"key":"2020110612231719600_ocy114-B51","first-page":"72","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Jain","year":"2017"},{"key":"2020110612231719600_ocy114-B52","first-page":"64","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Tsui","year":"2017"},{"key":"2020110612231719600_ocy114-B53","first-page":"83","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Wang","year":"2017"},{"key":"2020110612231719600_ocy114-B54","first-page":"54","volume-title":"Proceedings of the Second Workshop on Social Media Mining for Health Research and Applications Workshop Co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017)","author":"Belousov","year":"2017"},{"key":"2020110612231719600_ocy114-B55","author":"Kim","year":"2014"},{"key":"2020110612231719600_ocy114-B56","first-page":"679","volume-title":"AMIA Annu Symp Proc","author":"Emadzadeh","year":"2017"},{"issue":"1","key":"2020110612231719600_ocy114-B57","first-page":"183","article-title":"Recent advances in clinical natural language processing in support of semantic analysis","volume":"10","author":"Velupillai","year":"2015","journal-title":"Yearb Med Inform"},{"issue":"10","key":"2020110612231719600_ocy114-B58","doi-asserted-by":"crossref","first-page":"e0139701","DOI":"10.1371\/journal.pone.0139701","article-title":"Using social media for actionable disease surveillance and outbreak management: a systematic literature review. Braunstein LA, ed","volume":"10","author":"Charles-Smith","year":"2015","journal-title":"PLoS One"},{"key":"2020110612231719600_ocy114-B59","author":"Li","year":"2010"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/25\/10\/1274\/34150482\/ocy114.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/25\/10\/1274\/34150482\/ocy114.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,7]],"date-time":"2020-11-07T06:16:01Z","timestamp":1604729761000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/25\/10\/1274\/5113021"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,1]]},"references-count":59,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2018,10,1]]},"published-print":{"date-parts":[[2018,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocy114","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,10]]},"published":{"date-parts":[[2018,10,1]]}}}