{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T20:24:35Z","timestamp":1768595075012,"version":"3.49.0"},"reference-count":52,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2016,5,25]],"date-time":"2016-05-25T00:00:00Z","timestamp":1464134400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology of Taiwan","doi-asserted-by":"publisher","award":["MOST-104-2221-E-143-005"],"award-info":[{"award-number":["MOST-104-2221-E-143-005"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Social media platforms are emerging digital communication channels that provide an easy way for common people to share their health and medication experiences online. With more people discussing their health information online publicly, social media platforms present a rich source of information for exploring adverse drug reactions (ADRs). ADRs are major public health problems that result in deaths and hospitalizations of millions of people. Unfortunately, not all ADRs are identified before a drug is made available in the market. In this study, an ADR event monitoring system is developed which can recognize ADR mentions from a tweet and classify its assertion. We explored several entity recognition features, feature conjunctions, and feature selection and analyzed their characteristics and impacts on the recognition of ADRs, which have never been studied previously. The results demonstrate that the entity recognition performance for ADR can achieve an F-score of 0.562 on the PSB Social Media Mining shared task dataset, which outperforms the partial-matching-based method by 0.122. After feature selection, the F-score can be further improved by 0.026. This novel technique of text mining utilizing shared online social media data will open an array of opportunities for researchers to explore various health related issues.<\/jats:p>","DOI":"10.3390\/info7020027","type":"journal-article","created":{"date-parts":[[2016,5,25]],"date-time":"2016-05-25T09:15:07Z","timestamp":1464167707000},"page":"27","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1516-7255","authenticated-orcid":false,"given":"Hong-Jie","family":"Dai","sequence":"first","affiliation":[{"name":"Department of Computer Science &amp; Information Engineering, National Taitung University, Taitung 95092, Taiwan"},{"name":"Interdisciplinary Program of Green and Information Technology, National Taitung University, Taitung 95092, Taiwan"}]},{"given":"Musa","family":"Touray","sequence":"additional","affiliation":[{"name":"Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei 11031, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9912-2344","authenticated-orcid":false,"given":"Jitendra","family":"Jonnagaddala","sequence":"additional","affiliation":[{"name":"School of Public Health and Community Medicine, UNSW Australia, Sydney, NSW 2052, Australia"},{"name":"Prince of Wales Clinical School, UNSW Australia, Sydney, NSW 2052, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0412-767X","authenticated-orcid":false,"given":"Shabbir","family":"Syed-Abdul","sequence":"additional","affiliation":[{"name":"Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei 11031, Taiwan"},{"name":"International Center for Health Information Technology, Taipei Medical University, Taipei 11031, Taiwan"}]}],"member":"1968","published-online":{"date-parts":[[2016,5,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"e171","DOI":"10.2196\/jmir.4304","article-title":"Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review","volume":"17","author":"Lardon","year":"2015","journal-title":"J. Med. Internet Res."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1016\/j.jbi.2015.02.004","article-title":"Utilizing social media data for pharmacovigilance: A review","volume":"54","author":"Sarker","year":"2015","journal-title":"J. Biomed. Inform."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1111\/j.1365-2125.2006.02746.x","article-title":"Patient reporting of suspected adverse drug reactions: a review of published literature and international experience","volume":"63","author":"Blenkinsopp","year":"2007","journal-title":"Br. J. Clin. Pharmacol."},{"key":"ref_4","unstructured":"Cieliebak, M., Egger, D., and Uzdilli, F. Twitter can Help to Find Adverse Drug Reactions. Available online: http:\/\/ercim-news.ercim.eu\/en104\/special\/twitter-can-help-to-find-adverse-drug-reactions."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1016\/j.jbi.2011.07.005","article-title":"Identifying potential adverse effects using the web: A new approach to medical hypothesis generation","volume":"44","author":"Benton","year":"2011","journal-title":"J. Biomed. Inform."},{"key":"ref_6","unstructured":"Lafferty, J., McCallum, A., and Pereira, F. (2001, January 28). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning (ICML), Williamstown, MA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"913489","DOI":"10.1155\/2015\/913489","article-title":"Feature engineering for drug name recognition in biomedical texts: Feature conjunction and feature selection","volume":"2015","author":"Liu","year":"2015","journal-title":"Comput. Math. Methods Med."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1186\/1758-2946-7-S1-S14","article-title":"Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization","volume":"7","author":"Dai","year":"2015","journal-title":"J. Cheminform."},{"key":"ref_10","unstructured":"Tkachenko, M., and Simanovsky, A. (2012, January 19\u201321). Named entity recognition: Exploring features. Proceedings of The 11th Conference on Natural Language Processing (KONVENS 2012), Vienna, Austria."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhou, S., Zhang, S., and Karypis, G. (2012, January 15\u201318). Hierarchical Text Classification for News Articles Based-on Named Entities. Advanced Data Mining and Applications, Proceedings of the 8th International Conference, ADMA 2012, Nanjing, China.","DOI":"10.1007\/978-3-642-35527-1"},{"key":"ref_12","unstructured":"Tsai, R.T.-H., Hung, H.-C., Dai, H.-J., and Lin, Y.-W. (2007, January 6\u20137). Protein-protein interaction abstract identification with contextual bag of words. Proceedings of the 2nd International Symposium on Languages in Biology and Medicine (LBM 2007), Singapore."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Sarker, A., Nikfarjam, A., and Gonzalez, G. (2016, January 4\u20138). Social media mining shared task workshop. Proceedings of the Pacific Symposium on Biocomputing 2016, Big Island, HI, USA.","DOI":"10.1142\/9789814749411_0054"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Gimpel, K., Schneider, N., O\u2019Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., and Smith, N.A. (2011, January 19\u201324). Part-of-speech tagging for Twitter: Annotation, features, and experiments. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA.","DOI":"10.21236\/ADA547371"},{"key":"ref_15","unstructured":"Ritter, A., Clark, S., and Etzioni, O. (2011, January 27\u201331). Named entity recognition in tweets: an experimental study. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Finkel, J.R., Grenager, T., and Manning, C. (2005, January 25\u201330). Incorporating non-local information into information extraction systems by Gibbs sampling. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, USA.","DOI":"10.3115\/1219840.1219885"},{"key":"ref_17","unstructured":"Eisenstein, J. (2013, January 9\u201315). What to do about bad language on the internet. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), Atlanta, GA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1093\/jamia\/ocu041","article-title":"Pharmacovigilance from social media: Mining adverse drug reaction mentions using sequence labeling with word embedding cluster features","volume":"22","author":"Nikfarjam","year":"2015","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1002\/cpt.302","article-title":"Big Data and Adverse Drug Reaction Detection","volume":"99","author":"Harpaz","year":"2016","journal-title":"Clin. Pharmacol. Ther."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Dai, H.-J., Syed-Abdul, S., Chen, C.-W., and Wu, C.-C. (2015). Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields. BioMed Res. Int.","DOI":"10.1155\/2015\/873012"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"610","DOI":"10.1016\/j.drudis.2013.10.006","article-title":"Drug name recognition in biomedical texts: A machine-learning-based method","volume":"19","author":"He","year":"2014","journal-title":"Drug Discov. Today"},{"key":"ref_22","unstructured":"Kazama, J.I., and Torisawa, K. (2007, January 28\u201330). Exploiting Wikipedia as external knowledge for named entity recognition. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhang, T., and Johnson, D. (June, January 31). A robust risk minimization based named entity recognition system. Proceedings of the Seventh Conference on Natural language Learning at HLT-NAACL 2003, Edmonton, AB, Canada.","DOI":"10.3115\/1119176.1119210"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Tsai, R.T.-H., Sung, C.-L., Dai, H.-J., Hung, H.-C., Sung, T.-Y., and Hsu, W.-L. (2006). NERBio: Using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinform., 7.","DOI":"10.1186\/1471-2105-7-S5-S11"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Cohen, W.W., and Sarawagi, S. (2004, January 22\u201325). Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA.","DOI":"10.1145\/1014052.1014065"},{"key":"ref_26","unstructured":"Turian, J., Ratinov, L., and Bengio, Y. (2010, January 11\u201316). Word representations: A simple and general method for semi-supervised learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden."},{"key":"ref_27","first-page":"467","article-title":"Class-based n-gram models of natural language","volume":"18","author":"Brown","year":"1992","journal-title":"Comput. Linguist."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ratinov, L., and Roth, D. (2009, January 4\u20135). Design challenges and misconceptions in named entity recognition. Proceedings of the 19th Conference on Computational Natural Language Learning, Boulder, CO, USA.","DOI":"10.3115\/1596374.1596399"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lin, W.-S., Dai, H.-J., Jonnagaddala, J., Chang, N.-W., Jue, T.R., Iqbal, U., Shao, J.Y.-H., Chiang, I.J., and Li, Y.-C. (2015, January 20\u201322). Utilizing Different Word Representation Methods for Twitter Data in Adverse Drug Reactions Extraction. Proceedings of the 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI), Tainan, Taiwan.","DOI":"10.1109\/TAAI.2015.7407070"},{"key":"ref_30","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., and Dean, J. (2013, January 5\u201310). Distributed representations of words and phrases and their compositionality. Proceedings of Advances in Neural Information Processing Systems (NIPS 2013), Lake Taheo, NV, USA."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C.D. (2014, January 25\u201329). Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yates, A., Goharian, N., and Frieder, O. (2015, January 25\u201330). Extracting Adverse Drug Reactions from Social Media. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9527"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1016\/j.jbi.2014.11.002","article-title":"Portable automatic text classification for adverse drug reaction detection via multi-corpus training","volume":"53","author":"Sarker","year":"2015","journal-title":"J. Biomed. Inform."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1007\/s40264-015-0379-4","article-title":"Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter","volume":"39","author":"Sarker","year":"2016","journal-title":"Drug Saf."},{"key":"ref_35","unstructured":"Paul, M.J., and Dredze, M. (2011, January 17\u201321). You Are What You Tweet: Analyzing Twitter for Public Health. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM-11), Barcelona, Spain."},{"key":"ref_36","unstructured":"Owoputi, O., O\u2019Connor, B., Dyer, C., Gimpel, K., Schneider, N., and Smith, N.A. (2013, January 9\u201314). Improved part-of-speech tagging for online conversational text with word clusters. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, Atlanta, GA, USA."},{"key":"ref_37","unstructured":"Leaman, R., Wojtulewicz, L., Sullivan, R., Skariah, A., Yang, J., and Gonzalez, G. (2010, January 15). Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, Uppsala, Sweden."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The unified medical language system (UMLS): Integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Kuhn, M., Campillos, M., Letunic, I., Jensen, L.J., and Bork, P. (2010). A side effect resource to capture phenotypic effects of drugs. Mol. Syst. Biol., 6.","DOI":"10.1038\/msb.2009.98"},{"key":"ref_40","first-page":"570","article-title":"Analysis of Polarity Information in Medical Text","volume":"2005","author":"Niu","year":"2005","journal-title":"AMIA Ann. Symp. Proc."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Tsai, R.T.-H., Wu, S.-H., Chou, W.-C., Lin, C., He, D., Hsiang, J., Sung, T.-Y., and Hsu, W.-L. (2006). Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinform., 7.","DOI":"10.1186\/1471-2105-7-92"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Kim, J.-D., Ohta, T., Tsuruoka, Y., and Tateisi, Y. (2004, January 28\u201329). Introduction to the bio-entity recognition task at JNLPBA. Proceedings of the International Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-04), Geneva, Switzerland.","DOI":"10.3115\/1567594.1567610"},{"key":"ref_43","first-page":"382","article-title":"Developing a robust part-of-speech tagger for biomedical text","volume":"Volume 3746","author":"Bozanis","year":"2005","journal-title":"Advances in Informatics, Proceedings of the 10th Panhellenic Conference on Informatics, PCI 2005"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Fisher, D., and Lenz, H.-J. (1995). Learning from Data: Artificial Intelligence and Statistics V, Springer.","DOI":"10.1007\/978-1-4612-2404-4"},{"key":"ref_45","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_46","unstructured":"Klinger, R., and Friedrich, C.M. (2009, January 14\u201316). Feature Subset Selection in Conditional Random Fields for Named Entity Recognition. Proceedings of the International Conference RANLP 2009, Borovets, Bulgaria."},{"key":"ref_47","unstructured":"Brody, S., and Diakopoulos, N. (2011, January 27\u201329). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: Using word lengthening to detect sentiment in microblogs. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK."},{"key":"ref_48","unstructured":"Wang, C.-K., Singh, O., Dai, H.-J., Jonnagaddala, J., Jue, T.R., Iqbal, U., Su, E.C.-Y., Abdul, S.S., and Li, J.Y.-C. (2016, January 4\u20138). NTTMUNSW system for adverse drug reactions extraction in Twitter data. Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, Big Island, HI, USA."},{"key":"ref_49","unstructured":"Lai, S., Liu, K., Xu, L., and Zhao, J. (2015). How to Generate a Good Word Embedding?."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1109\/TSMCC.2011.2161285","article-title":"A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches","volume":"Volume 42","author":"Galar","year":"2012","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews"},{"key":"ref_51","first-page":"147","article-title":"A preliminary study on automatic identification of patient smoking status in unstructured electronic health records","volume":"2015","author":"Jonnagaddala","year":"2015","journal-title":"ACL-IJCNLP"},{"key":"ref_52","unstructured":"Jonnagaddala, J., Jue, T.R., and Dai, H.-J. (2016, January 4\u20138). Binary classification of Twitter posts for adverse drug reactions. Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, Big Island, HI, USA."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/7\/2\/27\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T19:24:26Z","timestamp":1760210666000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/7\/2\/27"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,5,25]]},"references-count":52,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2016,6]]}},"alternative-id":["info7020027"],"URL":"https:\/\/doi.org\/10.3390\/info7020027","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,5,25]]}}}