{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T19:53:24Z","timestamp":1772913204841,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1010144","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,6,28]],"date-time":"2022-06-28T00:00:00Z","timestamp":1656374400000}}],"reference-count":51,"publisher":"Public Library of Science (PLoS)","issue":"6","license":[{"start":{"date-parts":[[2022,6,15]],"date-time":"2022-06-15T00:00:00Z","timestamp":1655251200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61702443"],"award-info":[{"award-number":["61702443"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61966038"],"award-info":[{"award-number":["61966038"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61762091"],"award-info":[{"award-number":["61762091"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology, Taiwan","doi-asserted-by":"publisher","award":["MOST 110-2628-E-155-002"],"award-info":[{"award-number":["MOST 110-2628-E-155-002"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Analysis of health-related texts can be used to detect adverse drug reactions (ADR). The greatest challenge for ADR detection lies in imbalanced data distributions where words related to ADR symptoms are often minority classes. As a result, trained models tend to converge to a point that strongly biases towards the majority class and then ignores the minority class. Since the most used cross-entropy criteria is an approximation to accuracy, the model focuses more readily on the majority class to achieve high accuracy. To address this issue, existing methods apply either oversampling or down-sampling strategies to balance the data distribution and exploit the most difficult samples of the minority class. However, increasing or reducing the number of individual tokens alone in sequence labeling tasks will result in the loss of the syntactic relations of the sentence. This paper proposes a weighted variant of conditional random field (CRF) for data-imbalanced sequence labeling tasks. Such a weighting strategy can alleviate data distribution imbalances between majority and minority classes. Instead of using <jats:italic>softmax<\/jats:italic> in the output layer, the CRF can capture the relationship of labels between tokens. The locally interpretable model-agnostic explanations (LIME) algorithm was applied to investigate performance differences between models with and without the weighted loss function. Experimental results on two different ADR tasks show that the proposed model outperforms previously proposed sequence labeling methods.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1010144","type":"journal-article","created":{"date-parts":[[2022,6,15]],"date-time":"2022-06-15T18:14:55Z","timestamp":1655316895000},"page":"e1010144","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":11,"title":["Explainable detection of adverse drug reaction with imbalanced data distribution"],"prefix":"10.1371","volume":"18","author":[{"given":"Jin","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1443-4347","authenticated-orcid":true,"given":"Liang-Chih","family":"Yu","sequence":"additional","affiliation":[]},{"given":"Xuejie","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,6,15]]},"reference":[{"issue":"5","key":"pcbi.1010144.ref001","doi-asserted-by":"crossref","first-page":"385","DOI":"10.2165\/00002018-200629050-00003","article-title":"Under-Reporting of Adverse A Systematic Review","volume":"29","author":"L Hazell","year":"2006","journal-title":"Drug Safety"},{"issue":"3","key":"pcbi.1010144.ref002","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1136\/amiajnl-2012-000930","article-title":"Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions","volume":"20","author":"R Harpaz","year":"2013","journal-title":"Journal of the American Medical Informatics Association"},{"issue":"6","key":"pcbi.1010144.ref003","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1038\/clpt.2013.47","article-title":"Pharmacovigilance using clinical notes","volume":"93","author":"P LePendu","year":"2013","journal-title":"Clinical Pharmacology and Therapeutics"},{"issue":"3","key":"pcbi.1010144.ref004","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1197\/jamia.M3028","article-title":"Active Computerized Pharmacovigilance Using Natural Language Processing, Statistics, and Electronic Health Records: A Feasibility Study","volume":"16","author":"X Wang","year":"2009","journal-title":"Journal of the American Medical Informatics Association"},{"issue":"6","key":"pcbi.1010144.ref005","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1136\/jamia.2010.008607","article-title":"Drug safety surveillance using de-identified EMR and claims data: Issues and challenges","volume":"17","author":"PM Nadkarni","year":"2010","journal-title":"Journal of the American Medical Informatics Association"},{"key":"pcbi.1010144.ref006","unstructured":"Kiritchenko S, Mohammad SM, Morin J, De Bruijn B. NRC-Canada at SMM4H shared task: Classifying tweets mentioning adverse drug reactions and medication intake. In: CEUR Workshop Proceedings; 2017. p. 1\u201311."},{"issue":"3","key":"pcbi.1010144.ref007","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1093\/jamia\/ocu041","article-title":"Pharmacovigilance from social media: Mining adverse drug reaction mentions using sequence labeling with word embedding cluster features","volume":"22","author":"A Nikfarjam","year":"2015","journal-title":"Journal of the American Medical Informatics Association"},{"key":"pcbi.1010144.ref008","doi-asserted-by":"crossref","unstructured":"Iyyer M, Manjunatha V, Boyd-Graber J, Daum\u00e9 III H. Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-2015); 2015. p. 1681\u20131691. Available from: http:\/\/www.aclweb.org\/anthology\/P15-1162.","DOI":"10.3115\/v1\/P15-1162"},{"key":"pcbi.1010144.ref009","doi-asserted-by":"crossref","unstructured":"Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. arXiv preprint arXiv:160701759. 2016.","DOI":"10.18653\/v1\/E17-2068"},{"key":"pcbi.1010144.ref010","doi-asserted-by":"crossref","unstructured":"Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. arXiv preprint arXiv:160704606. 2016.","DOI":"10.1162\/tacl_a_00051"},{"key":"pcbi.1010144.ref011","unstructured":"Mikolov T, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems (NIPS-2013); 2013."},{"key":"pcbi.1010144.ref012","unstructured":"Mikolov T, Corrado G, Chen K, Dean J. Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR-2013); 2013."},{"key":"pcbi.1010144.ref013","unstructured":"Miranda DS. Automated detection of adverse drug reactions in the biomedical literature using convolutional neural networks and biomedical word Embeddings. In: Proceedings of the 3th Swiss Text Analytics Conference; 2018. p. 33\u201341."},{"key":"pcbi.1010144.ref014","unstructured":"Huynh T, He Y, Willis A, Ruger S. Adverse drug reaction classification with deep neural networks. In: Proceedings of the 26th International Conference on Computational Linguistics (COLING-2016); 2016. p. 877\u2013887."},{"issue":"4","key":"pcbi.1010144.ref015","doi-asserted-by":"crossref","first-page":"813","DOI":"10.1093\/jamia\/ocw180","article-title":"Deep learning for pharmacovigilance: Recurrent neural network architectures for labeling adverse drug reactions in Twitter posts","volume":"24","author":"A Cocos","year":"2017","journal-title":"Journal of the American Medical Informatics Association"},{"key":"pcbi.1010144.ref016","unstructured":"Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv preprint arXiv:150801991. 2015."},{"key":"pcbi.1010144.ref017","doi-asserted-by":"crossref","unstructured":"Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL\/HLT-2016); 2016. p. 260\u2013270.","DOI":"10.18653\/v1\/N16-1030"},{"key":"pcbi.1010144.ref018","doi-asserted-by":"crossref","first-page":"73305","DOI":"10.1109\/ACCESS.2018.2882443","article-title":"An attentive neural sequence labeling model for adverse drug reactions mentions extraction","volume":"6","author":"P Ding","year":"2018","journal-title":"IEEE Access"},{"key":"pcbi.1010144.ref019","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL\/HLT-2019); 2019. p. 4171\u20134186. Available from: http:\/\/arxiv.org\/abs\/1810.04805."},{"key":"pcbi.1010144.ref020","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:190711692. 2019."},{"key":"pcbi.1010144.ref021","unstructured":"Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv preprint arXiv:190911942. 2019."},{"key":"pcbi.1010144.ref022","doi-asserted-by":"crossref","unstructured":"Akbani R, Kwek S, Japkowicz N. Applying support vector machines to imbalanced datasets. In: Proceddings of the 15th European Conference on Machine Learning (ECML-2004); 2004. p. 39\u201350.","DOI":"10.1007\/978-3-540-30115-8_7"},{"key":"pcbi.1010144.ref023","unstructured":"O\u2019Connor K, Pimpalkhute P, Nikfarjam A, Ginn R, Smith KL, Gonzalez G. Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. In: Proceedings of Annual Symposium of American Medical Informatics Association; 2014. p. 924\u2013933."},{"issue":"5","key":"pcbi.1010144.ref024","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1016\/j.jbi.2012.04.008","article-title":"Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports","volume":"45","author":"H Gurulingappa","year":"2012","journal-title":"Journal of Biomedical Informatics"},{"key":"pcbi.1010144.ref025","unstructured":"Ramamoorthy S, Murugan S. An attentive sequence model for adverse drug event extraction from biomedical text. arXiv preprint arXiv:180100625. 2018."},{"key":"pcbi.1010144.ref026","first-page":"309","author":"M Rei","year":"2016","journal-title":"Attending to Characters in Neural Sequence Labeling Models"},{"key":"pcbi.1010144.ref027","doi-asserted-by":"crossref","unstructured":"Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep contextualized word representations. In: Proceedings of the NAACL-HLT 2018; 2018. p. 2227\u20132237.","DOI":"10.18653\/v1\/N18-1202"},{"key":"pcbi.1010144.ref028","doi-asserted-by":"crossref","unstructured":"Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP-2014). 2014; p. 1532\u20131543.","DOI":"10.3115\/v1\/D14-1162"},{"key":"pcbi.1010144.ref029","doi-asserted-by":"crossref","unstructured":"Lin TY, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV-2017); 2017. p. 2980\u20132988.","DOI":"10.1109\/ICCV.2017.324"},{"key":"pcbi.1010144.ref030","doi-asserted-by":"crossref","unstructured":"Li X, Sun X, Meng Y, Liang J, Wu F, Li J. Dice Loss for Data-imbalanced NLP Tasks. 2019.","DOI":"10.18653\/v1\/2020.acl-main.45"},{"key":"pcbi.1010144.ref031","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention Is All You Need. In: Advances in neural information processing systems(nips-2017); 2017. p. 5598\u20136008."},{"key":"pcbi.1010144.ref032","unstructured":"Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, et al. Google\u2019s neural machine translation system: bridging the gap between human and machine translation. In: Proceedings of the Conference of the Association for Machine Translation in the Americans; 2016. p. 193\u2013199. Available from: http:\/\/arxiv.org\/abs\/1609.08144."},{"issue":"2017","key":"pcbi.1010144.ref033","first-page":"193","volume":"1","author":"A Vaswani","year":"2018","journal-title":"Tensor2Tensor for Neural Machine Translation"},{"issue":"2","key":"pcbi.1010144.ref034","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition","volume":"77","author":"LR Rabiner","year":"1989","journal-title":"Proceedings of the IEEE"},{"key":"pcbi.1010144.ref035","unstructured":"Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of 18th. In: Proceedings of the 18th International Conference on Machine Learning (ICML-2001); 2001. p. 282\u2013289. Available from: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.23.9849&rep=rep1&type=pdf%5Cnhttp:\/\/portal.acm.org\/citation.cfm?id=645530.655813."},{"issue":"Suppl 8","key":"pcbi.1010144.ref036","first-page":"1","article-title":"Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction","volume":"19","author":"S Gupta","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1010144.ref037","doi-asserted-by":"crossref","unstructured":"Lee K, Qadir A, Hasan SA, Datla V, Prakash A, Liu J, et al. Adverse drug event detection in tweets with semi-supervised convolutional neural networks. In: Proceedings of the 26th International World Wide Web Conference (WWW-2017); 2017. p. 705\u2013714.","DOI":"10.1145\/3038912.3052671"},{"issue":"198","key":"pcbi.1010144.ref038","first-page":"1","article-title":"A neural joint model for entity and relation extraction from biomedical text","volume":"18","author":"F Li","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1010144.ref039","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2017\/9451342","article-title":"Combination of Deep Recurrent Neural Networks and Conditional Random Fields for Extracting Adverse Drug Reactions from User Reviews","volume":"2017","author":"E Tutubalina","year":"2017","journal-title":"Journal of Healthcare Engineering"},{"key":"pcbi.1010144.ref040","first-page":"48","article-title":"Medication and Adverse Drug Event Detection Workshop","author":"S Wunnava","year":"2018","journal-title":"Proceedings of Machine Learning Research"},{"key":"pcbi.1010144.ref041","unstructured":"Dandala B, Diwakar M, Murthy D. IBM Research System at TAC 2017: Adverse Drug Reactions Extraction from Drug Labels. Text Analysis Conference (TAC2017). 2017."},{"key":"pcbi.1010144.ref042","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2018\/2379208","article-title":"Recognizing Continuous and Discontinuous Adverse Drug Reaction Mentions from Social Media Using LSTM-CRF","volume":"2018","author":"B Tang","year":"2018","journal-title":"Wireless Communications and Mobile Computing"},{"key":"pcbi.1010144.ref043","doi-asserted-by":"crossref","unstructured":"Luong MT, Pham H, Manning CD. Effective Approaches to Attention-based Neural Machine Translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP-2015); 2015. p. 1412\u20131421.","DOI":"10.18653\/v1\/D15-1166"},{"key":"pcbi.1010144.ref044","doi-asserted-by":"crossref","unstructured":"Malisiewicz T, Gupta A, Efros AA. Ensemble of exemplar-SVMs for object detection and beyond. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV-2011); 2011. p. 89\u201396.","DOI":"10.1109\/ICCV.2011.6126229"},{"issue":"10","key":"pcbi.1010144.ref045","doi-asserted-by":"crossref","first-page":"1624","DOI":"10.1109\/TNN.2010.2066988","article-title":"RAMOBoost: Ranked minority oversampling in boosting","volume":"21","author":"S Chen","year":"2010","journal-title":"IEEE Transactions on Neural Networks"},{"key":"pcbi.1010144.ref046","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic Minority Over-sampling Technique","volume":"16","author":"NV Chawla","year":"2002","journal-title":"Journal of Artificial Intelligence Research"},{"key":"pcbi.1010144.ref047","unstructured":"Chang HS, Learned-Miller E, McCallum A. Active bias: Training more accurate neural networks by emphasizing high variance samples. In: Advances in Neural Information Processing Systems (NIPS-2017); 2017. p. 1003\u20131013."},{"key":"pcbi.1010144.ref048","unstructured":"Katharopoulos A, Fleuret F. Not all samples are created equal: Deep learning with importance sampling. In: Proceedings of the 35th International Conference on Machine Learning (ICML-2018); 2018. p. 3936\u20133949."},{"key":"pcbi.1010144.ref049","unstructured":"Jiang L, Zhou Z, Leung T, Li LJ, Fei-Fei L. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. Proceedings of the 35th International Conference on Machine Learning (ICML-2018). 2018; p. 3601\u20133620."},{"key":"pcbi.1010144.ref050","unstructured":"Fan Y, Tian F, Qin T, Li XY, Liu TY. Learning to teach. In: Proceedings of The 6th International Conference on Learning Representations (ICLR-2018); 2018. p. 1\u201316."},{"issue":"8","key":"pcbi.1010144.ref051","doi-asserted-by":"crossref","first-page":"1062","DOI":"10.1109\/TC.2018.2805683","article-title":"AdBoost: Thermal Aware Performance Boosting Through Dark Silicon Patterning","volume":"67","author":"A Kanduri","year":"2018","journal-title":"IEEE Transactions on Computers"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1010144","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,6,28]],"date-time":"2022-06-28T00:00:00Z","timestamp":1656374400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010144","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,28]],"date-time":"2022-06-28T18:52:11Z","timestamp":1656442331000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010144"}},"subtitle":[],"editor":[{"given":"Andrey","family":"Rzhetsky","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,6,15]]},"references-count":51,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,6,15]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010144","relation":{"new_version":[{"id-type":"doi","id":"10.1371\/journal.pcbi.1010144","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,15]]}}}