{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,18]],"date-time":"2026-02-18T02:56:40Z","timestamp":1771383400818,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T00:00:00Z","timestamp":1661212800000},"content-version":"vor","delay-in-days":234,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology, Taiwan","doi-asserted-by":"publisher","award":["E-008-062-MY3"],"award-info":[{"award-number":["E-008-062-MY3"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology, Taiwan","doi-asserted-by":"publisher","award":["MOST 109-2221"],"award-info":[{"award-number":["MOST 109-2221"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,23]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Automatically extracting medication names from tweets is challenging in the real world. There are many tweets; however, only a small proportion mentions medications. Thus, datasets are usually highly imbalanced. Moreover, the length of tweets is very short, which makes it hard to recognize medication names from the limited context. This paper proposes a data-centric approach for extracting medications in the BioCreative VII Track 3 (Automatic Extraction of Medication Names in Tweets). Our approach formulates the sequence labeling problem as text entailment and question\u2013answer tasks. As a result, without using the dictionary and ensemble method, our single model achieved a Strict F1 of 0.77 (the official baseline system is 0.758, and the average performance of participants is 0.696). Moreover, combining the dictionary filtering and ensemble method achieved a Strict F1 of 0.804 and had the highest performance for all participants. Furthermore, domain-specific and task-specific pretrained language models, as well as data-centric approaches, are proposed for further improvements.<\/jats:p><jats:p>Database URL https:\/\/competitions.codalab.org\/competitions\/23925 and https:\/\/biocreative.bioinformatics.udel.edu\/tasks\/biocreative-vii\/track-3\/<\/jats:p>","DOI":"10.1093\/database\/baac067","type":"journal-article","created":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T16:11:53Z","timestamp":1661271113000},"source":"Crossref","is-referenced-by-count":2,"title":["Task reformulation and data-centric approach for Twitter medication name extraction"],"prefix":"10.1093","volume":"2022","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4934-5873","authenticated-orcid":false,"given":"Yu","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli Dist., Taoyuan City 32001, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jong Kang","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli Dist., Taoyuan City 32001, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jen-Chieh","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli Dist., Taoyuan City 32001, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Richard Tzong-Han","family":"Tsai","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli Dist., Taoyuan City 32001, Taiwan"},{"name":"IoX Center, National Taiwan University , No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan"},{"name":"Center for GIS, Research Center for Humanities and Social Sciences, Academia Sinica , 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,8,23]]},"reference":[{"key":"2022082316111972700_R1","article-title":"BioCreative VII\u2013Task 3: automatic extraction of medication names in tweets","author":"Weissenbacher","year":"2021"},{"key":"2022082316111972700_R2","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-030-77211-6_10","article-title":"Addressing extreme imbalance for detecting medications mentioned in twitter user timelines","author":"Weissenbacher","year":"2021"},{"key":"2022082316111972700_R3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1606.05250","article-title":"Squad: 100,000+ questions for machine comprehension of text","author":"Rajpurkar","year":"2016","journal-title":"arXiv Preprint arXiv:1606.05250"},{"key":"2022082316111972700_R4","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2101.00438","article-title":"Few-shot question answering by pretraining span selection","author":"Ram","year":"2021","journal-title":"arXiv Preprint arXiv:2101.00438"},{"key":"2022082316111972700_R5","doi-asserted-by":"publisher","first-page":"1373","DOI":"10.1613\/jair.1.12125","article-title":"Confident learning: estimating uncertainty in dataset labels","volume":"70","author":"Northcutt","year":"2021","journal-title":"J. Artif. Intell. Res."},{"key":"2022082316111972700_R6","article-title":"Named entity recognition in tweets: an experimental study","author":"Ritter","year":"2011"},{"key":"2022082316111972700_R7","article-title":"Annotating named entities in Twitter data with crowdsourcing","author":"Finin","year":"2010"},{"key":"2022082316111972700_R8","article-title":"Making sense of microposts (# msm2013) concept extraction challenge","author":"Cano Basave","year":"2013"},{"key":"2022082316111972700_R9","article-title":"Results of the wnut16 named entity recognition shared task","author":"Strauss","year":"2016"},{"key":"2022082316111972700_R10","doi-asserted-by":"crossref","DOI":"10.1142\/9789814749411_0054","article-title":"Social media mining shared task workshop","author":"Sarker","year":"2016"},{"key":"2022082316111972700_R11","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1016\/j.ipm.2014.10.006","article-title":"Analysis of named entity recognition and linking for tweets","volume":"51","author":"Derczynski","year":"2015","journal-title":"Inf. Process Manag."},{"key":"2022082316111972700_R12","article-title":"Learning with the web: spotting named entities on the intersection of NERD and machine learning. In # MSM. Citeseer","author":"Van Erp","year":"2013"},{"key":"2022082316111972700_R13","article-title":"Overview of the second social media mining for health (SMM4H) shared tasks at AMIA 2017","volume":"1","author":"Sarker","year":"2017","journal-title":"Training"},{"key":"2022082316111972700_R14","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/W18-5904","article-title":"Overview of the third social media mining for health (SMM4H) shared tasks at EMNLP 2018","author":"Weissenbacher","year":"2018"},{"key":"2022082316111972700_R15","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/W19-3203","article-title":"Overview of the fourth social media mining for health (SMM4H) shared tasks at ACL 2019","author":"Weissenbacher","year":"2019"},{"key":"2022082316111972700_R16","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2021.smm4h-1.4","article-title":"Overview of the Sixth Social Media Mining for Health Applications (# SMM4H) shared tasks at NAACL 2021","author":"Magge","year":"2021"},{"key":"2022082316111972700_R17","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2103.08493","article-title":"How many data points is a prompt worth?","author":"Scao","year":"2021","journal-title":"arXiv Preprint arXiv:2103.08493"},{"key":"2022082316111972700_R18","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1910.11476","article-title":"A unified MRC framework for named entity recognition","author":"Li","year":"2019","journal-title":"arXiv Preprint arXiv:1910.11476"},{"key":"2022082316111972700_R19","article-title":"A chat with Andrew on MLOps: from model-centric to data-centric ai","author":"Ng","year":"2021"},{"key":"2022082316111972700_R20","article-title":"Overview of the fifth Social Media Mining for Health Applications (# SMM4H) shared tasks at Coling 2020","author":"Klein","year":"2020"},{"key":"2022082316111972700_R21","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2005.10200","article-title":"BERTweet: a pre-trained language model for English tweets","author":"Nguyen","year":"2020","journal-title":"arXiv Preprint arXiv:2005.10200"},{"key":"2022082316111972700_R22","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2006.03654","article-title":"DeBERTa: decoding-enhanced BERT with disentangled attention","author":"He","year":"2020","journal-title":"arXiv Preprint arXiv:2006.03654"},{"key":"2022082316111972700_R23","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2022082316111972700_R24","article-title":"BioELECTRA: pretrained biomedical text encoder using discriminators","author":"Raj Kanakarajan","year":"2021"},{"key":"2022082316111972700_R25","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1810.04805","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018","journal-title":"arXiv Preprint arXiv:1810.04805"},{"key":"2022082316111972700_R26","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2104.14690","article-title":"Entailment as few-shot learner","author":"Wang","year":"2021","journal-title":"arXiv Preprint arXiv:2104.14690"},{"key":"2022082316111972700_R27","doi-asserted-by":"publisher","DOI":"10.2196\/publichealth.6396","article-title":"TwiMed: Twitter and PubMed comparable corpus of drugs, diseases, symptoms, and their relations","volume":"3","author":"Alvaro","year":"2017","journal-title":"JMIR Public Health Surveill."},{"key":"2022082316111972700_R28","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2003.13900","article-title":"A large-scale Twitter dataset for drug safety applications mined from publicly existing resources","author":"Tekumalla","year":"2020","journal-title":"arXiv Preprint arXiv:2003.13900"},{"key":"2022082316111972700_R29","doi-asserted-by":"crossref","DOI":"10.1609\/icwsm.v14i1.7357","article-title":"Mining archive.org\u2019s Twitter stream grab for pharmacovigilance research gold","author":"Tekumalla","year":"2020"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baac067\/45503092\/baac067.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baac067\/45503092\/baac067.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,15]],"date-time":"2023-02-15T18:02:01Z","timestamp":1676484121000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baac067\/6674007"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,1]]},"references-count":29,"URL":"https:\/\/doi.org\/10.1093\/database\/baac067","relation":{},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,1,1]]},"published":{"date-parts":[[2022,1,1]]},"article-number":"baac067"}}