{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T09:31:12Z","timestamp":1775899872589,"version":"3.50.1"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"W1","funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,7,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>PubTator Central (https:\/\/www.ncbi.nlm.nih.gov\/research\/pubtator\/) is a web service for viewing and retrieving bioconcept annotations in full text biomedical articles. PubTator Central (PTC) provides automated annotations from state-of-the-art text mining systems for genes\/proteins, genetic variants, diseases, chemicals, species and cell lines, all available for immediate download. PTC annotates PubMed (29 million abstracts) and the PMC Text Mining subset (3 million full text articles). The new PTC web interface allows users to build full text document collections and visualize concept annotations in each document. Annotations are downloadable in multiple formats (XML, JSON and tab delimited) via the online interface, a RESTful web service and bulk FTP. Improved concept identification systems and a new disambiguation module based on deep learning increase annotation accuracy, and the new server-side architecture is significantly faster. PTC is synchronized with PubMed and PubMed Central, with new articles added daily. The original PubTator service has served annotated abstracts for \u223c300 million requests, enabling third-party research in use cases such as biocuration support, gene prioritization, genetic disease analysis, and literature-based knowledge discovery. We demonstrate the full text results in PTC significantly increase biomedical concept coverage and anticipate this expansion will both enhance existing downstream applications and enable new use cases.<\/jats:p>","DOI":"10.1093\/nar\/gkz389","type":"journal-article","created":{"date-parts":[[2019,5,1]],"date-time":"2019-05-01T07:08:20Z","timestamp":1556694500000},"page":"W587-W593","source":"Crossref","is-referenced-by-count":343,"title":["PubTator central: automated concept annotation for biomedical full text articles"],"prefix":"10.1093","volume":"47","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5094-7321","authenticated-orcid":false,"given":"Chih-Hsuan","family":"Wei","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM),\u00a0National Institutes of Health (NIH), Bethesda, MD, USA"}]},{"given":"Alexis","family":"Allot","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM),\u00a0National Institutes of Health (NIH), Bethesda, MD, USA"}]},{"given":"Robert","family":"Leaman","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM),\u00a0National Institutes of Health (NIH), Bethesda, MD, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8301-9553","authenticated-orcid":false,"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM),\u00a0National Institutes of Health (NIH), Bethesda, MD, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,5,22]]},"reference":[{"key":"2019062808230262300_B1","doi-asserted-by":"crossref","first-page":"baw161","DOI":"10.1093\/database\/baw161","article-title":"Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges","volume":"2016","author":"Singhal","year":"2016","journal-title":"Database"},{"key":"2019062808230262300_B2","doi-asserted-by":"crossref","first-page":"baw032","DOI":"10.1093\/database\/baw032","article-title":"Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task","volume":"2016","author":"Wei","year":"2016","journal-title":"Database"},{"key":"2019062808230262300_B3","doi-asserted-by":"crossref","first-page":"bay137","DOI":"10.1093\/database\/bay137","article-title":"PubTerm: a web tool for organizing, annotating and curating genes, diseases, molecules and other concepts from PubMed records","volume":"2019","author":"Garcia-Pelaez","year":"2019","journal-title":"Database"},{"key":"2019062808230262300_B4","first-page":"bty871","article-title":"Thalia: Semantic search engine for biomedical abstracts","author":"Soto","year":"2018","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B5","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1186\/s13321-018-0317-4","article-title":"Configurable web-services for biomedical document annotation","volume":"2018","author":"Matos","year":"2018","journal-title":"J. Cheminform."},{"key":"2019062808230262300_B6","doi-asserted-by":"crossref","first-page":"25","DOI":"10.12688\/wellcomeopenres.10210.1","article-title":"SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data","volume":"1","author":"Venkatesan","year":"2016","journal-title":"Wellcome Open Res."},{"key":"2019062808230262300_B7","doi-asserted-by":"crossref","first-page":"e0164680","DOI":"10.1371\/journal.pone.0164680","article-title":"BEST: next-generation biomedical entity search tool for knowledge discovery from biomedical literature","volume":"11","author":"Lee","year":"2016","journal-title":"PLoS One"},{"key":"2019062808230262300_B8","doi-asserted-by":"crossref","first-page":"W585","DOI":"10.1093\/nar\/gks563","article-title":"GeneView: a comprehensive semantic search engine for PubMed","volume":"40","author":"Thomas","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2019062808230262300_B9","doi-asserted-by":"crossref","first-page":"bas010","DOI":"10.1093\/database\/bas010","article-title":"Argo: an integrative, interactive, text mining-based workbench supporting curation","volume":"2012","author":"Rak","year":"2012","journal-title":"Database"},{"key":"2019062808230262300_B10","doi-asserted-by":"crossref","first-page":"W518","DOI":"10.1093\/nar\/gkt441","article-title":"PubTator: a Web-based text mining tool for assisting Biocuration","volume":"41","author":"Wei","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2019062808230262300_B11","doi-asserted-by":"crossref","first-page":"e1006390","DOI":"10.1371\/journal.pcbi.1006390","article-title":"Scaling up data curation using deep learning: An application to literature triage in genomic variation resources","volume":"14","author":"Lee","year":"2018","journal-title":"PLoS Comput. Biol."},{"key":"2019062808230262300_B12","doi-asserted-by":"crossref","first-page":"3454","DOI":"10.1093\/bioinformatics\/btx439","article-title":"On expert curation and scalability: UniProtKB\/Swiss-Prot as a case study","volume":"33","author":"Poux","year":"2017","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B13","doi-asserted-by":"crossref","first-page":"bau094","DOI":"10.1093\/database\/bau094","article-title":"Hybrid curation of gene\u2013mutation relations combining automated extraction and crowdsourcing","volume":"2014","author":"Burger","year":"2014","journal-title":"Database"},{"key":"2019062808230262300_B14","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1016\/j.ygeno.2016.10.003","article-title":"A PubMed-wide study of endometriosis","volume":"108","author":"Liu","year":"2016","journal-title":"Genomics"},{"key":"2019062808230262300_B15","doi-asserted-by":"crossref","first-page":"275","DOI":"10.4172\/jpb.1000291","article-title":"A proteomic study of human Merkel cell carcinoma","volume":"6","author":"Shao","year":"2013","journal-title":"J. Proteomics Bioinform."},{"key":"2019062808230262300_B16","doi-asserted-by":"crossref","first-page":"6518","DOI":"10.1038\/s41598-018-24457-1","article-title":"Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources","volume":"8","author":"Huang","year":"2018","journal-title":"Sci. Rep."},{"key":"2019062808230262300_B17","doi-asserted-by":"crossref","first-page":"012037","DOI":"10.1088\/1742-6596\/1069\/1\/012037","article-title":"Evaluation of the performance of BioNLP tools for discovering causal genes in terms with pathway enrichment","volume":"1069","author":"Qin","year":"2018","journal-title":"J. Phys. Conf. Ser."},{"key":"2019062808230262300_B18","doi-asserted-by":"crossref","first-page":"2886","DOI":"10.1093\/bioinformatics\/btw511","article-title":"HiPub: translating PubMed and PMC texts to networks for knowledge discovery","volume":"32","author":"Lee","year":"2016","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B19","first-page":"bty845","article-title":"LION LBD: a literature-based discovery system for cancer biology","author":"Pyysalo","year":"2018","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B20","doi-asserted-by":"crossref","first-page":"2614","DOI":"10.1093\/bioinformatics\/bty114","article-title":"A global network of biomedical relationships derived from text","volume":"34","author":"Percha","year":"2018","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B21","first-page":"48","article-title":"Results of the fifth edition of the BioASQ Challenge","author":"Nentidis","year":"2017","journal-title":"BioNLP"},{"key":"2019062808230262300_B22","doi-asserted-by":"crossref","first-page":"e1005017","DOI":"10.1371\/journal.pcbi.1005017","article-title":"Text mining genotype-phenotype relationships from biomedical literature for database curation and precision medicine","volume":"12","author":"Singhal","year":"2016","journal-title":"PLoS Comput Biol."},{"key":"2019062808230262300_B23","doi-asserted-by":"crossref","first-page":"e0152725","DOI":"10.1371\/journal.pone.0152725","article-title":"DiMeX: a text mining system for mutation-disease association extraction","volume":"11","author":"Mahmood","year":"2016","journal-title":"PLoS One"},{"key":"2019062808230262300_B24","doi-asserted-by":"crossref","first-page":"baw043","DOI":"10.1093\/database\/baw043","article-title":"BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations","volume":"2016","author":"Lee","year":"2016","journal-title":"Database"},{"key":"2019062808230262300_B25","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1186\/s13326-017-0113-5","article-title":"12 years on - Is the NLM medical text indexer still useful and relevant","volume":"8","author":"Mork","year":"2017","journal-title":"J. Biomed. Semantics"},{"key":"2019062808230262300_B26","doi-asserted-by":"crossref","first-page":"bas043","DOI":"10.1093\/database\/bas043","article-title":"Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II","volume":"2012","author":"Lu","year":"2012","journal-title":"Database"},{"key":"2019062808230262300_B27","doi-asserted-by":"crossref","first-page":"e1005962","DOI":"10.1371\/journal.pcbi.1005962","article-title":"A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts","volume":"14","author":"Westergaard","year":"2018","journal-title":"PLoS Comput. Biol."},{"key":"2019062808230262300_B28","doi-asserted-by":"crossref","first-page":"W530","DOI":"10.1093\/nar\/gky355","article-title":"LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC","volume":"46","author":"Allot","year":"2018","journal-title":"Nucleic. Acids. Res."},{"key":"2019062808230262300_B29","first-page":"btz070","article-title":"PMC text mining subset in BioC: about 3 million full text articles and growing","author":"Comeau","year":"2019","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B30","doi-asserted-by":"crossref","first-page":"bau038","DOI":"10.1093\/database\/bau038","article-title":"iSimp in BioC standard format: enhancing the interoperability of a sentence simplification system","volume":"2014","author":"Peng","year":"2014","journal-title":"Database"},{"key":"2019062808230262300_B31","doi-asserted-by":"crossref","first-page":"1433","DOI":"10.1093\/bioinformatics\/btt156","article-title":"tmVar: A text mining approach for extracting sequence variants in biomedical literature","volume":"29","author":"Wei","year":"2013","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B32","doi-asserted-by":"crossref","first-page":"2909","DOI":"10.1093\/bioinformatics\/btt474","article-title":"DNorm: disease name normalization with pairwise learning to rank","volume":"29","author":"Leaman","year":"2013","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B33","doi-asserted-by":"crossref","first-page":"e38460","DOI":"10.1371\/journal.pone.0038460","article-title":"SR4GN: a species recognition software tool for gene normalization","volume":"7","author":"Wei","year":"2012","journal-title":"PLoS One"},{"key":"2019062808230262300_B34","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/1471-2105-12-S8-S5","article-title":"Cross-species gene normalization by species inference","volume":"12","author":"Wei","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2019062808230262300_B35","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1155\/2015\/918710","article-title":"GNormPlus: An integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"Biomed Res Int."},{"key":"2019062808230262300_B36","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1186\/1471-2105-9-402","article-title":"Abbreviation definition identification based on automatic precision estimates","volume":"9","author":"Sohn","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2019062808230262300_B37","doi-asserted-by":"crossref","first-page":"1385","DOI":"10.1109\/JBHI.2015.2422651","article-title":"SimConcept: a hybrid approach for simplifying composite named entities in biomedical text","volume":"19","author":"Wei","year":"2015","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"2019062808230262300_B38","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1093\/bioinformatics\/btx541","article-title":"tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine","volume":"34","author":"Wei","year":"2017","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B39","doi-asserted-by":"crossref","first-page":"2839","DOI":"10.1093\/bioinformatics\/btw343","article-title":"TaggerOne: joint named entity recognition and normalization with semi-Markov Model","volume":"32","author":"Leaman","year":"2016","journal-title":"Bioinformatics"},{"key":"2019062808230262300_B40","doi-asserted-by":"crossref","first-page":"25","DOI":"10.7171\/jbt.18-2902-002","article-title":"The Cellosaurus, a Cell-Line Knowledge Resource","volume":"29","author":"Bairoch","year":"2018","journal-title":"J. Biomol. Tech."},{"key":"2019062808230262300_B41","doi-asserted-by":"crossref","first-page":"baw068","DOI":"10.1093\/database\/baw068","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database"},{"key":"2019062808230262300_B42","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1758-2946-7-S1-S2","article-title":"The CHEMDNER corpus of chemicals and drugs and its annotation principles","volume":"7","author":"Krallinger","year":"2015","journal-title":"J. Cheminform."},{"key":"2019062808230262300_B43","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1186\/1471-2105-11-85","article-title":"LINNAEUS: a species name identification system for biomedical literature","volume":"11","author":"Gerner","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2019062808230262300_B44","first-page":"376","article-title":"Bio-ID track overview","volume":"482","author":"Arighi","year":"2017","journal-title":"Proc. BioCreative Workshop"},{"key":"2019062808230262300_B45","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/gb-2008-9-s2-s3","article-title":"Overview of BioCreative II gene normalization","volume":"9","author":"Morgan","year":"2008","journal-title":"Genome Biol."},{"key":"2019062808230262300_B46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J. Biomed. Inform."},{"key":"2019062808230262300_B47","first-page":"1746","article-title":"Convolutional neural networks for sentence classification","author":"Kim","year":"2014","journal-title":"EMNLP"},{"key":"2019062808230262300_B48","doi-asserted-by":"crossref","first-page":"bat064","DOI":"10.1093\/database\/bat064","article-title":"BioC: a minimalist approach to interoperability for biomedical text processing","volume":"2013","author":"Comeau","year":"2013","journal-title":"Database"},{"key":"2019062808230262300_B49","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1186\/1471-2105-11-492","article-title":"The structural and content aspects of abstracts versus bodies of full text journal articles are different","volume":"11","author":"Cohen","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2019062808230262300_B50","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1186\/1471-2105-10-46","article-title":"Is searching full text more effective than searching abstracts","volume":"10","author":"Lin","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2019062808230262300_B51","doi-asserted-by":"crossref","first-page":"bas020","DOI":"10.1093\/database\/bas020","article-title":"Text mining for the biocuration workflow","volume":"2012","author":"Hirschman","year":"2012","journal-title":"Database"},{"key":"2019062808230262300_B52","doi-asserted-by":"crossref","first-page":"bau003","DOI":"10.1093\/database\/bau003","article-title":"Literature mining of genetic variants for curation: quantifying the importance of supplementary material","volume":"2014","author":"Yepes","year":"2014","journal-title":"Database"}],"container-title":["Nucleic Acids Research"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/nar\/advance-article-pdf\/doi\/10.1093\/nar\/gkz389\/28684414\/gkz389.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/nar\/article-pdf\/47\/W1\/W587\/28880193\/gkz389.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,11,24]],"date-time":"2019-11-24T08:23:48Z","timestamp":1574583828000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/nar\/article\/47\/W1\/W587\/5494727"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,22]]},"references-count":52,"journal-issue":{"issue":"W1","published-online":{"date-parts":[[2019,5,22]]},"published-print":{"date-parts":[[2019,7,2]]}},"URL":"https:\/\/doi.org\/10.1093\/nar\/gkz389","relation":{},"ISSN":["0305-1048","1362-4962"],"issn-type":[{"value":"0305-1048","type":"print"},{"value":"1362-4962","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,7,2]]},"published":{"date-parts":[[2019,5,22]]}}}