{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T08:47:43Z","timestamp":1770972463835,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2020,3,18]],"date-time":"2020-03-18T00:00:00Z","timestamp":1584489600000},"content-version":"vor","delay-in-days":77,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Biological knowledgebases rely on expert biocuration of the research literature to maintain up-to-date collections of data organized in machine-readable form. To enter information into knowledgebases, curators need to follow three steps: (i) identify papers containing relevant data, a process called triaging; (ii) recognize named entities; and (iii) extract and curate data in accordance with the underlying data models. WormBase (WB), the authoritative repository for research data on Caenorhabditis elegans and other nematodes, uses text mining (TM) to semi-automate its curation pipeline. In addition, WB engages its community, via an Author First Pass (AFP) system, to help recognize entities and classify data types in their recently published papers. In this paper, we present a new WB AFP system that combines TM and AFP into a single application to enhance community curation. The system employs string-searching algorithms and statistical methods (e.g. support vector machines (SVMs)) to extract biological entities and classify data types, and it presents the results to authors in a web form where they validate the extracted information, rather than enter it de novo as the previous form required. With this new system, we lessen the burden for authors, while at the same time receive valuable feedback on the performance of our TM tools. The new user interface also links out to specific structured data submission forms, e.g. for phenotype or expression pattern data, giving the authors the opportunity to contribute a more detailed curation that can be incorporated into WB with minimal curator review. Our approach is generalizable and could be applied to additional knowledgebases that would like to engage their user community in assisting with the curation. In the five months succeeding the launch of the new system, the response rate has been comparable with that of the previous AFP version, but the quality and quantity of the data received has greatly improved.<\/jats:p>","DOI":"10.1093\/database\/baaa006","type":"journal-article","created":{"date-parts":[[2020,1,16]],"date-time":"2020-01-16T20:09:14Z","timestamp":1579205354000},"source":"Crossref","is-referenced-by-count":23,"title":["Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase"],"prefix":"10.1093","volume":"2020","author":[{"given":"Valerio","family":"Arnaboldi","sequence":"first","affiliation":[{"name":"Division of Biology and Biological Engineering 156\u201329, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, USA"}]},{"given":"Daniela","family":"Raciti","sequence":"first","affiliation":[{"name":"Division of Biology and Biological Engineering 156\u201329, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, USA"}]},{"given":"Kimberly","family":"Van Auken","sequence":"first","affiliation":[{"name":"Division of Biology and Biological Engineering 156\u201329, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, USA"}]},{"given":"Juancarlos N","family":"Chan","sequence":"first","affiliation":[{"name":"Division of Biology and Biological Engineering 156\u201329, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, USA"}]},{"given":"Hans-Michael","family":"M\u00fcller","sequence":"first","affiliation":[{"name":"Division of Biology and Biological Engineering 156\u201329, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, USA"}]},{"given":"Paul W","family":"Sternberg","sequence":"first","affiliation":[{"name":"Division of Biology and Biological Engineering 156\u201329, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,3,17]]},"reference":[{"key":"2020031722081795700_ref1","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw110","article-title":"How much does curation cost?","volume":"2016","author":"Karp","year":"2016","journal-title":"Database (Oxford)"},{"key":"2020031722081795700_ref2","doi-asserted-by":"crossref","first-page":"13439","DOI":"10.1073\/pnas.1511912112","article-title":"Accelerating scientific publication in biology","volume":"112","author":"Vale","year":"2015","journal-title":"Proc. Natl. Acad. Sci. U. S. A."},{"key":"2020031722081795700_ref3","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw149","article-title":"Crowd-sourcing and author submission as alternatives to professional curation","volume":"2016","author":"Karp","year":"2016","journal-title":"Database (Oxford)"},{"key":"2020031722081795700_ref4","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1186\/1471-2105-13-16","article-title":"Automatic categorization of diverse experimental information in the bioscience literature","volume":"13","author":"Fang","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2020031722081795700_ref5","doi-asserted-by":"crossref","first-page":"e309","DOI":"10.1371\/journal.pbio.0020309","article-title":"Textpresso: an ontology-based information retrieval and extraction system for biological literature","volume":"2","author":"M\u00fcller","year":"2004","journal-title":"PLoS Biol."},{"key":"2020031722081795700_ref6","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1186\/s12859-018-2103-8","article-title":"Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature","volume":"19","author":"M\u00fcller","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2020031722081795700_ref7","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baz045","article-title":"An effective biomedical document classification scheme in support of biocuration: addressing class imbalance","volume":"2019","author":"Jiang","year":"2019","journal-title":"Database (Oxford)"},{"key":"2020031722081795700_ref8","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw161","article-title":"Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges","volume":"2016","author":"Singhal","year":"2016","journal-title":"Database (Oxford)"},{"key":"2020031722081795700_ref9","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bay013","article-title":"Micropublication: incentivizing community curation and placing unpublished data into the public domain","volume":"2018","author":"Raciti","year":"2018","journal-title":"Database (Oxford)"},{"key":"2020031722081795700_ref10","doi-asserted-by":"crossref","first-page":"D759","DOI":"10.1093\/nar\/gky1003","article-title":"FlyBase 2.0: the next generation","volume":"47","author":"Thurmond","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"2020031722081795700_ref11","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bas024","article-title":"Directly e-mailing authors of newly published papers encourages community curation","volume":"2012","author":"Bunt","year":"2012","journal-title":"Database (Oxford)"},{"key":"2020031722081795700_ref12","doi-asserted-by":"crossref","first-page":"1791","DOI":"10.1093\/bioinformatics\/btu103","article-title":"Canto: an online tool for community literature curation","volume":"30","author":"Rutherford","year":"2014","journal-title":"Bioinformatics"},{"key":"2020031722081795700_ref13","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bas030","article-title":"Assessment of community-submitted ontology annotations from a novel database-journal partnership","volume":"2012","author":"Berardini","year":"2012","journal-title":"Database (Oxford)"},{"key":"2020031722081795700_ref14","doi-asserted-by":"crossref","first-page":"1.11.1","DOI":"10.1002\/cpbi.36","article-title":"Using the Arabidopsis Information Resource (TAIR) to find information about Arabidopsis genes","volume":"60","author":"Reiser","year":"2017","journal-title":"Curr. Protoc. Bioinformatics"},{"issue":"D1","key":"2020031722081795700_ref15","first-page":"D762","article-title":"WormBase: a modern model organism information resource","volume":"48","author":"Harris","year":"2019","journal-title":"Nucleic Acids Res."},{"issue":"D1","key":"2020031722081795700_ref16","doi-asserted-by":"crossref","first-page":"D650","DOI":"10.1093\/nar\/gkz813","article-title":"Alliance of Genome Resources Portal: unified model organism research platform","volume":"48","author":"Alliance of Genome Resources Consortium","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"2020031722081795700_ref17","doi-asserted-by":"crossref","first-page":"D463","DOI":"10.1093\/nar\/gkp952","article-title":"WormBase: a comprehensive resource for nematode research","volume":"38","author":"Harris","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2020031722081795700_ref18","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1186\/1471-2105-12-175","article-title":"Toward an interactive article: integrating journals and biological databases","volume":"12","author":"Rangarajan","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2020031722081795700_ref19","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bas040","article-title":"Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR","volume":"2012","author":"Van Auken","year":"2012","journal-title":"Database (Oxford)"},{"key":"2020031722081795700_ref20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1895\/wormbook.1.183.1","article-title":"Caenorhabditis nomenclature","volume":"2018","author":"Tuli","year":"2018","journal-title":"WormBook"},{"key":"2020031722081795700_ref21","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/1751-0473-7-7","article-title":"Layout-aware text extraction from full-text PDF of scientific articles","volume":"7","author":"Ramakrishnan","year":"2012","journal-title":"Source Code Biol. Med."},{"issue":"21","key":"2020031722081795700_ref22","doi-asserted-by":"crossref","first-page":"D4381","DOI":"10.1093\/bioinformatics\/btz228","article-title":"Figure and caption extraction from biomedical documents","volume":"35","author":"Li","year":"2019","journal-title":"Bioinformatics"},{"key":"2020031722081795700_ref23","doi-asserted-by":"crossref","first-page":"e2001414","DOI":"10.1371\/journal.pbio.2001414","article-title":"Identifiers for the 21st century: how to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data","volume":"15","author":"McMurry","year":"2017","journal-title":"PLoS Biol."}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaa006\/32923929\/baaa006.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaa006\/32923929\/baaa006.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,3,18]],"date-time":"2020-03-18T02:08:37Z","timestamp":1584497317000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baaa006\/5809234"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,1]]},"references-count":23,"URL":"https:\/\/doi.org\/10.1093\/database\/baaa006","relation":{},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020]]},"published":{"date-parts":[[2020,1,1]]},"article-number":"baaa006"}}