{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T12:38:33Z","timestamp":1775911113392,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2020,5,5]],"date-time":"2020-05-05T00:00:00Z","timestamp":1588636800000},"content-version":"vor","delay-in-days":125,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In the UniProt Knowledgebase (UniProtKB), publications providing evidence for a specific protein annotation entry are organized across different categories, such as function, interaction and expression, based on the type of data they contain. To provide a systematic way of categorizing computationally mapped bibliographies in UniProt, we investigate a convolutional neural network (CNN) model to classify publications with accession annotations according to UniProtKB categories. The main challenge of categorizing publications at the accession annotation level is that the same publication can be annotated with multiple proteins and thus be associated with different category sets according to the evidence provided for the protein. We propose a model that divides the document into parts containing and not containing evidence for the protein annotation. Then, we use these parts to create different feature sets for each accession and feed them to separate layers of the network. The CNN model achieved a micro F1-score of 0.72 and a macro F1-score of 0.62, outperforming baseline models based on logistic regression and support vector machine by up to 22 and 18 percentage points, respectively. We believe that such an approach could be used to systematically categorize the computationally mapped bibliography in UniProtKB, which represents a significant set of the publications, and help curators to decide whether a publication is relevant for further curation for a protein accession.<\/jats:p>\n                  <jats:p>Database URL: https:\/\/goldorak.hesge.ch\/bioexpclass\/upclass\/.<\/jats:p>","DOI":"10.1093\/database\/baaa026","type":"journal-article","created":{"date-parts":[[2020,3,12]],"date-time":"2020-03-12T08:35:54Z","timestamp":1584002154000},"source":"Crossref","is-referenced-by-count":11,"title":["UPCLASS: a deep learning-based classifier for UniProtKB entry publications"],"prefix":"10.1093","volume":"2020","author":[{"given":"Douglas","family":"Teodoro","sequence":"first","affiliation":[{"name":"Geneva School of Business Administration, CH-1227, University of Applied Sciences and Arts Western Switzerland, HES-SO, Geneva, Switzerland"},{"name":"Text Mining Group, Rue Michel-Servet 1, CH-1206, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julien","family":"Knafou","sequence":"first","affiliation":[{"name":"Geneva School of Business Administration, CH-1227, University of Applied Sciences and Arts Western Switzerland, HES-SO, Geneva, Switzerland"},{"name":"Text Mining Group, Rue Michel-Servet 1, CH-1206, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nona","family":"Naderi","sequence":"first","affiliation":[{"name":"Geneva School of Business Administration, CH-1227, University of Applied Sciences and Arts Western Switzerland, HES-SO, Geneva, Switzerland"},{"name":"Text Mining Group, Rue Michel-Servet 1, CH-1206, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emilie","family":"Pasche","sequence":"first","affiliation":[{"name":"Geneva School of Business Administration, CH-1227, University of Applied Sciences and Arts Western Switzerland, HES-SO, Geneva, Switzerland"},{"name":"Text Mining Group, Rue Michel-Servet 1, CH-1206, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julien","family":"Gobeill","sequence":"first","affiliation":[{"name":"Geneva School of Business Administration, CH-1227, University of Applied Sciences and Arts Western Switzerland, HES-SO, Geneva, Switzerland"},{"name":"Text Mining Group, Rue Michel-Servet 1, CH-1206, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cecilia N","family":"Arighi","sequence":"first","affiliation":[{"name":"Center of Bioinformatics and Computational Biology, 15 Innovation Way, 19711, Department of Computer and Information Sciences, University of Delaware, Newark, DE, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Patrick","family":"Ruch","sequence":"first","affiliation":[{"name":"Geneva School of Business Administration, CH-1227, University of Applied Sciences and Arts Western Switzerland, HES-SO, Geneva, Switzerland"},{"name":"Text Mining Group, Rue Michel-Servet 1, CH-1206, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,5,4]]},"reference":[{"key":"2020050421174459700_ref1","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1186\/s12915-016-0276-z","article-title":"Model organism databases: essential resources that need the support of both funders and users","volume":"14","author":"Oliver","year":"2016","journal-title":"BMC Biol"},{"key":"2020050421174459700_ref2","doi-asserted-by":"crossref","first-page":"e2002846","DOI":"10.1371\/journal.pbio.2002846","article-title":"Biocuration: distilling data into knowledge","volume":"16","author":"International Society for Biocuration","year":"2018","journal-title":"PLoS Biol"},{"key":"2020050421174459700_ref3","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"UniProt: a worldwide hub of protein knowledge","volume":"47","author":"UniProt Consortium","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"2020050421174459700_ref4","doi-asserted-by":"crossref","first-page":"3454","DOI":"10.1093\/bioinformatics\/btx439","article-title":"On expert curation and scalability: UniProtKB\/Swiss-Prot as a case study","volume":"33","author":"Poux","year":"2017","journal-title":"Bioinformatics"},{"key":"2020050421174459700_ref5","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1007\/978-3-319-21569-3_6","volume-title":"New Horizons for a Data-Driven Economy","author":"Freitas","year":"2016"},{"key":"2020050421174459700_ref6","volume-title":"A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques","author":"Allahyari","year":"2017"},{"key":"2020050421174459700_ref7","first-page":"132","article-title":"Community challenges in biomedical text mining over 10 years: Success, failure and the future","volume-title":"Brief. Bioinform","author":"Huang","year":"2016"},{"key":"2020050421174459700_ref8","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw161","article-title":"Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges","volume":"2016","author":"Singhal","year":"2016","journal-title":"Database"},{"key":"2020050421174459700_ref9","first-page":"8","article-title":"Customizing a variant annotation-support tool: an inquiry into probability ranking principles for TREC precision medicine","author":"Pasche","year":"2017","journal-title":"Proceedings of the Twenty-Sixth Text REtrieval Conference (TREC 2017)"},{"key":"2020050421174459700_ref10","volume-title":"Database","author":"Teodoro","year":"2017"},{"key":"2020050421174459700_ref11","first-page":"94","article-title":"Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature","volume-title":"BMC Bioinformatics","author":"M\u00fcller","year":"2018"},{"key":"2020050421174459700_ref12","first-page":"e2002846","article-title":"Biocuration: distilling data into knowledge","volume-title":"PLOS Biology","author":"International Society for Biocuration","year":"2019"},{"key":"2020050421174459700_ref13","volume-title":"Database","author":"Cejuela","year":"2014"},{"key":"2020050421174459700_ref14","first-page":"15","article-title":"LocText: relation extraction of protein localizations to assist database curation","volume-title":"BMC Bioinformatics","author":"Cejuela","year":"2018"},{"key":"2020050421174459700_ref15","volume-title":"Database","author":"Jiang","year":"2019"},{"key":"2020050421174459700_ref16","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"Lecun","year":"2015","journal-title":"Nature"},{"key":"2020050421174459700_ref17","article-title":"Efficient estimation of word representations in vector space","volume-title":"1st International Conference on Learning Representations, ICLR 2013, Workshop Track Proceedings, Scottsdale, Arizona, USA, 2-4 May 2013","author":"Mikolov","year":"2015"},{"key":"2020050421174459700_ref18","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"GloVe: global vectors for word representation","author":"Pennington","year":"2014","journal-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)"},{"key":"2020050421174459700_ref19","first-page":"2227","article-title":"Deep Contextualized Word Representations","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Peters","year":"2018"},{"key":"2020050421174459700_ref20","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","author":"Devlin","year":"2018"},{"key":"2020050421174459700_ref21","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1006390","article-title":"Scaling up data curation using deep learning: An application to literature triage in genomic variation resources","volume-title":"PLoS Comput. Biol","author":"Lee","year":"2018"},{"key":"2020050421174459700_ref22","volume-title":"Database, 2019","author":"Burns","year":"2019"},{"key":"2020050421174459700_ref23","volume-title":"Database","author":"Ding","year":"2017"},{"key":"2020050421174459700_ref24","volume-title":"31st International Conference on Machine Learning, ICML 2014","author":"Le","year":"2014"},{"key":"2020050421174459700_ref25","doi-asserted-by":"crossref","first-page":"367","DOI":"10.18653\/v1\/P16-1035","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Diaz","year":"2016"},{"key":"2020050421174459700_ref26","first-page":"309","volume-title":"Proceedings of the 8th NTCIR Workshop Meeting","author":"Teodoro","year":"2010"},{"key":"2020050421174459700_ref27","volume-title":"CEUR Workshop Proceedings (CEUR-WS.org)","author":"Teodoro","year":"2010"},{"key":"2020050421174459700_ref28","first-page":"2873","volume-title":"IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence","author":"Liu","year":"2016"},{"key":"2020050421174459700_ref29","first-page":"1279","article-title":"ML-Net: multi-label classification of biomedical texts with deep neural networks","volume-title":"J. Am. Med. Informatics Assoc","author":"Du","year":"2019"},{"key":"2020050421174459700_ref30","first-page":"20","article-title":"A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data","volume-title":"ACM SIGKDD Explor. Newsl","author":"Batista","year":"2004"},{"key":"2020050421174459700_ref31","doi-asserted-by":"crossref","first-page":"e62874","DOI":"10.1371\/journal.pone.0062874","article-title":"Assisted knowledge discovery for the maintenance of clinical guidelines","volume":"8","author":"Pasche","year":"2013","journal-title":"PLoS One"},{"key":"2020050421174459700_ref32","first-page":"85","article-title":"Character-Level neural network for biomedical named entity recognition","volume-title":"J. Biomed. Inform","author":"Gridach","year":"2017"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaa026\/33165294\/baaa026.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaa026\/33165294\/baaa026.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,5,4]],"date-time":"2020-05-04T21:18:52Z","timestamp":1588627132000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baaa026\/5822772"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,1]]},"references-count":32,"URL":"https:\/\/doi.org\/10.1093\/database\/baaa026","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/842062","asserted-by":"object"}]},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020]]},"published":{"date-parts":[[2020,1,1]]},"article-number":"baaa026"}}