{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,22]],"date-time":"2026-06-22T16:59:42Z","timestamp":1782147582966,"version":"3.54.5"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as \u2018Unknown protein\u2019 or alike. Also the functionally annotated parts are reported to contain 30\u201340% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation.<\/jats:p><jats:p>Results: Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge.<\/jats:p><jats:p>Availability and implementation: The PANNZER program was developed using the Python programming language (Version 2.6). The stand-alone installation of the PANNZER requires MySQL database for data storage and the BLAST (BLASTALL v.2.2.21) tools for the sequence similarity search. The tutorial, evaluation test sets and results are available on the PANNZER web site. PANNZER is freely available at http:\/\/ekhidna.biocenter.helsinki.fi\/pannzer.<\/jats:p><jats:p>Contact: patrik.koskinen@helsinki.fi<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu851","type":"journal-article","created":{"date-parts":[[2015,1,10]],"date-time":"2015-01-10T04:10:59Z","timestamp":1420863059000},"page":"1544-1552","source":"Crossref","is-referenced-by-count":130,"title":["PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment"],"prefix":"10.1093","volume":"31","author":[{"given":"Patrik","family":"Koskinen","sequence":"first","affiliation":[{"name":"1 Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and 2Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Petri","family":"T\u00f6r\u00f6nen","sequence":"additional","affiliation":[{"name":"1 Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and 2Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jussi","family":"Nokso-Koivisto","sequence":"additional","affiliation":[{"name":"1 Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and 2Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Liisa","family":"Holm","sequence":"additional","affiliation":[{"name":"1 Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and 2Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland"},{"name":"1 Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and 2Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2015,1,8]]},"reference":[{"key":"2023020115452676700_btu851-B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped blast and psi-blast: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023020115452676700_btu851-B2","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1093\/bioinformatics\/14.7.600","article-title":"Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families","volume":"14","author":"Andrade","year":"1998","journal-title":"Bioinformatics"},{"key":"2023020115452676700_btu851-B3","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/S0168-9525(99)01706-0","article-title":"Errors in genome annotation","volume":"15","author":"Brenner","year":"1999","journal-title":"Trends Genet.: TIG"},{"key":"2023020115452676700_btu851-B4","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-14-S3-S1","article-title":"Protein function prediction by massive integration of evolutionary analyses and multiple data sources","volume":"14","author":"Cozzetto","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B5","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1186\/1471-2105-13-S4-S14","article-title":"Argot2: a large scale function prediction tool relying on semantic similarity of weighted gene ontology terms","volume":"13","author":"Falda","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B6","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1093\/bioinformatics\/17.1.44","article-title":"Functional and structural genomics using pedant","volume":"17","author":"Frishman","year":"2001","journal-title":"Bioinformatics"},{"key":"2023020115452676700_btu851-B7","doi-asserted-by":"crossref","first-page":"1641","DOI":"10.1093\/bioinformatics\/18.12.1641","article-title":"Modeling the percolation of annotation errors in a database of protein sequences","volume":"18","author":"Gilks","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020115452676700_btu851-B8","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1016\/j.mbs.2004.08.001","article-title":"Percolation of annotation errors through hierarchically structured protein sequence databases","volume":"193","author":"Gilks","year":"2005","journal-title":"Math. Biosci."},{"key":"2023020115452676700_btu851-B9","doi-asserted-by":"crossref","first-page":"3420","DOI":"10.1093\/nar\/gkn176","article-title":"High-throughput functional annotation and data mining with the blast2go suite","volume":"36","author":"G\u00f6tz","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023020115452676700_btu851-B10","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1038\/sj.embor.embor932","article-title":"Righting the wrongs","volume":"4","author":"Hadley","year":"2003","journal-title":"EMBO Rep."},{"key":"2023020115452676700_btu851-B11","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1186\/1471-2105-8-170","article-title":"Estimating the annotation error rate of curated go database sequence annotations","volume":"8","author":"Jones","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B12","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/1471-2105-13-33","article-title":"Blannotator: enhanced homology-based function prediction of bacterial proteins","volume":"13","author":"Kankainen","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B13","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1186\/1471-2105-6-151","article-title":"Autofact: an automatic functional annotation and classification tool","volume":"6","author":"Koski","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B14","doi-asserted-by":"crossref","first-page":"i438","DOI":"10.1093\/bioinformatics\/bts417","article-title":"Sans: high-throughput retrieval of protein sequences allowing 50 mismatches","volume":"28","author":"Koskinen","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020115452676700_btu851-B15","first-page":"296","article-title":"An information-theoretic definition of similarity","volume":"Vol. 98","author":"Lin","year":"1998","journal-title":"International Conference on Machine Learning (ICML)"},{"key":"2023020115452676700_btu851-B16","doi-asserted-by":"crossref","first-page":"bar009","DOI":"10.1093\/database\/bar009","article-title":"UniProt knowledgebase: a hub of integrated protein data","volume":"2011","author":"Magrane","year":"2011","journal-title":"Database (Oxford)"},{"key":"2023020115452676700_btu851-B17","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1186\/1471-2105-5-178","article-title":"Gotcha: a new method for prediction of protein function assessed by the annotation of seven genomes","volume":"5","author":"Martin","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B18","doi-asserted-by":"crossref","first-page":"6643","DOI":"10.1093\/nar\/gkp698","article-title":"Figfams: yet another set of protein families","volume":"37","author":"Meyer","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023020115452676700_btu851-B19","volume-title":"Subset Selection in Regression","author":"Miller","year":"2012"},{"key":"2023020115452676700_btu851-B20","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1186\/1471-2164-5-52","article-title":"Retrieving sequences of enzymes experimentally characterized but erroneously annotated: the case of the putrescine carbamoyltransferase","volume":"5","author":"Naumoff","year":"2004","journal-title":"BMC Genomics"},{"key":"2023020115452676700_btu851-B21","doi-asserted-by":"crossref","first-page":"D206","DOI":"10.1093\/nar\/gkt1226","article-title":"The seed and the rapid annotation of microbial genomes using subsystems technology (rast)","volume":"42","author":"Overbeek","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023020115452676700_btu851-B22","doi-asserted-by":"crossref","first-page":"e1000443","DOI":"10.1371\/journal.pcbi.1000443","article-title":"Semantic similarity in biomedical ontologies","volume":"5","author":"Pesquita","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023020115452676700_btu851-B23","doi-asserted-by":"crossref","first-page":"e1000160","DOI":"10.1371\/journal.pcbi.1000160","article-title":"The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function","volume":"4","author":"Punta","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023020115452676700_btu851-B24","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020115452676700_btu851-B25","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Robert","year":"1996","journal-title":"J. R. Stat. Soc."},{"key":"2023020115452676700_btu851-B26","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1186\/gb-2011-12-8-125","article-title":"The real cost of sequencing: higher than you think!","volume":"12","author":"Sboner","year":"2011","journal-title":"Genome Biol."},{"key":"2023020115452676700_btu851-B27","first-page":"348","article-title":"Genequiz: a workbench for sequence analysis","volume":"Vol. 2","author":"Scharf","year":"1994","journal-title":"Intelligent Systems for Molecular Biology (ISMB)"},{"key":"2023020115452676700_btu851-B28","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1186\/1471-2105-7-302","article-title":"A new measure for functional similarity of gene products based on gene ontology","volume":"7","author":"Schlicker","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B29","doi-asserted-by":"crossref","first-page":"e1000605","DOI":"10.1371\/journal.pcbi.1000605","article-title":"Annotation error in public databases: misannotation of molecular function in enzyme superfamilies","volume":"5","author":"Schnoes","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023020115452676700_btu851-B30","doi-asserted-by":"crossref","first-page":"e1003063","DOI":"10.1371\/journal.pcbi.1003063","article-title":"Biases in the experimental annotations of protein function and their effect on our understanding of protein function space","volume":"9","author":"Schnoes","year":"2013","journal-title":"PLoS Comput. Biol."},{"key":"2023020115452676700_btu851-B31","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1186\/1471-2105-10-307","article-title":"Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function","volume":"10","author":"Toronen","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B32","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1186\/1471-2105-5-116","article-title":"Applying support vector machines for gene ontology based gene function prediction","volume":"5","author":"Vinayagam","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023020115452676700_btu851-B33","doi-asserted-by":"crossref","first-page":"i342","DOI":"10.1093\/bioinformatics\/bth938","article-title":"Filtering erroneous protein annotation","volume":"20","author":"Wieser","year":"2004","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/10\/1544\/49012732\/bioinformatics_31_10_1544.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/10\/1544\/49012732\/bioinformatics_31_10_1544.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T13:38:28Z","timestamp":1717681108000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/10\/1544\/176441"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,8]]},"references-count":33,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2015,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu851","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,5,15]]},"published":{"date-parts":[[2015,1,8]]}}}