{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T18:10:21Z","timestamp":1706811021987},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>One of the most evident achievements of bioinformatics is the development of methods that transfer biological knowledge from characterised proteins to uncharacterised sequences. This mode of protein function assignment is mostly based on the detection of sequence similarity and the premise that functional properties are conserved during evolution. Most automatic approaches developed to date rely on the identification of clusters of homologous proteins and the mapping of new proteins onto these clusters, which are expected to share functional characteristics.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Here, we inverse the logic of this process, by considering the mapping of sequences directly to a functional classification instead of mapping functions to a sequence clustering. In this mode, the starting point is a database of labelled proteins according to a functional classification scheme, and the subsequent use of sequence similarity allows defining the membership of new proteins to these functional classes. In this framework, we define the Correspondence Indicators as measures of relationship between sequence and function and further formulate two Bayesian approaches to estimate the probability for a sequence of unknown function to belong to a functional class. This approach allows the parametrisation of different sequence search strategies and provides a direct measure of annotation error rates. We validate this approach with a database of enzymes labelled by their corresponding four-digit EC numbers and analyse specific cases.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>The performance of this method is significantly higher than the simple strategy consisting in transferring the annotation from the highest scoring BLAST match and is expected to find applications in automated functional annotation pipelines.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/1471-2105-6-302","type":"journal-article","created":{"date-parts":[[2005,12,15]],"date-time":"2005-12-15T19:14:11Z","timestamp":1134674051000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Probabilistic annotation of protein sequences based on functional classifications"],"prefix":"10.1186","volume":"6","author":[{"given":"Emmanuel D","family":"Levy","sequence":"first","affiliation":[]},{"given":"Christos A","family":"Ouzounis","sequence":"additional","affiliation":[]},{"given":"Walter R","family":"Gilks","sequence":"additional","affiliation":[]},{"given":"Benjamin","family":"Audit","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,12,14]]},"reference":[{"key":"626_CR1","doi-asserted-by":"publisher","first-page":"402","DOI":"10.1186\/gb-2003-4-5-402","volume":"4","author":"P Janssen","year":"2003","unstructured":"Janssen P, Audit B, Cases I, Darzentas N, Goldovsky L, Kunin V, Lopez-Bigas N, Peregrin-Alvarez JM, Pereira-Leal JB, Tsoka S, Ouzounis CA: Beyond 100 genomes. Genome Biol 2003, 4: 402. 10.1186\/gb-2003-4-5-402","journal-title":"Genome Biol"},{"key":"626_CR2","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1016\/S0958-1669(97)80118-8","volume":"8","author":"MA Andrade","year":"1997","unstructured":"Andrade MA, Sander C: Bioinformatics: from genome data to biological knowledge. Curr Opin Biotechnol 1997, 8: 675\u2013683. 10.1016\/S0958-1669(97)80118-8","journal-title":"Curr Opin Biotechnol"},{"key":"626_CR3","doi-asserted-by":"publisher","first-page":"753","DOI":"10.1093\/bioinformatics\/14.9.753","volume":"14","author":"PD Karp","year":"1998","unstructured":"Karp PD: What we do not know about sequence analysis and sequence databases. Bioinformatics 1998, 14: 753\u2013754. 10.1093\/bioinformatics\/14.9.753","journal-title":"Bioinformatics"},{"key":"626_CR4","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1016\/0076-6879(90)83007-V","volume":"183","author":"WR Pearson","year":"1990","unstructured":"Pearson WR: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 1990, 183: 63\u201398.","journal-title":"Methods Enzymol"},{"key":"626_CR5","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403\u2013410. 10.1006\/jmbi.1990.9999","journal-title":"J Mol Biol"},{"key":"626_CR6","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"},{"key":"626_CR7","doi-asserted-by":"publisher","first-page":"1501","DOI":"10.1006\/jmbi.1994.1104","volume":"235","author":"A Krogh","year":"1994","unstructured":"Krogh A, Brown M, Mian IS, Sjolander K, Haussler D: Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 1994, 235: 1501\u20131531. 10.1006\/jmbi.1994.1104","journal-title":"J Mol Biol"},{"key":"626_CR8","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","volume":"19","author":"S Vinga","year":"2003","unstructured":"Vinga S, Almeida J: Alignment-free sequence comparison-a review. Bioinformatics 2003, 19: 513\u2013523. 10.1093\/bioinformatics\/btg005","journal-title":"Bioinformatics"},{"key":"626_CR9","doi-asserted-by":"publisher","first-page":"137","DOI":"10.2165\/00822942-200403020-00008","volume":"3","author":"JK Vries","year":"2004","unstructured":"Vries JK, Munshi R, Tobi D, Klein-Seetharaman J, Benos PV, Bahar I: A sequence alignment-independent method for protein classification. Appl Bioinformatics 2004, 3: 137\u2013148. 10.2165\/00822942-200403020-00008","journal-title":"Appl Bioinformatics"},{"key":"626_CR10","doi-asserted-by":"publisher","first-page":"683","DOI":"10.1002\/prot.10449","volume":"53","author":"F Abascal","year":"2003","unstructured":"Abascal F, Valencia A: Automatic annotation of protein function based on family identification. Proteins 2003, 53: 683\u2013692. 10.1002\/prot.10449","journal-title":"Proteins"},{"key":"626_CR11","doi-asserted-by":"publisher","first-page":"1066","DOI":"10.1093\/bioinformatics\/bth039","volume":"20","author":"WG Krebs","year":"2004","unstructured":"Krebs WG, Bourne PE: Statistically rigorous automated protein annotation. Bioinformatics 2004, 20: 1066\u20131073. 10.1093\/bioinformatics\/bth039","journal-title":"Bioinformatics"},{"key":"626_CR12","doi-asserted-by":"publisher","first-page":"838","DOI":"10.1093\/bioinformatics\/18.6.838","volume":"18","author":"AM Leontovich","year":"2002","unstructured":"Leontovich AM, Brodsky LI, Drachev VA, Nikolaev VK: Adaptive algorithm of automated annotation. Bioinformatics 2002, 18: 838\u2013844. 10.1093\/bioinformatics\/18.6.838","journal-title":"Bioinformatics"},{"key":"626_CR13","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1093\/nar\/28.1.33","volume":"28","author":"RL Tatusov","year":"2000","unstructured":"Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 2000, 28: 33\u201336. 10.1093\/nar\/28.1.33","journal-title":"Nucleic Acids Res"},{"key":"626_CR14","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1093\/bioinformatics\/15.5.391","volume":"15","author":"MA Andrade","year":"1999","unstructured":"Andrade MA, Brown NP, Leroy C, Hoersch S, de Daruvar A, Reich C, Franchini A, Tamames J, Valencia A, Ouzounis C, Sander C: Automated genome sequence analysis and annotation. Bioinformatics 1999, 15: 391\u2013412. 10.1093\/bioinformatics\/15.5.391","journal-title":"Bioinformatics"},{"key":"626_CR15","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1006\/jmbi.2000.3550","volume":"297","author":"CA Wilson","year":"2000","unstructured":"Wilson CA, Kreychman J, Gerstein M: Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 2000, 297: 233\u2013249. 10.1006\/jmbi.2000.3550","journal-title":"J Mol Biol"},{"key":"626_CR16","doi-asserted-by":"publisher","first-page":"886","DOI":"10.1046\/j.1365-2958.1999.01380.x","volume":"32","author":"NC Kyrpides","year":"1999","unstructured":"Kyrpides NC, Ouzounis CA: Whole-genome sequence annotation: 'Going wrong with confidence'. Mol Microbiol 1999, 32: 886\u2013887. 10.1046\/j.1365-2958.1999.01380.x","journal-title":"Mol Microbiol"},{"key":"626_CR17","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1038\/ng0498-313","volume":"18","author":"P Bork","year":"1998","unstructured":"Bork P, Koonin EV: Predicting functions from protein sequences--where are the bottlenecks? Nat Genet 1998, 18: 313\u2013318. 10.1038\/ng0498-313","journal-title":"Nat Genet"},{"key":"626_CR18","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1016\/S0168-9525(01)02348-4","volume":"17","author":"D Devos","year":"2001","unstructured":"Devos D, Valencia A: Intrinsic errors in genome annotation. Trends Genet 2001, 17: 429\u2013431. 10.1016\/S0168-9525(01)02348-4","journal-title":"Trends Genet"},{"key":"626_CR19","doi-asserted-by":"publisher","first-page":"REVIEWS0005","DOI":"10.1186\/gb-2000-1-5-reviews0005","volume":"1","author":"JA Gerlt","year":"2000","unstructured":"Gerlt JA, Babbitt PC: Can sequence determine function? Genome Biol 2000, 1: REVIEWS0005. 10.1186\/gb-2000-1-5-reviews0005","journal-title":"Genome Biol"},{"key":"626_CR20","doi-asserted-by":"publisher","first-page":"1641","DOI":"10.1093\/bioinformatics\/18.12.1641","volume":"18","author":"WR Gilks","year":"2002","unstructured":"Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA: Modeling the percolation of annotation errors in a database of protein sequences. Bioinformatics 2002, 18: 1641\u20131649. 10.1093\/bioinformatics\/18.12.1641","journal-title":"Bioinformatics"},{"key":"626_CR21","doi-asserted-by":"publisher","first-page":"955","DOI":"10.1002\/prot.20373","volume":"58","author":"BY Cheng","year":"2005","unstructured":"Cheng BY, Carbonell JG, Klein-Seetharaman J: Protein classification based on text document classification techniques. Proteins 2005, 58: 955\u2013970. 10.1002\/prot.20373","journal-title":"Proteins"},{"key":"626_CR22","first-page":"92","volume":"5","author":"M des Jardins","year":"1997","unstructured":"des Jardins M, Karp PD, Krummenacker M, Lee TJ, Ouzounis CA: Prediction of enzyme classification from protein sequence without the use of sequence similarity. Proc Int Conf Intell Syst Mol Biol 1997, 5: 92\u201399.","journal-title":"Proc Int Conf Intell Syst Mol Biol"},{"key":"626_CR23","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1093\/bioinformatics\/18.1.147","volume":"18","author":"R Karchin","year":"2002","unstructured":"Karchin R, Karplus K, Haussler D: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 2002, 18: 147\u2013159. 10.1093\/bioinformatics\/18.1.147","journal-title":"Bioinformatics"},{"key":"626_CR24","doi-asserted-by":"publisher","first-page":"14031","DOI":"10.1074\/jbc.275.19.14031","volume":"275","author":"S Fillinger","year":"2000","unstructured":"Fillinger S, Boschi-Muller S, Azza S, Dervyn E, Branlant G, Aymerich S: Two glyceraldehyde-3-phosphate dehydrogenases with opposite physiological roles in a nonphotosynthetic bacterium. J Biol Chem 2000, 275: 14031\u201314037. 10.1074\/jbc.275.19.14031","journal-title":"J Biol Chem"},{"key":"626_CR25","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1093\/bib\/3.3.265","volume":"3","author":"CJ Sigrist","year":"2002","unstructured":"Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P: PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 2002, 3: 265\u2013274. 10.1093\/bib\/3.3.265","journal-title":"Brief Bioinform"},{"key":"626_CR26","doi-asserted-by":"crossref","first-page":"3826","DOI":"10.1128\/aem.62.10.3826-3833.1996","volume":"62","author":"Z Wen","year":"1996","unstructured":"Wen Z, Morrison M: The NAD(P)H-dependent glutamate dehydrogenase activities of Prevotella ruminicola B(1)4 can be attributed to one enzyme (GdhA), and gdhA expression is regulated in response to the nitrogen source available for growth. Appl Environ Microbiol 1996, 62: 3826\u20133833.","journal-title":"Appl Environ Microbiol"},{"key":"626_CR27","doi-asserted-by":"publisher","first-page":"630","DOI":"10.1016\/0006-291X(90)90855-H","volume":"166","author":"P Itkor","year":"1990","unstructured":"Itkor P, Tsukagoshi N, Udaka S: Nucleotide sequence of the raw-starch-digesting amylase gene from Bacillus sp. B1018 and its strong homology to the cyclodextrin glucanotransferase genes. Biochem Biophys Res Commun 1990, 166: 630\u2013636. 10.1016\/0006-291X(90)90855-H","journal-title":"Biochem Biophys Res Commun"},{"key":"626_CR28","first-page":"276","volume":"5","author":"I Shah","year":"1997","unstructured":"Shah I, Hunter L: Predicting enzyme function from sequence: a systematic appraisal. Proc Int Conf Intell Syst Mol Biol 1997, 5: 276\u2013283.","journal-title":"Proc Int Conf Intell Syst Mol Biol"},{"key":"626_CR29","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1002\/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S","volume":"41","author":"D Devos","year":"2000","unstructured":"Devos D, Valencia A: Practical limits of function prediction. Proteins 2000, 41: 98\u2013107. 10.1002\/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S","journal-title":"Proteins"},{"key":"626_CR30","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"M Ashburner","year":"2000","unstructured":"Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25: 25\u201329. 10.1038\/75556","journal-title":"Nat Genet"},{"key":"626_CR31","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1016\/S1476-9271(02)00094-4","volume":"27","author":"A Gattiker","year":"2003","unstructured":"Gattiker A, Michoud K, Rivoire C, Auchincloss AH, Coudert E, Lima T, Kersey P, Pagni M, Sigrist CJ, Lachaize C, Veuthey AL, Gasteiger E, Bairoch A: Automated annotation of microbial proteomes in SWISS-PROT. Comput Biol Chem 2003, 27: 49\u201358. 10.1016\/S1476-9271(02)00094-4","journal-title":"Comput Biol Chem"},{"key":"626_CR32","doi-asserted-by":"publisher","first-page":"I342","DOI":"10.1093\/bioinformatics\/bth938","volume":"20 Suppl 1","author":"D Wieser","year":"2004","unstructured":"Wieser D, Kretschmann E, Apweiler R: Filtering erroneous protein annotation. Bioinformatics 2004, 20 Suppl 1: I342-I347. 10.1093\/bioinformatics\/bth938","journal-title":"Bioinformatics"},{"key":"626_CR33","doi-asserted-by":"publisher","first-page":"536","DOI":"10.1006\/jmbi.1995.0159","volume":"247","author":"AG Murzin","year":"1995","unstructured":"Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536\u2013540. 10.1006\/jmbi.1995.0159","journal-title":"J Mol Biol"},{"key":"626_CR34","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1126\/science.273.5275.595","volume":"273","author":"L Holm","year":"1996","unstructured":"Holm L, Sander C: Mapping the protein universe. Science 1996, 273: 595\u2013603.","journal-title":"Science"},{"key":"626_CR35","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1089\/10665270050081405","volume":"7","author":"T Jaakkola","year":"2000","unstructured":"Jaakkola T, Diekhans M, Haussler D: A discriminative framework for detecting remote protein homologies. J Comput Biol 2000, 7: 95\u2013114. 10.1089\/10665270050081405","journal-title":"J Comput Biol"},{"key":"626_CR36","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1093\/nar\/28.1.304","volume":"28","author":"A Bairoch","year":"2000","unstructured":"Bairoch A: The ENZYME database in 2000. Nucleic Acids Res 2000, 28: 304\u2013305. 10.1093\/nar\/28.1.304","journal-title":"Nucleic Acids Res"},{"key":"626_CR37","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1093\/nar\/gkg095","volume":"31","author":"B Boeckmann","year":"2003","unstructured":"Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31: 365\u2013370. 10.1093\/nar\/gkg095","journal-title":"Nucleic Acids Res"},{"key":"626_CR38","doi-asserted-by":"publisher","first-page":"915","DOI":"10.1093\/bioinformatics\/16.10.915","volume":"16","author":"VJ Promponas","year":"2000","unstructured":"Promponas VJ, Enright AJ, Tsoka S, Kreil DP, Leroy C, Hamodrakas S, Sander C, Ouzounis CA: CAST: an iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts. Bioinformatics 2000, 16: 915\u2013922. 10.1093\/bioinformatics\/16.10.915","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-302.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T17:50:08Z","timestamp":1706809808000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-302"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,12,14]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2005,12]]}},"alternative-id":["626"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-302","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,12,14]]},"assertion":[{"value":"20 May 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 December 2005","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 December 2005","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"302"}}