{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T06:44:54Z","timestamp":1776408294480,"version":"3.51.2"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2004,8,26]],"date-time":"2004-08-26T00:00:00Z","timestamp":1093478400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"},{"start":{"date-parts":[[2004,8,26]],"date-time":"2004-08-26T00:00:00Z","timestamp":1093478400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>The current progress in sequencing projects calls for rapid, reliable and accurate function assignments of gene products. A variety of methods has been designed to annotate sequences on a large scale. However, these methods can either only be applied for specific subsets, or their results are not formalised, or they do not provide precise confidence estimates for their predictions.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>We have developed a large-scale annotation system that tackles all of these shortcomings. In our approach, annotation was provided through Gene Ontology terms by applying multiple Support Vector Machines (SVM) for the classification of correct and false predictions. The general performance of the system was benchmarked with a large dataset. An organism-wise cross-validation was performed to define confidence estimates, resulting in an average precision of 80% for 74% of all test sequences. The validation results show that the prediction performance was organism-independent and could reproduce the annotation of other automated systems as well as high-quality manual annotations. We applied our trained classification system to <jats:italic>Xenopus laevis<\/jats:italic> sequences, yielding functional annotation for more than half of the known expressed genome. Compared to the currently available annotation, we provided more than twice the number of contigs with good quality annotation, and additionally we assigned a confidence value to each predicted GO term.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusions<\/jats:title>\n                        <jats:p>We present a complete automated annotation system that overcomes many of the usual problems by applying a controlled vocabulary of Gene Ontology and an established classification method on large and well-described sequence data sets. In a case study, the function for <jats:italic>Xenopus laevis<\/jats:italic> contig sequences was predicted and the results are publicly available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"ftp:\/\/genome.dkfz-heidelberg.de\/pub\/agd\/gene_association.agd_Xenopus\">ftp:\/\/genome.dkfz-heidelberg.de\/pub\/agd\/gene_association.agd_Xenopus<\/jats:ext-link>.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-5-116","type":"journal-article","created":{"date-parts":[[2004,8,28]],"date-time":"2004-08-28T06:23:22Z","timestamp":1093674202000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":60,"title":["Applying Support Vector Machines for Gene ontology based gene function prediction"],"prefix":"10.1186","volume":"5","author":[{"given":"Arunachalam","family":"Vinayagam","sequence":"first","affiliation":[]},{"given":"Rainer","family":"K\u00f6nig","sequence":"additional","affiliation":[]},{"given":"Jutta","family":"Moormann","sequence":"additional","affiliation":[]},{"given":"Falk","family":"Schubert","sequence":"additional","affiliation":[]},{"given":"Roland","family":"Eils","sequence":"additional","affiliation":[]},{"given":"Karl-Heinz","family":"Glatting","sequence":"additional","affiliation":[]},{"given":"S\u00e1ndor","family":"Suhai","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2004,8,26]]},"reference":[{"key":"232_CR1","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1016\/S0959-440X(00)00095-6","volume":"10","author":"S Lewis","year":"2000","unstructured":"Lewis S, Ashburner M, Reese MG: Annotating eukaryote genomes.\n                           Curr Opin Struct Biol 2000, 10: 349\u2013354. 10.1016\/S0959-440X(00)00095-6","journal-title":"Curr Opin Struct Biol"},{"key":"232_CR2","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1016\/S1359-6446(99)01457-9","volume":"5","author":"DB Searls","year":"2000","unstructured":"Searls DB: Using bioinformatics in gene and drug discovery.\n                           Drug Discov Today 2000, 5: 135\u2013143. 10.1016\/S1359-6446(99)01457-9","journal-title":"Drug Discov Today"},{"key":"232_CR3","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1038\/ng0498-313","volume":"18","author":"P Bork","year":"1998","unstructured":"Bork P, Koonin EV: Predicting function from protein sequence: Where are the bottlenecks?\n                           Nat Genet 1998, 18: 313\u2013318. 10.1038\/ng0498-313","journal-title":"Nat Genet"},{"key":"232_CR4","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1016\/S0168-9525(98)01508-X","volume":"14","author":"TF Smith","year":"1998","unstructured":"Smith TF: Functional genomics \u2013 bioinformatics is ready for the challenge.\n                           Trends Genet 1998, 14: 291\u2013293. 10.1016\/S0168-9525(98)01508-X","journal-title":"Trends Genet"},{"key":"232_CR5","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1016\/S0076-6879(96)66013-3","volume":"266","author":"P Bork","year":"1996","unstructured":"Bork P, Gibson TJ: Applying motif and profile searches.\n                           Methods Enzymol 1996, 266: 162\u2013184. 10.1016\/S0076-6879(96)66013-3","journal-title":"Methods Enzymol"},{"key":"232_CR6","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1093\/bioinformatics\/15.5.391","volume":"15","author":"MA Andrade","year":"1999","unstructured":"Andrade MA, Brown NP, Leroy C, Hoersch S, de Daruvar A, Reich C, Franchini A, Tamames J, Valencia A, Ouzounis C, Sander C: Automated genome sequence analysis and annotation.\n                           Bioinformatics 1999, 15: 391\u2013412. 10.1093\/bioinformatics\/15.5.391","journal-title":"Bioinformatics"},{"key":"232_CR7","doi-asserted-by":"publisher","first-page":"425","DOI":"10.1016\/0168-9525(96)60040-7","volume":"12","author":"P Bork","year":"1996","unstructured":"Bork P, Bairoch A: Go hunting in sequence databases but watch out for the traps.\n                           Trends Genet 1996, 12: 425\u2013427. 10.1016\/0168-9525(96)60040-7","journal-title":"Trends Genet"},{"key":"232_CR8","first-page":"0007","volume":"1","author":"MY Galperin","year":"1998","unstructured":"Galperin MY, Koonin EV: Sources of systematic errors in functional annotation of genomes: domain rearrangements, non-orthologous gene displacement, and operon distribution.\n                           In Silico Biol 1998, 1: 0007. [http:\/\/www.bioinfo.de\/isb\/1998\/01\/0007\/]","journal-title":"In Silico Biol"},{"key":"232_CR9","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1093\/nar\/30.1.98","volume":"30","author":"K Sakata","year":"2002","unstructured":"Sakata K, Nagamura Y, Numa H, Antonio BA, Nagasaki H, Idonuma A, Watanabe W, Shimizu Y, Horiuchi I, Matsumoto T, Sasaki T, Higo K: RiceGAAS: an automated annotation system and database for rice genome sequence.\n                           Nucleic Acids Res 2002, 30: 98\u2013102. 10.1093\/nar\/30.1.98","journal-title":"Nucleic Acids Res"},{"key":"232_CR10","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1101\/gr.8.3.234","volume":"8","author":"LC Bailey","year":"1998","unstructured":"Bailey LC, Fischer S Jr, Schug J, Crabtree J, Gibson M, Overton GC: GAIA: framework annotation of genomic sequence.\n                           Genome Res 1998, 8: 234\u2013250.","journal-title":"Genome Res"},{"key":"232_CR11","doi-asserted-by":"crossref","first-page":"754","DOI":"10.1101\/gr.7.7.754","volume":"7","author":"NL Harris","year":"1997","unstructured":"Harris NL: Genotator: a workbench for sequence annotation.\n                           Genome Res 1997, 7: 754\u2013762.","journal-title":"Genome Res"},{"key":"232_CR12","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/0168-9525(96)81406-5","volume":"12","author":"T Gaasterland","year":"1996","unstructured":"Gaasterland T, Sensen CW: MAGPIE: automated genome interpretation.\n                           Trends Genet 1996, 12: 76\u201378. 10.1016\/0168-9525(96)81406-5","journal-title":"Trends Genet"},{"key":"232_CR13","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1093\/bib\/3.1.32","volume":"3","author":"DH Kitson","year":"2002","unstructured":"Kitson DH, Badretdinov A, Zhu ZY, Velikanov M, Edwards DJ, Olszewski K, Szalma S, Yan L: Functional annotation of proteomic sequences based on consensus of sequence and structural analysis.\n                           Brief Bioinform 2002, 3: 32\u201344. 10.1186\/1471-2105-3-32","journal-title":"Brief Bioinform"},{"key":"232_CR14","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1093\/bioinformatics\/17.1.44","volume":"17","author":"D Frishman","year":"2001","unstructured":"Frishman D, Albermann K, Hani J, Heumann K, Metanomski A, Zollner A, Mewes HW: Functional and structural genomics using PEDANT.\n                           Bioinformatics 2001, 17: 44\u201357. 10.1093\/bioinformatics\/17.1.44","journal-title":"Bioinformatics"},{"key":"232_CR15","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"The Gene Ontology Consortium","year":"2000","unstructured":"The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology.\n                           Nat Genet 2000, 25: 25\u201329. 10.1038\/75556","journal-title":"Nat Genet"},{"key":"232_CR16","doi-asserted-by":"publisher","first-page":"1425","DOI":"10.1101\/gr.180801","volume":"11","author":"The Gene Ontology Consortium","year":"2001","unstructured":"The Gene Ontology Consortium: Creating the gene ontology resource: design and implementation.\n                           Genome Res 2001, 11: 1425\u20131433. 10.1101\/gr.180801","journal-title":"Genome Res"},{"key":"232_CR17","doi-asserted-by":"publisher","first-page":"1982","DOI":"10.1101\/gr.580102","volume":"12","author":"DP Hill","year":"2002","unstructured":"Hill DP, Blake JA, Richardson JE, Ringwald M: Extension and Integration of the Gene Ontology (GO): Combining GO vocabularies with external vocabularies.\n                           Genome Res 2002, 12: 1982\u20131991. 10.1101\/gr.580102","journal-title":"Genome Res"},{"key":"232_CR18","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1101\/gr.86902","volume":"12","author":"H Xie","year":"2002","unstructured":"Xie H, Wasserman A, Levine Z, Novik A, Grebinskiy V, Shoshan A, Mintz L: Large-Scale Protein Annotation through Gene Ontology.\n                           Genome Res 2002, 12: 785\u2013794. 10.1101\/gr.86902","journal-title":"Genome Res"},{"key":"232_CR19","doi-asserted-by":"publisher","first-page":"662","DOI":"10.1101\/gr.461403","volume":"13","author":"E Camon","year":"2003","unstructured":"Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, Kersey P, Mulder N, Oinn T, Maslen J, Cox A, Apweiler R: The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro.\n                           Genome Res 2003, 13: 662\u2013672. 10.1101\/gr.461403","journal-title":"Genome Res"},{"key":"232_CR20","unstructured":"TIGR Gene Indices[http:\/\/www.tigr.org\/tdb\/tgi.shtml]"},{"key":"232_CR21","doi-asserted-by":"publisher","first-page":"3799","DOI":"10.1093\/nar\/gkg555","volume":"31","author":"G Zehetner","year":"2003","unstructured":"Zehetner G: OntoBLAST function: from sequence similarities directly to potential functional annotations by ontology terms.\n                           Nucleic Acids Res 2003, 31: 3799\u20133803. 10.1093\/nar\/gkg555","journal-title":"Nucleic Acids Res"},{"key":"232_CR22","doi-asserted-by":"publisher","first-page":"3712","DOI":"10.1093\/nar\/gkg582","volume":"31","author":"S Hennig","year":"2003","unstructured":"Hennig S, Groth D, Lehrach H: Automated Gene Ontology annotation for anonymous sequence data.\n                           Nucleic Acids Res 2003, 31: 3712\u20133715. 10.1093\/nar\/gkg582","journal-title":"Nucleic Acids Res"},{"key":"232_CR23","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1093\/bioinformatics\/btg036","volume":"19","author":"LJ Jensen","year":"2003","unstructured":"Jensen LJ, Gupta R, Staerfeldt HH, Brunak S: Prediction of human protein function according to Gene Ontology categories.\n                           Bioinformatics 2003, 19: 635\u2013642. 10.1093\/bioinformatics\/btg036","journal-title":"Bioinformatics"},{"key":"232_CR24","doi-asserted-by":"publisher","first-page":"648","DOI":"10.1101\/gr.222902","volume":"12","author":"J Schug","year":"2002","unstructured":"Schug J, Diskin S, Mazzarelli J, Brunk BP, Stoeckert CJ Jr: Predicting Gene Ontology Functions from ProDom and CDD Protein Domains.\n                           Genome Res 2002, 12: 648\u2013655. 10.1101\/gr.222902","journal-title":"Genome Res"},{"key":"232_CR25","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1023\/A:1007515423169","volume":"36","author":"E Bauer","year":"1999","unstructured":"Bauer E, Kohavi R: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.\n                           Machine Learning 1999, 36: 105\u2013139. 10.1023\/A:1007515423169","journal-title":"Machine Learning"},{"key":"232_CR26","doi-asserted-by":"publisher","first-page":"665","DOI":"10.2174\/1389202033490097","volume":"4","author":"DA Peiffer","year":"2003","unstructured":"Peiffer DA, Cho KWY, Shin Y: Xenopus\n                           DNA Microarrays.\n                           Current Genomics 2003, 4: 665\u2013672.","journal-title":"Current Genomics"},{"key":"232_CR27","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Sch\u00e4ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.\n                           Nucleic Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"},{"key":"232_CR28","unstructured":"TIGR Xenopus laevis Gene Index[http:\/\/www.tigr.org\/tdb\/tgi\/xgi\/]"},{"key":"232_CR29","unstructured":"Gene Ontology Consortium[http:\/\/www.geneontology.org]"},{"key":"232_CR30","volume-title":"Principles of Data Mining,","author":"D Hand","year":"2001","unstructured":"Hand D, Mannila H, Smyth P: Principles of Data Mining,. MIT Press, Cambridge, London 2001."},{"key":"232_CR31","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-21606-5","volume-title":"The Elements of Statistical Learning,","author":"T Hastie","year":"2001","unstructured":"Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning,. Springer, New York, Berlin, Heidelberg 2001."},{"key":"232_CR32","unstructured":"Swiss-Prot[http:\/\/www.ebi.ac.uk\/ebi_docs\/swissprot_db\/swisshome.html]"},{"key":"232_CR33","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1093\/bioinformatics\/14.5.452","volume":"14","author":"M Senger","year":"1998","unstructured":"Senger M, Flores T, Glatting K, Ernst P, Hotz-Wagenblatt A, Suhai S: W2H: WWW interface to the GCG sequence analysis package.\n                           Bioinformatics 1998, 14: 452\u2013457. 10.1093\/bioinformatics\/14.5.452","journal-title":"Bioinformatics"},{"key":"232_CR34","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1093\/bioinformatics\/19.2.278","volume":"19","author":"P Ernst","year":"2003","unstructured":"Ernst P, Glatting KH, Suhai S: A task framework for the web interface W2H.\n                           Bioinformatics 2003, 19: 278\u2013282. 10.1093\/bioinformatics\/19.2.278","journal-title":"Bioinformatics"},{"issue":"1","key":"232_CR35","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1186\/1471-2105-4-39","volume":"4","author":"C Del Val","year":"2003","unstructured":"Del Val C, Glatting KH, Suhai S: cDNA2Genome: A tool for mapping and annotating cDNAs.\n                           BMC Bioinformatics 2003, 4(1):39. 10.1186\/1471-2105-4-39","journal-title":"BMC Bioinformatics"},{"key":"232_CR36","unstructured":"LIBSVM; version 2.4[http:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvm\/index.html]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-116.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-5-116\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-116.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:16:48Z","timestamp":1728303408000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-5-116"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,8,26]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2004,12]]}},"alternative-id":["232"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-5-116","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2004,8,26]]},"assertion":[{"value":"11 May 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 August 2004","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 August 2004","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"116"}}