{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T15:31:39Z","timestamp":1772033499366,"version":"3.50.1"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"S4","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2007,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/www.synteny.net\/\" ext-link-type=\"uri\">http:\/\/www.synteny.net\/<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-8-s4-s6","type":"journal-article","created":{"date-parts":[[2007,5,23]],"date-time":"2007-05-23T16:35:21Z","timestamp":1179938121000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages"],"prefix":"10.1186","volume":"8","author":[{"given":"Jason","family":"Li","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Saman K","family":"Halgamuge","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher I","family":"Kells","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sen-Lin","family":"Tang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2007,5,22]]},"reference":[{"issue":"17","key":"1909_CR1","doi-asserted-by":"publisher","first-page":"3461","DOI":"10.1093\/bioinformatics\/bti555","volume":"21","author":"X Pan","year":"2005","unstructured":"Pan X, Stein L, Brendel V: SynBrowse: a synteny browser for comparative sequence analysis. Bioinformatics 2005, 21(17):3461\u20133468. 10.1093\/bioinformatics\/bti555","journal-title":"Bioinformatics"},{"key":"1909_CR2","doi-asserted-by":"crossref","unstructured":"Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I: VISTA: computational tools for comparative genomics. Nucleic Acids Res 2004, (32 Web Server):W273\u2013279. 10.1093\/nar\/gkh458","DOI":"10.1093\/nar\/gkh458"},{"issue":"4","key":"1909_CR3","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1101\/gr.926603","volume":"13","author":"M Brudno","year":"2003","unstructured":"Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Program NCS, Green ED, Sidow A, Batzoglou S: LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA. Genome Res 2003, 13(4):721\u2013731. 10.1101\/gr.926603","journal-title":"Genome Res"},{"issue":"4","key":"1909_CR4","doi-asserted-by":"publisher","first-page":"577","DOI":"10.1101\/gr.10.4.577","volume":"10","author":"S Schwartz","year":"2000","unstructured":"Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W: PipMaker \u2013 a web server for aligning two genomic DNA sequences. Genome Res 2000, 10(4):577\u2013586. 10.1101\/gr.10.4.577","journal-title":"Genome Res"},{"issue":"1","key":"1909_CR5","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1093\/nar\/gkg083","volume":"31","author":"M Clamp","year":"2003","unstructured":"Clamp M, Andrews D, Barker D, Bevan P, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, et al.: Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res 2003, 31(1):38\u201342. 10.1093\/nar\/gkg083","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"1909_CR6","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1016\/S0955-0674(03)00009-7","volume":"15","author":"MA Huynen","year":"2003","unstructured":"Huynen MA, Snel B, von Mering C, Bork P: Function prediction and protein networks. Curr Opin Cell Biol 2003, 15(2):191\u2013198. 10.1016\/S0955-0674(03)00009-7","journal-title":"Curr Opin Cell Biol"},{"issue":"Database issue","key":"1909_CR7","doi-asserted-by":"publisher","first-page":"D433","DOI":"10.1093\/nar\/gki005","volume":"33","author":"C von Mering","year":"2005","unstructured":"von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P: STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 2005, 33(Database issue):D433-D437. 10.1093\/nar\/gki005","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"1909_CR8","doi-asserted-by":"publisher","first-page":"RESEARCH0020","DOI":"10.1186\/gb-2001-2-6-research0020","volume":"2","author":"J Tamames","year":"2001","unstructured":"Tamames J: Evolution of gene order conservation in prokaryotes. Genome Biol 2001, 2(6):RESEARCH0020. 10.1186\/gb-2001-2-6-research0020","journal-title":"Genome Biol"},{"issue":"4","key":"1909_CR9","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1016\/S0168-9525(01)02621-X","volume":"18","author":"I Yanai","year":"2002","unstructured":"Yanai I, Mellor JC, DeLisi C: Identifying functional links between genes using conserved chromosomal proximity. Trends in Genetics 2002, 18(4):176\u2013179. 10.1016\/S0168-9525(01)02621-X","journal-title":"Trends in Genetics"},{"key":"1909_CR10","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1101\/gr.789803","volume":"13","author":"N Bray","year":"2003","unstructured":"Bray N, Dubchak I, Pachter L: AVID: A Global Alignment Program. Genome Res 2003, 13: 97\u2013102. 10.1101\/gr.789803","journal-title":"Genome Res"},{"issue":"suppl_1","key":"1909_CR11","doi-asserted-by":"publisher","first-page":"i54","DOI":"10.1093\/bioinformatics\/btg1005","volume":"19","author":"M Brudno","year":"2003","unstructured":"Brudno M, Malde S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S: Glocal alignment: finding rearrangements during alignment. Bioinformatics 2003, 19(suppl_1):i54\u201362. 10.1093\/bioinformatics\/btg1005","journal-title":"Bioinformatics"},{"issue":"3","key":"1909_CR12","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1016\/S0168-9525(01)02597-5","volume":"18","author":"JO Korbel","year":"2002","unstructured":"Korbel JO, Snel B, Huynen MA, Bork P: SHOT: a web server for the construction of genome phylogenies. Trends Genet 2002, 18(3):158\u2013162. 10.1016\/S0168-9525(01)02597-5","journal-title":"Trends Genet"},{"key":"1909_CR13","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/S0092-8674(01)00637-7","volume":"108","author":"H Brussow","year":"2002","unstructured":"Brussow H, Hendrix RW: Phage Genomics: Small Is Beautiful. Cell 2002, 108: 13\u201316. 10.1016\/S0092-8674(01)00637-7","journal-title":"Cell"},{"issue":"5","key":"1909_CR14","doi-asserted-by":"publisher","first-page":"506","DOI":"10.1016\/j.mib.2003.09.004","volume":"6","author":"RW Hendrix","year":"2003","unstructured":"Hendrix RW: Bacteriophage genomics. Curr Opin Microbiol 2003, 6(5):506\u2013511. 10.1016\/j.mib.2003.09.004","journal-title":"Curr Opin Microbiol"},{"issue":"2","key":"1909_CR15","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1038\/nsb891","volume":"10","author":"W Jiang","year":"2003","unstructured":"Jiang W, Li Z, Zhang Z, Baker ML, Prevelige PE Jr, Chiu W: Coat protein fold and maturation transition of bacteriophage P22 seen at subnanometer resolutions. Nat Struct Biol 2003, 10(2):131\u2013135. 10.1038\/nsb891","journal-title":"Nat Struct Biol"},{"issue":"6","key":"1909_CR16","doi-asserted-by":"publisher","first-page":"e92","DOI":"10.1371\/journal.pgen.0020092","volume":"2","author":"GF Hatfull","year":"2006","unstructured":"Hatfull GF, Pedulla ML, Jacobs-Sera D, Cichon PM, Foley A, Ford ME, Gonda RM, Houtz JM, Hryckowian AJ, Kelchner VA, et al.: Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform. PLoS Genet 2006, 2(6):e92. 10.1371\/journal.pgen.0020092","journal-title":"PLoS Genet"},{"key":"1909_CR17","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511801389","volume-title":"An introduction to support vector machines: And other kernel-based learning methods","author":"N Cristianini","year":"2000","unstructured":"Cristianini N, Shawe-Taylor J: An introduction to support vector machines: And other kernel-based learning methods. Cambridge, England: Cambridge Press; 2000."},{"key":"1909_CR18","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1146\/annurev.micro.54.1.799","volume":"54","author":"IN Wang","year":"2000","unstructured":"Wang IN, Smith DL, Young R: Holins: the protein clocks of bacteriophage infections. Annu Rev Microbiol 2000, 54: 799\u2013825. 10.1146\/annurev.micro.54.1.799","journal-title":"Annu Rev Microbiol"},{"issue":"3","key":"1909_CR19","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1016\/S0168-9525(00)02212-5","volume":"17","author":"J Tamames","year":"2001","unstructured":"Tamames J, Gonzalez-Moreno M, Mingorance J, Valencia A, Vicente M: Bringing gene order into bacterial shape. Trends in Genetics 2001, 17(3):124\u2013126. 10.1016\/S0168-9525(00)02212-5","journal-title":"Trends in Genetics"},{"issue":"3","key":"1909_CR20","doi-asserted-by":"publisher","first-page":"356","DOI":"10.1101\/gr.GR-1619R","volume":"11","author":"YI Wolf","year":"2001","unstructured":"Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res 2001, 11(3):356\u2013372. 10.1101\/gr.GR-1619R","journal-title":"Genome Res"},{"issue":"20","key":"1909_CR21","doi-asserted-by":"publisher","first-page":"4029","DOI":"10.1093\/nar\/28.20.4029","volume":"28","author":"W Fujibuchi","year":"2000","unstructured":"Fujibuchi W, Ogata H, Matsuda H, Kanehisa M: Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping. Nucleic Acids Res 2000, 28(20):4029\u20134036. 10.1093\/nar\/28.20.4029","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"1909_CR22","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1093\/nar\/30.1.42","volume":"30","author":"M Kanehisa","year":"2002","unstructured":"Kanehisa M, Goto S, Kawashima S, Nakaya A: The KEGG databases at GenomeNet. Nucl Acids Res 2002, 30(1):42\u201346. 10.1093\/nar\/30.1.42","journal-title":"Nucl Acids Res"},{"issue":"10","key":"1909_CR23","doi-asserted-by":"publisher","first-page":"1611","DOI":"10.1101\/gr.361602","volume":"12","author":"JE Stajich","year":"2002","unstructured":"Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al.: The Bioperl toolkit: Perl modules for the life sciences. Genome Res 2002, 12(10):1611\u20131618. 10.1101\/gr.361602","journal-title":"Genome Res"},{"key":"1909_CR24","first-page":"31","volume-title":"Introduction to the theory of computation","author":"M Sipser","year":"2006","unstructured":"Sipser M: Chapter 1: Regular languages. In Introduction to the theory of computation. 2nd edition. Boston: Thomson Course Technology; 2006:31\u201390.","edition":"2"},{"key":"1909_CR25","volume-title":"Regular Expression Language Elements","author":"Microsoft","year":"2006","unstructured":"Microsoft: Regular Expression Language Elements. MSDN Library: .NET Framework General Reference, Microsoft Corporation; 2006."},{"issue":"2\u20133","key":"1909_CR26","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1016\/S0888-613X(02)00086-5","volume":"32","author":"AL Hsu","year":"2003","unstructured":"Hsu AL, Halgamuge SK: Enhancement of topology preservation and hierarchical dynamic self-organising maps for data visualisation. International Journal of Approximate Reasoning 2003, 32(2\u20133):259\u2013279. 10.1016\/S0888-613X(02)00086-5","journal-title":"International Journal of Approximate Reasoning"},{"issue":"16","key":"1909_CR27","doi-asserted-by":"publisher","first-page":"2131","DOI":"10.1093\/bioinformatics\/btg296","volume":"19","author":"AL Hsu","year":"2003","unstructured":"Hsu AL, Tang SL, Halgamuge SK: An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 2003, 19(16):2131\u20132140. 10.1093\/bioinformatics\/btg296","journal-title":"Bioinformatics"},{"issue":"3","key":"1909_CR28","doi-asserted-by":"publisher","first-page":"637","DOI":"10.1162\/089976601300014493","volume":"13","author":"SS Keerthi","year":"2001","unstructured":"Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK: Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Comp 2001, 13(3):637\u2013649. 10.1162\/089976601300014493","journal-title":"Neural Comp"},{"issue":"6","key":"1909_CR29","doi-asserted-by":"publisher","first-page":"857","DOI":"10.1089\/106652703322756113","volume":"10","author":"L Liao","year":"2003","unstructured":"Liao L, Noble WS: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J Comput Biol 2003, 10(6):857\u2013868. 10.1089\/106652703322756113","journal-title":"J Comput Biol"},{"issue":"1","key":"1909_CR30","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1002\/prot.20045","volume":"55","author":"CZ Cai","year":"2004","unstructured":"Cai CZ, Han LY, Ji ZL, Chen YZ: Enzyme family classification by support vector machines. Proteins 2004, 55(1):66\u201376. 10.1002\/prot.20045","journal-title":"Proteins"},{"issue":"Suppl 5","key":"1909_CR31","doi-asserted-by":"publisher","first-page":"S15","DOI":"10.1186\/1471-2105-7-S5-S15","volume":"7","author":"A Baten","year":"2006","unstructured":"Baten A, Chang BCH, Halgamuge SK, Li J: Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics 2006, 7(Suppl 5):S15. 10.1186\/1471-2105-7-S5-S15","journal-title":"BMC Bioinformatics"},{"issue":"19","key":"1909_CR32","doi-asserted-by":"publisher","first-page":"8700","DOI":"10.1073\/pnas.92.19.8700","volume":"92","author":"I Dubchak","year":"1995","unstructured":"Dubchak I, Muchnik I, Holbrook SR, Kim SH: Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 1995, 92(19):8700\u20138704. 10.1073\/pnas.92.19.8700","journal-title":"Proc Natl Acad Sci USA"},{"issue":"2","key":"1909_CR33","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1111\/j.1574-6968.1999.tb13575.x","volume":"174","author":"TA Tatusova","year":"1999","unstructured":"Tatusova TA, Madden TL: BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 1999, 174(2):247\u2013250. 10.1111\/j.1574-6968.1999.tb13575.x","journal-title":"FEMS Microbiol Lett"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-8-S4-S6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T21:26:46Z","timestamp":1630445206000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-8-S4-S6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,5]]},"references-count":33,"journal-issue":{"issue":"S4","published-print":{"date-parts":[[2007,5]]}},"alternative-id":["1909"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-8-s4-s6","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,5]]},"assertion":[{"value":"22 May 2007","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S6"}}