{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T01:18:04Z","timestamp":1776734284654,"version":"3.51.2"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Genome context methods have been introduced in the last decade as automatic methods to predict functional relatedness between genes in a target genome using the patterns of existence and relative locations of the homologs of those genes in a set of reference genomes. Much work has been done in the application of these methods to different bioinformatics tasks, but few papers present a systematic study of the methods and their combination necessary for their optimal use.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We present a thorough study of the four main families of genome context methods found in the literature: phylogenetic profile, gene fusion, gene cluster, and gene neighbor. We find that for most organisms the gene neighbor method outperforms the phylogenetic profile method by as much as 40% in sensitivity, being competitive with the gene cluster method at low sensitivities. Gene fusion is generally the worst performing of the four methods. A thorough exploration of the parameter space for each method is performed and results across different target organisms are presented.<\/jats:p><jats:p>We propose the use of normalization procedures as those used on microarray data for the genome context scores. We show that substantial gains can be achieved from the use of a simple normalization technique. In particular, the sensitivity of the phylogenetic profile method is improved by around 25% after normalization, resulting, to our knowledge, on the best-performing phylogenetic profile system in the literature.<\/jats:p><jats:p>Finally, we show results from combining the various genome context methods into a single score. When using a cross-validation procedure to train the combiners, with both original and normalized scores as input, a decision tree combiner results in gains of up to 20% with respect to the gene neighbor method. Overall, this represents a gain of around 15% over what can be considered the state of the art in this area: the four original genome context methods combined using a procedure like that used in the STRING database. Unfortunately, we find that these gains disappear when the combiner is trained only with organisms that are phylogenetically distant from the target organism.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Our experiments indicate that gene neighbor is the best individual genome context method and that gains from the combination of individual methods are very sensitive to the training data used to obtain the combiner's parameters. If adequate training data is not available, using the gene neighbor score by itself instead of a combined score might be the best choice.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-11-493","type":"journal-article","created":{"date-parts":[[2010,10,1]],"date-time":"2010-10-01T18:14:10Z","timestamp":1285956850000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["A systematic study of genome context methods: calibration, normalization and combination"],"prefix":"10.1186","volume":"11","author":[{"given":"Luciana","family":"Ferrer","sequence":"first","affiliation":[]},{"given":"Joseph M","family":"Dale","sequence":"additional","affiliation":[]},{"given":"Peter D","family":"Karp","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,10,1]]},"reference":[{"key":"4076_CR1","doi-asserted-by":"publisher","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","volume":"96","author":"M Pellegrini","year":"1999","unstructured":"Pellegrini M, Marcotte E, Thompson M, Eisenberg D, Yeates T: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. PNAS 1999, 96: 4285\u20138. 10.1073\/pnas.96.8.4285","journal-title":"PNAS"},{"key":"4076_CR2","doi-asserted-by":"publisher","first-page":"751","DOI":"10.1126\/science.285.5428.751","volume":"285","author":"E Marcotte","year":"1999","unstructured":"Marcotte E, Pellegrini M, Ng H, Rice D, Yeates T, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285: 751\u20133. 10.1126\/science.285.5428.751","journal-title":"Science"},{"key":"4076_CR3","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1038\/47056","volume":"402","author":"A Enright","year":"1999","unstructured":"Enright A, Iliopoulos I, Kyrpides N, Ouzounis C: Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402: 86\u201390. 10.1038\/47056","journal-title":"Nature"},{"issue":"5","key":"4076_CR4","doi-asserted-by":"publisher","first-page":"R35","DOI":"10.1186\/gb-2004-5-5-r35","volume":"5","author":"P Bowers","year":"2004","unstructured":"Bowers P, Pellegrini M, Thompson M, Fierro J, Yeates T, Eisenberg D: Prolinks: a database of protein functional linkages derived from coevolution. Genome Biology 2004, 5(5):R35. 10.1186\/gb-2004-5-5-r35","journal-title":"Genome Biology"},{"issue":"2","key":"4076_CR5","doi-asserted-by":"crossref","first-page":"93","DOI":"10.3233\/ISB-00009","volume":"1","author":"R Overbeek","year":"1999","unstructured":"Overbeek R, Fonstein M, D'Souza M, Pusch G, Maltsev N: Use of contiguity on the chromosome to predict functional coupling. In Silico Biol 1999, 1(2):93\u2013108.","journal-title":"In Silico Biol"},{"issue":"5644","key":"4076_CR6","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1126\/science.1087361","volume":"302","author":"R Jansen","year":"2003","unstructured":"Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan N, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(5644):449\u2013453. 10.1126\/science.1087361","journal-title":"Science"},{"issue":"7","key":"4076_CR7","doi-asserted-by":"publisher","first-page":"945","DOI":"10.1101\/gr.3610305","volume":"15","author":"L Lu","year":"2005","unstructured":"Lu L, Xia Y, Paccanaro A, Yu H, Gerstein M: Assessing the limits of genomic data integration for predicting protein networks. Genome Research 2005, 15(7):945\u201353. 10.1101\/gr.3610305","journal-title":"Genome Research"},{"key":"4076_CR8","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1002\/jcb.10073","volume-title":"Journal of Cellular Biochemistry","author":"E Schadt","year":"2001","unstructured":"Schadt E, Li C, Ellis B, Wong W: Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. Journal of Cellular Biochemistry 2001, (Suppl 37):120\u20135. 10.1002\/jcb.10073"},{"issue":"2","key":"4076_CR9","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1093\/bioinformatics\/19.2.185","volume":"19","author":"B Bolstad","year":"2003","unstructured":"Bolstad B, Irizarry R, Astrand M, Speed T: A Comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 2003, 19(2):185\u2013193. 10.1093\/bioinformatics\/19.2.185","journal-title":"Bioinformatics"},{"key":"4076_CR10","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1038\/47048","volume":"402","author":"EM Marcotte","year":"1999","unstructured":"Marcotte EM, Pellegrini M, Thompson MJ, Yeates T, Eisenberg D: A combined algorithm for genome-wide prediction of protein function. Nature 1999, 402: 83\u201386. 10.1038\/47048","journal-title":"Nature"},{"issue":"4","key":"4076_CR11","doi-asserted-by":"publisher","first-page":"527","DOI":"10.1101\/gr.5900607","volume":"17","author":"S Yellaboina","year":"2007","unstructured":"Yellaboina S, Goyal K, Mande S: Inferring genome-wide functional linkages in E. coli by combining improved genome context methods: comparison with high-throughput experimental data. Genome Research 2007, 17(4):527\u201335. 10.1101\/gr.5900607","journal-title":"Genome Research"},{"key":"4076_CR12","doi-asserted-by":"publisher","first-page":"414","DOI":"10.1186\/1471-2105-8-414","volume":"8","author":"J Sun","year":"2007","unstructured":"Sun J, Sun Y, Ding G, Liu Q, Wang C, He Y, Shi T, Li Y, Zhao Z: InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes. BMC Bioinformatics 2007, 8: 414. 10.1186\/1471-2105-8-414","journal-title":"BMC Bioinformatics"},{"issue":"9","key":"4076_CR13","doi-asserted-by":"publisher","first-page":"R59","DOI":"10.1186\/gb-2003-4-9-r59","volume":"4","author":"M Strong","year":"2003","unstructured":"Strong M, Mallick P, Pellegrini M, Thompson MJ, Eisenberg D: Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach. Genome Biology 2003, 4(9):R59. 10.1186\/gb-2003-4-9-r59","journal-title":"Genome Biology"},{"issue":"1","key":"4076_CR14","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1093\/nar\/gkg034","volume":"31","author":"C von Mering","year":"2003","unstructured":"von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B: STRING: a database of predicted functional associations between proteins. Nucleic Acids Research 2003, 31(1):258\u201361. 10.1093\/nar\/gkg034","journal-title":"Nucleic Acids Research"},{"key":"4076_CR15","volume-title":"PLoS Biol","author":"P Hu","year":"2009","unstructured":"Hu P, Janga SC, Babu M, D\u00edaz-Mej\u00eda J, Butland G, Yang W, Pogoutse O, Guo X, Phanse S, Wong P, Chandran S, Christopoulos C, Nazarians-Armavil A, Nasseri NK, Musso G, Ali M, Nazemof N, Eroukova V, Golshani A, Paccanaro A, Greenblatt J, Moreno-Hagelsieb G, Emili A: Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 2009., 7(4): 10.1371\/journal.pbio.1000096","edition":"7"},{"key":"4076_CR16","doi-asserted-by":"publisher","first-page":"433","DOI":"10.1093\/nar\/gki005","volume":"33","author":"C von Mering","year":"2005","unstructured":"von Mering C, Jensen L, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P: STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Research 2005, 33: 433\u20137. 10.1093\/nar\/gki005","journal-title":"Nucleic Acids Research"},{"issue":"16","key":"4076_CR17","doi-asserted-by":"publisher","first-page":"3409","DOI":"10.1093\/bioinformatics\/bti532","volume":"21","author":"J Sun","year":"2005","unstructured":"Sun J, Xu J, Liu Z, Liu Q, Zhao A, Shi T, Li Y: Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 2005, 21(16):3409\u201315. 10.1093\/bioinformatics\/bti532","journal-title":"Bioinformatics"},{"key":"4076_CR18","doi-asserted-by":"publisher","first-page":"393","DOI":"10.1186\/1471-2164-8-393","volume":"8","author":"A Karimpour-Fard","year":"2007","unstructured":"Karimpour-Fard A, Hunter L, Gill R: Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling. BMC Genomics 2007, 8: 393. 10.1186\/1471-2164-8-393","journal-title":"BMC Genomics"},{"key":"4076_CR19","volume-title":"BMC Bioinformatics","author":"S Cokus","year":"2007","unstructured":"Cokus S, Mizutani S, Pellegrini M: An improved method for identifying functionally linked proteins using phylogenetic profiles. BMC Bioinformatics 2007., 8: 10.1186\/1471-2105-8-S4-S7"},{"key":"4076_CR20","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1093\/nar\/29.1.123","volume":"29","author":"J Peterson","year":"2001","unstructured":"Peterson J, Umayam L, Dickinson T, Hickey E, White O: The Comprehensive Microbial Resource. Nucleic Acids Research 2001, 29: 123\u20135. 10.1093\/nar\/29.1.123","journal-title":"Nucleic Acids Research"},{"key":"4076_CR21","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1186\/1471-2105-7-170","volume":"7","author":"T Lee","year":"2006","unstructured":"Lee T, Pouliot Y, Wagner V, Gupta P, Stringer-Calvert D, Tenenbaum J, Karp P: BioWarehouse: A bioinformatics database warehouse toolkit. BMC Bioinformatics 2006, 7: 170. 10.1186\/1471-2105-7-170","journal-title":"BMC Bioinformatics"},{"key":"4076_CR22","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1016\/0097-8485(93)85006-X","volume":"17","author":"JC Wootton","year":"1993","unstructured":"Wootton JC, Federhen S: Statistics of local complexity in amino acid sequences and sequence databases. Computers and Chemistry 1993, 17: 149\u2013163. 10.1016\/0097-8485(93)85006-X","journal-title":"Computers and Chemistry"},{"key":"4076_CR23","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1016\/0097-8485(93)85010-A","volume":"17","author":"JM Claverie","year":"1993","unstructured":"Claverie JM, States DJ: Information enhancement methods for large scale sequence analysis. Computers and Chemistry 1993, 17: 191\u2013201. 10.1016\/0097-8485(93)85010-A","journal-title":"Computers and Chemistry"},{"key":"4076_CR24","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1186\/1471-2105-7-177","volume":"7","author":"P Kharchenko","year":"2006","unstructured":"Kharchenko P, Chen L, Freund Y, Vitkup D, Church G: Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics 2006, 7: 177. 10.1186\/1471-2105-7-177","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"4076_CR25","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1093\/bioinformatics\/btl558","volume":"23","author":"D Barker","year":"2007","unstructured":"Barker D, Meade A, Pagel M: Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. Bioinformatics 2007, 23(1):14\u201320. 10.1093\/bioinformatics\/btl558","journal-title":"Bioinformatics"},{"key":"4076_CR26","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1007\/PL00006122","volume":"74","author":"J Tamames","year":"1997","unstructured":"Tamames J, Casari G, Ouzounis C, Valencia A: Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol 1997, 74: 66\u201373. 10.1007\/PL00006122","journal-title":"J Mol Evol"},{"issue":"5","key":"4076_CR27","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1093\/bib\/bbn019","volume":"9","author":"R Brouwer","year":"2008","unstructured":"Brouwer R, Kuipers O, van Hijum S: The relative value of operon predictions. Briefings in Bioinformatics 2008, 9(5):367\u201375. 10.1093\/bib\/bbn019","journal-title":"Briefings in Bioinformatics"},{"key":"4076_CR28","first-page":"376","volume":"0","author":"G Pandey","year":"2008","unstructured":"Pandey G, Ramakrishnan LN, Steinbach M, Kumar V: Systematic evaluation of scaling methods for gene expression data. Bioinformatics and Biomedicine, IEEE International Conference on 2008, 0: 376\u2013381. full_text","journal-title":"Bioinformatics and Biomedicine, IEEE International Conference on"},{"issue":"19","key":"4076_CR29","doi-asserted-by":"publisher","first-page":"6083","DOI":"10.1093\/nar\/gki892","volume":"33","author":"P Karp","year":"2005","unstructured":"Karp P, Ouzounis C, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahren D, Tsoka S, Darzentas N, Kunin V, Lopez-Bigas N: Expansion of the BioCyc collection of pathway\/genome databases to 160 genomes. Nucleic Acids Research 2005, 33(19):6083\u201389. 10.1093\/nar\/gki892","journal-title":"Nucleic Acids Research"},{"key":"4076_CR30","doi-asserted-by":"publisher","first-page":"D623","DOI":"10.1093\/nar\/gkm900","volume":"36","author":"R Caspi","year":"2008","unstructured":"Caspi R, Foerster H, Fulcher C, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer A, Tissier C, Walk T, Zhang P, Karp PD: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway\/Genome Databases. Nucleic Acids Research 2008, 36: D623\u201331. 10.1093\/nar\/gkm900","journal-title":"Nucleic Acids Research"},{"key":"4076_CR31","doi-asserted-by":"publisher","first-page":"D464","DOI":"10.1093\/nar\/gkn751","volume":"37","author":"I Keseler","year":"2009","unstructured":"Keseler I, Bonavides-Martinez C, Collado-Vides J, Gama-Castro S, Gunsalus R, Johnson DA, Krummenacker M, Nolan L, Paley S, Paulsen I, Peralta-Gil M, Santos-Zavaleta A, Shearer A, Karp P: EcoCyc: A comprehensive view of E. coli biology. Nucleic Acids Research 2009, 37: D464\u201370. 10.1093\/nar\/gkn751","journal-title":"Nucleic Acids Research"},{"key":"4076_CR32","doi-asserted-by":"publisher","first-page":"D473","DOI":"10.1093\/nar\/gkp875","volume":"38","author":"R Caspi","year":"2010","unstructured":"Caspi R, Altman T, Dale J, Dreher K, Fulcher C, Gilham F, Kaipa P, Karthikeyan A, Kothari A, Krummenacker M, Latendresse M, Mueller L, Paley S, Popescu L, Pujar A, Shearer A, Zhang P, Karp P: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway\/Genome Databases. Nucleic Acids Research 2010, 38: D473\u20139. 10.1093\/nar\/gkp875","journal-title":"Nucleic Acids Research"},{"key":"4076_CR33","doi-asserted-by":"publisher","first-page":"3687","DOI":"10.1093\/nar\/gkl438","volume":"34","author":"M Green","year":"2006","unstructured":"Green M, Karp P: The outcomes of pathway database computations depend on pathway ontology. Nucleic Acids Research 2006, 34: 3687\u201397. 10.1093\/nar\/gkl438","journal-title":"Nucleic Acids Research"},{"key":"4076_CR34","doi-asserted-by":"publisher","first-page":"D277","DOI":"10.1093\/nar\/gkh063","volume":"32","author":"M Kanehisa","year":"2004","unstructured":"Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Research 2004, 32: D277-D280. 10.1093\/nar\/gkh063","journal-title":"Nucleic Acids Research"},{"key":"4076_CR35","doi-asserted-by":"publisher","first-page":"e3","DOI":"10.1371\/journal.pcbi.0010003","volume":"1","author":"D Barker","year":"2005","unstructured":"Barker D, Pagel M: Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Computational Biology 2005, 1: e3. 10.1371\/journal.pcbi.0010003","journal-title":"PLoS Computational Biology"},{"key":"4076_CR36","volume-title":"Statistical Models in S. Wadsworth and BrooksCole","author":"JM Chambers","year":"1992","unstructured":"Chambers JM, Hastie TJ: Statistical Models in S. Wadsworth and BrooksCole. 1992."},{"key":"4076_CR37","volume-title":"R: A language and environment for statistical computing","author":"R Development Core Team","year":"2005","unstructured":"R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2005."},{"key":"4076_CR38","volume-title":"Tech Rep FIA-91-28, NASA Ames Research Center","author":"W Buntine","year":"1991","unstructured":"Buntine W, Caruana R: Introduction to IND and recursive partitioning. Tech Rep FIA-91\u201328, NASA Ames Research Center 1991."},{"key":"4076_CR39","unstructured":"Buntine W: IND software package.[http:\/\/opensource.arc.nasa.gov\/project\/ind\/]"},{"issue":"2","key":"4076_CR40","first-page":"123","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L: Bagging predictors. Machine Learning 1996, 24(2):123\u2013140.","journal-title":"Machine Learning"},{"key":"4076_CR41","volume-title":"Sequence - Evolution - Function: Computational Approaches in Comparative Genomics","author":"EV Koonin","year":"2002","unstructured":"Koonin EV, Galperin MY: Sequence - Evolution - Function: Computational Approaches in Comparative Genomics. Kluwer Academic; 2002."},{"key":"4076_CR42","volume-title":"Nucleic Acids Research","author":"LJ Jensen","year":"2009","unstructured":"Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C: STRING 8-a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Research 2009., 37: 10.1093\/nar\/gkn760"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-493.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,26]],"date-time":"2025-02-26T04:47:15Z","timestamp":1740545235000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-493"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,10,1]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["4076"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-493","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,10,1]]},"assertion":[{"value":"13 February 2010","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 October 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 October 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"493"}}