{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T04:18:23Z","timestamp":1773289103836,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Orthologs (genes that have diverged after a speciation event) tend to have similar function, and so their prediction has become an important component of comparative genomics and genome annotation. The gold standard phylogenetic analysis approach of comparing available organismal phylogeny to gene phylogeny is not easily automated for genome-wide analysis; therefore, ortholog prediction for large genome-scale datasets is typically performed using a reciprocal-best-BLAST-hits (RBH) approach. One problem with RBH is that it will incorrectly predict a paralog as an ortholog when incomplete genome sequences or gene loss is involved. In addition, there is an increasing interest in identifying orthologs most likely to have retained similar function.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>To address these issues, we present here a high-throughput computational method named Ortholuge that further evaluates previously predicted orthologs (including those predicted using an RBH-based approach) \u2013 identifying which orthologs most closely reflect species divergence and may more likely have similar function. Ortholuge analyzes phylogenetic distance ratios involving two comparison species and an outgroup species, noting cases where relative gene divergence is atypical. It also identifies some cases of gene duplication after species divergence. Through simulations of incomplete genome data\/gene loss, we show that the vast majority of genes falsely predicted as orthologs by an RBH-based method can be identified. Ortholuge was then used to estimate the number of false-positives (predominantly paralogs) in selected RBH-predicted ortholog datasets, identifying approximately 10% paralogs in a eukaryotic data set (mouse-rat comparison) and 5% in a bacterial data set (<jats:italic>Pseudomonas<\/jats:italic> putida \u2013 <jats:italic>Pseudomonas syringae<\/jats:italic> species comparison). Higher quality (more precise) datasets of orthologs, which we term \"ssd-orthologs\" (<jats:underline>s<\/jats:underline> upporting-<jats:underline>s<\/jats:underline> pecies-<jats:underline>d<\/jats:underline> ivergence-orthologs), were also constructed. These datasets, as well as Ortholuge software that may be used to characterize other species' datasets, are available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/www.pathogenomics.ca\/ortholuge\/\" ext-link-type=\"uri\">http:\/\/www.pathogenomics.ca\/ortholuge\/<\/jats:ext-link> (software under GNU General Public License).<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The Ortholuge method reported here appears to significantly improve the specificity (precision) of high-throughput ortholog prediction for both bacterial and eukaryotic species. This method, and its associated software, will aid those performing various comparative genomics-based analyses, such as the prediction of conserved regulatory elements upstream of orthologous genes.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-270","type":"journal-article","created":{"date-parts":[[2006,5,30]],"date-time":"2006-05-30T06:27:32Z","timestamp":1148970452000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":74,"title":["Improving the specificity of high-throughput ortholog prediction"],"prefix":"10.1186","volume":"7","author":[{"given":"Debra L","family":"Fulton","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yvonne Y","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthew R","family":"Laird","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Benjamin GS","family":"Horsman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fiona M","family":"Roche","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fiona SL","family":"Brinkman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2006,5,28]]},"reference":[{"key":"1009_CR1","doi-asserted-by":"publisher","first-page":"99","DOI":"10.2307\/2412448","volume":"19","author":"WM Fitch","year":"1970","unstructured":"Fitch WM: Distinguishing homologous from analogous proteins. Syst Zool 1970, 19: 99\u2013113. 10.2307\/2412448","journal-title":"Syst Zool"},{"key":"1009_CR2","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403\u2013410. 10.1006\/jmbi.1990.9999","journal-title":"J Mol Biol"},{"key":"1009_CR3","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1186\/1471-2105-4-41","volume":"4","author":"RL Tatusov","year":"2003","unstructured":"Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: An updated version includes eukaryotes. BMC Bioinformatics 2003, 4: 41. 10.1186\/1471-2105-4-41","journal-title":"BMC Bioinformatics"},{"key":"1009_CR4","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1093\/nar\/29.1.22","volume":"29","author":"RL Tatusov","year":"2001","unstructured":"Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 2001, 29: 22\u201328. 10.1093\/nar\/29.1.22","journal-title":"Nucleic Acids Res"},{"key":"1009_CR5","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1101\/gr.212002","volume":"12","author":"Y Lee","year":"2002","unstructured":"Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J: Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res 2002, 12: 493\u2013502. 10.1101\/gr.212002","journal-title":"Genome Res"},{"key":"1009_CR6","doi-asserted-by":"publisher","first-page":"1041","DOI":"10.1006\/jmbi.2000.5197","volume":"314","author":"M Remm","year":"2001","unstructured":"Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 2001, 314: 1041\u20131052. 10.1006\/jmbi.2000.5197","journal-title":"J Mol Biol"},{"key":"1009_CR7","doi-asserted-by":"publisher","first-page":"D476","DOI":"10.1093\/nar\/gki107","volume":"33","author":"KP O'Brien","year":"2005","unstructured":"O'Brien KP, Remm M, Sonnhammer EL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 2005, 33: D476\u2013480. 10.1093\/nar\/gki107","journal-title":"Nucleic Acids Res"},{"key":"1009_CR8","doi-asserted-by":"publisher","first-page":"1589","DOI":"10.1101\/gr.1092603","volume":"13","author":"V Kunin","year":"2003","unstructured":"Kunin V, Ouzounis CA: The balance of driving forces during genome evolution in prokaryotes. Genome Res 2003, 13: 1589\u20131594. 10.1101\/gr.1092603","journal-title":"Genome Res"},{"key":"1009_CR9","doi-asserted-by":"publisher","first-page":"R56","DOI":"10.1186\/gb-2003-4-9-r56","volume":"4","author":"P Zhang","year":"2003","unstructured":"Zhang P, Gu Z, Li WH: Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol 2003, 4: R56. 10.1186\/gb-2003-4-9-r56","journal-title":"Genome Biol"},{"key":"1009_CR10","doi-asserted-by":"publisher","first-page":"1453","DOI":"10.1126\/science.277.5331.1453","volume":"277","author":"FR Blattner","year":"1997","unstructured":"Blattner FR, Plunkett G 3rd, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y: The complete genome sequence of escherichia coli K-12. Science 1997, 277: 1453\u20131474. 10.1126\/science.277.5331.1453","journal-title":"Science"},{"key":"1009_CR11","doi-asserted-by":"publisher","first-page":"10181","DOI":"10.1073\/pnas.1731982100","volume":"100","author":"CR Buell","year":"2003","unstructured":"Buell CR, Joardar V, Lindeberg M, Selengut J, Paulsen IT, Gwinn ML, Dodson RJ, Deboy RT, Durkin AS, Kolonay JF, Madupu R, Daugherty S, Brinkac L, Beanan MJ, Haft DH, Nelson WC, Davidsen T, Zafar N, Zhou L, Liu J, Yuan Q, Khouri H, Fedorova N, Tran B, Russell D, Berry K, Utterback T, Van Aken SE, Feldblyum TV, D'Ascenzo M, Deng WL, Ramos AR, Alfano JR, Cartinhour S, Chatterjee AK, Delaney TP, Lazarowitz SG, Martin GB, Schneider DJ, Tang X, Bender CL, White O, Fraser CM, Collmer A: The complete genome sequence of the arabidopsis and tomato pathogen pseudomonas syringae pv. tomato DC3000. Proc Natl Acad Sci U S A 2003, 100: 10181\u201310186. 10.1073\/pnas.1731982100","journal-title":"Proc Natl Acad Sci U S A"},{"key":"1009_CR12","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1046\/j.1462-2920.2002.00366.x","volume":"4","author":"KE Nelson","year":"2002","unstructured":"Nelson KE, Weinel C, Paulsen IT, Dodson RJ, Hilbert H, Martins dos Santos VA, Fouts DE, Gill SR, Pop M, Holmes M, Brinkac L, Beanan M, DeBoy RT, Daugherty S, Kolonay J, Madupu R, Nelson W, White O, Peterson J, Khouri H, Hance I, Chris Lee P, Holtzapple E, Scanlan D, Tran K, Moazzez A, Utterback T, Rizzo M, Lee K, Kosack D, Moestl D, Wedler H, Lauber J, Stjepandic D, Hoheisel J, Straetz M, Heim S, Kiewitz C, Eisen JA, Timmis KN, Dusterhoft A, Tummler B, Fraser CM: Complete genome sequence and comparative analysis of the metabolically versatile pseudomonas putida KT2440. Environ Microbiol 2002, 4: 799\u2013808. 10.1046\/j.1462-2920.2002.00366.x","journal-title":"Environ Microbiol"},{"key":"1009_CR13","doi-asserted-by":"publisher","first-page":"703","DOI":"10.1093\/bioinformatics\/bti045","volume":"21","author":"XH Zheng","year":"2005","unstructured":"Zheng XH, Lu F, Wang ZY, Zhong F, Hoover J, Mural R: Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics 2005, 21: 703\u2013710. 10.1093\/bioinformatics\/bti045","journal-title":"Bioinformatics"},{"key":"1009_CR14","doi-asserted-by":"publisher","first-page":"1530","DOI":"10.1101\/gr.2662504","volume":"14","author":"CI Castillo-Davis","year":"2004","unstructured":"Castillo-Davis CI, Hartl DL, Achaz G: Cis-regulatory and protein evolution in orthologous and duplicate genes. Genome Res 2004, 14: 1530\u20131536. 10.1101\/gr.2662504","journal-title":"Genome Res"},{"key":"1009_CR15","volume-title":"Genome Biol","author":"RA Jensen","year":"2001","unstructured":"Jensen RA: Orthologs and paralogs \u2013 we need to get it right. Genome Biol 2001., 2: INTERACTIONS1002 INTERACTIONS1002"},{"key":"1009_CR16","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1016\/S0168-9525(00)02005-9","volume":"16","author":"WM Fitch","year":"2000","unstructured":"Fitch WM: Homology a personal view on some of the problems. Trends Genet 2000, 16: 227\u2013231. 10.1016\/S0168-9525(00)02005-9","journal-title":"Trends Genet"},{"issue":"Suppl 1","key":"1009_CR17","doi-asserted-by":"publisher","first-page":"i54","DOI":"10.1093\/bioinformatics\/btg1005","volume":"19","author":"M Brudno","year":"2003","unstructured":"Brudno M, Malde S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S: Glocal alignment: Finding rearrangements during alignment. Bioinformatics 2003, 19(Suppl 1):i54\u201362. 10.1093\/bioinformatics\/btg1005","journal-title":"Bioinformatics"},{"key":"1009_CR18","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1007\/s002390010184","volume":"52","author":"LB Koski","year":"2001","unstructured":"Koski LB, Golding GB: The closest BLAST hit is often not the nearest neighbor. J Mol Evol 2001, 52: 540\u2013542.","journal-title":"J Mol Evol"},{"key":"1009_CR19","doi-asserted-by":"publisher","first-page":"D471","DOI":"10.1093\/nar\/gki113","volume":"33","author":"JT Eppig","year":"2005","unstructured":"Eppig JT, Bult CJ, Kadin JA, Richardson JE, Blake JA, Anagnostopoulos A, Baldarelli RM, Baya M, Beal JS, Bello SM, Boddy WJ, Bradt DW, Burkart DL, Butler NE, Campbell J, Cassell MA, Corbani LE, Cousins SL, Dahmen DJ, Dene H, Diehl AD, Drabkin HJ, Frazer KS, Frost P, Glass LH, Goldsmith CW, Grant PL, Lennon-Pierce M, Lewis J, Lu I, Maltais LJ, McAndrews-Hill M, McClellan L, Miers DB, Miller LA, Ni L, Ormsby JE, Qi D, Reddy TB, Reed DJ, Richards-Smith B, Shaw DR, Sinclair R, Smith CL, Szauter P, Walker MB, Walton DO, Washburn LL, Witham IT, Zhu Y, Mouse Genome Database Group: The Mouse Genome Database (MGD): from genes to mice \u2013 a community resource for mouse biology. Nucleic Acids Res 2005, 33: D471\u2013475. 10.1093\/nar\/gki113","journal-title":"Nucleic Acids Res"},{"key":"1009_CR20","doi-asserted-by":"publisher","first-page":"D501","DOI":"10.1093\/nar\/gki025","volume":"33","author":"KD Pruitt","year":"2005","unstructured":"Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2005, 33: D501\u2013504. 10.1093\/nar\/gki025","journal-title":"Nucleic Acids Res"},{"key":"1009_CR21","doi-asserted-by":"publisher","first-page":"E19","DOI":"10.1371\/journal.pbio.0000019","volume":"1","author":"E Lerat","year":"2003","unstructured":"Lerat E, Daubin V, Moran NA: From gene trees to organismal phylogeny in prokaryotes: The case of the gamma-proteobacteria. PLoS Biol 2003, 1: E19. 10.1371\/journal.pbio.0000019","journal-title":"PLoS Biol"},{"key":"1009_CR22","doi-asserted-by":"publisher","first-page":"D363","DOI":"10.1093\/nar\/gkj123","volume":"34","author":"F Chen","year":"2006","unstructured":"Chen F, Mackey AJ, Stoeckert CJ Jr, Roos DS: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 2006, 34: D363\u2013368. 10.1093\/nar\/gkj123","journal-title":"Nucleic Acids Res"},{"key":"1009_CR23","doi-asserted-by":"publisher","first-page":"3497","DOI":"10.1093\/nar\/gkg500","volume":"31","author":"R Chenna","year":"2003","unstructured":"Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res 2003, 31: 3497\u20133500. 10.1093\/nar\/gkg500","journal-title":"Nucleic Acids Res"},{"key":"1009_CR24","doi-asserted-by":"publisher","first-page":"1159","DOI":"10.1101\/gr.341802","volume":"12","author":"FS Brinkman","year":"2002","unstructured":"Brinkman FS, Blanchard JL, Cherkasov A, Av-Gay Y, Brunham RC, Fernandez RC, Finlay BB, Otto SP, Ouellette BF, Keeling PJ, Rose AM, Hancock RE, Jones SJ, Greberg H: Evidence that plant-like genes in chlamydia species reflect an ancestral relationship between chlamydiaceae, cyanobacteria, and the chloroplast. Genome Res 2002, 12: 1159\u20131167. 10.1101\/gr.341802","journal-title":"Genome Res"},{"key":"1009_CR25","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1016\/S0168-9525(00)02024-2","volume":"16","author":"P Rice","year":"2000","unstructured":"Rice P, Longden I, Bleasby A: EMBOSS: The european molecular biology open software suite. Trends Genet 2000, 16: 276\u2013277. 10.1016\/S0168-9525(00)02024-2","journal-title":"Trends Genet"},{"key":"1009_CR26","first-page":"164","volume":"5","author":"J Felsenstein","year":"1989","unstructured":"Felsenstein J: PHYLIP-phylogeny inference package. Cladistics 1989, 5: 164\u2013166.","journal-title":"Cladistics"},{"key":"1009_CR27","doi-asserted-by":"publisher","first-page":"13994","DOI":"10.1073\/pnas.0404142101","volume":"101","author":"DG Hwang","year":"2004","unstructured":"Hwang DG, Green P: Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci U S A 2004, 101: 13994\u201314001. 10.1073\/pnas.0404142101","journal-title":"Proc Natl Acad Sci U S A"},{"key":"1009_CR28","unstructured":"Ortholuge[http:\/\/www.pathogenomics.ca\/ortholuge\/]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-270.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:03:55Z","timestamp":1630494235000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-270"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,5,28]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["1009"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-270","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,5,28]]},"assertion":[{"value":"3 October 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 May 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 May 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"270"}}