{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T03:32:30Z","timestamp":1767929550878,"version":"3.49.0"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["RGPIN-2019-04266"],"award-info":[{"award-number":["RGPIN-2019-04266"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Discovery Accelerator Supplement","award":["RGPAS-2019-00004"],"award-info":[{"award-number":["RGPAS-2019-00004"]}]},{"DOI":"10.13039\/501100004490","name":"University of Waterloo","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004490","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,4,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Statistical detection of co-occurring genes across genomes, known as \u2018phylogenetic profiling\u2019, is a powerful bioinformatic technique for inferring gene\u2013gene functional associations. However, this can be a challenging task given the size and complexity of phylogenomic databases, difficulty in accounting for phylogenetic structure, inconsistencies in genome annotation and substantial computational requirements.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce PhyloCorrelate\u2014a computational framework for gene co-occurrence analysis across large phylogenomic datasets. PhyloCorrelate implements a variety of co-occurrence metrics including standard correlation metrics and model-based metrics that account for phylogenetic history. By combining multiple metrics, we developed an optimized score that exhibits a superior ability to link genes with overlapping GO terms and KEGG pathways, enabling gene function prediction. Using genomic and functional annotation data from the Genome Taxonomy Database and AnnoTree, we performed all-by-all comparisons of gene occurrence profiles across the bacterial tree of life, totaling 154\u00a0217\u00a0052 comparisons for 28\u00a0315 genes across 27\u00a0372 bacterial genomes. All predictions are available in an online database, which instantaneously returns the top correlated genes for any PFAM, TIGRFAM or KEGG query. In total, PhyloCorrelate detected 29\u00a0762 high confidence associations between bacterial gene\/protein pairs, and generated functional predictions for 834 DUFs and proteins of unknown function.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availabilityand implementation<\/jats:title>\n                  <jats:p>PhyloCorrelate is available as a web-server at phylocorrelate.uwaterloo.ca as well as an R package for analysis of custom datasets. We anticipate that PhyloCorrelate will be broadly useful as a tool for predicting function and interactions for gene families.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa1105","type":"journal-article","created":{"date-parts":[[2020,12,29]],"date-time":"2020-12-29T20:13:07Z","timestamp":1609272787000},"page":"17-22","source":"Crossref","is-referenced-by-count":23,"title":["PhyloCorrelate: inferring bacterial gene\u2013gene functional associations through large-scale phylogenetic profiling"],"prefix":"10.1093","volume":"37","author":[{"given":"Benjamin J -M","family":"Tremblay","sequence":"first","affiliation":[{"name":"Department of Biology , University of Waterloo, Waterloo, ON N2L 3G1, Canada"}]},{"given":"Briallen","family":"Lobb","sequence":"additional","affiliation":[{"name":"Department of Biology , University of Waterloo, Waterloo, ON N2L 3G1, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2015-099X","authenticated-orcid":false,"given":"Andrew C","family":"Doxey","sequence":"additional","affiliation":[{"name":"Department of Biology , University of Waterloo, Waterloo, ON N2L 3G1, Canada"}]}],"member":"286","published-online":{"date-parts":[[2021,1,8]]},"reference":[{"key":"2023051510472381000_btaa1105-B1","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1093\/bioinformatics\/btl558","article-title":"Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes","volume":"23","author":"Barker","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051510472381000_btaa1105-B2","doi-asserted-by":"crossref","first-page":"e3","DOI":"10.1371\/journal.pcbi.0010003","article-title":"Predicting functional gene links from phylogenetic-statistical analyses of whole genomes","volume":"1","author":"Barker","year":"2005","journal-title":"PLoS Comput. Biol"},{"key":"2023051510472381000_btaa1105-B3","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1186\/1471-2105-9-551","article-title":"Analysis of plasmid genes by phylogenetic profiling and visualization of homology relationships using Blast2Network","volume":"9","author":"Brilli","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023051510472381000_btaa1105-B4","doi-asserted-by":"crossref","first-page":"S7","DOI":"10.1186\/1471-2105-8-S4-S7","article-title":"An improved method for identifying functionally linked proteins using phylogenetic profiles","volume":"8","author":"Cokus","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023051510472381000_btaa1105-B5","doi-asserted-by":"crossref","first-page":"1910","DOI":"10.1093\/bioinformatics\/btq315","article-title":"Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood","volume":"26","author":"Csur\u00f6s","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051510472381000_btaa1105-B6","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1016\/j.cels.2015.08.006","article-title":"Phylogenetic profiling for probing the modular architecture of the human genome","volume":"1","author":"Dey","year":"2015","journal-title":"Cell Syst"},{"key":"2023051510472381000_btaa1105-B7","doi-asserted-by":"crossref","first-page":"R32","DOI":"10.1186\/2004-5-5-r32","article-title":"Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns","volume":"5","author":"Glazko","year":"2004","journal-title":"Genome Biol"},{"key":"2023051510472381000_btaa1105-B8","doi-asserted-by":"crossref","first-page":"1204","DOI":"10.1101\/gr.10.8.1204","article-title":"Predicting protein function by genomic context: quantitative evaluation and qualitative inferences","volume":"10","author":"Huynen","year":"2000","journal-title":"Genome Res"},{"key":"2023051510472381000_btaa1105-B9","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1098\/rsif.2007.1047","article-title":"Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution","volume":"5","author":"Kensche","year":"2008","journal-title":"J. R. Soc. Interface"},{"key":"2023051510472381000_btaa1105-B10","doi-asserted-by":"crossref","first-page":"e1002340","DOI":"10.1371\/journal.pcbi.1002340","article-title":"Genetic co-occurrence network across sequenced microbes","volume":"7","author":"Kim","year":"2011","journal-title":"PLoS Comput. Biol"},{"key":"2023051510472381000_btaa1105-B11","doi-asserted-by":"crossref","first-page":"2255","DOI":"10.1093\/gbe\/evy178","article-title":"Phylogenetic clustering of genes reveals shared evolutionary trajectories and putative gene functions","volume":"10","author":"Liu","year":"2018","journal-title":"Genome Biol. Evol"},{"key":"2023051510472381000_btaa1105-B12","doi-asserted-by":"crossref","first-page":"e000341","DOI":"10.1099\/mgen.0.000341","article-title":"An assessment of genome annotation coverage across the bacterial tree of life","volume":"6","author":"Lobb","year":"2020","journal-title":"Microb. Genomics"},{"key":"2023051510472381000_btaa1105-B13","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.sbi.2016.05.017","article-title":"Novel function discovery through sequence and structural data mining","volume":"38","author":"Lobb","year":"2016","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023051510472381000_btaa1105-B14","doi-asserted-by":"crossref","first-page":"4442","DOI":"10.1093\/nar\/gkz246","article-title":"AnnoTree: visualization and exploration of a functionally annotated microbial tree of life","volume":"47","author":"Mendler","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051510472381000_btaa1105-B15","doi-asserted-by":"crossref","first-page":"e3712","DOI":"10.7717\/peerj.3712","article-title":"PrePhyloPro: phylogenetic profile-based prediction of whole proteome linkages","volume":"5","author":"Niu","year":"2017","journal-title":"PeerJ"},{"key":"2023051510472381000_btaa1105-B16","doi-asserted-by":"crossref","first-page":"1331","DOI":"10.1016\/j.jmb.2004.10.019","article-title":"A domain interaction map based on phylogenetic profiling","volume":"344","author":"Pagel","year":"2004","journal-title":"J. Mol. Biol"},{"key":"2023051510472381000_btaa1105-B17","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1038\/nbt.4229","article-title":"A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life","volume":"36","author":"Parks","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023051510472381000_btaa1105-B18","doi-asserted-by":"crossref","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","article-title":"Assigning protein functions by comparative genome analysis: protein phylogenetic profiles","volume":"96","author":"Pellegrini","year":"1999","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051510472381000_btaa1105-B19","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1038\/nchembio768","article-title":"Completing the uric acid degradation pathway through phylogenetic comparison of whole genomes","volume":"2","author":"Ramazzina","year":"2006","journal-title":"Nat. Chem. Biol"},{"key":"2023051510472381000_btaa1105-B20","doi-asserted-by":"crossref","first-page":"D380","DOI":"10.1093\/nar\/gkt984","article-title":"FunCoup 3.0: database of genome-wide functional coupling networks","volume":"42","author":"Schmitt","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051510472381000_btaa1105-B21","doi-asserted-by":"crossref","first-page":"D447","DOI":"10.1093\/nar\/gku1003","article-title":"STRING v10: protein-protein interaction networks, integrated over the tree of life","volume":"43","author":"Szklarczyk","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023051510472381000_btaa1105-B22","doi-asserted-by":"crossref","first-page":"S276","DOI":"10.1093\/bioinformatics\/18.suppl_1.S276","article-title":"A tree kernel to analyse phylogenetic profiles","volume":"18","author":"Vert","year":"2002","journal-title":"Bioinformatics"},{"key":"2023051510472381000_btaa1105-B24","doi-asserted-by":"crossref","first-page":"e0205749","DOI":"10.1371\/journal.pone.0205749","article-title":"A most wanted list of conserved microbial protein families with no known domains","volume":"13","author":"Wyman","year":"2018","journal-title":"PLoS One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa1105\/35925999\/btaa1105.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/1\/17\/50321435\/btaa1105.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/1\/17\/50321435\/btaa1105.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T10:48:58Z","timestamp":1684147738000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/1\/17\/6070124"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1,1]]},"references-count":23,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,4,9]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa1105","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,1,1]]},"published":{"date-parts":[[2021,1,1]]}}}