{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T08:41:53Z","timestamp":1771922513139,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Finding biologically causative genotype\u2013phenotype associations from whole-genome data is difficult due to the large gene feature space to mine, the potential for interactions among genes and phylogenetic correlations between genomes. Associations within phylogentically distinct organisms with unusual molecular mechanisms underlying their phenotype may be particularly difficult to assess.<\/jats:p><jats:p>Results: We have developed a new genotype\u2013phenotype association approach that uses Classification based on Predictive Association Rules (CPAR), and compare it with NETCAR, a recently published association algorithm. Our implementation of CPAR gave on average slightly higher classification accuracy, with approximately 100 time faster running times. Given the influence of phylogenetic correlations in the extraction of genotype\u2013phenotype association rules, we furthermore propose a novel measure for downweighting the dependence among samples by modeling shared ancestry using conditional mutual information, and demonstrate its complementary nature to traditional mining approaches.<\/jats:p><jats:p>Availability: Software implemented for this study is available under the Creative Commons Attribution 3.0 license from the author at http:\/\/kiwi.cs.dal.ca\/Software\/PICA<\/jats:p><jats:p>Contact: \u00a0beiko@cs.dal.ca<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq305","type":"journal-article","created":{"date-parts":[[2010,6,8]],"date-time":"2010-06-08T01:20:28Z","timestamp":1275960028000},"page":"1834-1840","source":"Crossref","is-referenced-by-count":21,"title":["Efficient learning of microbial genotype\u2013phenotype association rules"],"prefix":"10.1093","volume":"26","author":[{"given":"Norman J.","family":"MacDonald","sequence":"first","affiliation":[{"name":"Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada"}]},{"given":"Robert G.","family":"Beiko","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada"}]}],"member":"286","published-online":{"date-parts":[[2010,6,6]]},"reference":[{"key":"2023012507582496300_B1","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1145\/170035.170072","article-title":"Mining association rules between sets of items in large databases","volume-title":"SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on management of data.","author":"Agrawal","year":"1993"},{"key":"2023012507582496300_B2","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1093\/bioinformatics\/16.5.412","article-title":"Assessing the accuracy of prediction algorithms for classification: an overview","volume":"16","author":"Baldi","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012507582496300_B3","doi-asserted-by":"crossref","first-page":"14332","DOI":"10.1073\/pnas.0504068102","article-title":"Highways of gene sharing in prokaryotes","volume":"102","author":"Beiko","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507582496300_B4","doi-asserted-by":"crossref","first-page":"e1000225","DOI":"10.1371\/journal.pcbi.1000225","article-title":"Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag","volume":"4","author":"Carlson","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012507582496300_B5","author":"Chang","year":"2001","journal-title":"LIBSVM: a Library for Support Vector Machines."},{"key":"2023012507582496300_B6","doi-asserted-by":"crossref","first-page":"1550","DOI":"10.1128\/jb.127.3.1550-1557.1976","article-title":"Deoxyribonucleic acid polymerase from the extreme thermophile thermus aquaticus","volume":"127","author":"Chien","year":"1976","journal-title":"J. Bacteriol."},{"key":"2023012507582496300_B7","volume-title":"Elements of Information Theory","author":"Cover","year":"2006","edition":"2"},{"key":"2023012507582496300_B8","doi-asserted-by":"crossref","first-page":"7687","DOI":"10.1073\/pnas.122108599","article-title":"The evolutionary history of methicillin-resistant staphylococcus aureus (MRSA)","volume":"99","author":"Enright","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507582496300_B9","first-page":"1531","article-title":"Fast binary feature selection with conditional mutual information","volume":"5","author":"Fleuret","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"2023012507582496300_B10","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1089\/omi.1.1998.3.177","article-title":"Constructing multigenome views of whole microbial genomes","volume":"3","author":"Gaasterland","year":"1998","journal-title":"Microb. Comp. Genomics"},{"key":"2023012507582496300_B11","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1186\/1471-2164-7-257","article-title":"Integration of curated databases to identify genotype-phenotype associations","volume":"7","author":"Goh","year":"2006","journal-title":"BMC Genomics"},{"key":"2023012507582496300_B12","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"2023012507582496300_B13","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene selection for cancer classification using support vector machines","volume":"46","author":"Guyon","year":"2002","journal-title":"Mach. Learn."},{"key":"2023012507582496300_B14","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198546412.001.0001","volume-title":"The Comparative Method in Evolutionary Biology.","author":"Harvey","year":"1991"},{"key":"2023012507582496300_B15","doi-asserted-by":"crossref","first-page":"D250","DOI":"10.1093\/nar\/gkm796","article-title":"eggNOG: automated construction and annotation of orthologous groups of genes","volume":"36","author":"Jensen","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012507582496300_B16","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkn760","article-title":"STRING 8\u2013a global view on proteins and their functional interactions in 630 organisms","volume":"37","author":"Jensen","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012507582496300_B17","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1101\/gr.1586704","article-title":"A cross-genomic approach for systematic mapping of phenotypic traits to genes","volume":"14","author":"Jim","year":"2004","journal-title":"Genome Res."},{"key":"2023012507582496300_B18","doi-asserted-by":"crossref","first-page":"R28","DOI":"10.1186\/gb-2009-10-3-r28","article-title":"Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes","volume":"10","author":"Kastenm\u00fcller","year":"2009","journal-title":"Genome Biol."},{"key":"2023012507582496300_B19","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1016\/S0960-9822(03)00009-5","article-title":"Trait-to-gene: a computational method for predicting the function of uncharacterized genes","volume":"13","author":"Levesque","year":"2003","journal-title":"Curr. Biol."},{"key":"2023012507582496300_B20","first-page":"80","article-title":"Integrating classification and association rule mining","volume-title":"Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining","author":"Liu","year":"1998"},{"key":"2023012507582496300_B21","doi-asserted-by":"crossref","first-page":"e159","DOI":"10.1371\/journal.pcbi.0020159","article-title":"An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits","volume":"2","author":"Liu","year":"2006","journal-title":"PLoS Comput. Biol."},{"key":"2023012507582496300_B22","doi-asserted-by":"crossref","first-page":"482","DOI":"10.1093\/nar\/30.2.482","article-title":"A DNA repair system specific for thermophilic archaea and bacteria predicted by genomic context analysis","volume":"30","author":"Makarova","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012507582496300_B23","doi-asserted-by":"crossref","first-page":"D528","DOI":"10.1093\/nar\/gkm846","article-title":"The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions","volume":"36","author":"Markowitz","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012507582496300_B24","doi-asserted-by":"crossref","first-page":"991","DOI":"10.1101\/gr.678303","article-title":"Comparing bacterial genomes through conservation profiles","volume":"13","author":"Martin","year":"2003","journal-title":"Genome Res."},{"key":"2023012507582496300_B25","doi-asserted-by":"crossref","first-page":"12146","DOI":"10.1073\/pnas.0700687104","article-title":"Deep-sea vent epsilon-proteobacterial genomes provide insights into emergence of pathogens","volume":"104","author":"Nakagawa","year":"2007","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507582496300_B26","doi-asserted-by":"crossref","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","article-title":"Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles","volume":"96","author":"Pellegrini","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507582496300_B27","first-page":"3","article-title":"FOIL: a midterm report","volume-title":"Proceedings of the 1993 European Conference on Machine Learning","author":"Quinlan","year":"1993"},{"key":"2023012507582496300_B28","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1126\/science.1075558","article-title":"Whole-genome analysis of photosynthetic prokaryotes","volume":"298","author":"Raymond","year":"2002","journal-title":"Science"},{"key":"2023012507582496300_B29","doi-asserted-by":"crossref","first-page":"2006.0005","DOI":"10.1038\/msb4100047","article-title":"Ab initio genotype-phenotype association reveals intrinsic modularity in genetic networks","volume":"2","author":"Slonim","year":"2006","journal-title":"Mol. Syst. Biol."},{"key":"2023012507582496300_B30","doi-asserted-by":"crossref","first-page":"1523","DOI":"10.1093\/bioinformatics\/btn210","article-title":"Microbial genotype-phenotype mapping by class association rule mining","volume":"24","author":"Tamura","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012507582496300_B31","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1126\/science.278.5338.631","article-title":"A genomic perspective on protein families","volume":"278","author":"Tatusov","year":"1997","journal-title":"Science"},{"key":"2023012507582496300_B32","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1471-2105-4-41","article-title":"The COG database: an updated version includes eukaryotes","volume":"4","author":"Tatusov","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023012507582496300_B33","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1145\/1031171.1031241","article-title":"Feature selection with conditional mutual information maximin in text categorization","volume-title":"Proceedings of the thirteenth ACM international conference on information and knowledge management.","author":"Wang","year":"2004"},{"key":"2023012507582496300_B34","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.compbiolchem.2004.11.001","article-title":"Gene selection from microarray data for cancer classification-a machine learning approach","volume":"29","author":"Wang","year":"2005","journal-title":"Comput. Biol. Chem."},{"key":"2023012507582496300_B35","doi-asserted-by":"crossref","DOI":"10.1137\/1.9781611972733.40","article-title":"CPAR: Classification based on predictive association rules","volume-title":"Proceedings of the Third SIAM International Conference on Data Mining.","author":"Yin","year":"2003"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/15\/1834\/48852841\/bioinformatics_26_15_1834.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/15\/1834\/48852841\/bioinformatics_26_15_1834.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,27]],"date-time":"2024-03-27T11:11:26Z","timestamp":1711537886000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/15\/1834\/189538"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,6,6]]},"references-count":35,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2010,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq305","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,8,1]]},"published":{"date-parts":[[2010,6,6]]}}}