{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T17:40:07Z","timestamp":1716572407216},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2014,2,25]],"date-time":"2014-02-25T00:00:00Z","timestamp":1393286400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,6,15]]},"abstract":"<jats:p>Motivation: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar datasets.<\/jats:p><jats:p>Results: We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled datasets. As shown using simulated gene sets with simulated data and Molecular Signatures Database collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results.<\/jats:p><jats:p>Availability and implementation: \u00a0http:\/\/cran.r-project.org\/web\/packages\/EMVC\/index.html.<\/jats:p><jats:p>Contact: \u00a0jason.h.moore@dartmouth.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary Data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu110","type":"journal-article","created":{"date-parts":[[2014,2,27]],"date-time":"2014-02-27T01:26:41Z","timestamp":1393464401000},"page":"1698-1706","source":"Crossref","is-referenced-by-count":9,"title":["Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)"],"prefix":"10.1093","volume":"30","author":[{"given":"H. Robert","family":"Frost","sequence":"first","affiliation":[{"name":"1Departments of Genetics and Community and Family Medicine, Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, USA"}]},{"given":"Jason H.","family":"Moore","sequence":"additional","affiliation":[{"name":"1Departments of Genetics and Community and Family Medicine, Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,2,25]]},"reference":[{"key":"2023012711064203300_btu110-B1","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1038\/nbt0210-128","article-title":"Ontology engineering","volume":"28","author":"Alterovitz","year":"2010","journal-title":"Nat. Biotechnol."},{"key":"2023012711064203300_btu110-B2","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/ng765","article-title":"Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia","volume":"30","author":"Armstrong","year":"2002","journal-title":"Nat. Genet."},{"key":"2023012711064203300_btu110-B3","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012711064203300_btu110-B4","doi-asserted-by":"crossref","first-page":"i562","DOI":"10.1093\/bioinformatics\/bts372","article-title":"An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB","volume":"28","author":"Bell","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012711064203300_btu110-B5","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. B Methodol."},{"key":"2023012711064203300_btu110-B6","doi-asserted-by":"crossref","first-page":"3045","DOI":"10.1093\/bioinformatics\/btp536","article-title":"Quickgo: a web-based tool for gene ontology searching","volume":"25","author":"Binns","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012711064203300_btu110-B7","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn."},{"key":"2023012711064203300_btu110-B8","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1093\/bioinformatics\/btn615","article-title":"Amigo: online access to ontology and annotation data","volume":"25","author":"Carbon","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012711064203300_btu110-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v050.i13","article-title":"ClustOfVar: an R package for the clustering of variables","volume":"50","author":"Chavent","year":"2012","journal-title":"J. Stat. Softw."},{"key":"2023012711064203300_btu110-B10","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1038\/ng0502-19","article-title":"Genmapp, a new tool for viewing and analyzing microarray data on biological pathways","volume":"31","author":"Dahlquist","year":"2002","journal-title":"Nat. Genet."},{"key":"2023012711064203300_btu110-B11","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1186\/1471-2105-11-498","article-title":"Automatic, context-specific generation of gene ontology slims","volume":"11","author":"Davis","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012711064203300_btu110-B12","doi-asserted-by":"crossref","first-page":"i136","DOI":"10.1093\/bioinformatics\/bti1019","article-title":"A procedure for assessing go annotation consistency","volume":"21","author":"Dolan","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012711064203300_btu110-B13","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1093\/bib\/bbr002","article-title":"The what, where, how and why of gene ontology\u2013a primer for bioinformaticians","volume":"12","author":"du Plessis","year":"2011","journal-title":"Brief. Bioinform."},{"key":"2023012711064203300_btu110-B14","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1214\/07-AOAS101","article-title":"On testing the significance of sets of genes","volume":"1","author":"Efron","year":"2007","journal-title":"Ann. Appl. Stat."},{"key":"2023012711064203300_btu110-B15","doi-asserted-by":"crossref","first-page":"e40519","DOI":"10.1371\/journal.pone.0040519","article-title":"Mining go annotations for improving annotation consistency","volume":"7","author":"Faria","year":"2012","journal-title":"PLoS One"},{"key":"2023012711064203300_btu110-B16","doi-asserted-by":"crossref","DOI":"10.1186\/gb-2000-1-2-research0003","article-title":"\u2018Gene shaving\u2019 as a method for identifying distinct sets of genes with similar expression patterns","volume":"1","author":"Hastie","year":"2000","journal-title":"Genome Biol."},{"key":"2023012711064203300_btu110-B17","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics","author":"Hastie","year":"2009","edition":"2nd edn"},{"key":"2023012711064203300_btu110-B18","first-page":"1469","article-title":"Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks","volume":"10","author":"Hausser","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"2023012711064203300_btu110-B19","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1093\/bib\/bbr049","article-title":"Gene set enrichment analysis: performance evaluation and usage guidelines","volume":"13","author":"Hung","year":"2012","journal-title":"Brief. Bioinform."},{"key":"2023012711064203300_btu110-B20","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1186\/gb-2009-10-2-206","article-title":"Sequence-based feature prediction and annotation of proteins","volume":"10","author":"Juncker","year":"2009","journal-title":"Genome Biol."},{"key":"2023012711064203300_btu110-B21","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: Kyoto Encyclopedia of Genes and Genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012711064203300_btu110-B22","article-title":"Finding Groups in Data: An Introduction to Cluster Analysis","author":"Kaufman","year":"2005","journal-title":"Wiley"},{"key":"2023012711064203300_btu110-B23","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1214\/aoms\/1177732186","article-title":"The problem of m rankings","volume":"10","author":"Kendall","year":"1939","journal-title":"Ann. Math. Stat."},{"key":"2023012711064203300_btu110-B24","doi-asserted-by":"crossref","first-page":"e1002375","DOI":"10.1371\/journal.pcbi.1002375","article-title":"Ten years of pathway analysis: current approaches and outstanding challenges","volume":"8","author":"Khatri","year":"2012","journal-title":"PLoS Comput. Biol."},{"key":"2023012711064203300_btu110-B25","doi-asserted-by":"crossref","first-page":"1739","DOI":"10.1093\/bioinformatics\/btr260","article-title":"Molecular Signatures Database (MSigDb) 3.0","volume":"27","author":"Liberzon","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012711064203300_btu110-B26","doi-asserted-by":"crossref","first-page":"S40","DOI":"10.1186\/1471-2105-12-S1-S40","article-title":"Gochase-ii: correcting semantic inconsistencies from gene ontology-based annotations for gene products","volume":"12","author":"Park","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012711064203300_btu110-B27","doi-asserted-by":"crossref","first-page":"1518","DOI":"10.1126\/science.1205438","article-title":"Detecting novel associations in large data sets","volume":"334","author":"Reshef","year":"2011","journal-title":"Science"},{"key":"2023012711064203300_btu110-B28","doi-asserted-by":"crossref","first-page":"e1000605","DOI":"10.1371\/journal.pcbi.1000605","article-title":"Annotation error in public databases: misannotation of molecular function in enzyme superfamilies","volume":"5","author":"Schnoes","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012711064203300_btu110-B29","doi-asserted-by":"crossref","first-page":"1090","DOI":"10.1038\/ng1434","article-title":"A module map showing conditional activity of expression modules in cancer","volume":"36","author":"Segal","year":"2004","journal-title":"Nat. Genet."},{"key":"2023012711064203300_btu110-B30","doi-asserted-by":"crossref","first-page":"e1002533","DOI":"10.1371\/journal.pcbi.1002533","article-title":"Quality of computationally inferred gene ontology annotations","volume":"8","author":"Skunca","year":"2012","journal-title":"PLoS Comput. Biol."},{"key":"2023012711064203300_btu110-B31","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1007\/0-387-29362-0_23","article-title":"Limma: linear models for microarray data","volume-title":"Bioinformatics and Computational Biology Solutions Using R and Bioconductor","author":"Smyth","year":"2005"},{"key":"2023012711064203300_btu110-B32","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"issue":"Pt 2","key":"2023012711064203300_btu110-B33","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. R. Stat. Soc. B Methodol."},{"key":"2023012711064203300_btu110-B34","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1186\/1471-2105-8-383","article-title":"ProbCD: enrichment analysis accounting for categorization uncertainty","volume":"8","author":"V\u00eancio","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012711064203300_btu110-B35","doi-asserted-by":"crossref","first-page":"e133","DOI":"10.1093\/nar\/gks461","article-title":"Camera: a competitive gene set test accounting for inter-gene correlation","volume":"40","author":"Wu","year":"2012","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/12\/1698\/48927427\/bioinformatics_30_12_1698.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/12\/1698\/48927427\/bioinformatics_30_12_1698.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T16:42:20Z","timestamp":1716568940000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/12\/1698\/2748166"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,2,25]]},"references-count":35,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2014,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu110","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2014,6,15]]},"published":{"date-parts":[[2014,2,25]]}}}