{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,10,2]],"date-time":"2023-10-02T11:03:05Z","timestamp":1696244585931},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features associated with illness. We propose a new approach, called <jats:italic>gene set bagging<\/jats:italic>, for measuring the probability that a gene set replicates in future studies. Gene set bagging involves resampling the original high-throughput data, performing gene-set analysis on the resampled data, and confirming that biological categories replicate in the bagged samples.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Using both simulated and publicly-available genomics data, we demonstrate that significant categories in a gene set enrichment analysis may be unstable when subjected to resampling. We show our method estimates the replication probability (<jats:italic>R<\/jats:italic>), the probability that a gene set will replicate as a significant result in future studies, and show in simulations that this method reflects replication better than each set\u2019s p-value.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Our results suggest that gene lists based on p-values are not necessarily stable, and therefore additional steps like gene set bagging may improve biological inference on gene sets.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-14-360","type":"journal-article","created":{"date-parts":[[2013,12,12]],"date-time":"2013-12-12T20:01:17Z","timestamp":1386878477000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Gene set bagging for estimating the probability a statistically significant result will replicate"],"prefix":"10.1186","volume":"14","author":[{"given":"Andrew E","family":"Jaffe","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John D","family":"Storey","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongkai","family":"Ji","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeffrey T","family":"Leek","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2013,12,12]]},"reference":[{"issue":"5696","key":"6862_CR1","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1126\/science.1104635","volume":"306","author":"L Hood","year":"2004","unstructured":"Hood L, Heath J, Phelps M, Lin B: Systems biology and new technologies enable predictive and preventative medicine. Science. 2004, 306 (5696): 640-10.1126\/science.1104635.","journal-title":"Science"},{"issue":"8","key":"6862_CR2","doi-asserted-by":"publisher","first-page":"789","DOI":"10.1038\/nm1087","volume":"10","author":"B Vogelstein","year":"2004","unstructured":"Vogelstein B, Kinzler K: Cancer genes and the pathways they control. Nat Med. 2004, 10 (8): 789-799. 10.1038\/nm1087.","journal-title":"Nat Med"},{"key":"6862_CR3","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"T Hastie","year":"2009","unstructured":"Hastie T, Tibshirani R, Friedman JH: The elements of statistical learning: data mining, inference, and prediction. 2009, New York: Springer"},{"issue":"457","key":"6862_CR4","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1198\/016214502753479248","volume":"97","author":"S Dudoit","year":"2002","unstructured":"Dudoit S, Fridlyand J, Speed T: Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002, 97 (457): 77-87. 10.1198\/016214502753479248.","journal-title":"J Am Stat Assoc"},{"key":"6862_CR5","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1186\/1471-2105-9-289","volume":"9","author":"F Baty","year":"2008","unstructured":"Baty F, Jaeger D, Preiswerk F, Schumacher M, Brutsche M: Stability of gene contributions and identification of outliers in multivariate analysis of microarray data. BMC Bioinformatics. 2008, 9: 289-10.1186\/1471-2105-9-289.","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"6862_CR6","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1109\/tcbb.2007.1078","volume":"5","author":"LL Elo","year":"2008","unstructured":"Elo LL, Fil\u00e9n S, Lahesmaa R, Aittokallio T: Reproducibility-optimized test statistic for ranking genes in microarray studies. Comput Biol Bioinformatics, IEEE\/ACM Trans. 2008, 5 (3): 423-431.","journal-title":"Comput Biol Bioinformatics, IEEE\/ACM Trans"},{"key":"6862_CR7","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1186\/1471-2105-11-277","volume":"11","author":"G Abraham","year":"2010","unstructured":"Abraham G, Kowalczyk A, Loi S, Haviv I, Zobel J: Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context. BMC Bioinformatics. 2010, 11: 277-10.1186\/1471-2105-11-277.","journal-title":"BMC Bioinformatics"},{"key":"6862_CR8","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1186\/1471-2105-11-162","volume":"11","author":"C Wang","year":"2010","unstructured":"Wang C, Xuan J, Li H, Wang Y, Zhan M, Hoffman E, Clarke R: Knowledge-guided gene ranking by coordinative component analysis. BMC Bioinformatics. 2010, 11: 162-10.1186\/1471-2105-11-162.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"6862_CR9","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1214\/07-AOAS101","volume":"1","author":"B Efron","year":"2007","unstructured":"Efron B, Tibshirani R: On testing the significance of sets of genes. Ann Appl Stat. 2007, 1 (1): 107-129. 10.1214\/07-AOAS101.","journal-title":"Ann Appl Stat"},{"key":"6862_CR10","doi-asserted-by":"publisher","first-page":"574","DOI":"10.1186\/1471-2164-11-574","volume":"11","author":"DM Gatti","year":"2010","unstructured":"Gatti DM, Barry WT, Nobel AB, Rusyn I, Wright FA: Heading down the wrong pathway: on the influence of correlation within gene sets. BMC Genomics. 2010, 11: 574-10.1186\/1471-2164-11-574.","journal-title":"BMC Genomics"},{"key":"6862_CR11","doi-asserted-by":"crossref","unstructured":"Nature Editorial Staff: Announcement: Reducing our irreproducibility. Nature. 496 (398):","DOI":"10.1038\/496398a"},{"key":"6862_CR12","doi-asserted-by":"crossref","first-page":"3","DOI":"10.2202\/1544-6115.1027","volume":"3","author":"G Smyth","year":"2004","unstructured":"Smyth G, et al: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: 3-","journal-title":"Stat Appl Genet Mol Biol"},{"key":"6862_CR13","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1186\/1471-2164-9-363","volume":"9","author":"J Michaud","year":"2008","unstructured":"Michaud J, Simpson K, Escher R, Buchet-Poyau K, Beissbarth T, Carmichael C, Ritchie M, Sch\u00fctz F, Cannon P, Liu M, et al: Integrative analysis of RUNX1 downstream pathways and target genes. BMC Genomics. 2008, 9: 363-10.1186\/1471-2164-9-363.","journal-title":"BMC Genomics"},{"key":"6862_CR14","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1007\/0-387-29362-0_23","volume-title":"Bioinformatics and Computational Biology Solutions Using R and Bioconductor","author":"GK Smyth","year":"2005","unstructured":"Smyth GK: Limma: linear models for microarray data. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Edited by: Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. 2005, New York: Springer, 397-420."},{"issue":"4","key":"6862_CR15","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1111\/j.1467-9868.2010.00740.x","volume":"72","author":"N Meinshausen","year":"2010","unstructured":"Meinshausen N, B\u00fchlmann P: Stability selection. J R Stat Soc: Series B (Stat Method). 2010, 72 (4): 417-473. 10.1111\/j.1467-9868.2010.00740.x.","journal-title":"J R Stat Soc: Series B (Stat Method)"},{"issue":"3","key":"6862_CR16","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1158\/1940-6207.CAPR-09-0192","volume":"3","author":"J Boyle","year":"2010","unstructured":"Boyle J, Gumus Z, Kacker A, Choksi V, Bocker J, Zhou X, Yantiss R, Hughes D, Du B, Judson B, et al: Effects of cigarette smoke on the human oral mucosal transcriptome. Cancer Prev Res. 2010, 3 (3): 266-10.1158\/1940-6207.CAPR-09-0192.","journal-title":"Cancer Prev Res"},{"issue":"2","key":"6862_CR17","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1093\/biostatistics\/4.2.249","volume":"4","author":"R Irizarry","year":"2003","unstructured":"Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4 (2): 249-10.1093\/biostatistics\/4.2.249.","journal-title":"Biostatistics"},{"issue":"9","key":"6862_CR18","doi-asserted-by":"publisher","first-page":"e161","DOI":"10.1371\/journal.pgen.0030161","volume":"3","author":"J Leek","year":"2007","unstructured":"Leek J, Storey J: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics. 2007, 3 (9): e161-10.1371\/journal.pgen.0030161.","journal-title":"PLoS Genetics"},{"issue":"48","key":"6862_CR19","doi-asserted-by":"publisher","first-page":"18718","DOI":"10.1073\/pnas.0808709105","volume":"105","author":"J Leek","year":"2008","unstructured":"Leek J, Storey J: A general framework for multiple testing dependence. Proc Natl Acad Sci. 2008, 105 (48): 18718-10.1073\/pnas.0808709105.","journal-title":"Proc Natl Acad Sci"},{"key":"6862_CR20","doi-asserted-by":"crossref","first-page":"3","DOI":"10.2202\/1544-6115.1027","volume":"3","author":"G Smyth","year":"2004","unstructured":"Smyth G: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: 3-","journal-title":"Stat Appl Genet Mol Biol"},{"issue":"16","key":"6862_CR21","doi-asserted-by":"publisher","first-page":"9440","DOI":"10.1073\/pnas.1530509100","volume":"100","author":"J Storey","year":"2003","unstructured":"Storey J, Tibshirani R: Statistical significance for genomewide studies. PProc Natl Acad Sci USA. 2003, 100 (16): 9440-10.1073\/pnas.1530509100.","journal-title":"PProc Natl Acad Sci USA"},{"issue":"5","key":"6862_CR22","doi-asserted-by":"publisher","first-page":"e1000952","DOI":"10.1371\/journal.pgen.1000952","volume":"6","author":"J Gibbs","year":"2010","unstructured":"Gibbs J, Van Der Brug M, Hernandez D, Traynor B, Nalls M, Lai S, Arepalli S, Dillman A, Rafferty I, Troncoso J, et al: Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genetics. 2010, 6 (5): e1000952-10.1371\/journal.pgen.1000952.","journal-title":"PLoS Genetics"},{"issue":"6","key":"6862_CR23","doi-asserted-by":"publisher","first-page":"1304","DOI":"10.1086\/524110","volume":"81","author":"C Ladd-Acosta","year":"2007","unstructured":"Ladd-Acosta C, Pevsner J, Sabunciyan S, Yolken R, Webster M, Dinkins T, Callinan P, Fan J, Potash J, Feinberg A: DNA methylation signatures within the human brain. Am J Hum Genet. 2007, 81 (6): 1304-1315. 10.1086\/524110.","journal-title":"Am J Hum Genet"},{"issue":"9","key":"6862_CR24","doi-asserted-by":"publisher","first-page":"1511","DOI":"10.1093\/carcin\/23.9.1511","volume":"23","author":"RJ Anto","year":"2002","unstructured":"Anto RJ, Mukhopadhyay A, Shishodia S, Gairola CG, Aggarwal BB: Cigarette smoke condensate activates nuclear transcription factor-kappaB through phosphorylation and degradation of IkappaB(alpha): correlation with induction of cyclooxygenase-2. Carcinogenesis. 2002, 23 (9): 1511-1518. 10.1093\/carcin\/23.9.1511.","journal-title":"Carcinogenesis"},{"issue":"5","key":"6862_CR25","doi-asserted-by":"publisher","first-page":"1687","DOI":"10.1214\/aos\/1024691353","volume":"26","author":"B Efron","year":"1998","unstructured":"Efron B, Tibshirani R: The problem of regions. Ann Stat. 1998, 26 (5): 1687-1718.","journal-title":"Ann Stat"},{"issue":"4","key":"6862_CR26","doi-asserted-by":"publisher","first-page":"783","DOI":"10.2307\/2408678","volume":"39","author":"J Felsenstein","year":"1985","unstructured":"Felsenstein J: Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985, 39 (4): 783-791. 10.2307\/2408678.","journal-title":"Evolution"},{"issue":"23","key":"6862_CR27","doi-asserted-by":"publisher","first-page":"13429","DOI":"10.1073\/pnas.93.23.13429","volume":"93","author":"B Efron","year":"1996","unstructured":"Efron B, Halloran E, Holmes S: Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci. 1996, 93 (23): 13429-13429. 10.1073\/pnas.93.23.13429.","journal-title":"Proc Natl Acad Sci"},{"key":"6862_CR28","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1098\/rspa.1946.0056","volume":"186","author":"H Jeffreys","year":"1007","unstructured":"Jeffreys H, Jeffreys H: An invariant form for the prior probability in estimation problems. Proc R Soc Lond A Math Phys Sci. 1007, 186: 453-461.","journal-title":"Proc R Soc Lond A Math Phys Sci"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-14-360.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T19:53:53Z","timestamp":1630612433000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-14-360"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,12]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["6862"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-14-360","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,12]]},"assertion":[{"value":"13 February 2013","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 November 2013","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 December 2013","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"360"}}