{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,2,6]],"date-time":"2023-02-06T22:51:09Z","timestamp":1675723869382},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The advent of high throughput data has led to a massive increase in the number of hypothesis tests conducted in many types of biological studies and a concomitant increase in stringency of significance thresholds. Filtering methods, which use independent information to eliminate less promising tests and thus reduce multiple testing, have been widely and successfully applied. However, key questions remain about how to best apply them: When is filtering beneficial and when is it detrimental? How good does the independent information need to be in order for filtering to be effective? How should one choose the filter cutoff that separates tests that pass the filter from those that don\u2019t?<\/jats:p>\n               <jats:p>Result: We quantify the effect of the quality of the filter information, the filter cutoff and other factors on the effectiveness of the filter and show a number of results: If the filter has a high probability (e.g. 70%) of ranking true positive features highly (e.g. top 10%), then filtering can lead to dramatic increase (e.g. 10-fold) in discovery probability when there is high redundancy in information between hypothesis tests. Filtering is less effective when there is low redundancy between hypothesis tests and its benefit decreases rapidly as the quality of the filter information decreases. Furthermore, the outcome is highly dependent on the choice of filter cutoff. Choosing the cutoff without reference to the data will often lead to a large loss in discovery probability. However, na\u00efve optimization of the cutoff using the data will lead to inflated type I error. We introduce a data-based method for choosing the cutoff that maintains control of the family-wise error rate via a correction factor to the significance threshold. Application of this approach offers as much as a several-fold advantage in discovery probability relative to no filtering, while maintaining type I error control. We also introduce a closely related method of P-value weighting that further improves performance.<\/jats:p>\n               <jats:p>Availability and implementation: R code for calculating the correction factor is available at http:\/\/www.stat.uga.edu\/people\/faculty\/paul-schliekelman.<\/jats:p>\n               <jats:p>Contact: \u00a0pdschlie@stat.uga.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv608","type":"journal-article","created":{"date-parts":[[2015,11,18]],"date-time":"2015-11-18T01:29:26Z","timestamp":1447810166000},"page":"850-858","source":"Crossref","is-referenced-by-count":7,"title":["Prioritizing hypothesis tests for high throughput data"],"prefix":"10.1093","volume":"32","author":[{"given":"Sangjin","family":"Kim","sequence":"first","affiliation":[{"name":"Department of Statistics, University of Georgia, Athens, GA 30602, USA"}]},{"given":"Paul","family":"Schliekelman","sequence":"additional","affiliation":[{"name":"Department of Statistics, University of Georgia, Athens, GA 30602, USA"}]}],"member":"286","published-online":{"date-parts":[[2015,11,16]]},"reference":[{"key":"2023020111565605900_btv608-B1","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1111\/1467-9469.00072","article-title":"Multiple hypotheses testing with weights","volume":"24","author":"Benjamini","year":"1997","journal-title":"Scand. J. Stat."},{"key":"2023020111565605900_btv608-B2","doi-asserted-by":"crossref","first-page":"9546","DOI":"10.1073\/pnas.0914005107","article-title":"Independent filtering increases detection power for high-throughput experiments","volume":"107","author":"Bourgon","year":"2010","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023020111565605900_btv608-B3","doi-asserted-by":"crossref","first-page":"E175","DOI":"10.1073\/pnas.1011698107","article-title":"Reply to Talloen et\u00a0al.: independent filtering is a generic approach that needs domain specific adaptation","volume":"107","author":"Bourgon","year":"2010","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023020111565605900_btv608-B4","doi-asserted-by":"crossref","first-page":"6532","DOI":"10.1002\/sim.3431","article-title":"Improving strategies for detecting genetic patterns of disease susceptibility in association studies","volume":"27","author":"Calle","year":"2008","journal-title":"Stat. Med."},{"key":"2023020111565605900_btv608-B5","doi-asserted-by":"crossref","first-page":"929","DOI":"10.1093\/biomet\/ass044","article-title":"Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction","volume":"99","author":"Dai","year":"2012","journal-title":"Biometrika"},{"key":"2023020111565605900_btv608-B6","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1016\/j.ygeno.2008.05.012","article-title":"Genomics and genome-wide association studies: an integrative approach to expression QTL mapping","volume":"92","author":"Degnan","year":"2008","journal-title":"Genomics"},{"key":"2023020111565605900_btv608-B7","doi-asserted-by":"crossref","first-page":"e157","DOI":"10.1371\/journal.pgen.0020157","article-title":"Two-stage two-locus models in genome-wide association","volume":"2","author":"Evans","year":"2006","journal-title":"PLoS Genet"},{"key":"2023020111565605900_btv608-B8","doi-asserted-by":"crossref","first-page":"3859","DOI":"10.1016\/j.jspi.2007.04.004","article-title":"FDR- and FWE-controlling methods using data-driven weights","volume":"137","author":"Finos","year":"2007","journal-title":"J. Stat. Plan. Inference"},{"key":"2023020111565605900_btv608-B9","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1080\/03610910701790269","article-title":"Calculation methods for Wallenius' noncentral hypergeome-tric distribution","volume":"37","author":"Fog","year":"2008","journal-title":"Commun. Stat. Simul. C"},{"key":"2023020111565605900_btv608-B10","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1080\/03610910701790236","article-title":"Sampling methods for Wallenius' and Fisher's noncentral hypergeometric distributions","volume":"37","author":"Fog","year":"2008","journal-title":"Commun. Stat. Simul. C"},{"key":"2023020111565605900_btv608-B11","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1093\/biomet\/93.3.509","article-title":"False discovery control with p-value weighting","volume":"93","author":"Genovese","year":"2006","journal-title":"Biometrika"},{"key":"2023020111565605900_btv608-B12","doi-asserted-by":"crossref","first-page":"1182","DOI":"10.1371\/journal.pgen.0020130","article-title":"Integrating genetic and network analysis to characterize genes related to mouse weight","volume":"2","author":"Ghazalpour","year":"2006","journal-title":"Plos Genet."},{"key":"2023020111565605900_btv608-B13","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1038\/nrg3118","article-title":"Rare and common variants: twenty arguments","volume":"13","author":"Gibson","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023020111565605900_btv608-B14","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/1471-2105-10-11","article-title":"Filtering for increased power for microarray data analysis","volume":"10","author":"Hackstadt","year":"2009","journal-title":"BMC Bioinf."},{"key":"2023020111565605900_btv608-B15","first-page":"65","article-title":"A simple sequentially rejective multiple test procedure","volume":"6","author":"Holm","year":"1979","journal-title":"Scand. J. Stat."},{"key":"2023020111565605900_btv608-B16","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1086\/519748","article-title":"Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan","volume":"81","author":"Ionita-Laza","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"2023020111565605900_btv608-B17","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1223","article-title":"A two-step multiple comparison procedure for a large number of tests and multiple treatments","volume":"5","author":"Jiang","year":"2006","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023020111565605900_btv608-B18","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/j.jspi.2003.07.021","article-title":"Nonparametric multiple test procedures with data-driven order of hypotheses and with weighted hypotheses","volume":"125","author":"Kropf","year":"2004","journal-title":"J. Stat. Plan. Inference"},{"key":"2023020111565605900_btv608-B19","doi-asserted-by":"crossref","first-page":"103","DOI":"10.3389\/fgene.2013.00103","article-title":"Using eQTL weights to improve power for genome-wide association studies: a genetic study of childhood asthma","volume":"4","author":"Li","year":"2013","journal-title":"Front. Genet."},{"key":"2023020111565605900_btv608-B20","doi-asserted-by":"crossref","first-page":"e86","DOI":"10.1093\/nar\/gkr241","article-title":"Principal component analysis-based filtering improves detection for Affymetrix gene expression arrays","volume":"39","author":"Lu","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023020111565605900_btv608-B21","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1186\/1471-2105-7-49","article-title":"Effects of filtering by Present call on analysis of microarray experiments","volume":"7","author":"McClintick","year":"2006","journal-title":"BMC Bioinf."},{"key":"2023020111565605900_btv608-B22","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1007\/s00439-008-0522-8","article-title":"Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases","volume":"124","author":"Pattin","year":"2008","journal-title":"Hum. Genet."},{"key":"2023020111565605900_btv608-B23","first-page":"277","article-title":"Variant priorization and analysis incorporati-ng problematic regions of the genome","author":"Patwardhan","year":"2014","journal-title":"Pac. Symp. Biocomput."},{"key":"2023020111565605900_btv608-B24","doi-asserted-by":"crossref","first-page":"e1000598","DOI":"10.1371\/journal.pcbi.1000598","article-title":"An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data","volume":"5","author":"Ramskold","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023020111565605900_btv608-B25","first-page":"1","article-title":"HTSFilter : independent data-based filtering for replicated transcriptome sequencing experiments","author":"Rau","year":"2013"},{"key":"2023020111565605900_btv608-B26","doi-asserted-by":"crossref","first-page":"2146","DOI":"10.1093\/bioinformatics\/btt350","article-title":"Data-based filtering for replicated high-throughput transcriptome sequencing experiments","volume":"29","author":"Rau","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020111565605900_btv608-B27","first-page":"398","article-title":"Genome-wide significance levels and weighted hypothesis testing","volume":"24","author":"Roeder","year":"2009","journal-title":"Stat. Sci. Rev. J. Inst. Math. Stat."},{"key":"2023020111565605900_btv608-B28","doi-asserted-by":"crossref","first-page":"678","DOI":"10.1214\/09-EJS430","article-title":"Optimal weighting for false discovery rate control","volume":"3","author":"Roquain","year":"2009","journal-title":"Electron. J. Stat."},{"key":"2023020111565605900_btv608-B29","doi-asserted-by":"crossref","first-page":"19","DOI":"10.2202\/1544-6115.1148","article-title":"A method to increase the power of multiple testing procedures through sample splitting","volume":"5","author":"Rubin","year":"2006","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023020111565605900_btv608-B30","doi-asserted-by":"crossref","first-page":"D818","DOI":"10.1093\/nar\/gkt954","article-title":"The mouse Gene Expression Database (GXD): 2014 update","volume":"42","author":"Smith","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2023020111565605900_btv608-B31","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1126\/science.1160342","article-title":"A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome","volume":"321","author":"Sultan","year":"2008","journal-title":"Science (New York, N.Y)"},{"key":"2023020111565605900_btv608-B32","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1093\/bioinformatics\/btm478","article-title":"I\/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data","volume":"23","author":"Talloen","year":"2007","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023020111565605900_btv608-B33","doi-asserted-by":"crossref","first-page":"E173","DOI":"10.1073\/pnas.1010604107","article-title":"Filtering data from high-throughput experiments based on measurement reliability","volume":"107","author":"Talloen","year":"2010","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020111565605900_btv608-B34","first-page":"398","article-title":"Genome-wide significance levels and weighted hypothesis testing","volume":"24","author":"Wasserman","year":"2006","journal-title":"Stat. Sci. 2009"},{"key":"2023020111565605900_btv608-B35","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1214\/lnms\/1196285632","article-title":"Weighted FWE-controlling methods in high-dimensional situations","volume":"47","author":"Westfall","year":"2004","journal-title":"Lect. Notes Monogr. Ser. Recent Dev. Multiple Comparison Proced."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/6\/850\/49018532\/bioinformatics_32_6_850.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/6\/850\/49018532\/bioinformatics_32_6_850.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:20:59Z","timestamp":1675290059000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/6\/850\/1743699"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,11,16]]},"references-count":35,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2016,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv608","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,3,15]]},"published":{"date-parts":[[2015,11,16]]}}}