{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T20:31:49Z","timestamp":1774816309688,"version":"3.50.1"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"20","license":[{"start":{"date-parts":[[2019,3,15]],"date-time":"2019-03-15T00:00:00Z","timestamp":1552608000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Australian Research Council Discovery Project","award":["DP160104292"],"award-info":[{"award-number":["DP160104292"]}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Under two biologically different conditions, we are often interested in identifying differentially expressed genes. It is usually the case that the assumption of equal variances on the two groups is violated for many genes where a large number of them are required to be filtered or ranked. In these cases, exact tests are unavailable and the Welch\u2019s approximate test is most reliable one. The Welch\u2019s test involves two layers of approximations: approximating the distribution of the statistic by a t-distribution, which in turn depends on approximate degrees of freedom. This study attempts to improve upon Welch\u2019s approximate test by avoiding one layer of approximation.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We introduce a new distribution that generalizes the t-distribution and propose a Monte Carlo based test that uses only one layer of approximation for statistical inferences. Experimental results based on extensive simulation studies show that the Monte Carol based tests enhance the statistical power and performs better than Welch\u2019s t-approximation, especially when the equal variance assumption is not met and the sample size of the sample with a larger variance is smaller. We analyzed two gene-expression datasets, namely the childhood acute lymphoblastic leukemia gene-expression dataset with 22\u2009283 genes and Golden Spike dataset produced by a controlled experiment with 13\u2009966 genes. The new test identified additional genes of interest in both datasets. Some of these genes have been proven to play important roles in medical literature.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>R scripts and the R package mcBFtest is available in CRAN and to reproduce all reported results are available at the GitHub repository, https:\/\/github.com\/iullah1980\/MCTcodes.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data is available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz189","type":"journal-article","created":{"date-parts":[[2019,3,14]],"date-time":"2019-03-14T14:58:43Z","timestamp":1552575523000},"page":"3996-4003","source":"Crossref","is-referenced-by-count":10,"title":["Significance tests for analyzing gene expression data with small sample sizes"],"prefix":"10.1093","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9476-6837","authenticated-orcid":false,"given":"Insha","family":"Ullah","sequence":"first","affiliation":[{"name":"School of Mathematical Sciences, Queensland University of Technology , Brisbane, QLD, Australia"}]},{"given":"Sudhir","family":"Paul","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, University of Windsor , Windsor, ON, Canada"}]},{"given":"Zhenjie","family":"Hong","sequence":"additional","affiliation":[{"name":"College of Mathematics and Physics, Wenzhou University , Wenzhou, Zhejiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0901-4671","authenticated-orcid":false,"given":"You-Gan","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Mathematical Sciences, Queensland University of Technology , Brisbane, QLD, Australia"}]}],"member":"286","published-online":{"date-parts":[[2019,3,15]]},"reference":[{"key":"2023013108271770500_btz189-B1","doi-asserted-by":"crossref","first-page":"266","DOI":"10.2307\/2347702","article-title":"Comparing the means of two independent samples","volume":"33","author":"Barnard","year":"1984","journal-title":"Appl. Stat"},{"key":"2023013108271770500_btz189-B2","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1111\/j.1467-9876.2004.00428.x","article-title":"Chebyshev\u2019s inequality for nonparametric testing with small n and \u03b1 in microarray research","volume":"53","author":"Beasley","year":"2004","journal-title":"J. R. Stat. Soc. Ser. C Appl. Stat"},{"key":"2023013108271770500_btz189-B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. Ser B Method"},{"key":"2023013108271770500_btz189-B4","first-page":"205","article-title":"Welch\u2019s approximate solution for the Behrens\u2013Fisher problem","volume":"29","author":"Best","year":"1987","journal-title":"Technometrics"},{"key":"2023013108271770500_btz189-B5","volume-title":"Statistical Inference","author":"Casella","year":"2002"},{"key":"2023013108271770500_btz189-B6","doi-asserted-by":"crossref","first-page":"R16.","DOI":"10.1186\/gb-2005-6-2-r16","article-title":"Preferred analysis methods for affymetrix genechips revealed by a wholly defined control dataset","volume":"6","author":"Choe","year":"2005","journal-title":"Genome Biol"},{"key":"2023013108271770500_btz189-B7","doi-asserted-by":"crossref","first-page":"4511","DOI":"10.1038\/srep04511","article-title":"Statistical physics approach to quantifying differences in myelinated nerve fibers","volume":"4","author":"Comin","year":"2014","journal-title":"Sci. Rep"},{"key":"2023013108271770500_btz189-B8","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1080\/10543400903572720","article-title":"Optimized ranking and selection methods for feature selection with application in microarray experiments","volume":"20","author":"Cui","year":"2010","journal-title":"J. Biopharm. Stat"},{"key":"2023013108271770500_btz189-B9","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/S1470-2045(08)70339-5","article-title":"A subtype of childhood acute lymphoblastic leukaemia with poor treatment outcome: a genome-wide classification study","volume":"10","author":"Den Boer","year":"2009","journal-title":"Lancet Oncol"},{"key":"2023013108271770500_btz189-B10","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/S0143-4004(03)00214-5","article-title":"Gtt1\/stard7, a novel phosphatidylcholine transfer protein-like highly expressed in gestational trophoblastic tumour: cloning and characterization","volume":"25","author":"Durand","year":"2004","journal-title":"Placenta"},{"key":"2023013108271770500_btz189-B11","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1093\/biomet\/70.1.300","article-title":"A comparison between the u and v tests in the Behrens\u2013Fisher problem","volume":"70","author":"Fenstad","year":"1983","journal-title":"Biometrika"},{"key":"2023013108271770500_btz189-B12","doi-asserted-by":"crossref","first-page":"5648","DOI":"10.1073\/pnas.81.18.5648","article-title":"An 8-kilobase abl RNA transcript in chronic myelogenous leukemia","volume":"81","author":"Gale","year":"1984","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013108271770500_btz189-B13","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1093\/bioinformatics\/btg405","article-title":"affy\u2014analysis of affymetrix genechip data at the probe level","volume":"20","author":"Gautier","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013108271770500_btz189-B14","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1093\/bioinformatics\/btl033","article-title":"A new summarization method for affymetrix probe level data","volume":"22","author":"Hochreiter","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013108271770500_btz189-B15","doi-asserted-by":"crossref","first-page":"e12336.","DOI":"10.1371\/journal.pone.0012336","article-title":"Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies","volume":"5","author":"Jeanmougin","year":"2010","journal-title":"PLoS One"},{"key":"2023013108271770500_btz189-B16","first-page":"215","article-title":"Comparing samples\u2014part I","volume-title":"Nat. Methods","author":"Krzywinski","year":"2014"},{"key":"2023013108271770500_btz189-B17","first-page":"355","article-title":"Comparing samples\u2014part II","volume-title":"Nat. Methods","author":"Krzywinski","year":"2014"},{"key":"2023013108271770500_btz189-B18","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1007\/s13577-017-0178-6","article-title":"Essential role of sh3gl1 in interleukin-6 (il-6)-and vascular endothelial growth factor (vegf)-triggered p130cas-mediated proliferation and migration of osteosarcoma cells","volume":"30","author":"Li","year":"2017","journal-title":"Hum. Cell"},{"key":"2023013108271770500_btz189-B19","doi-asserted-by":"crossref","first-page":"165.","DOI":"10.1186\/1471-2105-6-165","article-title":"Identifying differential expression in multiple sage libraries: an overdispersed log-linear model approach","volume":"6","author":"Lu","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023013108271770500_btz189-B20","doi-asserted-by":"crossref","first-page":"e47","DOI":"10.1093\/nar\/gkv007","article-title":"limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"Ritchie","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023013108271770500_btz189-B21","doi-asserted-by":"crossref","first-page":"2881","DOI":"10.1093\/bioinformatics\/btm453","article-title":"Moderated statistical tests for assessing differences in tag abundance","volume":"23","author":"Robinson","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013108271770500_btz189-B22","doi-asserted-by":"crossref","first-page":"42460","DOI":"10.1038\/srep42460","article-title":"Variation-preserving normalization unveils blind spots in gene expression profiling","volume":"7","author":"Roca","year":"2017","journal-title":"Sci. Rep"},{"key":"2023013108271770500_btz189-B23","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013108271770500_btz189-B24","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"73","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B Method"},{"key":"2023013108271770500_btz189-B25","doi-asserted-by":"crossref","first-page":"1454","DOI":"10.1093\/bioinformatics\/18.11.1454","article-title":"Nonparametric methods for identifying differentially expressed genes in microarray data","volume":"18","author":"Troyanskaya","year":"2002","journal-title":"Bioinformatics"},{"key":"2023013108271770500_btz189-B26","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1093\/biomet\/29.3-4.350","article-title":"The significance of the difference between two means when the population variances are unequal","volume":"29","author":"Welch","year":"1938","journal-title":"Biometrika"},{"key":"2023013108271770500_btz189-B27","doi-asserted-by":"crossref","first-page":"210.","DOI":"10.1186\/s12864-017-3498-8","article-title":"A clustering-based approach for efficient identification of microRNA combinatorial biomarkers","volume":"18","author":"Yang","year":"2017","journal-title":"BMC Genomics"},{"key":"2023013108271770500_btz189-B28","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/S1535-6108(02)00032-6","article-title":"Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling","volume":"1","author":"Yeoh","year":"2002","journal-title":"Cancer Cell"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz189\/28405176\/btz189.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/20\/3996\/48975900\/bioinformatics_35_20_3996.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/20\/3996\/48975900\/bioinformatics_35_20_3996.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T20:14:42Z","timestamp":1721074482000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/20\/3996\/5381541"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,3,15]]},"references-count":28,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2019,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz189","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,10,15]]},"published":{"date-parts":[[2019,3,15]]}}}