{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,26]],"date-time":"2026-01-26T00:51:12Z","timestamp":1769388672397,"version":"3.49.0"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2017,5,3]],"date-time":"2017-05-03T00:00:00Z","timestamp":1493769600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>In genome-wide rate comparison studies, there is a big challenge for effective identification of an appropriate number of significant features objectively, since traditional statistical comparisons without multi-testing correction can generate a large number of false positives while multi-testing correction tremendously decreases the statistic power.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this study, we proposed a new exact test based on the translation of rate comparison to two binomial distributions. With modeling and real datasets, the exact binomial test (EBT) showed an advantage in balancing the statistical precision and power, by providing an appropriate size of significant features for further studies. Both correlation analysis and bootstrapping tests demonstrated that EBT is as robust as the typical rate-comparison methods, e.g. \u03c72 test, Fisher\u2019s exact test and Binomial test. Performance comparison among machine learning models with features identified by different statistical tests further demonstrated the advantage of EBT. The new test was also applied to analyze the genome-wide somatic gene mutation rate difference between lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), two main lung cancer subtypes and a list of new markers were identified that could be lineage-specifically associated with carcinogenesis of LUAD and LUSC, respectively. Interestingly, three cilia genes were found selectively with high mutation rates in LUSC, possibly implying the importance of cilia dysfunction in the carcinogenesis.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>An R package implementing EBT could be downloaded from the website freely: http:\/\/www.szu-bioinf.org\/EBT.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx294","type":"journal-article","created":{"date-parts":[[2017,5,2]],"date-time":"2017-05-02T11:11:48Z","timestamp":1493723508000},"page":"2631-2641","source":"Crossref","is-referenced-by-count":13,"title":["EBT: a statistic test identifying moderate size of significant features with balanced power and precision for genome-wide rate comparisons"],"prefix":"10.1093","volume":"33","author":[{"given":"Xinjie","family":"Hui","sequence":"first","affiliation":[{"name":"Department of Cell Biology and Genetics, School of Basic Medical Sciences, Shenzhen University Health Science Center, Shenzhen, China"}]},{"given":"Yueming","family":"Hu","sequence":"additional","affiliation":[{"name":"Department of Cell Biology and Genetics, School of Basic Medical Sciences, Shenzhen University Health Science Center, Shenzhen, China"}]},{"given":"Ming-An","family":"Sun","sequence":"additional","affiliation":[{"name":"Epigenomics and Computational Biology Lab, Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA"}]},{"given":"Xingsheng","family":"Shu","sequence":"additional","affiliation":[{"name":"Department of Cell Biology and Genetics, School of Basic Medical Sciences, Shenzhen University Health Science Center, Shenzhen, China"}]},{"given":"Rongfei","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Cell Biology and Genetics, School of Basic Medical Sciences, Shenzhen University Health Science Center, Shenzhen, China"}]},{"given":"Qinggang","family":"Ge","sequence":"additional","affiliation":[{"name":"Department of Critical Care Unit, Peking University Third Hospital, Beijing, China"}]},{"given":"Yejun","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Cell Biology and Genetics, School of Basic Medical Sciences, Shenzhen University Health Science Center, Shenzhen, China"}]}],"member":"286","published-online":{"date-parts":[[2017,5,3]]},"reference":[{"key":"2023020206283125000_btx294-B1","doi-asserted-by":"crossref","first-page":"15765","DOI":"10.1073\/pnas.0704344104","article-title":"The mouse homeobox gene Noto regulates node morphogenesis, notochordal ciliogenesis, and left right patterning","volume":"104","author":"Beckers","year":"2007","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020206283125000_btx294-B2","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multi-testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020206283125000_btx294-B3","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/978-3-319-24932-2_2","article-title":"Emerging biomarkers in personalized therapy of lung cancer","volume":"890","author":"Cagle","year":"2016","journal-title":"Adv. Exp. Med. Biol"},{"key":"2023020206283125000_btx294-B4","doi-asserted-by":"crossref","first-page":"718","DOI":"10.1038\/ng.374","article-title":"Common variations in BARD1 influence susceptibility to high-risk neuroblastoma","volume":"41","author":"Capasso","year":"2009","journal-title":"Nat. Genet"},{"key":"2023020206283125000_btx294-B5","first-page":"1111","article-title":"Immunohistochemical localization of LLC1 in human tissues and its limited expression in non-small cell lung cancer","volume":"30","author":"Chandra","year":"2015","journal-title":"Histol. Histopathol"},{"key":"2023020206283125000_btx294-B6","doi-asserted-by":"crossref","first-page":"e1004497.","DOI":"10.1371\/journal.pcbi.1004497","article-title":"A gene gravity model for the evolution of cancer genomes: a study of 3,000 cancer genomes across 9 cancer types","volume":"11","author":"Cheng","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023020206283125000_btx294-B7","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1038\/nature14659","article-title":"Sparse whole-genome sequencing identifies two loci for major depressive disorder","volume":"523","author":"CONVERGE consortium","year":"2015","journal-title":"Nature"},{"key":"2023020206283125000_btx294-B8","doi-asserted-by":"crossref","first-page":"R17.","DOI":"10.1186\/bcr3101","article-title":"Breast cancer risk assessment with five independent genetic variants and two risk factors in Chinese women","volume":"14","author":"Dai","year":"2012","journal-title":"Breast Cancer Res"},{"key":"2023020206283125000_btx294-B9","doi-asserted-by":"crossref","first-page":"1603","DOI":"10.1007\/s10552-015-0654-9","article-title":"The effects of height and BMI on prostate cancer incidence and mortality: a Mendelian randomization study in 20,848 cases and 20,214 controls from the PRACTICAL consortium","volume":"26","author":"Davies","year":"2015","journal-title":"Cancer Causes Control"},{"key":"2023020206283125000_btx294-B10","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1142\/S0219720005001004","article-title":"Minimum redundancy feature selection from microarray gene expression data","volume":"3","author":"Ding","year":"2005","journal-title":"J. Bioinf. Comp. Biol"},{"key":"2023020206283125000_btx294-B11","doi-asserted-by":"crossref","first-page":"1547","DOI":"10.1007\/s13277-013-0683-5","article-title":"A support vector machine model for predicting non-sentinel lymph node status in patients with sentinel lymph node positive breast cancer","volume":"34","author":"Ding","year":"2013","journal-title":"Tumour Biol"},{"key":"2023020206283125000_btx294-B12","doi-asserted-by":"crossref","first-page":"1096","DOI":"10.1080\/01621459.1955.10501294","article-title":"A multiple comparisons procedure for comparing several treatments with a control","volume":"50","author":"Dunnett","year":"1955","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020206283125000_btx294-B13","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1056\/NEJMoa1008862","article-title":"Diabetes mellitus, fasting glucose, and risk of cause-specific death","volume":"364","author":"Emerging Risk Factors Collaboration","year":"2011","journal-title":"N. Engl. J. Med"},{"key":"2023020206283125000_btx294-B14","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1159\/000128567","article-title":"DNAI1 mutations explain only 2% of primary ciliary dykinesia","volume":"76","author":"Failly","year":"2008","journal-title":"Respiration"},{"key":"2023020206283125000_btx294-B15","doi-asserted-by":"crossref","first-page":"1149","DOI":"10.3758\/BRM.41.4.1149","article-title":"Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses","volume":"41","author":"Faul","year":"2009","journal-title":"Behav. Res. Methods"},{"key":"2023020206283125000_btx294-B16","doi-asserted-by":"crossref","first-page":"87","DOI":"10.2307\/2340521","article-title":"On the interpretation of \u03c72 from contingency tables, and the calculation of P","volume":"85","author":"Fisher","year":"1922","journal-title":"J. R. Stat. Soc"},{"key":"2023020206283125000_btx294-B17","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1159\/000357567","article-title":"A network-based kernel machine test for the identification of risk pathways in genome-wide association studies","volume":"76","author":"Freytag","year":"2013","journal-title":"Hum. Hered"},{"key":"2023020206283125000_btx294-B18","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1038\/nature05610","article-title":"Patterns of somatic mutation in human cancer genomes","volume":"446","author":"Greenman","year":"2009","journal-title":"Nature"},{"key":"2023020206283125000_btx294-B19","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"2023020206283125000_btx294-B20","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene selection for cancer classification using support vector machines","volume":"46","author":"Guyon","year":"2002","journal-title":"Mach. Learn"},{"key":"2023020206283125000_btx294-B21","doi-asserted-by":"crossref","first-page":"3261.","DOI":"10.1038\/ncomms4261","article-title":"Transdifferentiation of lung adenocarcinoma in mice with Lkb1 deficiency to squamous cell carcinoma","volume":"5","author":"Han","year":"2014","journal-title":"Nat. Commun"},{"key":"2023020206283125000_btx294-B22","first-page":"30110","article-title":"Four susceptibility loci for gallstone disease identified in a meta-analysis of genome-wide association studies","volume":"S0016-5085","author":"Joshi","year":"2016","journal-title":"Gastroenterology"},{"key":"2023020206283125000_btx294-B23","doi-asserted-by":"crossref","first-page":"432","DOI":"10.1002\/gepi.1012","article-title":"Issues concerning association studies for fine mapping a susceptibility gene for a complex disease","volume":"20","author":"Kaplan","year":"2001","journal-title":"Genet. Epidemiol"},{"key":"2023020206283125000_btx294-B24","doi-asserted-by":"crossref","first-page":"239","DOI":"10.5808\/GI.2013.11.4.239","article-title":"Somatic mutaome profile in human cancer tissues","volume":"11","author":"Kim","year":"2013","journal-title":"Genomics Inform"},{"key":"2023020206283125000_btx294-B25","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1200\/JCO.2013.50.8556","article-title":"Integrative and comparative genomic analysis of lung squamous cell carcinomas in East asian patients","volume":"32","author":"Kim","year":"2014","journal-title":"J. Clin. Oncol"},{"key":"2023020206283125000_btx294-B26","doi-asserted-by":"crossref","first-page":"2549","DOI":"10.1056\/NEJMoa033179","article-title":"Absence of an effect of liposuction on insulin action and risk factors for coronary heart disease","volume":"350","author":"Klein","year":"2004","journal-title":"N. Engl. J. Med"},{"key":"2023020206283125000_btx294-B27","doi-asserted-by":"crossref","DOI":"10.1201\/9781420011371","volume-title":"Handbook of Statistical Distributions with Applications","author":"Krishnamoorthy","year":"2006"},{"key":"2023020206283125000_btx294-B28","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1016\/j.ccell.2015.04.001","article-title":"LKB1 inactivation elicits a redox imbalance to modulate non-small cell lung cancer plasticity and therapeutic response","volume":"27","author":"Li","year":"2015","journal-title":"Cancer Cell"},{"key":"2023020206283125000_btx294-B29","doi-asserted-by":"crossref","first-page":"118.","DOI":"10.1186\/1471-2350-13-118","article-title":"Prediction of lung cancer risk in a Chinese population using a multifactorial genetic model","volume":"13","author":"Li","year":"2012","journal-title":"BMC Med. Genet"},{"key":"2023020206283125000_btx294-B30","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1038\/nature15390","article-title":"A novel locus of resistance to severe malaria in a region of ancient balancing selection","volume":"526","author":"Malaria Genomic Epidemiology Network","year":"2015","journal-title":"Nature"},{"key":"2023020206283125000_btx294-B31","volume-title":"Machine Learning","author":"Mitchell","year":"1997"},{"key":"2023020206283125000_btx294-B32","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1016\/j.ajhg.2010.02.011","article-title":"A follow-up study of a genome-wide association scan identifies a susceptibility locus for venous thrombosis on chromosome 6p24.1","volume":"86","author":"Morange","year":"2010","journal-title":"Am. J. Hum. Genet"},{"key":"2023020206283125000_btx294-B33","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.ajhg.2015.05.018","article-title":"A powerful pathway-based adaptive test for genetic association with common or rare variants","volume":"97","author":"Pan","year":"2015","journal-title":"Am. J. Hum. Genet"},{"key":"2023020206283125000_btx294-B34","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/nrg3706","article-title":"Statistical power and significance testing in large-scale genetic studies","volume":"15","author":"Sham","year":"2014","journal-title":"Nat. Rev. Genet"},{"key":"2023020206283125000_btx294-B35","first-page":"13","article-title":"Cilia gene expression patterns in cancer","volume":"11","author":"Shpak","year":"2014","journal-title":"Cancer Genomics Proteomics"},{"key":"2023020206283125000_btx294-B36","first-page":"1193","article-title":"Smoking and lung cancer risk in American and Japanese men: an international case-control study","volume":"10","author":"Stellman","year":"2001","journal-title":"Cancer Epidemiol. Biomarkers Prev"},{"key":"2023020206283125000_btx294-B37","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1080\/07853890510011985","article-title":"Primary ciliary dyskinesia: clinical presentation, diagnosis and genetics","volume":"37","author":"Storm Van\u2019s Gravesande","year":"2005","journal-title":"Ann. Med"},{"key":"2023020206283125000_btx294-B38","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1038\/nature11404","article-title":"Comprehensive genomic characterization of squamous cell lung cancers","volume":"489","author":"The Cancer Genome Atlas Research Network","year":"2012","journal-title":"Nature"},{"key":"2023020206283125000_btx294-B39","doi-asserted-by":"crossref","first-page":"543","DOI":"10.1038\/nature13385","article-title":"Comprehensive molecular profiling of lung adenocarcinoma","volume":"511","author":"The Cancer Genome Atlas Research Network","year":"2014","journal-title":"Nature"},{"key":"2023020206283125000_btx294-B40","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1097\/CCO.0000000000000051","article-title":"The clinical relevance of KRAS gene mutation in non-small-cell lung cancer","volume":"26","author":"T\u00edm\u00e1r","year":"2014","journal-title":"Curr. Opin. Oncol"},{"key":"2023020206283125000_btx294-B41","doi-asserted-by":"crossref","first-page":"784","DOI":"10.1056\/NEJMoa001999","article-title":"Helicobacter pylori infection and the development of gastric cancer","volume":"345","author":"Uemura","year":"2001","journal-title":"N. Engl. J. Med"},{"key":"2023020206283125000_btx294-B42","doi-asserted-by":"crossref","first-page":"e58173.","DOI":"10.1371\/journal.pone.0058173","article-title":"T3_MM: a Markov model effectively classifies bacterial type III secretion signals","volume":"8","author":"Wang","year":"2013","journal-title":"PLoS One"},{"key":"2023020206283125000_btx294-B43","doi-asserted-by":"crossref","first-page":"50.","DOI":"10.1186\/1471-2164-15-50","article-title":"Prediction of bacterial type IV secreted effectors by C-terminal features","volume":"15","author":"Wang","year":"2014","journal-title":"BMC Genomics"},{"key":"2023020206283125000_btx294-B44","doi-asserted-by":"crossref","first-page":"359.","DOI":"10.1186\/s12864-015-1555-8","article-title":"An empirical strategy to detect bacterial transcript structure from directional RNA-seq transcriptome data","volume":"16","author":"Wang","year":"2015","journal-title":"BMC Genomics"},{"key":"2023020206283125000_btx294-B45","doi-asserted-by":"crossref","first-page":"1538","DOI":"10.1097\/JTO.0000000000000666","article-title":"Lung cancer risk prediction using common SNPs located in GWAS-identified susceptibility regions","volume":"10","author":"Weissfeld","year":"2015","journal-title":"J. Thorac. Oncol"},{"key":"2023020206283125000_btx294-B46","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1007\/s00251-016-0914-1","article-title":"Novel genetic risk factors for asthma in African American children: Precision Medicine and the SAGE II Study","volume":"68","author":"White","year":"2016","journal-title":"Immunogenetics"},{"key":"2023020206283125000_btx294-B47","doi-asserted-by":"crossref","first-page":"929","DOI":"10.1016\/j.ajhg.2010.05.002","article-title":"Powerful SNP set analysis for case-control genome wide association studies","volume":"86","author":"Wu","year":"2010","journal-title":"Am. J. Hum. Genet"},{"key":"2023020206283125000_btx294-B48","article-title":"Developing a clinical utility framework to evaluate prediction models in radiogenomics","volume":"9416","author":"Wu","year":"2015","journal-title":"Proc SPIE Int Soc Opt Eng"},{"key":"2023020206283125000_btx294-B49","doi-asserted-by":"crossref","first-page":"217","DOI":"10.2307\/2983604","article-title":"Contingency table involving small numbers and the \u03c72 test","volume":"S1","author":"Yates","year":"1934","journal-title":"J. R Stat. Soc"},{"key":"2023020206283125000_btx294-B50","doi-asserted-by":"crossref","first-page":"6036.","DOI":"10.1038\/srep06036","article-title":"Exome sequencing identifies frequent mutation of MLL2 in non-small cell lung carcinoma from Chinese patients","volume":"4","author":"Yin","year":"2014","journal-title":"Sci. Rep"},{"key":"2023020206283125000_btx294-B51","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/s10555-015-9558-0","article-title":"Global analysis of chromosome 1 genes among patients with lung adenocarcinoma, squamous carcinoma, large-cell carcinoma, small-cell carcinoma, or non-cancer","volume":"34","author":"Zhang","year":"2015","journal-title":"Cancer Metastasis Rev"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/17\/2631\/49041241\/bioinformatics_33_17_2631.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/17\/2631\/49041241\/bioinformatics_33_17_2631.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:01:22Z","timestamp":1750215682000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/17\/2631\/3791807"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,5,3]]},"references-count":51,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2017,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx294","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,9,1]]},"published":{"date-parts":[[2017,5,3]]}}}