{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T17:46:04Z","timestamp":1772905564442,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2017,11,27]],"date-time":"2017-11-27T00:00:00Z","timestamp":1511740800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11701385"],"award-info":[{"award-number":["11701385"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61501389"],"award-info":[{"award-number":["61501389"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11671268"],"award-info":[{"award-number":["11671268"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001747","name":"Hong Kong Baptist University","doi-asserted-by":"publisher","award":["FRG1\/16-17\/018 and FRG2\/16-17\/074"],"award-info":[{"award-number":["FRG1\/16-17\/018 and FRG2\/16-17\/074"]}],"id":[{"id":"10.13039\/501100001747","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005847","name":"Health and Medical Research Fund","doi-asserted-by":"publisher","award":["04150476"],"award-info":[{"award-number":["04150476"]}],"id":[{"id":"10.13039\/501100005847","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11671338"],"award-info":[{"award-number":["11671338"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research. As RNA-seq data are discrete, statistical methods developed for classifying microarray data cannot be readily applied for RNA-seq data classification. Witten proposed a Poisson linear discriminant analysis (PLDA) to classify the RNA-seq data in 2011. Note, however, that the count datasets are frequently characterized by excess zeros in real RNA-seq or microRNA sequence data (i.e. when the sequence depth is not enough or small RNAs with the length of 18\u201330 nucleotides). Therefore, it is desired to develop a new model to analyze RNA-seq data with an excess of zeros.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this paper, we propose a Zero-Inflated Poisson Logistic Discriminant Analysis (ZIPLDA) for RNA-seq data with an excess of zeros. The new method assumes that the data are from a mixture of two distributions: one is a point mass at zero, and the other follows a Poisson distribution. We then consider a logistic relation between the probability of observing zeros and the mean of the genes and the sequencing depth in the model. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. Two real datasets including a breast cancer RNA-seq dataset and a microRNA-seq dataset are also analyzed, and they coincide with the simulation results that our proposed method outperforms the existing competitors.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The software is available at http:\/\/www.math.hkbu.edu.hk\/\u223ctongt.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx768","type":"journal-article","created":{"date-parts":[[2017,11,24]],"date-time":"2017-11-24T12:10:32Z","timestamp":1511525432000},"page":"1329-1335","source":"Crossref","is-referenced-by-count":16,"title":["Classifying next-generation sequencing data using a zero-inflated Poisson model"],"prefix":"10.1093","volume":"34","author":[{"given":"Yan","family":"Zhou","sequence":"first","affiliation":[{"name":"College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, Shenzhen, China"}]},{"given":"Xiang","family":"Wan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, and Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Kowloon Tong, Hong Kong"}]},{"given":"Baoxue","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Statistics, Capital University of Economics and Business, Beijing, China"}]},{"given":"Tiejun","family":"Tong","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong"}]}],"member":"286","published-online":{"date-parts":[[2017,11,27]]},"reference":[{"key":"2023012713004607600_btx768-B1","doi-asserted-by":"crossref","first-page":"R106.","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"Anders","year":"2010","journal-title":"Genome Biol"},{"key":"2023012713004607600_btx768-B2","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1126\/science.1159018","article-title":"Slicing and dicing for small RNAs","volume":"320","author":"Birchler","year":"2008","journal-title":"Science"},{"key":"2023012713004607600_btx768-B3","doi-asserted-by":"crossref","first-page":"94.","DOI":"10.1186\/1471-2105-11-94","article-title":"Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments","volume":"11","author":"Bullard","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012713004607600_btx768-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood estimation from incomplete data via the EM Algorithm","volume":"9","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023012713004607600_btx768-B5","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1093\/bib\/bbs046","article-title":"A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis","volume":"14","author":"Dillies","year":"2013","journal-title":"Brief. Bioinf"},{"key":"2023012713004607600_btx768-B6","doi-asserted-by":"crossref","first-page":"369.","DOI":"10.1186\/s12859-016-1208-1","article-title":"NBLDA: negative binomial linear discriminant analysis for RNA-Seq data","volume":"17","author":"Dong","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023012713004607600_btx768-B7","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1198\/016214502753479248","article-title":"Comparison of discrimination methods for the classification of tumors using gene expression data","volume":"97","author":"Dudoit","year":"2002","journal-title":"J. Am. Stat. Assoc"},{"key":"2023012713004607600_btx768-B9","doi-asserted-by":"crossref","first-page":"1096","DOI":"10.1111\/j.1541-0420.2010.01395.x","article-title":"Bias-corrected diagonal discriminant rules for high-dimensional classification","volume":"66","author":"Huang","year":"2010","journal-title":"Biometrics"},{"key":"2023012713004607600_btx768-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2307\/1269547","article-title":"Zero-inflated Poisson regression, with an application to defects in manufacturing","volume":"34","author":"Lambert","year":"1992","journal-title":"Technometrics"},{"key":"2023012713004607600_btx768-B11","doi-asserted-by":"crossref","first-page":"S7.","DOI":"10.1186\/1471-2164-15-S10-S7","article-title":"LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data","volume":"15","author":"Lin","year":"2014","journal-title":"BMC Genomics"},{"key":"2023012713004607600_btx768-B12","doi-asserted-by":"crossref","first-page":"1701","DOI":"10.1093\/bioinformatics\/btw061","article-title":"A zero-inflated Poisson model for insertion tolerance analysis of genes based on Tn-seq data","volume":"32","author":"Liu","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713004607600_btx768-B13","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/978-3-319-07212-8_2","volume-title":"Statistical Analysis of Next Generation Sequencing Data","author":"Lorenz","year":"2014"},{"key":"2023012713004607600_btx768-B14","doi-asserted-by":"crossref","first-page":"550.","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol."},{"key":"2023012713004607600_btx768-B15","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1146\/annurev.genom.9.081307.164359","article-title":"Next-generation DNA sequencing methods","volume":"9","author":"Mardis","year":"2008","journal-title":"Annu. Rev. Genomics Hum. Genet"},{"key":"2023012713004607600_btx768-B16","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1101\/gr.079558.108","article-title":"RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays","volume":"18","author":"Marioni","year":"2008","journal-title":"Genome Res"},{"key":"2023012713004607600_btx768-B17","volume-title":"Support Vector Machines on Large Data Sets: Simple Parallel Approaches","author":"Meyer","year":"2014"},{"key":"2023012713004607600_btx768-B18","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1146\/annurev-genom-082908-145957","article-title":"Applications of new sequencing technologies for transcriptome analysis","volume":"10","author":"Morozova","year":"2009","journal-title":"Annu. Rev. Genomics Hum. Genet"},{"key":"2023012713004607600_btx768-B19","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s13385-012-0056-2","article-title":"Poisson regression and Zero-inflated Poisson regression: application to private health insurance data","volume":"2","author":"Mouatassim","year":"2012","journal-title":"Eur. Actuarial J"},{"key":"2023012713004607600_btx768-B20","author":"Ridout","year":"1998"},{"key":"2023012713004607600_btx768-B21","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511812651","volume-title":"Pattern Recognition and Neural Networks","author":"Ripley","year":"1996"},{"key":"2023012713004607600_btx768-B22","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1093\/biostatistics\/kxm030","article-title":"Small-sample estimation of negative binomial dispersion, with applications to SAGE data","volume":"9","author":"Robinson","year":"2008","journal-title":"Biostatistics"},{"key":"2023012713004607600_btx768-B23","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a Bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012713004607600_btx768-B24","doi-asserted-by":"crossref","first-page":"R25.","DOI":"10.1186\/gb-2010-11-3-r25","article-title":"A scaling normalization method for differential expression analysis of RNA-seq data","volume":"11","author":"Robinson","year":"2010","journal-title":"Genome Biol"},{"key":"2023012713004607600_btx768-B25","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1038\/nrm2347","article-title":"Small non-coding RNAs in animal development","volume":"9","author":"Stefani","year":"2008","journal-title":"Nat. Rev. Mol. Cell Biol"},{"key":"2023012713004607600_btx768-B26","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1007\/978-3-319-07212-8_11","volume-title":"Statistical Analysis of Next Generation Sequencing Data","author":"Tan","year":"2014"},{"key":"2023012713004607600_btx768-B27","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1038\/nature13480","article-title":"Comprehensive molecular characterization of gastric adenocarcinoma","volume":"513","author":"The Cancer Genome Atlas Research Network","year":"2014","journal-title":"Nature"},{"key":"2023012713004607600_btx768-B28","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"RNA-Seq: a revolutionary tool for transcriptomics","volume":"10","author":"Wang","year":"2009","journal-title":"Nat. Rev. Genet"},{"key":"2023012713004607600_btx768-B29","doi-asserted-by":"crossref","first-page":"58.","DOI":"10.1186\/1741-7007-8-58","article-title":"Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls","volume":"8","author":"Witten","year":"2010","journal-title":"BMC Biology"},{"key":"2023012713004607600_btx768-B30","doi-asserted-by":"crossref","first-page":"2493","DOI":"10.1214\/11-AOAS493","article-title":"Classification and clustering of sequencing data using a Poisson model","volume":"5","author":"Witten","year":"2011","journal-title":"Ann. Appl. Stat"},{"key":"2023012713004607600_btx768-B31","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1002\/jat.3358","article-title":"Identification of microRNA biomarker candidates in urine and plasma from rats with kidney or liver damage","volume":"37","author":"Wolenski","year":"2017","journal-title":"J. Appl. Toxicol"},{"key":"2023012713004607600_btx768-B32","doi-asserted-by":"crossref","first-page":"e0169594.","DOI":"10.1371\/journal.pone.0169594","article-title":"A hypothesis testing based method for normalization and differential expression analysis of RNA-Seq data","volume":"12","author":"Zhou","year":"2017","journal-title":"PLoS One"},{"key":"2023012713004607600_btx768-B33","doi-asserted-by":"crossref","first-page":"1099","DOI":"10.1089\/cmb.2017.0029","article-title":"GD-RDA: a new regularized discriminant analysis for high dimensional data","volume":"24","author":"Zhou","year":"2017","journal-title":"J. Comput. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/8\/1329\/48914929\/bioinformatics_34_8_1329.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/8\/1329\/48914929\/bioinformatics_34_8_1329.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,29]],"date-time":"2024-06-29T00:50:40Z","timestamp":1719622240000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/8\/1329\/4665421"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,11,27]]},"references-count":32,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2018,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx768","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,4,15]]},"published":{"date-parts":[[2017,11,27]]}}}