{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T07:48:48Z","timestamp":1769845728185,"version":"3.49.0"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2023,7,28]],"date-time":"2023-07-28T00:00:00Z","timestamp":1690502400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/100000065","name":"National Institute of Neurological Disorders and Stroke","doi-asserted-by":"publisher","award":["R01 NS117372"],"award-info":[{"award-number":["R01 NS117372"]}],"id":[{"id":"10.13039\/100000065","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000065","name":"National Institute of Neurological Disorders and Stroke","doi-asserted-by":"publisher","award":["R21 NS121284"],"award-info":[{"award-number":["R21 NS121284"]}],"id":[{"id":"10.13039\/100000065","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014370","name":"Simons Foundation Autism Research Initiative","doi-asserted-by":"publisher","award":["551354"],"award-info":[{"award-number":["551354"]}],"id":[{"id":"10.13039\/100014370","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000874","name":"Brain and Behavior Research Foundation","doi-asserted-by":"publisher","award":["27792"],"award-info":[{"award-number":["27792"]}],"id":[{"id":"10.13039\/100000874","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have been used to describe the distribution of gene expression across cells, a comprehensive assessment of the different models is missing. Moreover, the growing number of features associated with scRNA-seq datasets creates new challenges for analytical accuracy and computing speed. Here, we developed a Python-based package (TensorZINB) to solve the zero-inflated negative binomial (ZINB) model using the TensorFlow deep learning framework. We used a sequential initialization method to solve the numerical stability issues associated with hurdle and zero-inflated models. A recursive feature selection protocol was used to optimize feature selections for data processing and downstream differentially expressed gene (DEG) analysis. We proposed a class of hybrid models combining nested models to further improve the model\u2019s performance. Additionally, we developed a new method to convert a continuous distribution to its equivalent discrete form, so that statistical models can be fairly compared. Finally, we showed that the proposed TensorFlow algorithm (TensorZINB) was numerically stable and that its computing speed and performance were superior to those of existing ZINB solvers. Moreover, we implemented seven hurdle and zero-inflated statistical models in Python and systematically assessed their performance using a real scRNA-seq dataset. We demonstrated that the ZINB model achieved the lowest Akaike information criterion compared with other models tested. Taken together, TensorZINB was accurate, efficient and scalable for the implementation of ZINB and for large-scale scRNA-seq data analysis with DEG identification.<\/jats:p>","DOI":"10.1093\/bib\/bbad272","type":"journal-article","created":{"date-parts":[[2023,7,28]],"date-time":"2023-07-28T23:36:09Z","timestamp":1690587369000},"source":"Crossref","is-referenced-by-count":6,"title":["A comprehensive assessment of hurdle and zero-inflated models for single cell RNA-sequencing analysis"],"prefix":"10.1093","volume":"24","author":[{"given":"Tao","family":"Cui","sequence":"first","affiliation":[{"name":"Department of Pharmacology and Physiology Georgetown University Medical Center \u00a0 SE407 Med\/Dent 3900 Reservoir Road, N.W. Washington D.C. , USA"}]},{"given":"Tingting","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Pharmacology and Physiology Georgetown University Medical Center \u00a0 SE407 Med\/Dent 3900 Reservoir Road, N.W. Washington D.C. , USA"}]}],"member":"286","published-online":{"date-parts":[[2023,7,28]]},"reference":[{"issue":"8","key":"2023092216531198400_ref1","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1038\/s41581-018-0021-7","article-title":"Single-cell RNA sequencing for the study of development, physiology and disease","volume":"14","author":"Potter","year":"2018","journal-title":"Nat Rev Nephrol"},{"issue":"3","key":"2023092216531198400_ref2","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1038\/nrg.2015.16","article-title":"Single-cell genome sequencing: current state of the science","volume":"17","author":"Gawad","year":"2016","journal-title":"Nat Rev Genet"},{"issue":"1","key":"2023092216531198400_ref3","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1038\/nn.3881","article-title":"Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing","volume":"18","author":"Usoskin","year":"2015","journal-title":"Nat Neurosci"},{"issue":"6335","key":"2023092216531198400_ref4","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1126\/science.aah4573","article-title":"Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors","volume":"356","author":"Villani","year":"2017","journal-title":"Science"},{"key":"2023092216531198400_ref5","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1186\/s13059-015-0844-5","article-title":"Mast: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data","volume":"16","author":"Finak","year":"2015","journal-title":"Genome Biol"},{"issue":"1","key":"2023092216531198400_ref6","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1186\/s13059-018-1438-9","article-title":"Umi-count modeling and differential expression analysis for single-cell rna sequencing","volume":"19","author":"Chen","year":"2018","journal-title":"Genome Biol"},{"issue":"1","key":"2023092216531198400_ref7","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1038\/s41467-017-02554-5","article-title":"A general and flexible method for signal extraction from single-cell RNA-seq data","volume":"9","author":"Risso","year":"2018","journal-title":"Nat Commun"},{"issue":"6441","key":"2023092216531198400_ref8","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1126\/science.aav8130","article-title":"Single-cell genomics identifies cell type-specific molecular changes in autism","volume":"364","author":"Velmeshev","year":"2019","journal-title":"Science"},{"issue":"Suppl 6","key":"2023092216531198400_ref9","doi-asserted-by":"crossref","first-page":"689","DOI":"10.1186\/s12864-017-4019-5","article-title":"Saic: an iterative clustering approach for analysis of single cell RNA-seq data","volume":"18","author":"Yang","year":"2017","journal-title":"BMC Genomics"},{"key":"2023092216531198400_ref10","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"Hastie","year":"2016"},{"issue":"1","key":"2023092216531198400_ref11","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1186\/s12864-020-07302-6","article-title":"Joint for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing","volume":"22","author":"Cui","year":"2021","journal-title":"BMC Genomics"},{"key":"2023092216531198400_ref12","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1214\/aoms\/1177732360","article-title":"The large-sample distribution of the likelihood ratio for testing composite hypotheses","volume":"9","author":"Wilks","year":"1938","journal-title":"Ann Math Stat"},{"key":"2023092216531198400_ref13","volume-title":"Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation OSDI'16","author":"Abadi","year":"2016"},{"key":"2023092216531198400_ref14","first-page":"1\u201334.","article-title":"The VGAM package for categorical data analysis","volume-title":"J Stat Softw","author":"Yee","year":"2010"},{"key":"2023092216531198400_ref15","doi-asserted-by":"crossref","DOI":"10.25080\/Majora-92bf1922-011","article-title":"Statsmodels: econometric and statistical modeling with python","volume-title":"Proceedings of the 9th Python in Science Conference (SCIPY 2010)","author":"Seabold","year":"2010"},{"key":"2023092216531198400_ref16","doi-asserted-by":"crossref","DOI":"10.18637\/jss.v076.i01","article-title":"Stan: a probabilistic programming language","volume-title":"J Stat Softw","author":"Carpenter","year":"2017"},{"issue":"1","key":"2023092216531198400_ref17","doi-asserted-by":"crossref","first-page":"5692","DOI":"10.1038\/s41467-021-25960-2","article-title":"Confronting false discoveries in single-cell differential expression","volume":"12","author":"Squair","year":"2021","journal-title":"Nat Commun"},{"issue":"1","key":"2023092216531198400_ref18","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: A practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J R Stat Soc B Methodol"},{"key":"2023092216531198400_ref19","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511814365","volume-title":"Regression Analysis of Count Data","author":"Colin, Cameron","year":"1998"},{"key":"2023092216531198400_ref20","volume-title":"Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists","author":"Zheng","year":"2018"},{"issue":"2","key":"2023092216531198400_ref21","doi-asserted-by":"crossref","first-page":"307","DOI":"10.2307\/1912557","article-title":"Likelihood ratio tests for model selection and non-nested hypotheses","volume":"57","author":"Vuong","year":"1989","journal-title":"Econometrica"},{"issue":"2","key":"2023092216531198400_ref22","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1016\/j.econlet.2014.12.029","article-title":"The misuse of the Vuong test for non-nested models to test for zero-inflation","volume":"127","author":"Wilson","year":"2015","journal-title":"Econ Lett"},{"key":"2023092216531198400_ref23","volume-title":"Statistical Tables for Biological, Agricultural and Medical Research","author":"Fisher","year":"1938"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/5\/bbad272\/51711145\/bbad272.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/5\/bbad272\/51711145\/bbad272.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T20:42:56Z","timestamp":1729716176000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad272\/7233057"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,28]]},"references-count":23,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,9,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad272","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,9]]},"published":{"date-parts":[[2023,7,28]]},"article-number":"bbad272"}}