{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T22:48:33Z","timestamp":1767998913082,"version":"3.49.0"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2021,5,10]],"date-time":"2021-05-10T00:00:00Z","timestamp":1620604800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["NHLBI K01HL140261"],"award-info":[{"award-number":["NHLBI K01HL140261"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["NIDDK DK54759"],"award-info":[{"award-number":["NIDDK DK54759"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["NIEHS ES005605"],"award-info":[{"award-number":["NIEHS ES005605"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,10,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Single-cell RNA-sequencing (scRNA-seq) provides more granular biological information than bulk RNA-sequencing; bulk RNA sequencing remains popular due to lower costs which allows processing more biological replicates and design more powerful studies. As scRNA-seq costs have decreased, collecting data from more than one biological replicate has become more feasible, but careful modeling of different layers of biological variation remains challenging for many users. Here, we propose a statistical model for scRNA-seq gene counts, describe a simple method for estimating model parameters and show that failing to account for additional biological variation in scRNA-seq studies can inflate false discovery rates (FDRs) of statistical tests.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>First, in a simulation study, we show that when the gene expression distribution of a population of cells varies between subjects, a na\u00efve approach to differential expression analysis will inflate the FDR. We then compare multiple differential expression testing methods on scRNA-seq datasets from human samples and from animal models. These analyses suggest that a na\u00efve approach to differential expression testing could lead to many false discoveries; in contrast, an approach based on pseudobulk counts has better FDR control.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>A software package, aggregateBioVar, is freely available on Bioconductor (https:\/\/www.bioconductor.org\/packages\/release\/bioc\/html\/aggregateBioVar.html) to accommodate compatibility with upstream and downstream methods in scRNA-seq data analysis pipelines.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Raw gene-by-cell count matrices for pig scRNA-seq data are available as GEO accession GSE150211. Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab337","type":"journal-article","created":{"date-parts":[[2021,4,30]],"date-time":"2021-04-30T19:18:09Z","timestamp":1619810289000},"page":"3243-3251","source":"Crossref","is-referenced-by-count":25,"title":["Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with<i>aggregateBioVar<\/i>"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7777-4165","authenticated-orcid":false,"given":"Andrew L","family":"Thurman","sequence":"first","affiliation":[{"name":"Department of Internal Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa , Iowa City, IA 52242, USA"}]},{"given":"Jason A","family":"Ratcliff","sequence":"additional","affiliation":[{"name":"Iowa Institute of Human Genetics, Roy J. and Lucille A. Carver College of Medicine, University of Iowa , Iowa City, IA 52242, USA"}]},{"given":"Michael S","family":"Chimenti","sequence":"additional","affiliation":[{"name":"Iowa Institute of Human Genetics, Roy J. and Lucille A. Carver College of Medicine, University of Iowa , Iowa City, IA 52242, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7544-5109","authenticated-orcid":false,"given":"Alejandro A","family":"Pezzulo","sequence":"additional","affiliation":[{"name":"Department of Internal Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa , Iowa City, IA 52242, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,5,10]]},"reference":[{"key":"2023051608284627600_btab337-B1","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1007\/978-1-4939-9240-9_8","article-title":"Seq-Well: a sample-efficient, portable picowell platform for massively parallel single-cell RNA sequencing","volume":"1979","author":"Aicher","year":"2019","journal-title":"Methods Mol. Biol"},{"key":"2023051608284627600_btab337-B2","doi-asserted-by":"crossref","first-page":"845","DOI":"10.1164\/rccm.201510-2112OC","article-title":"Newborn cystic fibrosis pigs have a blunted early response to an inflammatory stimulus","volume":"194","author":"Bartlett","year":"2016","journal-title":"Am. J. Respir. Crit. Care Med"},{"key":"2023051608284627600_btab337-B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023051608284627600_btab337-B4","doi-asserted-by":"crossref","first-page":"eaar5780","DOI":"10.1126\/science.aar5780","article-title":"The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution","volume":"360","author":"Briggs","year":"2018","journal-title":"Science"},{"key":"2023051608284627600_btab337-B5","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023051608284627600_btab337-B6","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1126\/science.aam8940","article-title":"Comprehensive single-cell transcriptional profiling of a multicellular organism","volume":"357","author":"Cao","year":"2017","journal-title":"Science"},{"key":"2023051608284627600_btab337-B7","doi-asserted-by":"crossref","first-page":"1540","DOI":"10.1164\/rccm.201904-0792OC","article-title":"Single-cell reconstruction of human basal cell diversity in normal and idiopathic pulmonary fibrosis lungs","volume":"202","author":"Carraro","year":"2020","journal-title":"Am. J. Respir. Crit. Care Med"},{"key":"2023051608284627600_btab337-B8","doi-asserted-by":"crossref","first-page":"317","DOI":"10.3389\/fgene.2019.00317","article-title":"Single-cell RNA-seq technologies and related computational data analysis","volume":"10","author":"Chen","year":"2019","journal-title":"Front. Genet"},{"key":"2023051608284627600_btab337-B9","doi-asserted-by":"crossref","first-page":"6077","DOI":"10.1038\/s41467-020-19894-4","article-title":"Muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data","volume":"11","author":"Crowell","year":"2020","journal-title":"Nat. Commun"},{"key":"2023051608284627600_btab337-B10","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1186\/s12859-016-0944-6","article-title":"Discrete distributional differential expression (D3E)\u2013a tool for gene expression analysis of single-cell RNA-seq data","volume":"17","author":"Delmans","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023051608284627600_btab337-B11","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1186\/s13059-015-0844-5","article-title":"MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data","volume":"16","author":"Finak","year":"2015","journal-title":"Genome Biol"},{"key":"2023051608284627600_btab337-B12","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baz046","article-title":"PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data","volume":"2019","author":"Franzen","year":"2019","journal-title":"Database (Oxford)"},{"key":"2023051608284627600_btab337-B13","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1038\/s41587-019-0372-z","article-title":"Highly multiplexed single-cell RNA-seq by DNA oligonucleotide tagging of cellular proteins","volume":"38","author":"Gehring","year":"2020","journal-title":"Nat. Biotechnol"},{"key":"2023051608284627600_btab337-B14","volume-title":"Data Analysis Using Regression and Multilevel\/Hierarchical Models","author":"Gelman","year":"2007"},{"key":"2023051608284627600_btab337-B15","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1038\/nmeth.4179","article-title":"Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput","volume":"14","author":"Gierahn","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051608284627600_btab337-B16","doi-asserted-by":"crossref","first-page":"e1004575","DOI":"10.1371\/journal.pcbi.1004575","article-title":"SINCERA: a pipeline for single-cell RNA-seq profiling analysis","volume":"11","author":"Guo","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023051608284627600_btab337-B17","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1186\/1471-2105-11-422","article-title":"baySeq: empirical Bayesian methods for identifying differential expression in sequence count data","volume":"11","author":"Hardcastle","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023051608284627600_btab337-B18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s12276-018-0071-8","article-title":"Single-cell RNA sequencing technologies and bioinformatics pipelines","volume":"50","author":"Hwang","year":"2018","journal-title":"Exp. Mol. Med"},{"key":"2023051608284627600_btab337-B19","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1038\/nbt.4042","article-title":"Multiplexed droplet single-cell RNA-sequencing using natural genetic variation","volume":"36","author":"Kang","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023051608284627600_btab337-B20","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1038\/nmeth.2967","article-title":"Bayesian approach to single-cell differential expression analysis","volume":"11","author":"Kharchenko","year":"2014","journal-title":"Nat. Methods"},{"key":"2023051608284627600_btab337-B21","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"2023051608284627600_btab337-B22","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1186\/s13059-016-1077-y","article-title":"A statistical approach for identifying differential distributions in single-cell RNA-seq experiments","volume":"17","author":"Korthauer","year":"2016","journal-title":"Genome Biol"},{"key":"2023051608284627600_btab337-B23","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/s13059-020-1926-6","article-title":"Eleven grand challenges in single-cell data science","volume":"21","author":"Lahnemann","year":"2020","journal-title":"Genome Biol"},{"key":"2023051608284627600_btab337-B24","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.1093\/bioinformatics\/btt087","article-title":"EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments","volume":"29","author":"Leng","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051608284627600_btab337-B25","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2023051608284627600_btab337-B26","doi-asserted-by":"crossref","first-page":"e8746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single-cell RNA-seq analysis: a tutorial","volume":"15","author":"Luecken","year":"2019","journal-title":"Mol. Syst. Biol"},{"key":"2023051608284627600_btab337-B27","first-page":"2122","article-title":"A step-by-step workflow for low-level analysis of single-cell RNA-seq data with bioconductor","volume":"5","author":"Lun","year":"2016","journal-title":"F1000Res"},{"key":"2023051608284627600_btab337-B28","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"Macosko","year":"2015","journal-title":"Cell"},{"key":"2023051608284627600_btab337-B29","doi-asserted-by":"crossref","first-page":"1179","DOI":"10.1093\/bioinformatics\/btw777","article-title":"Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R","volume":"33","author":"McCarthy","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051608284627600_btab337-B30","doi-asserted-by":"crossref","first-page":"3223","DOI":"10.1093\/bioinformatics\/bty332","article-title":"DEsingle for detecting three types of differential expression in single-cell RNA-seq data","volume":"34","author":"Miao","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051608284627600_btab337-B31","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1186\/s13059-019-1676-5","article-title":"Comparative analysis of sequencing technologies for single-cell transcriptomics","volume":"20","author":"Natarajan","year":"2019","journal-title":"Genome Biol"},{"key":"2023051608284627600_btab337-B32","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1038\/nmeth.4150","article-title":"Single-cell mRNA quantification and differential analysis with Census","volume":"14","author":"Qiu","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051608284627600_btab337-B33","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1038\/nmeth.4402","article-title":"Reversed graph embedding resolves complex single-cell trajectories","volume":"14","author":"Qiu","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051608284627600_btab337-B34","doi-asserted-by":"crossref","first-page":"1517","DOI":"10.1164\/rccm.201712-2410OC","article-title":"Single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis","volume":"199","author":"Reyfman","year":"2019","journal-title":"Am. J. Respir. Crit. Care Med"},{"key":"2023051608284627600_btab337-B35","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a Bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051608284627600_btab337-B36","doi-asserted-by":"crossref","first-page":"1837","DOI":"10.1126\/science.1163600","article-title":"Disruption of the CFTR gene produces a model of cystic fibrosis in newborn pigs","volume":"321","author":"Rogers","year":"2008","journal-title":"Science"},{"key":"2023051608284627600_btab337-B37","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1126\/science.aam8999","article-title":"Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding","volume":"360","author":"Rosenberg","year":"2018","journal-title":"Science"},{"key":"2023051608284627600_btab337-B38","volume-title":"A First Course in Probability","author":"Ross","year":"2019"},{"key":"2023051608284627600_btab337-B39","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023051608284627600_btab337-B40","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1038\/s42003-020-0922-4","article-title":"Single-cell transcriptomes of the human skin reveal age-related loss of fibroblast priming","volume":"3","author":"Sole-Boldo","year":"2020","journal-title":"Commun. Biol"},{"key":"2023051608284627600_btab337-B41","doi-asserted-by":"crossref","first-page":"29ra31","DOI":"10.1126\/scitranslmed.3000928","article-title":"Cystic fibrosis pigs develop lung disease and exhibit defective bacterial eradication at birth","volume":"2","author":"Stoltz","year":"2010","journal-title":"Sci. Transl. Med"},{"key":"2023051608284627600_btab337-B42","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"2023051608284627600_btab337-B43","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1038\/nbt.2859","article-title":"The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells","volume":"32","author":"Trapnell","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023051608284627600_btab337-B44","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1146\/annurev-biodatasci-072018-021255","article-title":"RNA sequencing data: Hitchhiker\u2019s guide to expression analysis","volume":"2","author":"Van den Berge","year":"2019","journal-title":"Annu. Rev. Biomed. Da S"},{"key":"2023051608284627600_btab337-B45","doi-asserted-by":"crossref","first-page":"4667","DOI":"10.1038\/s41467-019-12266-7","article-title":"A systematic evaluation of single cell RNA-seq analysis pipelines","volume":"10","author":"Vieth","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051608284627600_btab337-B46","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1038\/nmeth.4154","article-title":"Sequencing thousands of single-cell genomes with combinatorial indexing","volume":"14","author":"Vitak","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051608284627600_btab337-B47","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1186\/s12859-019-2599-6","article-title":"Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data","volume":"20","author":"Wang","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023051608284627600_btab337-B48","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.ymeth.2018.04.017","article-title":"SigEMD: A powerful method for differential gene expression analysis in single-cell RNA sequencing data","volume":"145","author":"Wang","year":"2018","journal-title":"Methods"},{"key":"2023051608284627600_btab337-B49","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1042\/BST20191010","article-title":"Using single-cell RNA sequencing to unravel cell lineage relationships in the respiratory tract","volume":"48","author":"Zaragosi","year":"2020","journal-title":"Biochem. Soc. Trans"},{"key":"2023051608284627600_btab337-B50","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1016\/j.molcel.2018.10.020","article-title":"Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems","volume":"73","author":"Zhang","year":"2019","journal-title":"Mol. Cell"},{"key":"2023051608284627600_btab337-B51","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1016\/j.molcel.2017.01.023","article-title":"Comparative analysis of single-cell RNA sequencing methods","volume":"65","author":"Ziegenhain","year":"2017","journal-title":"Mol. Cell"},{"key":"2023051608284627600_btab337-B52","doi-asserted-by":"crossref","first-page":"738","DOI":"10.1038\/s41467-021-21038-1","article-title":"A practical solution to pseudoreplication bias in single-cell studies","volume":"12","author":"Zimmerman","year":"2021","journal-title":"Nat. Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab337\/38602833\/btab337.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3243\/50338723\/btab337.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3243\/50338723\/btab337.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T00:13:44Z","timestamp":1724976824000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/19\/3243\/6273181"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,5,10]]},"references-count":52,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2021,10,11]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab337","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,10,1]]},"published":{"date-parts":[[2021,5,10]]}}}