{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,15]],"date-time":"2025-08-15T02:24:51Z","timestamp":1755224691149,"version":"3.43.0"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T00:00:00Z","timestamp":1754956800000},"content-version":"vor","delay-in-days":42,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Differential expression analysis provides insights into fundamental biological processes and with the advent of single-cell transcriptomics, gene expression can now be studied at the level of individual cells. Many analyses treat cells as samples and assume statistical independence. As cells are pseudoreplicates, this assumption does not hold, leading to reduced robustness, reproducibility, and an inflated type 1 error rate. In this study, we investigate various methods for differential expression analysis on single-cell data, conduct extensive benchmarking, and give recommendations for method choice. The tested methods include DESeq2, MAST, DREAM, scVI, the permutation test, distinct, and the t-test. We additionally adapt hierarchical bootstrapping to differential expression analysis on single-cell data and include it in our benchmark. We found that differential expression analysis methods designed specifically for single-cell data do not offer performance advantages over conventional pseudobulk methods such as DESeq2 when applied to individual datasets. In addition, they mostly require significantly longer run times. For atlas-level analysis, permutation-based methods excel in performance but show poor runtime, suggesting to use DREAM as a compromise between quality and runtime. Overall, our study offers the community a valuable benchmark of methods across diverse scenarios and offers guidelines on method selection.<\/jats:p>","DOI":"10.1093\/bib\/bbaf397","type":"journal-article","created":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T11:50:48Z","timestamp":1754567448000},"source":"Crossref","is-referenced-by-count":0,"title":["Single-cell differential expression analysis between conditions within nested settings"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2705-1727","authenticated-orcid":false,"given":"Leon","family":"Hafner","sequence":"first","affiliation":[{"name":"Data Science in Systems Biology, School of Life Sciences, Technical University of Munich , Maximus-von-Imhof-Forum 3, 85354 Freising ,","place":["Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9584-7842","authenticated-orcid":false,"given":"Gregor","family":"Sturm","sequence":"additional","affiliation":[{"name":"Biocenter, Institute of Bioinformatics, Medical University of Innsbruck , Innrain 80-82\/Level 4, 6020 Innsbruck ,","place":["Austria"]},{"name":"Boehringer Ingelheim International Pharma GmbH & Co KG , Birkendorfer Strasse 65, 88397 Biberach\/Riss ,","place":["Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-0993-5439","authenticated-orcid":false,"given":"Sarah","family":"Lumpp","sequence":"additional","affiliation":[{"name":"Mathematical Statistics, Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich , Boltzmannstrasse 3, 85748 Garching (Munich) ,","place":["Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5614-3025","authenticated-orcid":false,"given":"Mathias","family":"Drton","sequence":"additional","affiliation":[{"name":"Mathematical Statistics, Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich , Boltzmannstrasse 3, 85748 Garching (Munich) ,","place":["Germany"]},{"name":"Munich Center for Machine Learning , Ludwig Maximilian University Munich, Institute for Informatics, Oettingenstrasse 67, 80538 Munich,","place":["Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0941-4168","authenticated-orcid":false,"given":"Markus","family":"List","sequence":"additional","affiliation":[{"name":"Data Science in Systems Biology, School of Life Sciences, Technical University of Munich , Maximus-von-Imhof-Forum 3, 85354 Freising ,","place":["Germany"]},{"name":"Munich Data Science Institute, Technical University of Munich , Walther-von-Dyck-Strasse 10, 85748 Garching (Munich) ,","place":["Germany"]}]}],"member":"286","published-online":{"date-parts":[[2025,8,12]]},"reference":[{"key":"2025081220575108000_ref1","doi-asserted-by":"publisher","first-page":"738","DOI":"10.1038\/s41467-021-21038-1","article-title":"A practical solution to pseudoreplication bias in single-cell studies","volume":"12","author":"Zimmerman","year":"2021","journal-title":"Nat Commun"},{"key":"2025081220575108000_ref2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-25960-2","article-title":"Confronting false discoveries in single-cell differential expression","volume":"12","author":"Squair","year":"2021","journal-title":"Nat Commun"},{"key":"2025081220575108000_ref3","doi-asserted-by":"publisher","first-page":"7851","DOI":"10.1038\/s41467-022-35519-4","article-title":"A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis","volume":"13","author":"Murphy","year":"2022","journal-title":"Nat Commun"},{"key":"2025081220575108000_ref4","doi-asserted-by":"publisher","first-page":"7852","DOI":"10.1038\/s41467-022-35520-x","article-title":"Reply to: a balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis","volume":"13","author":"Zimmerman","year":"2022","journal-title":"Nat Commun"},{"key":"2025081220575108000_ref5","doi-asserted-by":"publisher","DOI":"10.1214\/22-AOAS1689","article-title":"distinct: a novel approach to differential distribution analyses","volume":"17","author":"Tiberi","year":"2023","journal-title":"The Annals of Applied Statistics"},{"key":"2025081220575108000_ref6","doi-asserted-by":"publisher","first-page":"1487","DOI":"10.1080\/02664760903046102","article-title":"Nonparametric bootstrapping for hierarchical data","volume":"37","author":"Ren","year":"2010","journal-title":"J Appl Stat"},{"key":"2025081220575108000_ref7","article-title":"Application of the hierarchical bootstrap to multi-level data in neuroscience","volume":"3","author":"Saravanan","year":"2020","journal-title":"Neuron Behav Data Anal Theory"},{"key":"2025081220575108000_ref8","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1038\/s41576-023-00586-w","article-title":"Best practices for single-cell analysis across modalities","volume":"24","author":"Heumos","year":"2023","journal-title":"Nat Rev Genet"},{"key":"2025081220575108000_ref9","doi-asserted-by":"crossref","DOI":"10.32614\/CRAN.package.arm","volume-title":"Data Analysis Using Regression and Multilevel Hierarchical Models","author":"Gelman","year":"2007"},{"key":"2025081220575108000_ref10","doi-asserted-by":"crossref","DOI":"10.3389\/fpsyg.2011.00074","article-title":"Data with hierarchical structure: impact of intraclass correlation and sample size on Type-I error","volume":"2","author":"Musca","year":"2011","journal-title":"Front. Psychology"},{"key":"2025081220575108000_ref11","volume-title":"The Problem of Pseudoreplication in Neuroscientific Studies","author":"Lazic","year":"2010"},{"key":"2025081220575108000_ref12","doi-asserted-by":"crossref","first-page":"187","DOI":"10.2307\/1942661","article-title":"Pseudoreplication and the design of ecological field experiments","volume":"54","author":"Hurlbert","year":"1984","journal-title":"Ecol Monogr"},{"key":"2025081220575108000_ref13","volume-title":"Introduction to Modern Statistics","author":"Cetinkaya-Rundel","year":"2021"},{"key":"2025081220575108000_ref14","volume-title":"Extending the Linear Model with R","author":"Faraway","year":"2016"},{"key":"2025081220575108000_ref15","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1038\/s41562-017-0189-z","article-title":"Redefine statistical significance","volume":"2","author":"Benjamin","year":"2017","journal-title":"Nat Hum Behav"},{"key":"2025081220575108000_ref16","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1037\/h0078860","article-title":"Some myths concerning parametric and nonparametric tests","volume":"34","author":"Hunter","year":"1993","journal-title":"Can Psychol"},{"key":"2025081220575108000_ref17","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2025081220575108000_ref18","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1093\/bioinformatics\/btaa687","article-title":"Dream: powerful differential expression analysis for repeated measures designs","volume":"37","author":"Hoffman","year":"2020","journal-title":"Bioinformatics"},{"key":"2025081220575108000_ref19","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-015-0844-5","article-title":"MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data","volume":"16","author":"Finak","year":"2015","journal-title":"Genome Biol"},{"key":"2025081220575108000_ref20","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Methods"},{"key":"2025081220575108000_ref21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2307\/2331554","article-title":"The probable error of a mean","author":"Student","year":"1908","journal-title":"Biometrika"},{"key":"2025081220575108000_ref22","doi-asserted-by":"crossref","first-page":"R29","DOI":"10.1186\/gb-2014-15-2-r29","article-title":"voom: precision weights unlock linear model analysis tools for RNA-seq read counts","volume":"15","author":"Law","year":"2014","journal-title":"Genome Biol"},{"key":"2025081220575108000_ref23","volume-title":"Introduction to Robust Estimation and Hypothesis Testing","author":"Wilcox","year":"2022"},{"key":"2025081220575108000_ref24","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1038\/s41587-021-01206-w","article-title":"A python library for probabilistic analysis of single-cell omics data","volume":"40","author":"Gayoso","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2025081220575108000_ref25","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","article-title":"Benchmarking atlas-level data integration in single-cell genomics","volume":"19","author":"Luecken","year":"2021","journal-title":"Nat Methods"},{"key":"2025081220575108000_ref26","doi-asserted-by":"crossref","first-page":"1563","DOI":"10.1038\/s41591-023-02327-2","article-title":"An integrated cell atlas of the lung in health and disease","volume":"29","author":"Sikkema","year":"2023","journal-title":"Nat Med"},{"key":"2025081220575108000_ref27","doi-asserted-by":"crossref","first-page":"1503","DOI":"10.1016\/j.ccell.2022.10.008","article-title":"High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer","volume":"40","author":"Salcher","year":"2022","journal-title":"Cancer Cell"},{"key":"2025081220575108000_ref28","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1038\/nbt.3820","article-title":"Nextflow enables reproducible computational workflows","volume":"35","author":"Di Tommaso","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025081220575108000_ref29","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-021-02546-1","article-title":"splatPop: simulating population scale single-cell RNA sequencing data","volume":"22","author":"Azodi","year":"2021","journal-title":"Genome Biol"},{"key":"2025081220575108000_ref30","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2025081220575108000_ref31","doi-asserted-by":"crossref","first-page":"3573","DOI":"10.1016\/j.cell.2021.04.048","article-title":"Integrated analysis of multimodal single-cell data","volume":"184","author":"Hao","year":"2021","journal-title":"Cell"},{"key":"2025081220575108000_ref32","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1038\/nbt.4042","article-title":"Multiplexed droplet single-cell RNA-sequencing using natural genetic variation","volume":"36","author":"Kang","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025081220575108000_ref33","doi-asserted-by":"crossref","first-page":"D687","DOI":"10.1093\/nar\/gkab1028","article-title":"The reactome pathway knowledgebase 2022","volume":"50","author":"Gillespie","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025081220575108000_ref34","first-page":"01","article-title":"Evaluation: from precision, recall and F-factor to ROC, informedness, markedness & correlation","volume":"2","author":"Powers","year":"2008","journal-title":"Mach Learn Technol"},{"key":"2025081220575108000_ref35","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1111\/2041-210X.13140","article-title":"The area under the precision-recall curve as a performance metric for rare binary events","volume":"10","author":"Sofaer","year":"2019","journal-title":"Methods Ecol Evol"},{"key":"2025081220575108000_ref36","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PloS One"},{"key":"2025081220575108000_ref37","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1038\/s41587-019-0379-5","article-title":"Droplet scRNA-seq is not zero-inflated","volume":"38","author":"Svensson","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2025081220575108000_ref38","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.ymeth.2018.04.017","article-title":"SigEMD: a powerful method for differential gene expression analysis in single-cell RNA sequencing data","volume":"145","author":"Wang","year":"2018","journal-title":"Methods"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf397\/64027663\/bbaf397.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf397\/64027663\/bbaf397.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,13]],"date-time":"2025-08-13T00:58:02Z","timestamp":1755046682000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf397\/8232550"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf397","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7]]},"article-number":"bbaf397"}}