{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T04:24:03Z","timestamp":1773980643649,"version":"3.50.1"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,12,24]],"date-time":"2022-12-24T00:00:00Z","timestamp":1671840000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Hunan Postgraduate Research and Innovation Project","award":["CX20220108"],"award-info":[{"award-number":["CX20220108"]}]},{"name":"Hunan Postgraduate Research and Innovation Project","award":["2021RC4008"],"award-info":[{"award-number":["2021RC4008"]}]},{"name":"Hunan Postgraduate Research and Innovation Project","award":["2019CB1007"],"award-info":[{"award-number":["2019CB1007"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62225209"],"award-info":[{"award-number":["62225209"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,19]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Single-cell RNA-sequencing technology (scRNA-seq) brings research to single-cell resolution. However, a major drawback of scRNA-seq is large sparsity, i.e. expressed genes with no reads due to technical noise or limited sequence depth during the scRNA-seq protocol. This phenomenon is also called \u2018dropout\u2019 events, which likely affect downstream analyses such as differential expression analysis, the clustering and visualization of cell subpopulations, cellular trajectory inference, etc. Therefore, there is a need to develop a method to identify and impute these dropout events. We propose Bubble, which first identifies dropout events from all zeros based on expression rate and coefficient of variation of genes within cell subpopulation, and then leverages an autoencoder constrained by bulk RNA-seq data to only impute those values. Unlike other deep learning-based imputation methods, Bubble fuses the matched bulk RNA-seq data as a constraint to reduce the introduction of false positive signals. Using simulated and several real scRNA-seq datasets, we demonstrate that Bubble enhances the recovery of missing values, gene-to-gene and cell-to-cell correlations, and reduces the introduction of false positive signals. Regarding some crucial downstream analyses of scRNA-seq data, Bubble facilitates the identification of differentially expressed genes, improves the performance of clustering and visualization, and aids the construction of cellular trajectory. More importantly, Bubble provides fast and scalable imputation with minimal memory usage.<\/jats:p>","DOI":"10.1093\/bib\/bbac580","type":"journal-article","created":{"date-parts":[[2022,12,26]],"date-time":"2022-12-26T03:11:09Z","timestamp":1672024269000},"source":"Crossref","is-referenced-by-count":15,"title":["Bubble: a fast single-cell RNA-seq imputation using an autoencoder constrained by bulk RNA-seq data"],"prefix":"10.1093","volume":"24","author":[{"given":"Siqi","family":"Chen","sequence":"first","affiliation":[{"name":"Central South University School of Computer Science and Engineering, , Changsha 410083 , China"}]},{"given":"Xuhua","family":"Yan","sequence":"additional","affiliation":[{"name":"Central South University School of Computer Science and Engineering, , Changsha 410083 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6372-6798","authenticated-orcid":false,"given":"Ruiqing","family":"Zheng","sequence":"additional","affiliation":[{"name":"Central South University School of Computer Science and Engineering, , Changsha 410083 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0188-1394","authenticated-orcid":false,"given":"Min","family":"Li","sequence":"additional","affiliation":[{"name":"Central South University School of Computer Science and Engineering, , Changsha 410083 , China"}]}],"member":"286","published-online":{"date-parts":[[2022,12,24]]},"reference":[{"key":"2023011917110495600_ref1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-022-02601-5","article-title":"Statistics or biology: the zero-inflation controversy about scRNA-seq data","volume":"23","author":"Jiang","year":"2022","journal-title":"Genome Biol"},{"key":"2023011917110495600_ref2","first-page":"1","article-title":"An accurate and robust imputation method scImpute for single-cell RNA-seq data","volume":"9","author":"Li","year":"2018","journal-title":"Nat Commun"},{"key":"2023011917110495600_ref3","doi-asserted-by":"crossref","first-page":"e0190152","DOI":"10.1371\/journal.pone.0190152","article-title":"RNA-Seq differential expression analysis: an extended review and a software tool","volume":"12","author":"Costa-Silva","year":"2017","journal-title":"PLoS One"},{"key":"2023011917110495600_ref4","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1038\/s41576-018-0088-9","article-title":"Challenges in unsupervised clustering of single-cell RNA-seq data","volume":"20","author":"Kiselev","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2023011917110495600_ref5","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1016\/j.cell.2018.05.061","article-title":"Recovering gene interactions from single-cell data using data diffusion","volume":"174","author":"Van Dijk","year":"2018","journal-title":"Cell"},{"key":"2023011917110495600_ref6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-018-2226-y","article-title":"DrImpute: imputing dropout events in single cell RNA sequencing data","volume":"19","author":"Gong","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023011917110495600_ref7","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/s41592-018-0033-z","article-title":"SAVER: gene expression recovery for single-cell RNA sequencing","volume":"15","author":"Huang","year":"2018","journal-title":"Nat Methods"},{"key":"2023011917110495600_ref8","first-page":"397588","article-title":"Zero-preserving imputation of scRNA-seq data using low-rank approximation","author":"Linderman","year":"2018"},{"key":"2023011917110495600_ref9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-seq denoising using a deep count autoencoder","volume":"10","author":"Eraslan","year":"2019","journal-title":"Nat Commun"},{"key":"2023011917110495600_ref10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-34688-x","article-title":"AutoImpute: autoencoder based imputation of single-cell RNA-seq data","volume":"8","author":"Talwar","year":"2018","journal-title":"Sci Rep"},{"key":"2023011917110495600_ref11","first-page":"1","article-title":"Sparse autoencoder","volume":"72","author":"Ng","year":"2011","journal-title":"CS294A Lecture Notes"},{"key":"2023011917110495600_ref12","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1002\/wics.101","article-title":"Principal component analysis","volume":"2","author":"Abdi","year":"2010","journal-title":"WIREs Comput Stat"},{"key":"2023011917110495600_ref13","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1109\/3477.764879","article-title":"Genetic K-means algorithm","volume":"29","author":"Krishna","year":"1999","journal-title":"IEEE Trans Syst Man Cybern B Cybern. Part B (Cybernetics)"},{"key":"2023011917110495600_ref14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-020-02132-x","article-title":"A systematic evaluation of single-cell RNA-sequencing imputation methods","volume":"21","author":"Hou","year":"2020","journal-title":"Genome Biol"},{"key":"2023011917110495600_ref15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2023011917110495600_ref16","first-page":"541433","article-title":"Comparative analysis of commercially available single-cell RNA sequencing platforms for their performance in complex human tissues","author":"Wang","year":"2019","journal-title":"BioRxiv"},{"key":"2023011917110495600_ref17","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1038\/s41592-019-0425-8","article-title":"Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments","volume":"16","author":"Tian","year":"2019","journal-title":"Nat Methods"},{"key":"2023011917110495600_ref18","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1038\/nature11243","article-title":"A map of the cis-regulatory sequences in the mouse genome","volume":"488","author":"Shen","year":"2012","journal-title":"Nature"},{"key":"2023011917110495600_ref19","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1016\/j.cell.2018.02.001","article-title":"Mapping the mouse cell atlas by microwell-seq","volume":"172","author":"Han","year":"2018","journal-title":"Cell"},{"key":"2023011917110495600_ref20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-019-09990-5","article-title":"Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures","volume":"10","author":"Zaitsev","year":"2019","journal-title":"Nat Commun"},{"key":"2023011917110495600_ref21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat Commun"},{"key":"2023011917110495600_ref22","doi-asserted-by":"crossref","first-page":"1193","DOI":"10.1038\/ng.3646","article-title":"Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution","volume":"48","author":"Corces","year":"2016","journal-title":"Nat Genet"},{"key":"2023011917110495600_ref23","first-page":"e27041","volume":"6","author":"Regev","year":"2017","journal-title":"Science forum: the human cell atlas. elife"},{"key":"2023011917110495600_ref24","doi-asserted-by":"crossref","first-page":"eabl4896","DOI":"10.1126\/science.abl4896","article-title":"The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans","volume":"376","author":"Consortium","year":"2022","journal-title":"Science"},{"key":"2023011917110495600_ref25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-016-0938-8","article-title":"CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq","volume":"17","author":"Hashimshony","year":"2016","journal-title":"Genome Biol"},{"key":"2023011917110495600_ref26","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/j.gpb.2020.02.005","article-title":"Direct comparative analyses of 10X genomics chromium and smart-seq2","volume":"19","author":"Wang","year":"2021","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2023011917110495600_ref27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-018-1575-1","article-title":"VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies","volume":"19","author":"Chen","year":"2018","journal-title":"Genome Biol"},{"key":"2023011917110495600_ref28","doi-asserted-by":"crossref","first-page":"1174","DOI":"10.1093\/bioinformatics\/btz726","article-title":"bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data","volume":"36","author":"Tang","year":"2020","journal-title":"Bioinformatics"},{"key":"2023011917110495600_ref29","first-page":"665323","article-title":"Discriminating true and false zeros in single-cell RNA-seq data for imputation","author":"Miao","year":"2019","journal-title":"BioRxiv"},{"key":"2023011917110495600_ref30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-27729-z","article-title":"Zero-preserving imputation of single-cell RNA-seq data","volume":"13","author":"Linderman","year":"2022","journal-title":"Nat Commun"},{"key":"2023011917110495600_ref31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1681-8","article-title":"SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data","volume":"20","author":"Peng","year":"2019","journal-title":"Genome Biol"},{"key":"2023011917110495600_ref32","doi-asserted-by":"crossref","first-page":"908","DOI":"10.1198\/016214504000001583","article-title":"Rank-sum tests for clustered data","volume":"100","author":"Datta","year":"2005","journal-title":"J Am Stat Assoc"},{"key":"2023011917110495600_ref33","doi-asserted-by":"crossref","first-page":"3642","DOI":"10.1093\/bioinformatics\/btz139","article-title":"SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation","volume":"35","author":"Zheng","year":"2019","journal-title":"Bioinformatics"},{"key":"2023011917110495600_ref34","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1016\/j.gpb.2020.09.004","article-title":"SSRE: cell type detection based on sparse subspace representation and similarity enhancement","volume":"19","author":"Liang","year":"2021","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2023011917110495600_ref35","article-title":"On spectral clustering: analysis and an algorithm","volume":"14","author":"Ng","year":"2001","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023011917110495600_ref36","doi-asserted-by":"crossref","first-page":"bbac311","DOI":"10.1093\/bib\/bbac311","article-title":"GLOBE: a contrastive learning-based framework for integrating single-cell transcriptome datasets","volume":"23","author":"Yan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023011917110495600_ref37","doi-asserted-by":"crossref","first-page":"100723","DOI":"10.1016\/j.margen.2019.100723","article-title":"t-Distributed Stochastic Neighbor Embedding (t-SNE): a tool for eco-physiological transcriptomic analysis","volume":"51","author":"Cieslak","year":"2020","journal-title":"Marine genomics"},{"key":"2023011917110495600_ref38","doi-asserted-by":"crossref","first-page":"772","DOI":"10.26599\/TST.2020.9010028","article-title":"A data-driven clustering recommendation method for single-cell RNA-sequencing data","volume":"26","author":"Tian","year":"2021","journal-title":"Tsinghua Sci Technol"},{"key":"2023011917110495600_ref39","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1038\/s41587-019-0071-9","article-title":"A comparison of single-cell trajectory inference methods","volume":"37","author":"Saelens","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2023011917110495600_ref40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2004-5-10-r80","article-title":"Bioconductor: open software development for computational biology and bioinformatics","volume":"5","author":"Gentleman","year":"2004","journal-title":"Genome Biol"},{"key":"2023011917110495600_ref41","article-title":"Package \u201cStats.\u201d, The R Stats Package","author":"Team","year":"2018"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/1\/bbac580\/48782247\/bbac580.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/1\/bbac580\/48782247\/bbac580.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T23:15:39Z","timestamp":1678835739000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac580\/6960616"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,24]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac580","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,1]]},"published":{"date-parts":[[2022,12,24]]},"article-number":"bbac580"}}