{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:43:52Z","timestamp":1753875832656,"version":"3.41.2"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2022,7,12]],"date-time":"2022-07-12T00:00:00Z","timestamp":1657584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Disciplinary funding of Central University of Finance and Economics"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Developments of single-cell RNA sequencing (scRNA-seq) technologies have enabled biological discoveries at the single-cell resolution with high throughput. However, large scRNA-seq datasets always suffer from massive technical noises, including batch effects and dropouts, and the dropout is often shown to be batch-dependent. Most existing methods only address one of the problems, and we show that the popularly used methods failed in trading off batch effect correction and dropout imputation. Here, inspired by the idea of causal inference, we propose a novel propensity score matching method for scRNA-seq data (scPSM) by borrowing information and taking the weighted average from similar cells in the deep sequenced batch, which simultaneously removes the batch effect, imputes dropout and denoises data in the entire gene expression space. The proposed method is testified on two simulation datasets and a variety of real scRNA-seq datasets, and the results show that scPSM is superior to other state-of-the-art methods. First, scPSM improves clustering accuracy and mixes cells of the same type, suggesting its ability to keep cell type separation while correcting for batch. Besides, using the scPSM-integrated data as input yields results free of batch effects or dropouts in the differential expression analysis. Moreover, scPSM not only achieves ideal denoising but also preserves real biological structure for downstream gene-based analyses. Furthermore, scPSM is robust to hyperparameters and small datasets with a few cells but enormous genes. Comprehensive evaluations demonstrate that scPSM jointly provides desirable batch effect correction, imputation and denoising for recovering the biologically meaningful expression in scRNA-seq data.<\/jats:p>","DOI":"10.1093\/bib\/bbac275","type":"journal-article","created":{"date-parts":[[2022,7,13]],"date-time":"2022-07-13T03:12:20Z","timestamp":1657681940000},"source":"Crossref","is-referenced-by-count":5,"title":["Propensity score matching enables batch-effect-corrected imputation in single-cell RNA-seq analysis"],"prefix":"10.1093","volume":"23","author":[{"given":"Xinyi","family":"Xu","sequence":"first","affiliation":[{"name":"School of Statistics and Mathematics, Central University of Finance and Economics , Beijing, 100081, \u00a0 China"}]},{"given":"Xiaokang","family":"Yu","sequence":"additional","affiliation":[{"name":"Center for Applied Statistics, School of Statistics, Renmin University of China , Beijing, 100872, \u00a0 China"}]},{"given":"Gang","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Statistics and Data Science, Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin, Nankai University , Tianjin 300071, \u00a0 China"}]},{"given":"Kui","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Statistics and Data Science, Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin, Nankai University , Tianjin 300071, \u00a0 China"}]},{"given":"Jingxiao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Center for Applied Statistics, School of Statistics, Renmin University of China , Beijing, 100872, \u00a0 China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9600-8984","authenticated-orcid":false,"given":"Xiangjie","family":"Li","sequence":"additional","affiliation":[{"name":"School of Statistics and Data Science, Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin, Nankai University , Tianjin 300071, \u00a0 China"}]}],"member":"286","published-online":{"date-parts":[[2022,7,12]]},"reference":[{"key":"2022071906210285700_ref1","doi-asserted-by":"crossref","first-page":"e85","DOI":"10.1093\/nar\/gkaa506","article-title":"scIGANs: single-cell RNA-seq imputation using generative adversarial networks","volume":"48","author":"Xu","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022071906210285700_ref2","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1038\/s41467-018-03405-7","article-title":"An accurate and robust imputation method scImpute for single-cell RNA-seq data","volume":"9","author":"Li","year":"2018","journal-title":"Nat Commun"},{"key":"2022071906210285700_ref3","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1038\/nprot.2016.154","article-title":"Single-cell barcoding and sequencing using droplet microfluidics","volume":"12","author":"Zilionis","year":"2017","journal-title":"Nat Protoc"},{"key":"2022071906210285700_ref4","doi-asserted-by":"crossref","first-page":"1308","DOI":"10.1016\/j.cell.2016.07.054","article-title":"Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics","volume":"166","author":"Shekhar","year":"2016","journal-title":"Cell"},{"key":"2022071906210285700_ref5","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1093\/biostatistics\/kxx053","article-title":"Missing data and technical variability in single-cell RNA-sequencing experiments","volume":"19","author":"Hicks","year":"2018","journal-title":"Biostatistics"},{"key":"2022071906210285700_ref6","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/s13059-020-1926-6","article-title":"Eleven grand challenges in single-cell data science","volume":"21","author":"L\u00e4hnemann","year":"2020","journal-title":"Genome Biol"},{"key":"2022071906210285700_ref7","doi-asserted-by":"crossref","DOI":"10.1101\/gr.271874.120","article-title":"A joint deep learning model enables simultaneous batch effect correction, denoising and clustering in single-cell transcriptomics","volume":"31","author":"Lakkis","year":"2021","journal-title":"Genome Res"},{"key":"2022071906210285700_ref8","article-title":"Evaluation of methods in removing batch effects on RNA-seq data","volume-title":"Infect Dis Transl Med","author":"Liu","year":"2016"},{"key":"2022071906210285700_ref9","article-title":"Recovering gene interactions from single-cell data using data diffusion","volume-title":"Cell","author":"van Dijk","year":"2018"},{"key":"2022071906210285700_ref10","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/s41592-018-0033-z","article-title":"SAVER: gene expression recovery for single-cell RNA sequencing","volume":"15","author":"Huang","year":"2018","journal-title":"Nat Methods"},{"key":"2022071906210285700_ref11","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-seq denoising using a deep count autoencoder","volume":"10","author":"Eraslan","year":"2019","journal-title":"Nat Commun"},{"key":"2022071906210285700_ref12","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1186\/s13059-020-02096-y","article-title":"Demystifying \u201cdrop-outs\u201d in single-cell UMI data","volume":"21","author":"Kim","year":"2020","journal-title":"Genome Biol"},{"key":"2022071906210285700_ref13","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/s13059-022-02601-5","article-title":"Statistics or biology: the zero-inflation controversy about scRNA-seq data","volume":"23","author":"Jiang","year":"2022","journal-title":"Genome Biol"},{"key":"2022071906210285700_ref14","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1186\/s13059-020-02132-x","article-title":"A systematic evaluation of single-cell RNA-sequencing imputation methods","volume":"21","author":"Hou","year":"2020","journal-title":"Genome Biol"},{"key":"2022071906210285700_ref15","doi-asserted-by":"crossref","first-page":"e47","DOI":"10.1093\/nar\/gkv007","article-title":"Limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"Ritchie","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2022071906210285700_ref16","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2022071906210285700_ref17","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2022071906210285700_ref18","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"2022071906210285700_ref19","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1038\/s41587-019-0113-3","article-title":"Efficient integration of heterogeneous single-cell transcriptomes using Scanorama","volume":"37","author":"Hie","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2022071906210285700_ref20","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Method"},{"key":"2022071906210285700_ref21","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","article-title":"Benchmarking atlas-level data integration in single-cell genomics","volume":"19","author":"Luecken","year":"2022","journal-title":"Nat Method"},{"key":"2022071906210285700_ref22","first-page":"41","article-title":"The central role of the propensity score in observational studies for causal effects","volume-title":"Biometrika","author":"Rosenbaum","year":"1983"},{"key":"2022071906210285700_ref23","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1038\/s41592-019-0619-0","article-title":"Fast, sensitive and accurate integration of single-cell data with harmony","volume":"16","author":"Korsunsky","year":"2019","journal-title":"Nat Method"},{"key":"2022071906210285700_ref24","doi-asserted-by":"crossref","DOI":"10.1080\/01621459.1984.10478078","article-title":"Reducing bias in observational studies using subclassification on the propensity score","volume":"79","author":"Rosenbaum","year":"1984","journal-title":"J Am Stat Assoc"},{"key":"2022071906210285700_ref25","doi-asserted-by":"crossref","DOI":"10.1037\/1082-989X.9.4.403","article-title":"Propensity score estimation with boosted regression for evaluating causal effects in observational studies","volume-title":"Psychol Methods","author":"McCaffrey","year":"2004"},{"issue":"5769","key":"2022071906210285700_ref26","first-page":"175","article-title":"On the use of the adjusted Rand index as a metric for evaluating supervised classification","volume":"2009","author":"Santos","year":"2009","journal-title":"Artif Neural Netw \u2013 ICANN"},{"key":"2022071906210285700_ref27","article-title":"Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance","volume-title":"J Mach Learn Res","author":"Vinh","year":"2010"},{"key":"2022071906210285700_ref28","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J Comput. Appl. Math"},{"key":"2022071906210285700_ref29","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1038\/s41592-018-0254-1","article-title":"A test metric for assessing single-cell RNA-seq batch correction","volume":"16","author":"B\u00fcttner","year":"2019","journal-title":"Nat Method"},{"key":"2022071906210285700_ref30","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/s13059-019-1850-9","article-title":"A benchmark of batch-effect correction methods for single-cell RNA sequencing data","volume":"21","author":"Tran","year":"2020","journal-title":"Genome Biol"},{"key":"2022071906210285700_ref31","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1101\/gr.212720.116","article-title":"Single-cell transcriptomes identify human islet cell signatures and reveal cell-type\u2013specific expression changes in type 2 diabetes","volume":"27","author":"Lawlor","year":"2017","journal-title":"Genome Res"},{"key":"2022071906210285700_ref32","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1016\/j.cmet.2016.08.020","article-title":"Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes","volume":"24","author":"Segerstolpe","year":"2016","journal-title":"Cell Metab"},{"key":"2022071906210285700_ref33","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1016\/j.stem.2016.05.010","article-title":"De novo prediction of stem cell identity using single-cell transcriptome data","volume":"19","author":"Gr\u00fcn","year":"2016","journal-title":"Cell Stem Cell"},{"key":"2022071906210285700_ref34","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.cels.2016.09.002","article-title":"A single-cell transcriptome atlas of the human pancreas","volume":"3","author":"Muraro","year":"2016","journal-title":"Cell Systems"},{"key":"2022071906210285700_ref35","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Systems"},{"key":"2022071906210285700_ref36","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1038\/s41587-020-0465-8","article-title":"Systematic comparison of single-cell and single-nucleus RNA-sequencing methods","volume":"38","author":"Ding","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2022071906210285700_ref37","doi-asserted-by":"crossref","first-page":"1222","DOI":"10.1016\/j.cell.2019.01.004","article-title":"Molecular classification and comparative Taxonomics of foveal and peripheral cells in primate retina","volume":"176","author":"Peng","year":"2019","journal-title":"Cell"},{"key":"2022071906210285700_ref38","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/4\/bbac275\/45016395\/bbac275.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/4\/bbac275\/45016395\/bbac275.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,19]],"date-time":"2022-07-19T06:24:18Z","timestamp":1658211858000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac275\/6640334"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,12]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,7,18]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac275","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2022,7,18]]},"published":{"date-parts":[[2022,7,12]]},"article-number":"bbac275"}}