{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:30Z","timestamp":1772138070635,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:00:00Z","timestamp":1740182400000},"content-version":"vor","delay-in-days":21,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["NSF-2233887"],"award-info":[{"award-number":["NSF-2233887"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,2,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Integrative analysis of large-scale single-cell data collected from diverse cell populations promises an improved understanding of complex biological systems. While several algorithms have been developed for single-cell RNA-sequencing data integration, many lack the scalability to handle large numbers of datasets and\/or millions of cells due to their memory and run time requirements. The few tools that can handle large data do so by reducing the computational burden through strategies such as subsampling of the data or selecting a reference dataset to improve computational efficiency and scalability. Such shortcuts, however, hamper the accuracy of downstream analyses, especially those requiring quantitative gene expression information.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present SCEMENT, a SCalablE and Memory-Efficient iNTegration method, to overcome these limitations. Our new parallel algorithm builds upon and extends the linear regression model previously applied in ComBat to an unsupervised sparse matrix setting to enable accurate integration of diverse and large collections of single-cell RNA-sequencing data. Using tens to hundreds of real single-cell RNA-seq datasets, we show that SCEMENT outperforms ComBat as well as FastIntegration and Scanorama in runtime (upto 214\u00d7 faster) and memory usage (upto 17.5\u00d7 less). It not only performs batch correction and integration of millions of cells in under 25\u2009min, but also facilitates the discovery of new rare cell types and more robust reconstruction of gene regulatory networks with full quantitative gene expression information.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code freely available for download at https:\/\/github.com\/AluruLab\/scement, implemented in C++ and supported on Linux.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf057","type":"journal-article","created":{"date-parts":[[2025,2,20]],"date-time":"2025-02-20T15:24:29Z","timestamp":1740065069000},"source":"Crossref","is-referenced-by-count":1,"title":["SCEMENT: scalable and memory efficient integration of large-scale single-cell RNA-sequencing data"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1358-7691","authenticated-orcid":false,"given":"Sriram P","family":"Chockalingam","sequence":"first","affiliation":[{"name":"Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA-30332,","place":["United States"]}]},{"given":"Maneesha","family":"Aluru","sequence":"additional","affiliation":[{"name":"School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA-30332,","place":["United States"]}]},{"given":"Srinivas","family":"Aluru","sequence":"additional","affiliation":[{"name":"School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA-30332,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,2,22]]},"reference":[{"key":"2025042214242007200_btaf057-B1","doi-asserted-by":"crossref","first-page":"i484","DOI":"10.1093\/bioinformatics\/btad269","article-title":"Clarify: cell\u2013cell interaction and gene regulatory network refinement from spatially resolved transcriptomics","volume":"39","author":"Bafna","year":"2023","journal-title":"Bioinformatics"},{"key":"2025042214242007200_btaf057-B2","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1038\/msb4100120","article-title":"How to infer gene networks from expression profiles","volume":"3","author":"Bansal","year":"2007","journal-title":"Mol Syst Biol"},{"key":"2025042214242007200_btaf057-B3","doi-asserted-by":"crossref","first-page":"8677","DOI":"10.1093\/nar\/gkr593","article-title":"Transcriptional gene network inference from a massive dataset elucidates transcriptome organization and gene function","volume":"39","author":"Belcastro","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2025042214242007200_btaf057-B4","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2025042214242007200_btaf057-B5","doi-asserted-by":"crossref","first-page":"eaba7721","DOI":"10.1126\/science.aba7721","article-title":"A human cell atlas of fetal gene expression","volume":"370","author":"Cao","year":"2020","journal-title":"Science"},{"key":"2025042214242007200_btaf057-B6","doi-asserted-by":"crossref","first-page":"4616","DOI":"10.1038\/s41467-022-32097-3","article-title":"Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data","volume":"13","author":"Dhapola","year":"2022","journal-title":"Nat Commun"},{"key":"2025042214242007200_btaf057-B7","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1093\/bioinformatics\/btx677","article-title":"netreg: network-regularized linear models for biological association studies","volume":"34","author":"Dirmeier","year":"2018","journal-title":"Bioinformatics"},{"key":"2025042214242007200_btaf057-B8","doi-asserted-by":"crossref","first-page":"8","DOI":"10.3389\/fgene.2012.00008","article-title":"Statistical inference and reverse engineering of gene regulatory networks from observational expression data","volume":"3","author":"Emmert-Streib","year":"2012","journal-title":"Front Genet"},{"key":"2025042214242007200_btaf057-B9","doi-asserted-by":"crossref","first-page":"2197","DOI":"10.1093\/plcell\/koab101","article-title":"A single-cell view of the transcriptome during lateral root initiation in Arabidopsis thaliana","volume":"33","author":"Gala","year":"2021","journal-title":"Plant Cell"},{"key":"2025042214242007200_btaf057-B10","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2025042214242007200_btaf057-B11","doi-asserted-by":"crossref","first-page":"D380","DOI":"10.1093\/nar\/gkx1013","article-title":"Trrust v2: an expanded reference database of human and mouse transcriptional regulatory interactions","volume":"46","author":"Han","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025042214242007200_btaf057-B12","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1038\/s41587-023-01767-y","article-title":"Dictionary learning for integrative, multimodal and scalable single-cell analysis","volume":"42","author":"Hao","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025042214242007200_btaf057-B13","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1038\/s41587-019-0113-3","article-title":"Efficient integration of heterogeneous single-cell transcriptomes using scanorama","volume":"37","author":"Hie","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2025042214242007200_btaf057-B14","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1105\/tpc.18.00785","article-title":"Dynamics of gene expression in single root cells of Arabidopsis thaliana","volume":"31","author":"Jean-Baptiste","year":"2019","journal-title":"Plant Cell"},{"key":"2025042214242007200_btaf057-B15","doi-asserted-by":"crossref","first-page":"4719","DOI":"10.1038\/s41467-018-07234-6","article-title":"Discovery of rare cells from voluminous single cell expression data","volume":"9","author":"Jindal","year":"2018","journal-title":"Nat Commun"},{"key":"2025042214242007200_btaf057-B16","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2025042214242007200_btaf057-B17","doi-asserted-by":"crossref","first-page":"891","DOI":"10.1089\/106652703322756131","article-title":"Linear models for microarray data analysis: hidden similarities and differences","volume":"10","author":"Kerr","year":"2003","journal-title":"J Comput Biol"},{"key":"2025042214242007200_btaf057-B18","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1038\/s41592-019-0619-0","article-title":"Fast, sensitive and accurate integration of single-cell data with harmony","volume":"16","author":"Korsunsky","year":"2019","journal-title":"Nat Methods"},{"key":"2025042214242007200_btaf057-B19","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1007\/978-1-0716-1534-8_10","article-title":"Inference of gene regulatory network from single-cell transcriptomic data using pyscenic","author":"Kumar","year":"2021","journal-title":"Modeling Transcriptional Regulation: Methods and Protocols"},{"key":"2025042214242007200_btaf057-B20","author":"Li","year":"2022"},{"key":"2025042214242007200_btaf057-B21","doi-asserted-by":"crossref","first-page":"e689","DOI":"10.1002\/ctm2.689","article-title":"Molecular mechanisms governing circulating immune cell heterogeneity across different species revealed by single-cell sequencing","volume":"12","author":"Li","year":"2022","journal-title":"Clin Transl Med"},{"key":"2025042214242007200_btaf057-B22","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","article-title":"Benchmarking atlas-level data integration in single-cell genomics","volume":"19","author":"Luecken","year":"2022","journal-title":"Nat Methods"},{"key":"2025042214242007200_btaf057-B23","doi-asserted-by":"crossref","first-page":"3176","DOI":"10.3389\/fimmu.2018.03176","article-title":"Human dendritic cells: their heterogeneity and clinical application potential in cancer immunotherapy","volume":"9","author":"Patente","year":"2019","journal-title":"Front Immunol"},{"key":"2025042214242007200_btaf057-B24","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1186\/s13059-015-0805-z","article-title":"Zifa: dimensionality reduction for zero-inflated single-cell gene expression analysis","volume":"16","author":"Pierson","year":"2015","journal-title":"Genome Biol"},{"key":"2025042214242007200_btaf057-B25","doi-asserted-by":"crossref","first-page":"964","DOI":"10.1093\/bioinformatics\/btz625","article-title":"Bbknn: fast batch alignment of single cell transcriptomes","volume":"36","author":"Pola\u0144ski","year":"2020","journal-title":"Bioinformatics"},{"key":"2025042214242007200_btaf057-B26","doi-asserted-by":"crossref","first-page":"692","DOI":"10.3390\/biom13040692","article-title":"Large-scale integration of single-cell RNA-seq data reveals astrocyte diversity and transcriptomic modules across six central nervous system disorders","volume":"13","author":"Qian","year":"2023","journal-title":"Biomolecules"},{"key":"2025042214242007200_btaf057-B27","doi-asserted-by":"crossref","first-page":"1895","DOI":"10.1016\/j.cell.2021.01.053","article-title":"Covid-19 immune features revealed by a large-scale single-cell transcriptome atlas","volume":"184","author":"Ren","year":"2021","journal-title":"Cell"},{"key":"2025042214242007200_btaf057-B28","first-page":"1964","author":"Wang","year":"2021"},{"key":"2025042214242007200_btaf057-B29","doi-asserted-by":"crossref","first-page":"e9620","DOI":"10.15252\/msb.20209620","article-title":"Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models","volume":"17","author":"Xu","year":"2021","journal-title":"Mol Syst Biol"},{"key":"2025042214242007200_btaf057-B30","doi-asserted-by":"crossref","first-page":"2910","DOI":"10.1161\/ATVBAHA.120.314789","article-title":"Cell-type transcriptome atlas of human aortic valves reveal cell heterogeneity and endothelial to mesenchymal transition involved in calcific aortic valve disease","volume":"40","author":"Xu","year":"2020","journal-title":"Arterioscler Thromb Vasc Biol"},{"key":"2025042214242007200_btaf057-B31","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1038\/s41421-019-0114-x","article-title":"A novel approach to remove the batch effect of single-cell data","volume":"5","author":"Zhang","year":"2019","journal-title":"Cell Discov"},{"key":"2025042214242007200_btaf057-B32","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1016\/j.gpb.2019.09.006","article-title":"htftarget: a comprehensive database for regulations of human transcription factors and their targets","volume":"18","author":"Zhang","year":"2020","journal-title":"Genomics Proteomics Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf057\/62052355\/btaf057.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/2\/btaf057\/62052355\/btaf057.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/2\/btaf057\/62052355\/btaf057.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,22]],"date-time":"2025-04-22T14:24:34Z","timestamp":1745331874000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf057\/8030215"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,2]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf057","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.06.27.601027","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,2]]},"published":{"date-parts":[[2025,2]]},"article-number":"btaf057"}}