{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T06:39:41Z","timestamp":1765435181943,"version":"3.46.0"},"reference-count":22,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T00:00:00Z","timestamp":1765411200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework\u2019s practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.<\/jats:p>","DOI":"10.3389\/fbinf.2025.1684227","type":"journal-article","created":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T06:35:52Z","timestamp":1765434952000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Celline: a flexible tool for one-step retrieval and integrative analysis of public single-cell RNA sequencing data"],"prefix":"10.3389","volume":"5","author":[{"given":"Yuya","family":"Sato","sequence":"first","affiliation":[]},{"given":"Toru","family":"Asahi","sequence":"additional","affiliation":[]},{"given":"Kosuke","family":"Kataoka","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,12,11]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"264","DOI":"10.1186\/s13059-019-1862-5","article-title":"scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data","volume":"20","author":"Alquicira-Hernandez","year":"2019","journal-title":"Genome Biol."},{"key":"B2","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1186\/s44342-025-00044-5","article-title":"Navigating single-cell RNA-sequencing: protocols, tools, databases, and applications","volume":"23","author":"Arya","year":"2025","journal-title":"Genomics Inf."},{"key":"B3","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1038\/s41592-018-0254-1","article-title":"A test metric for assessing single-cell RNA-seq batch correction","volume":"16","author":"B\u00fcttner","year":"2019","journal-title":"Nat. Methods"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1497","DOI":"10.1007\/s12033-023-00777-0","article-title":"Single-cell RNA sequencing: technological progress and biomedical application in cancer research","volume":"66","author":"Chang","year":"2024","journal-title":"Mol. Biotechnol."},{"key":"B5","doi-asserted-by":"publisher","first-page":"554","DOI":"10.1038\/s41586-021-03670-5","article-title":"Molecular logic of cellular diversification in the mouse cerebral cortex","volume":"595","author":"Di Bella","year":"2021","journal-title":"Nature"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.1101\/2021.05.05.442755","article-title":"STARsolo: accurate, fast and versatile mapping\/quantification of single-cell and single-nucleus RNA-seq data","author":"Kaminow","year":"2021","journal-title":"bioRxiv"},{"key":"B7","doi-asserted-by":"publisher","first-page":"1289","DOI":"10.1038\/s41592-019-0619-0","article-title":"Fast, sensitive and accurate integration of single-cell data with Harmony","volume":"16","author":"Korsunsky","year":"2019","journal-title":"Nat. Methods"},{"key":"B8","doi-asserted-by":"publisher","first-page":"D596","DOI":"10.1093\/nar\/gkab1020","article-title":"DISCO: a database of deeply Integrated human single-Cell omics data","volume":"50","author":"Li","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"B9","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat. Methods"},{"key":"B10","doi-asserted-by":"publisher","first-page":"e8746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single-cell RNA-Seq analysis: a tutorial","volume":"15","author":"Luecken","year":"2019","journal-title":"Mol. Syst. Biol."},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.1101\/2020.12.31.425022","article-title":"Transcriptomics data availability and reusability in the transition from microarray to next-generation sequencing","author":"Rustici","year":"2021","journal-title":"bioRxiv"},{"key":"B12","doi-asserted-by":"publisher","first-page":"106","DOI":"10.14348\/molcells.2023.0009","article-title":"Integration of single-cell RNA-seq datasets: a review of computational methods","volume":"46","author":"Ryu","year":"2023","journal-title":"Mol. Cells"},{"key":"B13","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1186\/s12915-023-01711-1","article-title":"Integrative single-cell RNA-seq analysis of vascularized cerebral organoids","volume":"21","author":"Sato","year":"2023","journal-title":"BMC Biol."},{"key":"B14","doi-asserted-by":"publisher","first-page":"100842","DOI":"10.1016\/j.xgen.2025.100842","article-title":"Single-cell meta-analysis of T cells reveals clonal dynamics of response to checkpoint immunotherapy","volume":"5","author":"Shorer","year":"2025","journal-title":"Cell Genom"},{"key":"B15","doi-asserted-by":"publisher","DOI":"10.1101\/2023.11.18.567507","article-title":"GEfetch2R: fetching single-cell\/bulk RNA-seq data from public repositories to R and benchmarking the subsequent format conversion tools","author":"Song","year":"2023","journal-title":"bioRxiv"},{"key":"B16","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1186\/s12864-018-4772-0","article-title":"Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics","volume":"19","author":"Street","year":"2018","journal-title":"BMC Genomics"},{"key":"B17","doi-asserted-by":"publisher","first-page":"496","DOI":"10.1038\/s41573-023-00688-4","article-title":"Applications of single-cell RNA sequencing in drug discovery and development","volume":"22","author":"Van de Sande","year":"2023","journal-title":"Nat. Rev. Drug Discov."},{"key":"B18","doi-asserted-by":"publisher","first-page":"e2500870","DOI":"10.1002\/advs.202500870","article-title":"ScCompass: an integrated multi-species scRNA-seq database for AI-ready","volume":"12","author":"Wang","year":"2025","journal-title":"Adv. Sci. (Weinh.)"},{"key":"B19","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1016\/j.cels.2018.11.005","article-title":"Scrublet: computational identification of cell doublets in Single-cell transcriptomic data","volume":"8","author":"Wolock","year":"2019","journal-title":"Cell Syst."},{"key":"B20","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1186\/s13059-025-03639-x","article-title":"scExtract: leveraging large language models for fully automated single-cell RNA-seq data annotation and prior-informed multi-dataset integration","volume":"26","author":"Wu","year":"2025","journal-title":"Genome Biol."},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.1101\/2025.02.27.640494","article-title":"scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository","author":"Youngblut","year":"2025","journal-title":"bioRxiv"},{"key":"B22","doi-asserted-by":"publisher","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1684227\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T06:35:54Z","timestamp":1765434954000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1684227\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,11]]},"references-count":22,"alternative-id":["10.3389\/fbinf.2025.1684227"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1684227","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,11]]},"article-number":"1684227"}}