{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T08:00:36Z","timestamp":1776240036072,"version":"3.50.1"},"reference-count":58,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T00:00:00Z","timestamp":1724284800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"publisher","award":["2021YFF1200901"],"award-info":[{"award-number":["2021YFF1200901"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62250005"],"award-info":[{"award-number":["62250005"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61721003"],"award-info":[{"award-number":["61721003"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62373210"],"award-info":[{"award-number":["62373210"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level. However, it is still challenging to obtain enough high-quality scRNA-seq data. To mitigate the limited availability of data, generative models have been proposed to computationally generate synthetic scRNA-seq data. Nevertheless, the data generated with current models are not very realistic yet, especially when we need to generate data with controlled conditions. In the meantime, diffusion models have shown their power in generating data with high fidelity, providing a new opportunity for scRNA-seq generation.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this study, we developed scDiffusion, a generative model combining the diffusion model and foundation model to generate high-quality scRNA-seq data with controlled conditions. We designed multiple classifiers to guide the diffusion process simultaneously, enabling scDiffusion to generate data under multiple condition combinations. We also proposed a new control strategy called Gradient Interpolation. This strategy allows the model to generate continuous trajectories of cell development from a given cell state. Experiments showed that scDiffusion could generate single-cell gene expression data closely resembling real scRNA-seq data. Also, scDiffusion can conditionally produce data on specific cell types including rare cell types. Furthermore, we could use the multiple-condition generation of scDiffusion to generate cell type that was out of the training data. Leveraging the Gradient Interpolation strategy, we generated a continuous developmental trajectory of mouse embryonic cells. These experiments demonstrate that scDiffusion is a powerful tool for augmenting the real scRNA-seq data and can provide insights into cell fate research.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>scDiffusion is openly available at the GitHub repository https:\/\/github.com\/EperLuo\/scDiffusion or Zenodo https:\/\/zenodo.org\/doi\/10.5281\/zenodo.13268742.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae518","type":"journal-article","created":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T23:13:53Z","timestamp":1724368433000},"source":"Crossref","is-referenced-by-count":32,"title":["scDiffusion: conditional generation of high-quality single-cell data using diffusion model"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-4087-6027","authenticated-orcid":false,"given":"Erpai","family":"Luo","sequence":"first","affiliation":[{"name":"MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University , Beijing 100084, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6749-5659","authenticated-orcid":false,"given":"Minsheng","family":"Hao","sequence":"additional","affiliation":[{"name":"MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University , Beijing 100084, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1546-6458","authenticated-orcid":false,"given":"Lei","family":"Wei","sequence":"additional","affiliation":[{"name":"MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University , Beijing 100084, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9684-5643","authenticated-orcid":false,"given":"Xuegong","family":"Zhang","sequence":"additional","affiliation":[{"name":"MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University , Beijing 100084, China"},{"name":"School of Life Sciences and School of Medicine, Center for Synthetic and Systems Biology, Tsinghua University , Beijing 100084, China"}]}],"member":"286","published-online":{"date-parts":[[2024,8,22]]},"reference":[{"key":"2024090221350494900_btae518-B1","doi-asserted-by":"crossref","first-page":"1468","DOI":"10.1093\/bioinformatics\/btz752","article-title":"Sparsim single cell: a count data simulator for scRNA-seq data","volume":"36","author":"Baruzzo","year":"2020","journal-title":"Bioinformatics"},{"key":"2024090221350494900_btae518-B2","doi-asserted-by":"crossref","first-page":"20201329","DOI":"10.1084\/jem.20201329","article-title":"STARTRAC analyses of scRNA-seq data from tumor models reveal T cell dynamics and therapeutic targets","volume":"218","author":"Bhatt","year":"2021","journal-title":"J Exp Med"},{"key":"2024090221350494900_btae518-B3","author":"Bian","year":"2024"},{"key":"2024090221350494900_btae518-B4","doi-asserted-by":"crossref","first-page":"7327","DOI":"10.1109\/TPAMI.2021.3116668","article-title":"Deep generative modelling: a comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models","volume":"44","author":"Bond-Taylor","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2024090221350494900_btae518-B5","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1038\/s41592-020-00979-3","article-title":"Mars: discovering novel cell types across heterogeneous single-cell experiments","volume":"17","author":"Brbic","year":"2020","journal-title":"Nat Methods"},{"key":"2024090221350494900_btae518-B6","first-page":"1133","article-title":"Immunodetection of aldose reductase in normal and diseased human liver","volume":"22","author":"Brown","year":"2005","journal-title":"Histol Histopathol"},{"key":"2024090221350494900_btae518-B7","first-page":"2814","volume-title":"IEEE Trans Knowl Data Eng"},{"key":"2024090221350494900_btae518-B8","author":"Charlier","year":"2022"},{"key":"2024090221350494900_btae518-B9","doi-asserted-by":"crossref","first-page":"10850","DOI":"10.1109\/TPAMI.2023.3261988","article-title":"Diffusion models in vision: a survey","volume":"45","author":"Croitoru","year":"2023","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2024090221350494900_btae518-B10","doi-asserted-by":"crossref","first-page":"1470","DOI":"10.1038\/s41592-024-02201-0","article-title":"scGPT: toward building a foundation model for single-cell multi-omics using generative AI","volume":"21","author":"Cui","year":"2024","journal-title":"Nat Methods"},{"key":"2024090221350494900_btae518-B11","first-page":"45","author":"de Masson","year":"2014"},{"key":"2024090221350494900_btae518-B12","first-page":"8780","article-title":"Diffusion models beat GANs on image synthesis","volume":"34","author":"Dhariwal","year":"2021","journal-title":"Adv Neural Inf Process Syst"},{"key":"2024090221350494900_btae518-B13","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1016\/j.cels.2020.08.003","article-title":"Sergio: a single-cell expression simulator guided by gene regulatory networks","volume":"11","author":"Dibaeinia","year":"2020","journal-title":"Cell Syst"},{"key":"2024090221350494900_btae518-B14","doi-asserted-by":"crossref","first-page":"eabl5197","DOI":"10.1126\/science.abl5197","article-title":"Cross-tissue immune cell analysis reveals tissue-specific features in humans","volume":"376","author":"Dom\u00ednguez Conde","year":"2022","journal-title":"Science"},{"key":"2024090221350494900_btae518-B15","doi-asserted-by":"crossref","first-page":"567342","DOI":"10.3389\/fimmu.2020.567342","article-title":"Single cell transcriptomics implicate novel monocyte and T cell immune dysregulation in sarcoidosis","volume":"11","author":"Garman","year":"2020","journal-title":"Front Immunol"},{"key":"2024090221350494900_btae518-B16","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1038\/s41571-020-00449-x","article-title":"Applying high-dimensional single-cell technologies to the analysis of cancer immunotherapy","volume":"18","author":"Gohil","year":"2021","journal-title":"Nat Rev Clin Oncol"},{"key":"2024090221350494900_btae518-B17","author":"Greene","year":"1994"},{"key":"2024090221350494900_btae518-B18","first-page":"723","article-title":"A kernel two-sample test","volume":"13","author":"Gretton","year":"2012","journal-title":"J Mach Learn Res"},{"key":"2024090221350494900_btae518-B19","doi-asserted-by":"crossref","first-page":"eaba1972","DOI":"10.1126\/sciadv.aba1972","article-title":"Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis","volume":"6","author":"Habermann","year":"2020","journal-title":"Sci Adv"},{"key":"2024090221350494900_btae518-B20","doi-asserted-by":"crossref","first-page":"845","DOI":"10.1038\/nmeth.3971","article-title":"Diffusion pseudotime robustly reconstructs lineage branching","volume":"13","author":"Haghverdi","year":"2016","journal-title":"Nat Methods"},{"key":"2024090221350494900_btae518-B21","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2024090221350494900_btae518-B22","doi-asserted-by":"crossref","first-page":"1481","DOI":"10.1038\/s41592-024-02305-7","article-title":"Large-scale foundation model on single-cell transcriptomics","volume":"21","author":"Hao","year":"2024","journal-title":"Nat Methods"},{"key":"2024090221350494900_btae518-B23","author":"Heimberg","year":"2023"},{"key":"2024090221350494900_btae518-B24","doi-asserted-by":"crossref","first-page":"e3000528","DOI":"10.1371\/journal.pbio.3000528","article-title":"Single-cell transcriptomics of the naked mole-rat reveals unexpected features of mammalian immunity","volume":"17","author":"Hilton","year":"2019","journal-title":"PLoS Biol"},{"key":"2024090221350494900_btae518-B25","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"2024090221350494900_btae518-B26","doi-asserted-by":"crossref","first-page":"D870","DOI":"10.1093\/nar\/gkac947","article-title":"Cellmarker 2.0: an updated database of manually curated cell markers in human\/mouse and web tools based on scRNA-seq data","volume":"51","author":"Hu","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2024090221350494900_btae518-B27","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1038\/s41568-022-00502-0","article-title":"Big data in basic and translational cancer research","volume":"22","author":"Jiang","year":"2022","journal-title":"Nat Rev Cancer"},{"key":"2024090221350494900_btae518-B28","doi-asserted-by":"crossref","first-page":"e694","DOI":"10.1002\/ctm2.694","article-title":"Single-cell RNA sequencing technologies and applications: a brief overview","volume":"12","author":"Jovic","year":"2022","journal-title":"Clin Transl Med"},{"key":"2024090221350494900_btae518-B29","doi-asserted-by":"crossref","first-page":"e2200084","DOI":"10.1002\/bies.202200084","article-title":"Single cell RNA-sequencing: a powerful yet still challenging technology to study cellular heterogeneity","volume":"44","author":"Ke","year":"2022","journal-title":"Bioessays"},{"key":"2024090221350494900_btae518-B30","author":"Kingma","year":"2013."},{"key":"2024090221350494900_btae518-B31","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1038\/s42003-022-03473-y","article-title":"LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data","volume":"5","author":"Lall","year":"2022","journal-title":"Commun Biol"},{"key":"2024090221350494900_btae518-B32","doi-asserted-by":"crossref","first-page":"i41","DOI":"10.1093\/bioinformatics\/btz321","article-title":"A statistical simulator scDesign for rational scRNA-seq experimental design","volume":"35","author":"Li","year":"2019","journal-title":"Bioinformatics"},{"key":"2024090221350494900_btae518-B33","author":"Lindenbaum"},{"key":"2024090221350494900_btae518-B34","doi-asserted-by":"crossref","first-page":"e9198","DOI":"10.15252\/msb.20199198","article-title":"Enhancing scientific discoveries in molecular biology with deep generative models","volume":"16","author":"Lopez","year":"2020","journal-title":"Mol Syst Biol"},{"key":"2024090221350494900_btae518-B35","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Methods"},{"key":"2024090221350494900_btae518-B36","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1038\/s41587-021-01001-7","article-title":"Mapping single-cell data to reference atlases by transfer learning","volume":"40","author":"Lotfollahi","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2024090221350494900_btae518-B37","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","article-title":"Benchmarking atlas-level data integration in single-cell genomics","volume":"19","author":"Luecken","year":"2022","journal-title":"Nat Methods"},{"key":"2024090221350494900_btae518-B38","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1038\/s41467-019-14018-z","article-title":"Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks","volume":"11","author":"Marouf","year":"2020","journal-title":"Nat Commun"},{"key":"2024090221350494900_btae518-B39","author":"McInnes","year":"2018"},{"key":"2024090221350494900_btae518-B40","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1186\/s13059-021-02548-z","article-title":"genebasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq","volume":"22","author":"Missarova","year":"2021","journal-title":"Genome Biol"},{"key":"2024090221350494900_btae518-B41","doi-asserted-by":"crossref","first-page":"1913","DOI":"10.1101\/gr.273300.120","article-title":"A single-cell tumor immune atlas for precision oncology","volume":"31","author":"Nieto","year":"2021","journal-title":"Genome Res"},{"key":"2024090221350494900_btae518-B42","doi-asserted-by":"crossref","first-page":"758","DOI":"10.1016\/j.cellsig.2011.11.011","article-title":"Unexpected diversity in shisa-like proteins suggests the importance of their roles as transmembrane adaptors","volume":"24","author":"Pei","year":"2012","journal-title":"Cell Signal"},{"key":"2024090221350494900_btae518-B43","doi-asserted-by":"crossref","first-page":"1304","DOI":"10.1093\/bioinformatics\/btab824","article-title":"Scrip: an accurate simulator for single-cell RNA sequencing data","volume":"38","author":"Qin","year":"2022","journal-title":"Bioinformatics"},{"key":"2024090221350494900_btae518-B44","author":"Radford"},{"key":"2024090221350494900_btae518-B45","author":"Rombach"},{"key":"2024090221350494900_btae518-B46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3446374","article-title":"Generative adversarial networks (GANs) challenges, solutions, and future directions","volume":"54","author":"Saxena","year":"2021","journal-title":"ACM Comput Surv"},{"key":"2024090221350494900_btae518-B47","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a tabula muris: the Tabula Muris Consortium","volume":"562","author":"Schaum","year":"2018","journal-title":"Nature"},{"key":"2024090221350494900_btae518-B48","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1016\/j.cell.2019.01.006","article-title":"Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming","volume":"176","author":"Schiebinger","year":"2019","journal-title":"Cell"},{"key":"2024090221350494900_btae518-B49","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1038\/s41587-023-01772-1","article-title":"scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics","volume":"42","author":"Song","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2024090221350494900_btae518-B50","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1016\/j.molcel.2019.05.003","article-title":"Single-cell RNA sequencing in cancer: lessons learned and emerging challenges","volume":"75","author":"Suv\u00e0","year":"2019","journal-title":"Mol Cell"},{"key":"2024090221350494900_btae518-B51","doi-asserted-by":"crossref","first-page":"616","DOI":"10.1038\/s41586-023-06139-9","article-title":"Transfer learning enables predictions in network biology","volume":"618","author":"Theodoris","year":"2023","journal-title":"Nature"},{"key":"2024090221350494900_btae518-B52","doi-asserted-by":"crossref","first-page":"eabl4896","DOI":"10.1126\/science.abl4896","article-title":"The Tabula Sapiens: a multiple-organ. Single-cell transcriptomic atlas of humans","volume":"376","author":"TTS Consortium*, Jones RC, Karkanias J","year":"2022","journal-title":"Science"},{"key":"2024090221350494900_btae518-B53","doi-asserted-by":"crossref","first-page":"e85","DOI":"10.1093\/nar\/gkaa506","article-title":"scIGANS: single-cell RNA-seq imputation using generative adversarial networks","volume":"48","author":"Xu","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2024090221350494900_btae518-B54","first-page":"1","author":"Yang","year":"2022"},{"key":"2024090221350494900_btae518-B55","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2024090221350494900_btae518-B56","author":"Zhang","year":"2023"},{"key":"2024090221350494900_btae518-B57","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1038\/s41421-020-0157-z","article-title":"Single-cell RNA sequencing reveals the heterogeneity of liver-resident immune cells in human","volume":"6","author":"Zhao","year":"2020","journal-title":"Cell Discov"},{"key":"2024090221350494900_btae518-B58","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae518\/58889831\/btae518.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/9\/btae518\/58998007\/btae518.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/9\/btae518\/58998007\/btae518.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,2]],"date-time":"2024-09-02T21:35:43Z","timestamp":1725312943000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae518\/7738782"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,8,22]]},"references-count":58,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2024,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae518","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,8,22]]},"article-number":"btae518"}}