{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,10]],"date-time":"2026-05-10T10:16:31Z","timestamp":1778408191981,"version":"3.51.4"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2022,2,18]],"date-time":"2022-02-18T00:00:00Z","timestamp":1645142400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R15-HL146779"],"award-info":[{"award-number":["R15-HL146779"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01-GM126548"],"award-info":[{"award-number":["R01-GM126548"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DMS-1840265"],"award-info":[{"award-number":["DMS-1840265"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"University of California Office of the President and University of California Merced COVID-19"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,4,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Single-cell RNA sequencing (scRNAseq) technologies allow for measurements of gene expression at a single-cell resolution. This provides researchers with a tremendous advantage for detecting heterogeneity, delineating cellular maps or identifying rare subpopulations. However, a critical complication remains: the low number of single-cell observations due to limitations by rarity of subpopulation, tissue degradation or cost. This absence of sufficient data may cause inaccuracy or irreproducibility of downstream analysis. In this work, we present Automated Cell-Type-informed Introspective Variational Autoencoder (ACTIVA): a novel framework for generating realistic synthetic data using a single-stream adversarial variational autoencoder conditioned with cell-type information. Within a single framework, ACTIVA can enlarge existing datasets and generate specific subpopulations on demand, as opposed to two separate models [such as single-cell GAN (scGAN) and conditional scGAN (cscGAN)]. Data generation and augmentation with ACTIVA can enhance scRNAseq pipelines and analysis, such as benchmarking new algorithms, studying the accuracy of classifiers and detecting marker genes. ACTIVA will facilitate analysis of smaller datasets, potentially reducing the number of patients and animals necessary in initial studies.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We train and evaluate models on multiple public scRNAseq datasets. In comparison to GAN-based models (scGAN and cscGAN), we demonstrate that ACTIVA generates cells that are more realistic and harder for classifiers to identify as synthetic which also have better pair-wise correlation between genes. Data augmentation with ACTIVA significantly improves classification of rare subtypes (more than 45% improvement compared with not augmenting and 4% better than cscGAN) all while reducing run-time by an order of magnitude in comparison to both models.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The codes and datasets are hosted on Zenodo (https:\/\/doi.org\/10.5281\/zenodo.5879639). Tutorials are available at https:\/\/github.com\/SindiLab\/ACTIVA.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac095","type":"journal-article","created":{"date-parts":[[2022,2,15]],"date-time":"2022-02-15T15:26:09Z","timestamp":1644938769000},"page":"2194-2201","source":"Crossref","is-referenced-by-count":34,"title":["<i>ACTIVA<\/i>\n                    : realistic single-cell RNA-seq generation with automatic cell-type identification using introspective variational autoencoders"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9141-3809","authenticated-orcid":false,"given":"A Ali","family":"Heydari","sequence":"first","affiliation":[{"name":"Department of Applied Mathematics, University of California , Merced, CA 95343, USA"},{"name":"Health Sciences Research Institute, University of California , Merced, CA 95343, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Oscar A","family":"Davalos","sequence":"additional","affiliation":[{"name":"Health Sciences Research Institute, University of California , Merced, CA 95343, USA"},{"name":"Quantitative and Systems Biology Graduate Program, University of California , Merced, CA 95343, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5985-0614","authenticated-orcid":false,"given":"Lihong","family":"Zhao","sequence":"additional","affiliation":[{"name":"Department of Applied Mathematics, University of California , Merced, CA 95343, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Katrina K","family":"Hoyer","sequence":"additional","affiliation":[{"name":"Health Sciences Research Institute, University of California , Merced, CA 95343, USA"},{"name":"Department of Molecular and Cell Biology, University of California , Merced, CA 95343, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2742-4332","authenticated-orcid":false,"given":"Suzanne S","family":"Sindi","sequence":"additional","affiliation":[{"name":"Department of Applied Mathematics, University of California , Merced, CA 95343, USA"},{"name":"Health Sciences Research Institute, University of California , Merced, CA 95343, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,2,18]]},"reference":[{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1186\/s13059-019-1795-z","article-title":"A comparison of automatic cell identification methods for single-cell RNA sequencing data","volume":"20","author":"Abdelaal","year":"2019","journal-title":"Genome Biol"},{"key":"2023021801091263800_","article-title":"Towards principled methods for training generative adversarial networks","author":"Arjovsky","year":"2017","journal-title":"arXiv"},{"key":"2023021801091263800_","first-page":"214","article-title":"Wasserstein generative adversarial networks","volume":"70","author":"Arjovsky","year":"2017","journal-title":"Proc. Mach. Learn. Res"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"3276","DOI":"10.1093\/bioinformatics\/btaa105","article-title":"SPsimSeq: semi-parametric simulation of bulk and single-cell RNA-sequencing data","volume":"36","author":"Assefa","year":"2020","journal-title":"Bioinformatics"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"2131","DOI":"10.1093\/bioinformatics\/btv124","article-title":"SimSeq: a nonparametric approach to simulation of RNA-sequence datasets","volume":"31","author":"Benidt","year":"2015","journal-title":"Bioinformatics"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1038\/nrn3475","article-title":"Power failure: why small sample size undermines the reliability of neuroscience","volume":"14","author":"Button","year":"2013","journal-title":"Nat. Rev. Neurosci"},{"key":"2023021801091263800_","article-title":"Training generative neural networks via maximum mean discrepancy optimization","author":"Dziugaite","year":"2015"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"2778","DOI":"10.1093\/bioinformatics\/btv272","article-title":"Polyester: simulating RNA-seq datasets with differential transcript expression","volume":"31","author":"Frazee","year":"2015","journal-title":"Bioinformatics"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1186\/s12859-020-3450-9","article-title":"Data-based RNA-seq simulations by binomial thinning","volume":"21","author":"Gerard","year":"2020","journal-title":"BMC Bioinformatics"},{"key":"2023021801091263800_","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023021801091263800_","first-page":"723","article-title":"A kernel two-sample test","volume":"13","author":"Gretton","year":"2012","journal-title":"J. Mach. Learn. Res"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1016\/j.cell.2018.02.001","article-title":"Mapping the mouse cell atlas by Microwell-seq","volume":"172","author":"Han","year":"2018","journal-title":"Cell"},{"key":"2023021801091263800_","author":"He","year":"2019"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1016\/j.immuni.2020.12.011","article-title":"Neurological manifestations of COVID-19 feature T-cell exhaustion and dedifferentiated monocytes in cerebrospinal fluid","volume":"54","author":"Heming","year":"2021","journal-title":"Immunity"},{"key":"2023021801091263800_","article-title":"SoftAdapt: techniques for adaptive loss weighting of neural networks with multi-part loss functions","author":"Heydari","year":"2019","journal-title":"CoRR"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","DOI":"10.1117\/12.2559808","article-title":"SRVAE: super resolution using variational autoencoders","author":"Heydari","year":"2020"},{"key":"2023021801091263800_","article-title":"IntroVAE: introspective variational autoencoders for photographic image synthesis","volume":"31","author":"Huang","year":"2018"},{"key":"2023021801091263800_","article-title":"Auto-encoding variational Bayes","author":"Kingma","year":"2013"},{"key":"2023021801091263800_","first-page":"1400","volume-title":"Advances in Neural Information Processing Systems","author":"Lindenbaum","year":"2018"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"i99","DOI":"10.1093\/bioinformatics\/btz317","article-title":"hicGAN infers super resolution Hi-C data with generative adversarial networks","volume":"35","author":"Liu","year":"2019","journal-title":"Bioinformatics"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat. Methods"},{"key":"2023021801091263800_","first-page":"698","volume-title":"Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS\u201918","author":"Lucic","year":"2018"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1093\/bioinformatics\/btz592","article-title":"ACTINN: automated identification of cell types in single cell RNA sequencing","volume":"36","author":"Ma","year":"2020","journal-title":"Bioinformatics"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1038\/s41467-019-14018-z","article-title":"Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks","volume":"11","author":"Marouf","year":"2020","journal-title":"Nat. Commun"},{"key":"2023021801091263800_","article-title":"Unrolled generative adversarial networks","author":"Metz","year":"2016"},{"key":"2023021801091263800_","article-title":"cGANs with projection discriminator","author":"Miyato","year":"2018"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"e27041","DOI":"10.7554\/eLife.27041","article-title":"The human cell atlas","volume":"6","author":"Regev","year":"2017","journal-title":"Elife"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1186\/s40537-019-0197-0","article-title":"A survey on image data augmentation for deep learning","volume":"6","author":"Shorten","year":"2019","journal-title":"J. Big Data"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1038\/nmeth.1315","article-title":"mRNA-seq whole-transcriptome analysis of a single cell","volume":"6","author":"Tang","year":"2009","journal-title":"Nat. Methods"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1186\/s13578-019-0314-y","article-title":"The single-cell sequencing: new developments and medical applications","volume":"9","author":"Tang","year":"2019","journal-title":"Cell Biosci"},{"key":"2023021801091263800_","author":"Theis","year":"2016"},{"key":"2023021801091263800_","article-title":"Generative adversarial networks in computer vision: a survey and taxonomy","author":"Wang","year":"2019"},{"key":"2023021801091263800_","first-page":"435","article-title":"Cost-sensitive learning by cost-proportionate example weighting","author":"Zadrozny","year":"2003"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"2611","DOI":"10.1038\/s41467-019-10500-w","article-title":"Simulating multiple faceted variability in single cell RNA sequencing","volume":"10","author":"Zhang","year":"2019","journal-title":"Nat. Commun"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"},{"key":"2023021801091263800_","doi-asserted-by":"crossref","first-page":"153905","DOI":"10.1109\/ACCESS.2020.3018228","article-title":"Conditional introspective variational autoencoder for image synthesis","volume":"8","author":"Zheng","year":"2020","journal-title":"IEEE Access"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac095\/42628775\/btac095.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2194\/49010211\/btac095.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2194\/49010211\/btac095.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T09:52:24Z","timestamp":1700214744000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/8\/2194\/6531957"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,2,18]]},"references-count":37,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4,12]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac095","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.01.28.428725","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,4,15]]},"published":{"date-parts":[[2022,2,18]]}}}