{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T23:17:18Z","timestamp":1773875838003,"version":"3.50.1"},"reference-count":57,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Labex DigiCosme"},{"name":"University Paris-Saclay"},{"DOI":"10.13039\/501100001665","name":"French National Research Agency","doi-asserted-by":"publisher","award":["ANR-20-THIA-0013-01"],"award-info":[{"award-number":["ANR-20-THIA-0013-01"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Transcriptomics data are becoming more accessible due to high-throughput and less costly sequencing methods. However, data scarcity prevents exploiting deep learning models\u2019 full predictive power for phenotypes prediction. Artificially enhancing the training sets, namely data augmentation, is suggested as a regularization strategy. Data augmentation corresponds to label-invariant transformations of the training set (e.g. geometric transformations on images and syntax parsing on text data). Such transformations are, unfortunately, unknown in the transcriptomic field. Therefore, deep generative models such as generative adversarial networks (GANs) have been proposed to generate additional samples. In this article, we analyze GAN-based data augmentation strategies with respect to performance indicators and the classification of cancer phenotypes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>This work highlights a significant boost in binary and multiclass classification performances due to augmentation strategies. Without augmentation, training a classifier on only 50 RNA-seq samples yields an accuracy of, respectively, 94% and 70% for binary and tissue classification. In comparison, we achieved 98% and 94% of accuracy when adding 1000 augmented samples. Richer architectures and more expensive training of the GAN return better augmentation performances and generated data quality overall. Further analysis of the generated data shows that several performance indicators are needed to assess its quality correctly.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>All data used for this research are publicly available and comes from The Cancer Genome Atlas. Reproducible code is available on the GitLab repository: https:\/\/forge.ibisc.univ-evry.fr\/alacan\/GANs-for-transcriptomics<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad239","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T08:19:26Z","timestamp":1688113166000},"page":"i111-i120","source":"Crossref","is-referenced-by-count":30,"title":["GAN-based data augmentation for transcriptomics: survey and comparative assessment"],"prefix":"10.1093","volume":"39","author":[{"given":"Alice","family":"Lacan","sequence":"first","affiliation":[{"name":"IBISC, University Paris-Saclay (Univ. Evry) , Evry 91000, France"}]},{"given":"Mich\u00e8le","family":"Sebag","sequence":"additional","affiliation":[{"name":"TAU, CNRS-INRIA-LISN, University Paris-Saclay , Gif-sur-Yvette 91190, France"}]},{"given":"Blaise","family":"Hanczar","sequence":"additional","affiliation":[{"name":"IBISC, University Paris-Saclay (Univ. Evry) , Evry 91000, France"}]}],"member":"286","published-online":{"date-parts":[[2023,6,30]]},"reference":[{"key":"2023063008162384300_btad239-B1","author":"Akiba","year":"2019"},{"key":"2023063008162384300_btad239-B2","author":"Arjovsky","year":"2017"},{"key":"2023063008162384300_btad239-B3","author":"Arjovsky","year":"2017"},{"key":"2023063008162384300_btad239-B4","author":"Chen","year":"2020"},{"key":"2023063008162384300_btad239-B5","author":"Cubuk","year":"2019"},{"key":"2023063008162384300_btad239-B6","first-page":"219","article-title":"A deep learning approach for cancer detection and relevant gene identification","volume":"22","author":"Danaee","year":"2017","journal-title":"Pacific Symp Biocomput"},{"key":"2023063008162384300_btad239-B7","author":"Dao","year":"2018"},{"key":"2023063008162384300_btad239-B8","author":"Das","year":"2022"},{"key":"2023063008162384300_btad239-B9","article-title":"Improved regularization of convolutional neural networks with cutout","author":"Devries","year":"2017","journal-title":"CoRR"},{"key":"2023063008162384300_btad239-B10","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-seq denoising using a deep count autoencoder","volume":"10","author":"Eraslan","year":"2019","journal-title":"Nat Commun"},{"key":"2023063008162384300_btad239-B11","author":"Feng","year":"2021"},{"key":"2023063008162384300_btad239-B12","author":"Ghahramani","year":"2018"},{"key":"2023063008162384300_btad239-B13","author":"Goodfellow","year":"2014"},{"key":"2023063008162384300_btad239-B14","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1162\/neco.1997.9.5.1093","article-title":"Noise injection: theoretical prospects","volume":"9","author":"Grandvalet","year":"1997","journal-title":"Neural Comput"},{"key":"2023063008162384300_btad239-B15","doi-asserted-by":"crossref","first-page":"4415","DOI":"10.1093\/bioinformatics\/btaa293","article-title":"scVAE: variational auto-encoders for single-cell gene expression data","volume":"36","author":"Gr\u00f8nbech","year":"2020","journal-title":"Bioinformatics"},{"key":"2023063008162384300_btad239-B16","author":"Guo","year":"2017"},{"key":"2023063008162384300_btad239-B17","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MIS.2009.36","article-title":"The unreasonable effectiveness of data","volume":"24","author":"Halevy","year":"2009","journal-title":"IEEE Intell Syst"},{"key":"2023063008162384300_btad239-B18","doi-asserted-by":"crossref","first-page":"125","DOI":"10.17706\/ijbbb.2018.8.2.125-131","article-title":"Phenotypes prediction from gene expression data with deep multilayer perceptron and unsupervised pre-training","volume":"8","author":"Hanczar","year":"2018","journal-title":"IJBBB"},{"key":"2023063008162384300_btad239-B19","author":"Hawthorne","year":"2022."},{"key":"2023063008162384300_btad239-B20","author":"Hendrycks","year":"2020"},{"key":"2023063008162384300_btad239-B21","author":"Heusel","year":"2017"},{"key":"2023063008162384300_btad239-B22","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1186\/s13045-020-01005-x","article-title":"RNA sequencing: new technologies and applications in cancer research","volume":"13","author":"Hong","year":"2020","journal-title":"J Hematol Oncol"},{"key":"2023063008162384300_btad239-B23","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1080\/23808993.2016.1157686","article-title":"The path from big data to precision medicine","volume":"1","author":"Huang","year":"2016","journal-title":"Expert Rev Precision Med Drug Dev"},{"key":"2023063008162384300_btad239-B24","doi-asserted-by":"crossref","DOI":"10.1186\/s12874-018-0482-1","article-title":"DeepSurv: personalized treatment recommender system using a cox proportional hazards deep neural network","volume":"18","author":"Katzman","year":"2018","journal-title":"BMC Med Res Methodol"},{"key":"2023063008162384300_btad239-B25","doi-asserted-by":"crossref","first-page":"i389","DOI":"10.1093\/bioinformatics\/btaa462","article-title":"Improved survival analysis by learning shared genomic information from pan-cancer data","volume":"36","author":"Kim","year":"2020","journal-title":"Bioinformatics"},{"key":"2023063008162384300_btad239-B26","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1016\/j.csbj.2020.06.017","article-title":"Deep learning models in genomics; are we there yet?","volume":"18","author":"Koumakis","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2023063008162384300_btad239-B27","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1016\/j.csbj.2014.11.005","article-title":"Machine learning applications in cancer prognosis and prediction","volume":"13","author":"Kourou","year":"2015","journal-title":"Comput Struct Biotechnol J"},{"key":"2023063008162384300_btad239-B28","author":"Kynk\u00e4\u00e4nniemi","year":"2019"},{"key":"2023063008162384300_btad239-B29","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proceedings of the IEEE"},{"key":"2023063008162384300_btad239-B30","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1038\/nrg3920","article-title":"Machine learning applications in genetics and genomics","volume":"16","author":"Libbrecht","year":"2015","journal-title":"Nat Rev Genet"},{"key":"2023063008162384300_btad239-B31","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1038\/s42256-021-00333-y","article-title":"Simultaneous deep generative modelling and clustering of single-cell genomic data","volume":"3","author":"Liu","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2023063008162384300_btad239-B32","author":"Lopes","year":"2021"},{"key":"2023063008162384300_btad239-B33","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Methods"},{"key":"2023063008162384300_btad239-B34","doi-asserted-by":"crossref","first-page":"S7","DOI":"10.1038\/d41586-020-02676-9","article-title":"The RNA and protein landscape that could bring precision medicine to more people","volume":"585","author":"Makin","year":"2020","journal-title":"Nature"},{"key":"2023063008162384300_btad239-B35","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-019-14018-z","article-title":"Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks","volume":"11","author":"Marouf","year":"2020","journal-title":"Nat Commun"},{"key":"2023063008162384300_btad239-B36","author":"Mirza","year":"2014"},{"key":"2023063008162384300_btad239-B37","author":"Mounsaveng","year":"2020"},{"key":"2023063008162384300_btad239-B38","first-page":"8152","author":"Ni","year":"2021"},{"key":"2023063008162384300_btad239-B39","doi-asserted-by":"crossref","first-page":"e1008099","DOI":"10.1371\/journal.pcbi.1008099","article-title":"A practical application of generative adversarial networks for RNA-seq analysis to predict the molecular progress of Alzheimer\u2019s disease","volume":"16","author":"Park","year":"2020","journal-title":"PLoS Comput Biol"},{"key":"2023063008162384300_btad239-B40","author":"Radford","year":"2016"},{"key":"2023063008162384300_btad239-B41","author":"Salimans","year":"2016"},{"key":"2023063008162384300_btad239-B42","author":"Shao","year":"2022"},{"key":"2023063008162384300_btad239-B43","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1186\/s40537-019-0197-0","article-title":"A survey on image data augmentation for deep learning","volume":"6","author":"Shorten","year":"2019","journal-title":"J Big Data"},{"key":"2023063008162384300_btad239-B44","author":"Uddin","year":"2020"},{"key":"2023063008162384300_btad239-B45","doi-asserted-by":"crossref","first-page":"730","DOI":"10.1093\/bioinformatics\/btab035","article-title":"Adversarial generation of gene expression data","volume":"38","author":"Vi\u00f1as","year":"2022","journal-title":"Bioinformatics"},{"key":"2023063008162384300_btad239-B46","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1016\/j.gpb.2018.08.003","article-title":"Vasc: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder","volume":"16","author":"Wang","year":"2018","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2023063008162384300_btad239-B47","author":"Wang","year":"2020"},{"key":"2023063008162384300_btad239-B48","first-page":"80","article-title":"Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders","volume":"23","author":"Way","year":"2018","journal-title":"Pac Symp Biocomput"},{"key":"2023063008162384300_btad239-B49","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The cancer genome atlas pan-cancer analysis project","volume":"45","author":"Weinstein","year":"2013","journal-title":"Nat Genet"},{"key":"2023063008162384300_btad239-B50","author":"Welling","year":"2014"},{"key":"2023063008162384300_btad239-B51","first-page":"4653","author":"Wen","year":"2021"},{"key":"2023063008162384300_btad239-B52","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1016\/j.neucom.2019.12.136","article-title":"Generation and evaluation of privacy preserving synthetic health data","volume":"416","author":"Yale","year":"2020","journal-title":"Neurocomputing"},{"key":"2023063008162384300_btad239-B53","doi-asserted-by":"crossref","first-page":"e1009303","DOI":"10.1371\/journal.pgen.1009303","article-title":"Creating artificial human genomes using generative neural networks","volume":"17","author":"Yelmen","year":"2021","journal-title":"PLoS Genet"},{"key":"2023063008162384300_btad239-B54","author":"Yun","year":"2019"},{"key":"2023063008162384300_btad239-B55","first-page":"7354","author":"Zhang","year":"2019"},{"key":"2023063008162384300_btad239-B56","author":"Zhao","year":"2020"},{"key":"2023063008162384300_btad239-B57","author":"Zhu","year":"2017"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/Supplement_1\/i111\/50741877\/btad239.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/Supplement_1\/i111\/50741877\/btad239.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T08:19:51Z","timestamp":1688113191000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/39\/Supplement_1\/i111\/7210506"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":57,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2023,6,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad239","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,6,1]]},"published":{"date-parts":[[2023,6,1]]}}}