{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:25:10Z","timestamp":1772173510298,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1013392","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T00:00:00Z","timestamp":1757462400000}}],"reference-count":34,"publisher":"Public Library of Science (PLoS)","issue":"9","license":[{"start":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T00:00:00Z","timestamp":1757030400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000065","name":"National Institute of Neurological Disorders and Stroke","doi-asserted-by":"publisher","award":["5R01NS048471"],"award-info":[{"award-number":["5R01NS048471"]}],"id":[{"id":"10.13039\/100000065","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"publisher","award":["5UL1TR000003"],"award-info":[{"award-number":["5UL1TR000003"]}],"id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Simulation of realistic omics data is a key input for benchmarking studies that help users obtain optimal computational pipelines. Omics data involves large numbers of measured features on each sample and these measures are generally correlated with each other. However, simulation too often ignores these correlations, perhaps due to computational and statistical hurdles of doing so. To alleviate this, we describe three approaches for generating omics-scale data with correlated measures which mimic real datasets. These approaches are all based on a Gaussian copula approach with a covariance matrix that decomposes into a diagonal part and a low-rank part. This decomposition allows for extremely efficient simulation, overcoming a hurdle for adoption of past methods. We use these approaches to demonstrate the importance of including correlation in two benchmarking applications. First, we show that variance of results from the popular DESeq2 method increases when dependence is included. Second, we demonstrate that CYCLOPS, a method for inferring circadian time of collection from transcriptomics, improves in performance when given gene-gene dependencies in some circumstances. We provide an R package, dependentsimr, that has efficient implementations of these methods and can generate dependent data with arbitrary marginal distributions, including discrete (binary, ordered categorical, Poisson, negative binomial), continuous (normal), or with an empirical distribution.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1013392","type":"journal-article","created":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T18:00:32Z","timestamp":1757095232000},"page":"e1013392","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":0,"title":["Generating correlated data for omics simulation"],"prefix":"10.1371","volume":"21","author":[{"given":"Jianing","family":"Yang","sequence":"first","affiliation":[]},{"given":"Gregory R.","family":"Grant","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6980-0079","authenticated-orcid":true,"given":"Thomas G.","family":"Brooks","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2025,9,5]]},"reference":[{"issue":"5","key":"pcbi.1013392.ref001","doi-asserted-by":"crossref","first-page":"326","DOI":"10.1038\/s41576-023-00679-6","article-title":"Challenges and best practices in omics benchmarking","volume":"25","author":"TG Brooks","year":"2024","journal-title":"Nat Rev Genet."},{"key":"pcbi.1013392.ref002","doi-asserted-by":"crossref","unstructured":"Nelsen R. An introduction to copulas. New York (NY): Springer. 1998.","DOI":"10.1007\/978-1-4757-3076-0"},{"key":"pcbi.1013392.ref003","unstructured":"Cario CM, Nelson BL. Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. 1997."},{"key":"pcbi.1013392.ref004","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1252","article-title":"Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach","volume":"6","author":"R Opgen-Rhein","year":"2007","journal-title":"Stat Appl Genet Mol Biol."},{"key":"pcbi.1013392.ref005","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1175","article-title":"A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics","volume":"4","author":"J Sch\u00c3\u00a4fer","year":"2005","journal-title":"Stat Appl Genet Mol Biol."},{"issue":"494","key":"pcbi.1013392.ref006","doi-asserted-by":"crossref","first-page":"672","DOI":"10.1198\/jasa.2011.tm10560","article-title":"Adaptive Thresholding for Sparse Covariance Matrix Estimation","volume":"106","author":"T Cai","year":"2011","journal-title":"J Am Stat Assoc."},{"issue":"3","key":"pcbi.1013392.ref007","doi-asserted-by":"crossref","DOI":"10.1214\/009053606000000281","article-title":"High-dimensional graphs and variable selection with the Lasso","volume":"34","author":"N Meinshausen","year":"2006","journal-title":"Ann Statist."},{"issue":"6","key":"pcbi.1013392.ref008","article-title":"Random variables, joint distribution functions, and copulas","volume":"9","author":"ABE Sklar","year":"1973","journal-title":"Kybernetica."},{"issue":"3","key":"pcbi.1013392.ref009","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1177\/0306312713517157","article-title":"\u00e2\u20ac\u0153The formula that killed Wall Street\u00e2\u20ac#157;: the Gaussian copula and modelling practices in investment banking","volume":"44","author":"D MacKenzie","year":"2014","journal-title":"Soc Stud Sci."},{"issue":"30","key":"pcbi.1013392.ref010","doi-asserted-by":"crossref","first-page":"4869","DOI":"10.1002\/sim.8758","article-title":"High-dimensional integrative copula discriminant analysis for multiomics data","volume":"39","author":"Y He","year":"2020","journal-title":"Stat Med."},{"issue":"2","key":"pcbi.1013392.ref011","doi-asserted-by":"crossref","first-page":"1559","DOI":"10.1111\/biom.13701","article-title":"Flexible copula model for integrating correlated multi-omics data from single-cell experiments","volume":"79","author":"Z Ma","year":"2023","journal-title":"Biometrics."},{"issue":"7","key":"pcbi.1013392.ref012","article-title":"Inference of microbial covariation networks using copula models with mixture margins","volume":"39","author":"RA Deek","year":"2023","journal-title":"Bioinformatics."},{"issue":"10","key":"pcbi.1013392.ref013","doi-asserted-by":"crossref","first-page":"3276","DOI":"10.1093\/bioinformatics\/btaa105","article-title":"SPsimSeq: semi-parametric simulation of bulk and single-cell RNA-sequencing data","volume":"36","author":"AT Assefa","year":"2020","journal-title":"Bioinformatics."},{"issue":"1","key":"pcbi.1013392.ref014","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1186\/s13059-021-02367-2","article-title":"scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured","volume":"22","author":"T Sun","year":"2021","journal-title":"Genome Biol."},{"key":"pcbi.1013392.ref015","unstructured":"Fuetterer C, Schollmeyer G, Augustin T. Constructing simulation data with dependency structure for unreliable single-cell RNA-sequencing data using copulas. In: International Symposium on Imprecise Probabilities: Theories and Applications. 2019, p. 216\u201324."},{"issue":"2","key":"pcbi.1013392.ref016","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1038\/s41587-023-01772-1","article-title":"scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics","volume":"42","author":"D Song","year":"2024","journal-title":"Nat Biotechnol."},{"issue":"12","key":"pcbi.1013392.ref017","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v098.i12","article-title":"SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data","volume":"98","author":"T Grimes","year":"2021","journal-title":"J Stat Softw."},{"issue":"12","key":"pcbi.1013392.ref018","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"MI Love","year":"2014","journal-title":"Genome Biol."},{"issue":"20","key":"pcbi.1013392.ref019","doi-asserted-by":"crossref","first-page":"5312","DOI":"10.1073\/pnas.1619320114","article-title":"CYCLOPS reveals human transcriptional rhythms in health and disease","volume":"114","author":"RC Anafi","year":"2017","journal-title":"Proc Natl Acad Sci U S A."},{"issue":"20","key":"pcbi.1013392.ref020","doi-asserted-by":"crossref","DOI":"10.1016\/j.neuron.2022.09.028","article-title":"Mapping brain gene coexpression in daytime transcriptomes unveils diurnal molecular networks and deciphers perturbation gene signatures","volume":"110","author":"N Wang","year":"2022","journal-title":"Neuron."},{"issue":"21","key":"pcbi.1013392.ref021","doi-asserted-by":"crossref","first-page":"9546","DOI":"10.1073\/pnas.0914005107","article-title":"Independent filtering increases detection power for high-throughput experiments","volume":"107","author":"R Bourgon","year":"2010","journal-title":"Proc Natl Acad Sci U S A."},{"issue":"1","key":"pcbi.1013392.ref022","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing","volume":"57","author":"Y Benjamini","year":"1995","journal-title":"J R Stat Soc Ser B Stat Method."},{"issue":"4","key":"pcbi.1013392.ref023","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1214\/aos\/1013699998","article-title":"The control of the false discovery rate in multiple testing under dependency","volume":"29","author":"Y Benjamini","year":"2001","journal-title":"Ann Stat."},{"issue":"7","key":"pcbi.1013392.ref024","article-title":"Tumor circadian clock strength influences metastatic potential and predicts patient prognosis in luminal A breast cancer","volume":"121","author":"S-Y Li","year":"2024","journal-title":"Proc Natl Acad Sci U S A."},{"issue":"45","key":"pcbi.1013392.ref025","doi-asserted-by":"crossref","first-page":"16219","DOI":"10.1073\/pnas.1408886111","article-title":"A circadian gene expression atlas in mammals: implications for biology and medicine","volume":"111","author":"R Zhang","year":"2014","journal-title":"Proc Natl Acad Sci U S A."},{"issue":"2","key":"pcbi.1013392.ref026","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1093\/biomet\/70.2.327","article-title":"A correlation coefficient for circular data","volume":"70","author":"NI FISHER","year":"1983","journal-title":"Biometrika."},{"key":"pcbi.1013392.ref027","unstructured":"Brooks TG. Sampling spiked Wishart eigenvalues. 2024. Available from: https:\/\/arxiv.org\/abs\/2410.05280"},{"issue":"1","key":"pcbi.1013392.ref028","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1186\/s13059-019-1716-1","article-title":"A practical guide to methods controlling false discoveries in computational biology","volume":"20","author":"K Korthauer","year":"2019","journal-title":"Genome Biol."},{"issue":"3","key":"pcbi.1013392.ref029","article-title":"Copulae: An overview and recent developments","volume":"14","author":"J Gr\u00c3\u00b6\u00c3\u0178er","year":"2021","journal-title":"WIREs Computat Stats."},{"key":"pcbi.1013392.ref030","unstructured":"Fasiolo M. An Introduction to Mvnfast. R Package Version 0.1.6. 2016. Available from: https:\/\/CRAN.R-project.org\/package=mvnfast."},{"issue":"521","key":"pcbi.1013392.ref031","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1080\/01621459.2016.1247002","article-title":"Block-Diagonal Covariance Selection for High-Dimensional Gaussian Graphical Models","volume":"113","author":"E Devijver","year":"2017","journal-title":"J Am Stat Assoc."},{"issue":"4","key":"pcbi.1013392.ref032","doi-asserted-by":"crossref","first-page":"807","DOI":"10.1093\/biomet\/asr054","article-title":"Sparse estimation of a covariance matrix","volume":"98","author":"J Bien","year":"2011","journal-title":"Biometrika."},{"issue":"6","key":"pcbi.1013392.ref033","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1214\/08-AOS600","article-title":"Covariance regularization by thresholding.","volume":"36","author":"PJ Bickel","year":"2008","journal-title":"Ann Statist."},{"issue":"2","key":"pcbi.1013392.ref034","doi-asserted-by":"crossref","DOI":"10.1137\/130920587","article-title":"Preconditioned Krylov Subspace Methods for Sampling Multivariate Gaussian Distributions","volume":"36","author":"E Chow","year":"2014","journal-title":"SIAM J Sci Comput."}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1013392","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T00:00:00Z","timestamp":1757462400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013392","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T18:00:20Z","timestamp":1757527220000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013392"}},"subtitle":[],"editor":[{"given":"Jie","family":"Liu","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,9,5]]},"references-count":34,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9,5]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1013392","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2025.01.31.634335","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,5]]}}}