{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,4]],"date-time":"2025-06-04T10:49:00Z","timestamp":1749034140704,"version":"3.37.3"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"Supplement_2","license":[{"start":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T00:00:00Z","timestamp":1725408000000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004281","name":"National Science Centre Poland","doi-asserted-by":"crossref","award":["UMO-2020\/38\/E\/NZ2\/00598"],"award-info":[{"award-number":["UMO-2020\/38\/E\/NZ2\/00598"]}],"id":[{"id":"10.13039\/501100004281","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Analysis of the omics data with the help of machine learning (ML) methods is limited by small sample sizes and a large number of variables. One possible approach to deal with such data is using algorithms for feature selection and reducing the dataset to include only those variables that are related to the studied phenomena. Existing simulators of the omics data were mostly developed with the goal of improving the methods for generations of high-quality data, that correspond with the highest possible fidelity to the real level of molecular markers in the biological materials. The current study aims to simulate the data on a higher level of generalization. Such datasets can then be used to perform tests of the feature selection and ML algorithms on systems that have structures mimicking those of real data, but where the ground truth may be implanted by design. They can also be used to generate contrast variables with the desired correlation structure for the feature selection.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We proposed the algorithm for the reconstruction of the omic dataset that, with high fidelity, preserves the correlation structure of the original data with a reduced number of parameters. It is based on the hierarchical clustering of variables and uses principal components of the clusters. It reproduces well topological descriptors of the correlation structure. The correlation structure of the principal components of the clusters then is used to obtain datasets with correlation structures similar to the original data but not correlated with the original variables.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The code and data is available at: https:\/\/github.com\/p100mma\/hcrs_omics.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae392","type":"journal-article","created":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T07:37:17Z","timestamp":1725521837000},"page":"ii98-ii104","source":"Crossref","is-referenced-by-count":1,"title":["HCS\u2014hierarchical algorithm for simulation of omics datasets"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2603-8205","authenticated-orcid":false,"given":"Piotr","family":"Stomma","sequence":"first","affiliation":[{"name":"Faculty of Computer Science, University of Bia\u0142ystok , Bia\u0142ystok 15-245, Poland"},{"name":"Computational Centre, University of Bia\u0142ystok , Bia\u0142ystok 15-245, Poland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7928-4944","authenticated-orcid":false,"given":"Witold R","family":"Rudnicki","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science, University of Bia\u0142ystok , Bia\u0142ystok 15-245, Poland"},{"name":"Computational Centre, University of Bia\u0142ystok , Bia\u0142ystok 15-245, Poland"}]}],"member":"286","published-online":{"date-parts":[[2024,9,4]]},"reference":[{"key":"2024090413582825600_btae392-B1","doi-asserted-by":"crossref","first-page":"1499","DOI":"10.1038\/nbt1205-1499","article-title":"How does gene expression clustering work?","volume":"23","author":"D\u2019haeseleer","year":"2005","journal-title":"Nat Biotechnol"},{"year":"2021","author":"Faber","key":"2024090413582825600_btae392-B2"},{"key":"2024090413582825600_btae392-B3","doi-asserted-by":"crossref","first-page":"951582","DOI":"10.3389\/fimmu.2022.951582","article-title":"Integrative analysis from multicenter studies identifies a WGCNA-derived cancer-associated fibroblast signature for ovarian cancer","volume":"13","author":"Feng","year":"2022","journal-title":"Front Immunol"},{"key":"2024090413582825600_btae392-B4","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.physrep.2009.11.002","article-title":"Community detection in graphs","volume":"486","author":"Fortunato","year":"2010","journal-title":"Phys Rep"},{"key":"2024090413582825600_btae392-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.physrep.2016.09.002","article-title":"Community detection in networks: a user guide","volume":"659","author":"Fortunato","year":"2016","journal-title":"Phys Rep"},{"key":"2024090413582825600_btae392-B6","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1137\/090771806","article-title":"Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions","volume":"53","author":"Halko","year":"2011","journal-title":"SIAM Rev"},{"key":"2024090413582825600_btae392-B7","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1093\/imanum\/22.3.329","article-title":"Computing the nearest correlation matrix\u2013a problem from finance","volume":"22","author":"Higham","year":"2002","journal-title":"IMA J Numer Anal"},{"key":"2024090413582825600_btae392-B8","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1002\/wics.18","article-title":"Cholesky factorization","volume":"1","author":"Higham","year":"2009","journal-title":"WIREs Comput Stats"},{"key":"2024090413582825600_btae392-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/nar\/gkn923","article-title":"Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists","volume":"37","author":"Huang","year":"2009","journal-title":"Nucleic Acids Res"},{"volume-title":"Principal Component Analysis","year":"2002","author":"Jolliffe","key":"2024090413582825600_btae392-B10"},{"key":"2024090413582825600_btae392-B11","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1287\/deca.2016.0338","article-title":"The metalog distributions","volume":"13","author":"Keelin","year":"2016","journal-title":"Dec Anal"},{"key":"2024090413582825600_btae392-B12","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1186\/1471-2105-9-559","article-title":"WGCNA: an R package for weighted correlation network analysis","volume":"9","author":"Langfelder","year":"2009","journal-title":"BMC Bioinform"},{"year":"2016","author":"Langfelder","key":"2024090413582825600_btae392-B13"},{"key":"2024090413582825600_btae392-B14","doi-asserted-by":"crossref","first-page":"11479","DOI":"10.1038\/ncomms11479","article-title":"The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes","volume":"7","author":"Pereira","year":"2016","journal-title":"Nat Commun"},{"key":"2024090413582825600_btae392-B15","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1007\/s10916-021-01718-7","article-title":"Robust data integration method for classification of biomedical data","volume":"45","author":"Polewko-Klim","year":"2021","journal-title":"J Med Syst"},{"volume-title":"Numerical Recipes 3rd Edition: The Art of Scientific Computing","year":"2007","author":"Press","key":"2024090413582825600_btae392-B16"},{"key":"2024090413582825600_btae392-B17","doi-asserted-by":"crossref","first-page":"1118","DOI":"10.1073\/pnas.0706851105","article-title":"Maps of random walks on complex networks reveal community structure","volume":"105","author":"Rosvall","year":"2008","journal-title":"Proc Natl Acad Sci"},{"key":"2024090413582825600_btae392-B18","doi-asserted-by":"crossref","first-page":"1090","DOI":"10.1038\/s41467-018-03424-4","article-title":"A comprehensive evaluation of module detection methods for gene expression data","volume":"9","author":"Saelens","year":"2018","journal-title":"Nat Commun"},{"key":"2024090413582825600_btae392-B19","doi-asserted-by":"crossref","first-page":"i473","DOI":"10.1093\/bioinformatics\/bts370","article-title":"Identifying functional modules in interaction networks through overlapping Markov clustering","volume":"28","author":"Shih","year":"2012","journal-title":"Bioinformatics"},{"key":"2024090413582825600_btae392-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-13-328","article-title":"Comparison of co-expression measures: mutual information, correlation, and model based indices","volume":"13","author":"Song","year":"2012","journal-title":"BMC Bioinform"},{"key":"2024090413582825600_btae392-B21","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"2024090413582825600_btae392-B22","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1186\/s13059-021-02367-2","article-title":"scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured","volume":"22","author":"Sun","year":"2021","journal-title":"Genome Biol"},{"key":"2024090413582825600_btae392-B23","doi-asserted-by":"crossref","first-page":"6805","DOI":"10.2147\/OTT.S258439","article-title":"Identification of important modules and biomarkers in breast cancer based on WGCNA","volume":"13","author":"Tian","year":"2020","journal-title":"OncoTargets Ther"},{"key":"2024090413582825600_btae392-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2164-13-535","article-title":"Genefriends: an online co-expression analysis tool to identify novel gene targets for aging and complex diseases","volume":"13","author":"van Dam","year":"2012","journal-title":"BMC Genomics"},{"key":"2024090413582825600_btae392-B25","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1137\/040608635","article-title":"Graph clustering via a discrete uncoupling process","volume":"30","author":"Van Dongen","year":"2008","journal-title":"SIAM J Matrix Anal Appl"},{"key":"2024090413582825600_btae392-B26","doi-asserted-by":"crossref","first-page":"1974","DOI":"10.1093\/bioinformatics\/btv088","article-title":"Identification of cell types from single-cell transcriptomes using a novel clustering method","volume":"31","author":"Xu","year":"2015","journal-title":"Bioinformatics"},{"key":"2024090413582825600_btae392-B27","first-page":"17","article-title":"A general framework for weighted gene co-expression network analysis","volume":"4","author":"Zhang","year":"2005","journal-title":"Stat Appl Gen Mol Biol"},{"key":"2024090413582825600_btae392-B28","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1198\/106186006X113430","article-title":"Sparse principal component analysis","volume":"15","author":"Zou","year":"2006","journal-title":"J Comput Graph Stat"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_2\/ii98\/59016914\/btae392.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_2\/ii98\/59016914\/btae392.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T07:37:36Z","timestamp":1725521856000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_2\/ii98\/7749068"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,1]]},"references-count":28,"journal-issue":{"issue":"Supplement_2","published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae392","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,9,1]]}}}