{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T12:38:14Z","timestamp":1763642294773,"version":"3.37.3"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2017,10,12]],"date-time":"2017-10-12T00:00:00Z","timestamp":1507766400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Microarray technology can be used to study the expression of thousands of genes across a number of different experimental conditions, usually hundreds. The underlying principle is that genes sharing similar expression patterns, across different samples, can be part of the same co-expression system, or they may share the same biological functions. Groups of genes are usually identified based on cluster analysis. Clustering methods rely on the similarity matrix between genes. A common choice to measure similarity is to compute the sample correlation matrix. Dimensionality reduction is another popular data analysis task which is also based on covariance\/correlation matrix estimates. Unfortunately, covariance\/correlation matrix estimation suffers from the intrinsic noise present in high-dimensional data. Sources of noise are: sampling variations, presents of outlying sample units, and the fact that in most cases the number of units is much larger than the number of genes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this paper, we propose a robust correlation matrix estimator that is regularized based on adaptive thresholding. The resulting method jointly tames the effects of the high-dimensionality, and data contamination. Computations are easy to implement and do not require hand tunings. Both simulated and real data are analyzed. A Monte Carlo experiment shows that the proposed method is capable of remarkable performances. Our correlation metric is more robust to outliers compared with the existing alternatives in two gene expression datasets. It is also shown how the regularization allows to automatically detect and filter spurious correlations. The same regularization is also extended to other less robust correlation measures. Finally, we apply the ARACNE algorithm on the SyNTreN gene expression data. Sensitivity and specificity of the reconstructed network is compared with the gold standard. We show that ARACNE performs better when it takes the proposed correlation matrix estimator as input.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The R software is available at https:\/\/github.com\/angy89\/RobustSparseCorrelation.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx642","type":"journal-article","created":{"date-parts":[[2017,10,10]],"date-time":"2017-10-10T11:10:27Z","timestamp":1507633827000},"page":"625-634","source":"Crossref","is-referenced-by-count":25,"title":["Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data"],"prefix":"10.1093","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3374-1492","authenticated-orcid":false,"given":"Angela","family":"Serra","sequence":"first","affiliation":[{"name":"NeuRoNeLab, Department of Management and Innovation Systems, University of Salerno, Fisciano (Sa), Italy"}]},{"given":"Pietro","family":"Coretto","sequence":"additional","affiliation":[{"name":"STATLAB, Department of Economics and Statistics, University of Salerno, Fisciano (Sa), Italy"}]},{"given":"Michele","family":"Fratello","sequence":"additional","affiliation":[{"name":"Department of Medical, Surgical, Neurological, Metabolic and Ageing Sciences, Second University of Napoli, Piazza Luigi Miraglia, 2 Napoli, Italy"}]},{"given":"Roberto","family":"Tagliaferri","sequence":"additional","affiliation":[{"name":"NeuRoNeLab, Department of Management and Innovation Systems, University of Salerno, Fisciano (Sa), Italy"}]}],"member":"286","published-online":{"date-parts":[[2017,10,12]]},"reference":[{"key":"2023012712322036500_btx642-B1","doi-asserted-by":"crossref","first-page":"455.","DOI":"10.2307\/2349088","article-title":"On a robust correlation coefficient","volume":"39","author":"Abdullah","year":"1990","journal-title":"Statistician"},{"key":"2023012712322036500_btx642-B2","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/0024-3795(85)90049-7","article-title":"Maximum-likelihood estimation of the parameters of a multivariate normal distribution","volume":"70","author":"Anderson","year":"1985","journal-title":"Linear Algebra Appl"},{"key":"2023012712322036500_btx642-B3","doi-asserted-by":"crossref","first-page":"e9.","DOI":"10.1371\/journal.pbio.0020009","article-title":"Similarities and differences in genome-wide expression data of six organisms","volume":"2","author":"Bergmann","year":"2004","journal-title":"PLoS Biol"},{"key":"2023012712322036500_btx642-B4","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1093\/bioinformatics\/btg092","article-title":"Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically","volume":"19","author":"Bickel","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012712322036500_btx642-B5","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1214\/08-AOS600","article-title":"Covariance regularization by thresholding","volume":"36","author":"Bickel","year":"2008","journal-title":"Ann. Stat"},{"key":"2023012712322036500_btx642-B6","doi-asserted-by":"crossref","first-page":"672","DOI":"10.1198\/jasa.2011.tm10560","article-title":"Adaptive thresholding for sparse covariance matrix estimation","volume":"106","author":"Cai","year":"2011","journal-title":"J. Am. Stat. Assoc"},{"key":"2023012712322036500_btx642-B7","doi-asserted-by":"crossref","first-page":"594","DOI":"10.1198\/jasa.2011.tm10155","article-title":"A constrained l1 minimization approach to sparse precision matrix estimation","volume":"106","author":"Cai","year":"2011","journal-title":"J. Am. Stat. Assoc"},{"key":"2023012712322036500_btx642-B8","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1137\/060670985","article-title":"First-order methods for sparse covariance selection","volume":"30","author":"D\u2019aspremont","year":"2008","journal-title":"SIAM J. Matrix Anal. Appl"},{"key":"2023012712322036500_btx642-B9","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1007\/978-1-4615-5345-8_22","volume-title":"Information Processing in Cells and Tissues","author":"D\u2019haeseleer","year":"1998"},{"key":"2023012712322036500_btx642-B10","first-page":"2717","article-title":"Operator norm consistent estimation of large-dimensional sparse covariance matrices","volume":"36","author":"El Karoui","year":"2008","journal-title":"Ann. Stat"},{"key":"2023012712322036500_btx642-B11","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1038\/ng.3173","article-title":"Gene expression analysis identifies global gene dosage sensitivity in cancer","volume":"47","author":"Fehrmann","year":"2015","journal-title":"Nat. Genet"},{"key":"2023012712322036500_btx642-B12","doi-asserted-by":"crossref","first-page":"432","DOI":"10.1093\/biostatistics\/kxm045","article-title":"Sparse inverse covariance estimation with the graphical lasso","volume":"9","author":"Friedman","year":"2008","journal-title":"Biostatistics"},{"key":"2023012712322036500_btx642-B13","doi-asserted-by":"crossref","first-page":"81","DOI":"10.2307\/2528963","article-title":"Robust estimates, residuals, and outlier detection with multiresponse data","volume":"28","author":"Gnanadesikan","year":"1972","journal-title":"Biometrics"},{"key":"2023012712322036500_btx642-B14","doi-asserted-by":"crossref","first-page":"220.","DOI":"10.1186\/1471-2105-8-220","article-title":"A robust measure of correlation between two genes on a microarray","volume":"8","author":"Hardin","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012712322036500_btx642-B15","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1093\/imanum\/22.3.329","article-title":"Computing the nearest correlation matrix \u2013 a problem from finance","volume":"22","author":"Higham","year":"2002","journal-title":"IMA J. Numer. Anal"},{"key":"2023012712322036500_btx642-B16","first-page":"1248","volume-title":"Robust statistics. International Encyclopedia of Statistical Science","author":"Huber","year":"2011"},{"key":"2023012712322036500_btx642-B17","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1016\/S0140-6736(05)17878-7","article-title":"Microarrays and molecular research: noise discovery?","volume":"365","author":"Ioannidis","year":"2005","journal-title":"Lancet"},{"key":"2023012712322036500_btx642-B18","doi-asserted-by":"crossref","first-page":"2492","DOI":"10.1016\/j.spl.2013.07.008","article-title":"Covariance selection by thresholding the sample correlation matrix","volume":"83","author":"Jiang","year":"2013","journal-title":"Stat. Probab. Lett"},{"key":"2023012712322036500_btx642-B19","doi-asserted-by":"crossref","first-page":"1370","DOI":"10.1109\/TKDE.2004.68","article-title":"Cluster analysis for gene expression data: a survey","volume":"16","author":"Jiang","year":"2004","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"2023012712322036500_btx642-B20","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1093\/biomet\/30.1-2.81","article-title":"A new measure of rank correlation","volume":"30","author":"Kendall","year":"1938","journal-title":"Biometrika"},{"key":"2023012712322036500_btx642-B21","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1016\/S0047-259X(03)00096-4","article-title":"A well-conditioned estimator for large-dimensional covariance matrices","volume":"88","author":"Ledoit","year":"2004","journal-title":"J. Multivar. Anal"},{"key":"2023012712322036500_btx642-B22","doi-asserted-by":"crossref","first-page":"S7.","DOI":"10.1186\/1471-2105-7-S1-S7","article-title":"Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context","volume":"7","author":"Margolin","year":"2006","journal-title":"BMC Bioinformatics"},{"year":"2006","author":"Maronna","key":"2023012712322036500_btx642-B23"},{"key":"2023012712322036500_btx642-B24","doi-asserted-by":"crossref","first-page":"630","DOI":"10.1126\/science.306.5696.630","article-title":"Getting the noise out of gene arrays","volume":"306","author":"Marshall","year":"2004","journal-title":"Science"},{"key":"2023012712322036500_btx642-B25","doi-asserted-by":"crossref","first-page":"31","DOI":"10.2469\/faj.v45.n1.31","article-title":"The markowitz optimization enigma: is optimized optimal?","volume":"45","author":"Michaud","year":"1989","journal-title":"Finan. Anal. J"},{"key":"2023012712322036500_btx642-B26","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1007\/978-3-319-22404-6_19","volume-title":"Modern Nonparametric, Robust and Multivariate Methods","author":"\u00d6llerer","year":"2015"},{"key":"2023012712322036500_btx642-B27","doi-asserted-by":"crossref","first-page":"2489","DOI":"10.1016\/j.patrec.2010.08.003","article-title":"Improved direct LDA and its application to DNA microarray gene expression data","volume":"31","author":"Paliwal","year":"2010","journal-title":"Pattern Recogn. Lett"},{"key":"2023012712322036500_btx642-B28","first-page":"332","article-title":"Robust methods of estimation of correlation-coefficient","volume":"48","author":"Pasman","year":"1987","journal-title":"Automation Remote Control"},{"key":"2023012712322036500_btx642-B29","doi-asserted-by":"crossref","DOI":"10.1002\/9781118573617","volume-title":"High-Dimensional Covariance Estimation: With High-Dimensional Data (Wiley Series in Probability and Statistics)","author":"Pourahmadi","year":"2013"},{"key":"2023012712322036500_btx642-B30","doi-asserted-by":"crossref","first-page":"2097","DOI":"10.1093\/bioinformatics\/btg288","article-title":"Kernel hierarchical gene clustering from microarray expression data","volume":"19","author":"Qin","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012712322036500_btx642-B31","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1214\/08-EJS176","article-title":"Sparse permutation invariant covariance estimation","volume":"2","author":"Rothman","year":"2008","journal-title":"Electronic J. Stat"},{"key":"2023012712322036500_btx642-B32","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1198\/jasa.2009.0101","article-title":"Generalized thresholding of large covariance matrices","volume":"104","author":"Rothman","year":"2009","journal-title":"J. Am. Stat. Assoc"},{"key":"2023012712322036500_btx642-B33","first-page":"1","volume-title":"2015 International Joint Conference on Neural Networks (IJCNN)","author":"Serra","year":"2015"},{"key":"2023012712322036500_btx642-B34","first-page":"147","article-title":"Robust estimation of the correlation coefficient: an attempt of survey","volume":"40","author":"Shevlyakov","year":"2011","journal-title":"Austrian J. Stat"},{"key":"2023012712322036500_btx642-B35","doi-asserted-by":"crossref","first-page":"201","DOI":"10.2307\/1412107","article-title":"\u201cgeneral intelligence,\u201d objectively determined and measured","volume":"15","author":"Spearman","year":"1904","journal-title":"Am. J. Psychol"},{"key":"2023012712322036500_btx642-B36","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1016\/j.csda.2015.02.005","article-title":"Robust estimation of precision matrices under cellwise contamination","volume":"93","author":"Tarr","year":"2016","journal-title":"Comput. Stat. Data Anal"},{"key":"2023012712322036500_btx642-B37","article-title":"Coexpnetviz: comparative co-expression networks construction and visualization tool","volume":"6","author":"Tzfadia","year":"2015","journal-title":"Front. Plant Sci"},{"key":"2023012712322036500_btx642-B38","doi-asserted-by":"crossref","first-page":"43.","DOI":"10.1186\/1471-2105-7-43","article-title":"Syntren: a generator of synthetic gene expression data for design and analysis of structure learning algorithms","volume":"7","author":"Van den Bulcke","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012712322036500_btx642-B39","doi-asserted-by":"crossref","first-page":"71.","DOI":"10.1186\/1471-2164-6-71","article-title":"A study of inter-lab and inter-platform agreement of DNA microarray data","volume":"6","author":"Wang","year":"2005","journal-title":"BMC Genomics"},{"key":"2023012712322036500_btx642-B40","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1111\/j.1467-9868.2012.01049.x","article-title":"Condition-number-regularized covariance estimation","volume":"75","author":"Won","year":"2013","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol"},{"key":"2023012712322036500_btx642-B41","doi-asserted-by":"crossref","first-page":"15e","DOI":"10.1093\/nar\/30.4.e15","article-title":"Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation","volume":"30","author":"Yang","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023012712322036500_btx642-B42","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1093\/biomet\/asm018","article-title":"Model selection and estimation in the Gaussian graphical model","volume":"94","author":"Yuan","year":"2007","journal-title":"Biometrika"},{"key":"2023012712322036500_btx642-B43","doi-asserted-by":"crossref","first-page":"1128","DOI":"10.2202\/1544-6115.1128","article-title":"A general framework for weighted gene co-expression network analysis","volume":"4","author":"Zhang","year":"2005","journal-title":"Stat. Appl. Genet. Mol. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/4\/625\/48913355\/bioinformatics_34_4_625.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/4\/625\/48913355\/bioinformatics_34_4_625.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:20:13Z","timestamp":1674825613000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/4\/625\/4470355"}},"subtitle":[],"editor":[{"given":"Oliver","family":"Stegle","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,10,12]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx642","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,2,15]]},"published":{"date-parts":[[2017,10,12]]}}}