{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T18:42:34Z","timestamp":1771353754603,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2021,11,26]],"date-time":"2021-11-26T00:00:00Z","timestamp":1637884800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,27]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>It is more and more common to perform multi-omics analyses to explore the genome at diverse levels and not only at a single level. Through integrative statistical methods, multi-omics data have the power to reveal new biological processes, potential biomarkers and subgroups in a cohort. Matrix factorization (MF) is an unsupervised statistical method that allows a clustering of individuals, but also reveals relevant omics variables from the various blocks.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Here, we present PIntMF (Penalized Integrative Matrix Factorization), an MF model with sparsity, positivity and equality constraints. To induce sparsity in the model, we used a classical Lasso penalization on variable and individual matrices. For the matrix of samples, sparsity helps in the clustering, while normalization (matching an equality constraint) of inferred coefficients is added to improve interpretation. Moreover, we added an automatic tuning of the sparsity parameters using the famous glmnet package. We also proposed three criteria to help the user to choose the number of latent variables. PIntMF was compared with other state-of-the-art integrative methods including feature selection techniques in both synthetic and real data. PIntMF succeeds in finding relevant clusters as well as variables in two types of simulated data (correlated and uncorrelated). Next, PIntMF was applied to two real datasets (Diet and cancer), and it revealed interpretable clusters linked to available clinical data. Our method outperforms the existing ones on two criteria (clustering and variable selection). We show that PIntMF is an easy, fast and powerful tool to extract patterns and cluster samples from multi-omics data.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>An R package is available at https:\/\/github.com\/mpierrejean\/pintmf.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab786","type":"journal-article","created":{"date-parts":[[2021,11,20]],"date-time":"2021-11-20T04:54:11Z","timestamp":1637384051000},"page":"900-907","source":"Crossref","is-referenced-by-count":15,"title":["PIntMF: Penalized Integrative Matrix Factorization method for multi-omics data"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9133-780X","authenticated-orcid":false,"given":"Morgane","family":"Pierre-Jean","sequence":"first","affiliation":[{"name":"Centre National de Recherche en G\u00e9nomique Humaine, CEA, Universit\u00e9 de Paris-Saclay , Evry, France"}]},{"given":"Florence","family":"Mauger","sequence":"additional","affiliation":[{"name":"Centre National de Recherche en G\u00e9nomique Humaine, CEA, Universit\u00e9 de Paris-Saclay , Evry, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5358-4463","authenticated-orcid":false,"given":"Jean-Fran\u00e7ois","family":"Deleuze","sequence":"additional","affiliation":[{"name":"Centre National de Recherche en G\u00e9nomique Humaine, CEA, Universit\u00e9 de Paris-Saclay , Evry, France"}]},{"given":"Edith","family":"Le Floch","sequence":"additional","affiliation":[{"name":"Centre National de Recherche en G\u00e9nomique Humaine, CEA, Universit\u00e9 de Paris-Saclay , Evry, France"}]}],"member":"286","published-online":{"date-parts":[[2021,11,26]]},"reference":[{"key":"2023020108520052800_btab786-B1","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s12859-015-0857-9","article-title":"Methods for the integration of multi-omics data: mathematical aspects","volume":"17","author":"Bersanelli","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023020108520052800_btab786-B2","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1016\/j.tibtech.2016.04.004","article-title":"Multi-omics of single cells: strategies and applications","volume":"34","author":"Bock","year":"2016","journal-title":"Trends Biotechnol"},{"key":"2023020108520052800_btab786-B3","doi-asserted-by":"crossref","first-page":"4164","DOI":"10.1073\/pnas.0308531101","article-title":"Metagenes and molecular pattern discovery using matrix factorization","volume":"101","author":"Brunet","year":"2004","journal-title":"Proc. Nat. Acad. Sci. USA"},{"key":"2023020108520052800_btab786-B4","doi-asserted-by":"crossref","first-page":"1688","DOI":"10.1158\/1078-0432.CCR-14-0432","article-title":"Comprehensive genomic analysis identifies novel subtypes and targets of triple-negative breast cancer","volume":"21","author":"Burstein","year":"2015","journal-title":"Clin. Cancer Res"},{"key":"2023020108520052800_btab786-B5","first-page":"124","article-title":"Benchmarking joint multi-omics dimensionality reduction approaches for cancer study","volume":"2","author":"Cantini","year":"2020","journal-title":"Nat. Commun"},{"key":"2023020108520052800_btab786-B6","doi-asserted-by":"crossref","first-page":"e0176278","DOI":"10.1371\/journal.pone.0176278","article-title":"Integrative clustering of multi-level omic data based on non-negative matrix factorization algorithm","volume":"12","author":"Chalise","year":"2017","journal-title":"PLoS One"},{"key":"2023020108520052800_btab786-B7","first-page":"202","article-title":"Integrative clustering methods for high-dimensional molecular data","volume":"3","author":"Chalise","year":"2014","journal-title":"Transl. Cancer Res"},{"key":"2023020108520052800_btab786-B8","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1093\/bib\/bbz015","article-title":"Evaluation of integrative clustering methods for the analysis of multi-omics data","volume":"21","author":"Chauvel","year":"2020","journal-title":"Brief. Bioinf"},{"key":"2023020108520052800_btab786-B9","doi-asserted-by":"crossref","first-page":"5967","DOI":"10.1093\/nar\/gky440","article-title":"Discovery of two-level modular organization from matched genomic data via joint matrix tri-factorization","volume":"46","author":"Chen","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023020108520052800_btab786-B10","doi-asserted-by":"crossref","first-page":"giz045","DOI":"10.1093\/gigascience\/giz045","article-title":"A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification","volume":"8","author":"Chung","year":"2019","journal-title":"GigaScience"},{"key":"2023020108520052800_btab786-B11","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1186\/1471-2105-11-367","article-title":"A flexible r package for nonnegative matrix factorization","volume":"11","author":"Gaujoux","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020108520052800_btab786-B12","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1142\/S0218339009002831","article-title":"Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis","volume":"17","author":"Gonz\u00e1lez","year":"2009","journal-title":"J. Biol. Syst"},{"key":"2023020108520052800_btab786-B13","doi-asserted-by":"crossref","first-page":"84","DOI":"10.3389\/fgene.2017.00084","article-title":"More is better: recent progress in multi-omics data integration methods","volume":"8","author":"Huang","year":"2017","journal-title":"Front. Genet"},{"key":"2023020108520052800_btab786-B14","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/j.foodqual.2013.04.013","article-title":"Handling missing values in multiple factor analysis","volume":"30","author":"Husson","year":"2013","journal-title":"Food Qual. Preference"},{"key":"2023020108520052800_btab786-B15","doi-asserted-by":"crossref","first-page":"e0246159","DOI":"10.1371\/journal.pone.0246159","article-title":"Hdsi: high dimensional selection with interactions algorithm on feature selection and testing","volume":"16","author":"Jain","year":"2021","journal-title":"PLoS One"},{"key":"2023020108520052800_btab786-B16","first-page":"1","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Jerome","year":"2010","journal-title":"J. Stat. Softw"},{"key":"2023020108520052800_btab786-B17","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1038\/44565","article-title":"Learning the parts of objects by non-negative matrix factorization","volume":"401","author":"Lee","year":"1999","journal-title":"Nature"},{"key":"2023020108520052800_btab786-B18","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1021\/acs.jproteome.5b00824","article-title":"mocluster: identifying joint patterns across multiple omics data sets","volume":"15","author":"Meng","year":"2016","journal-title":"J. Proteome Res"},{"key":"2023020108520052800_btab786-B19","author":"Mo","year":"2018"},{"key":"2023020108520052800_btab786-B20","doi-asserted-by":"crossref","first-page":"4245","DOI":"10.1073\/pnas.1208949110","article-title":"Pattern discovery and cancer gene identification in integrated cancer genomic data","volume":"110","author":"Mo","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108520052800_btab786-B21","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","author":"Network","year":"2012","journal-title":"Nature"},{"key":"2023020108520052800_btab786-B22","doi-asserted-by":"crossref","first-page":"776","DOI":"10.1093\/biostatistics\/kxr012","article-title":"A fused lasso latent feature model for analyzing multi-sample ACGH data","volume":"12","author":"Nowak","year":"2011","journal-title":"Biostatistics"},{"key":"2023020108520052800_btab786-B23","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1093\/bib\/bbz138","article-title":"Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration","volume":"21","author":"Pierre-Jean","year":"2020","journal-title":"Brief. Bioinf"},{"key":"2023020108520052800_btab786-B24","doi-asserted-by":"crossref","first-page":"4453","DOI":"10.1038\/s41467-018-06921-8","article-title":"Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival","volume":"9","author":"Ramazzotti","year":"2018","journal-title":"Nat. Commun"},{"key":"2023020108520052800_btab786-B25","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020108520052800_btab786-B26","doi-asserted-by":"crossref","first-page":"2845","DOI":"10.1182\/bloodadvances.2019000192","article-title":"DNA methylation identifies genetically and prognostically distinct subtypes of myelodysplastic syndromes","volume":"3","author":"Reilly","year":"2019","journal-title":"Blood Adv"},{"key":"2023020108520052800_btab786-B27","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1038\/nrg3868","article-title":"Methods of integrating data to uncover genotype\u2013phenotype interactions","volume":"16","author":"Ritchie","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2023020108520052800_btab786-B28","doi-asserted-by":"crossref","first-page":"4616","DOI":"10.1093\/bioinformatics\/btaa530","article-title":"Integrating multi-omics data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study","volume":"36","author":"Rodosthenous","year":"2020","journal-title":"Bioinformatics"},{"key":"2023020108520052800_btab786-B29","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1152\/physiolgenomics.00024.2014","article-title":"Multi-omic integrated networks connect DNA methylation and miRNA with skeletal muscle plasticity to chronic exercise in type 2 diabetic obesity","volume":"46","author":"Rowlands","year":"2014","journal-title":"Physiol. Genomics"},{"key":"2023020108520052800_btab786-B30","author":"Sastry","year":"2020"},{"key":"2023020108520052800_btab786-B31","doi-asserted-by":"crossref","first-page":"2906","DOI":"10.1093\/bioinformatics\/btp543","article-title":"Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis","volume":"25","author":"Shen","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020108520052800_btab786-B32","doi-asserted-by":"crossref","first-page":"e35236","DOI":"10.1371\/journal.pone.0035236","article-title":"Integrative subtype discovery in glioblastoma using icluster","volume":"7","author":"Shen","year":"2012","journal-title":"PLoS One"},{"key":"2023020108520052800_btab786-B33","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1080\/10618600.2012.681250","article-title":"A sparse-group lasso","volume":"22","author":"Simon","year":"2013","journal-title":"J. Comput. Graph. Stat"},{"key":"2023020108520052800_btab786-B34","author":"Sneath","year":"1973"},{"key":"2023020108520052800_btab786-B35","doi-asserted-by":"crossref","first-page":"33","DOI":"10.2307\/1217208","article-title":"The comparison of dendrograms by objective methods","volume":"11","author":"Sokal","year":"1962","journal-title":"Taxon"},{"key":"2023020108520052800_btab786-B36","doi-asserted-by":"crossref","first-page":"570255","DOI":"10.3389\/fgene.2020.570255","article-title":"A review of integrative imputation for multi-omics datasets","volume":"11","author":"Song","year":"2020","journal-title":"Front. Genet"},{"key":"2023020108520052800_btab786-B37","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/s11336-011-9206-8","article-title":"Regularized generalized canonical correlation analysis","volume":"76","author":"Tenenhaus","year":"2011","journal-title":"Psychometrika"},{"key":"2023020108520052800_btab786-B38","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1093\/biostatistics\/kxu001","article-title":"Variable selection for generalized canonical correlation analysis","volume":"15","author":"Tenenhaus","year":"2014","journal-title":"Biostatistics"},{"key":"2023020108520052800_btab786-B39","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020108520052800_btab786-B40","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1093\/bib\/bbx167","article-title":"Multi-omics integration - a comparison of unsupervised clustering methodologies","volume":"20","author":"Tini","year":"2019","journal-title":"Brief. Bioinf"},{"key":"2023020108520052800_btab786-B41","doi-asserted-by":"crossref","first-page":"D956","DOI":"10.1093\/nar\/gkx1090","article-title":"Linkedomics: analyzing multi-omics data within and across 32 cancer types","volume":"46","author":"Vasaikar","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023020108520052800_btab786-B42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-016-1273-5","article-title":"Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework","volume":"17","author":"Voillet","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023020108520052800_btab786-B43","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nmeth.2810","article-title":"Similarity network fusion for aggregating data types on a genomic scale","volume":"11","author":"Wang","year":"2014","journal-title":"Nat. Methods"},{"key":"2023020108520052800_btab786-B44","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1080\/01621459.1963.10500845","article-title":"Hierarchical grouping to optimize an objective function","volume":"58","author":"Ward Jr","year":"1963","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020108520052800_btab786-B45","doi-asserted-by":"crossref","first-page":"aad0189","DOI":"10.1126\/science.aad0189","article-title":"Systems proteomics of liver mitochondria function","volume":"352","author":"Williams","year":"2016","journal-title":"Science"},{"key":"2023020108520052800_btab786-B46","doi-asserted-by":"crossref","first-page":"718","DOI":"10.1016\/j.molmed.2020.04.006","article-title":"Multitissue multiomics systems biology to dissect complex diseases","volume":"26","author":"Yang","year":"2020","journal-title":"Trends Mol. Med"},{"key":"2023020108520052800_btab786-B47","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1016\/j.tibtech.2015.12.013","article-title":"Trans-omics: how to reconstruct biochemical networks across multiple omic layers","volume":"34","author":"Yugi","year":"2016","journal-title":"Trends Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab786\/41478294\/btab786.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/4\/900\/49008210\/btab786.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/4\/900\/49008210\/btab786.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,12]],"date-time":"2024-09-12T18:41:25Z","timestamp":1726166485000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/4\/900\/6443074"}},"subtitle":[],"editor":[{"given":"Tobias","family":"Marschall","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,11,26]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,1,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab786","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,2,15]]},"published":{"date-parts":[[2021,11,26]]}}}