{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T13:54:02Z","timestamp":1764251642946,"version":"3.37.3"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2020,7,13]],"date-time":"2020-07-13T00:00:00Z","timestamp":1594598400000},"content-version":"vor","delay-in-days":12,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100007631","name":"CIFAR","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007631","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010785","name":"Canada First Research Excellence Fund","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100010785","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Frederick Banting and Charles Best Canada Graduate Scholarships Doctoral Award"},{"DOI":"10.13039\/501100000024","name":"Canadian Institute for Health Research","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000024","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100000024","name":"CIHR","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000024","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The recent development of sequencing technologies revolutionized our understanding of the inner workings of the cell as well as the way disease is treated. A single RNA sequencing (RNA-Seq) experiment, however, measures tens of thousands of parameters simultaneously. While the results are information rich, data analysis provides a challenge. Dimensionality reduction methods help with this task by extracting patterns from the data by compressing it into compact vector representations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present the factorized embeddings (FE) model, a self-supervised deep learning algorithm that learns simultaneously, by tensor factorization, gene and sample representation spaces. We ran the model on RNA-Seq data from two large-scale cohorts and observed that the sample representation captures information on single gene and global gene expression patterns. Moreover, we found that the gene representation space was organized such that tissue-specific genes, highly correlated genes as well as genes participating in the same GO terms were grouped. Finally, we compared the vector representation of samples learned by the FE model to other similar models on 49 regression tasks. We report that the representations trained with FE rank first or second in all of the tasks, surpassing, sometimes by a considerable margin, other representations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>A toy example in the form of a Jupyter Notebook as well as the code and trained embeddings for this project can be found at: https:\/\/github.com\/TrofimovAssya\/FactorizedEmbeddings.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa488","type":"journal-article","created":{"date-parts":[[2020,5,6]],"date-time":"2020-05-06T19:18:10Z","timestamp":1588792690000},"page":"i417-i426","source":"Crossref","is-referenced-by-count":5,"title":["Factorized embeddings learns rich and biologically meaningful embedding spaces using factorized tensor decomposition"],"prefix":"10.1093","volume":"36","author":[{"given":"Assya","family":"Trofimov","sequence":"first","affiliation":[{"name":"Department of Computer Science, Univerity of Montreal , Qu\u00e9bec, Canada"},{"name":"Institute for Research in Immunology and Cancer, Univerity of Montreal , Qu\u00e9bec, Canada"},{"name":"Univerity of Montreal Mila, , Qu\u00e9bec, Canada"}]},{"given":"Joseph Paul","family":"Cohen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Univerity of Montreal , Qu\u00e9bec, Canada"},{"name":"Univerity of Montreal Mila, , Qu\u00e9bec, Canada"}]},{"given":"Yoshua","family":"Bengio","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Univerity of Montreal , Qu\u00e9bec, Canada"},{"name":"Univerity of Montreal Mila, , Qu\u00e9bec, Canada"}]},{"given":"Claude","family":"Perreault","sequence":"additional","affiliation":[{"name":"Institute for Research in Immunology and Cancer, Univerity of Montreal , Qu\u00e9bec, Canada"},{"name":"Department of Medicine, Univerity of Montreal , Qu\u00e9bec, Canada"}]},{"given":"S\u00e9bastien","family":"Lemieux","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Univerity of Montreal , Qu\u00e9bec, Canada"},{"name":"Institute for Research in Immunology and Cancer, Univerity of Montreal , Qu\u00e9bec, Canada"},{"name":"Department of Biochemistry and Molecular Medicine, Univerity of Montreal , Qu\u00e9bec, Canada"}]}],"member":"286","published-online":{"date-parts":[[2020,7,13]]},"reference":[{"key":"2024021913374429100_btaa488-B1","doi-asserted-by":"crossref","first-page":"e0141287","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PLoS One"},{"key":"2024021913374429100_btaa488-B2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2024021913374429100_btaa488-B3","doi-asserted-by":"publisher","first-page":"e201900336","DOI":"10.26508\/lsa.201900336","article-title":"Targeted variant detection using unaligned RNA-Seq reads","volume":"2","author":"Audemard","year":"2019","journal-title":"Life Science Alliance"},{"year":"2015","author":"Bolotin","key":"2024021913374429100_btaa488-B4"},{"key":"2024021913374429100_btaa488-B5","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2024021913374429100_btaa488-B6","doi-asserted-by":"crossref","first-page":"4164","DOI":"10.1073\/pnas.0308531101","article-title":"Metagenes and molecular pattern discovery using matrix factorization","volume":"101","author":"Brunet","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2024021913374429100_btaa488-B7","doi-asserted-by":"crossref","first-page":"682","DOI":"10.3389\/fgene.2018.00682","article-title":"Embedding of genes using cancer gene expression data: biological relevance and potential application on biomarker discovery","volume":"9","author":"Choy","year":"2019","journal-title":"Front. Genet"},{"key":"2024021913374429100_btaa488-B8","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1186\/s12864-018-5370-x","article-title":"Gene2vec: distributed representation of genes based on co-expression","volume":"20","author":"Du","year":"2019","journal-title":"BMC Genomics"},{"key":"2024021913374429100_btaa488-B9","doi-asserted-by":"crossref","first-page":"1574","DOI":"10.1101\/gr.397002","article-title":"Judging the quality of gene expression-based clustering methods using gene annotation","volume":"12","author":"Gibbons","year":"2002","journal-title":"Genome Res"},{"year":"2020","author":"Goldman","key":"2024021913374429100_btaa488-B10","doi-asserted-by":"publisher","DOI":"10.1038\/s41587-020-0546-8"},{"key":"2024021913374429100_btaa488-B11","first-page":"19","article-title":"Statistical aspects of gene signatures and molecular targets","volume":"3","author":"G\u00f6nen","year":"2009","journal-title":"Gastroint. Cancer Res"},{"key":"2024021913374429100_btaa488-B12","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1080\/00437956.1954.11659520","article-title":"Distributional structure","volume":"10","author":"Harris","year":"1954","journal-title":"WORD"},{"key":"2024021913374429100_btaa488-B13","doi-asserted-by":"crossref","first-page":"djw144","DOI":"10.1093\/jnci\/djw144","article-title":"Genomic analysis of immune cell infiltrates across 11 tumor types","volume":"108","author":"Iglesia","year":"2016","journal-title":"J. Natl. Cancer Inst"},{"key":"2024021913374429100_btaa488-B14","article-title":"Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types","volume":"8","author":"Kim","year":"2018","journal-title":"Sci. Rep"},{"key":"2024021913374429100_btaa488-B15","doi-asserted-by":"crossref","first-page":"e122","DOI":"10.1093\/nar\/gkx338","article-title":"MiSTIC, an integrated platform for the analysis of heterogeneity in large tumour transcriptome datasets","volume":"45","author":"Lemieux","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2024021913374429100_btaa488-B16","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1093\/bib\/bbt002","article-title":"Gene set analysis methods: statistical models and methodological differences","volume":"15","author":"Maciejewski","year":"2014","journal-title":"Brief. Bioinformatics"},{"year":"2018","author":"McInnes","key":"2024021913374429100_btaa488-B17"},{"year":"2013","author":"Mikolov","key":"2024021913374429100_btaa488-B18"},{"key":"2024021913374429100_btaa488-B19","doi-asserted-by":"crossref","first-page":"1482","DOI":"10.1038\/s41587-019-0336-3","article-title":"Visualizing structure and transitions in high-dimensional biological data","volume":"37","author":"Moon","year":"2019","journal-title":"Nat. Biotechnol"},{"volume-title":"Machine Learning: A Probabilistic Perspective","year":"2012","author":"Murphy","key":"2024021913374429100_btaa488-B20"},{"key":"2024021913374429100_btaa488-B21","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1038\/nmeth.3337","article-title":"Robust enumeration of cell subsets from tissue expression profiles","volume":"12","author":"Newman","year":"2015","journal-title":"Nat. Methods"},{"year":"2017","author":"Ng","key":"2024021913374429100_btaa488-B22"},{"key":"2024021913374429100_btaa488-B23","first-page":"8024","volume-title":"Advances in Neural Information Processing Systems","author":"Paszke","year":"2019"},{"key":"2024021913374429100_btaa488-B24","first-page":"1532","article-title":"GloVe: global vectors for word representation","author":"Pennington","year":"2014","journal-title":"Empirical Methods in Natural Language Processing (EMNLP)"},{"key":"2024021913374429100_btaa488-B25","doi-asserted-by":"crossref","first-page":"10546","DOI":"10.1093\/nar\/gky889","article-title":"Multi-omic and multi-view clustering algorithms: review and cancer benchmark","volume":"46","author":"Rappoport","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2024021913374429100_btaa488-B26","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.27041","article-title":"The human cell atlas","volume":"6","author":"Regev","year":"2017","journal-title":"eLife"},{"year":"2000","author":"Roweis","key":"2024021913374429100_btaa488-B27"},{"year":"2020","author":"Schreiber","key":"2024021913374429100_btaa488-B28"},{"key":"2024021913374429100_btaa488-B29","doi-asserted-by":"crossref","first-page":"812","DOI":"10.1016\/j.immuni.2018.03.023","article-title":"The immune landscape of","volume":"48","author":"Thorsson","year":"2018","journal-title":". Immunity"},{"year":"2018","author":"Trofimov","key":"2024021913374429100_btaa488-B30"},{"key":"2024021913374429100_btaa488-B31","first-page":"1","article-title":"Dimensionality reduction: a comparative review","volume":"10","author":"Van Der Maaten","year":"2009","journal-title":"J. Mach. Learn. Res"},{"key":"2024021913374429100_btaa488-B32","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1038\/nbt.3711","article-title":"Revealing the vectors of cellular identity with single-cell genomics","volume":"34","author":"Wagner","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2024021913374429100_btaa488-B33","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1093\/bioinformatics\/bti042","article-title":"Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification","volume":"21","author":"Yanai","year":"2005","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i417\/56702846\/bioinformatics_36_supplement1_i417.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i417\/56702846\/bioinformatics_36_supplement1_i417.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T13:49:09Z","timestamp":1708350549000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/Supplement_1\/i417\/5870511"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,1]]},"references-count":33,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2020,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa488","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2020,7]]},"published":{"date-parts":[[2020,7,1]]}}}