{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T18:25:52Z","timestamp":1772562352848,"version":"3.50.1"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2019,2,15]],"date-time":"2019-02-15T00:00:00Z","timestamp":1550188800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100006112","name":"Microsoft Research","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006112","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,7,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Intra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here we describe \u2018Dhaka\u2019, a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and six single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and\/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>All the datasets used in the paper are publicly available and developed software package and supporting info is available on Github https:\/\/github.com\/MicrosoftGenomics\/Dhaka.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz095","type":"journal-article","created":{"date-parts":[[2019,2,15]],"date-time":"2019-02-15T09:11:39Z","timestamp":1550221899000},"page":"1535-1543","source":"Crossref","is-referenced-by-count":41,"title":["Dhaka: variational autoencoder for unmasking tumor heterogeneity from single cell genomic data"],"prefix":"10.1093","volume":"37","author":[{"given":"Sabrina","family":"Rashid","sequence":"first","affiliation":[{"name":"Computational Biology Department, Carnegie Mellon University , Pittsburgh, PA 15232, USA"}]},{"given":"Sohrab","family":"Shah","sequence":"additional","affiliation":[{"name":"Department of Computer Science"},{"name":"Department of Pathology and Laboratory Medicine, University of British Columbia , Vancouver, BC V6T 1Z4, Canada"},{"name":"Department of Molecular Oncology, BC Cancer Agency , Vancouver, BC V5Z 4E6, Canada"}]},{"given":"Ziv","family":"Bar-Joseph","sequence":"additional","affiliation":[{"name":"Computational Biology Department, Carnegie Mellon University , Pittsburgh, PA 15232, USA"},{"name":"Machine Learning Department, Carnegie Mellon University , Pittsburgh, PA 15232, USA"}]},{"given":"Ravi","family":"Pandya","sequence":"additional","affiliation":[{"name":"Microsoft Research , Redmond, WA 98052, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,2,15]]},"reference":[{"key":"2024041009313984500_btz095-B1","doi-asserted-by":"crossref","first-page":"8865","DOI":"10.1074\/jbc.M113.506790","article-title":"Characterizing WW domain interactions of tumor suppressor WWOX reveals its association with multiprotein networks","volume":"289","author":"Abu-Odeh","year":"2014","journal-title":"J. Biol. Chem"},{"key":"2024041009313984500_btz095-B2","doi-asserted-by":"crossref","first-page":"105.","DOI":"10.1038\/nm.3984","article-title":"Pan-cancer analysis of the extent and consequences of intra-tumor heterogeneity","volume":"22","author":"Andor","year":"2016","journal-title":"Nat. Med"},{"key":"2024041009313984500_btz095-B3","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1002\/pros.20054","article-title":"Translationally controlled tumor protein (TCTP) in the human prostate and prostate cancer cells: expression, distribution, and calcium binding activity","volume":"60","author":"Arcuri","year":"2004","journal-title":"Prostate"},{"key":"2024041009313984500_btz095-B4","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1002\/ijc.24791","article-title":"Extraribosomal function of metallopanstimulin-1: reducing paxillin in head and neck squamous cell carcinoma and inhibiting tumor growth","volume":"126","author":"Dai","year":"2010","journal-title":"Int. J. Cancer"},{"key":"2024041009313984500_btz095-B5","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1126\/science.1253462","article-title":"Spatial and temporal diversity in genomic instability processes defines lung cancer evolution","volume":"346","author":"de Bruin","year":"2014","journal-title":"Science"},{"key":"2024041009313984500_btz095-B6","doi-asserted-by":"crossref","first-page":"315.","DOI":"10.1186\/s12859-016-1176-5","article-title":"FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data","volume":"17","author":"DeTomaso","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2024041009313984500_btz095-B7","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1038\/nmeth.3734","article-title":"Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis","volume":"13","author":"Fan","year":"2016","journal-title":"Nat. Methods"},{"key":"2024041009313984500_btz095-B8","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1006\/bbrc.2001.4912","article-title":"Tumor autocrine motility factor is an angiogenic factor that stimulates endothelial cell motility","volume":"284","author":"Funasaka","year":"2001","journal-title":"Biochem. Biophys. Res. Commun"},{"key":"2024041009313984500_btz095-B9","doi-asserted-by":"crossref","first-page":"175.","DOI":"10.1038\/nrg.2015.16","article-title":"Single-cell genome sequencing: current state of the science","volume":"17","author":"Gawad","year":"2016","journal-title":"Nat. Rev. Genet"},{"key":"2024041009313984500_btz095-B10","doi-asserted-by":"crossref","first-page":"718","DOI":"10.1016\/S0006-291X(03)00028-7","article-title":"Human trophoblast noncoding RNA suppresses CIITA promoter III activity in murine B-lymphocytes","volume":"301","author":"Geirsson","year":"2003","journal-title":"Biochem. Biophys. Res. Commun"},{"key":"2024041009313984500_btz095-B11","doi-asserted-by":"crossref","first-page":"692","DOI":"10.1038\/nm.4336","article-title":"Single-cell transcriptomics uncovers distinct molecular signatures of stem cells in chronic myeloid leukemia","volume":"23","author":"Giustacchini","year":"2017","journal-title":"Nat. Med"},{"key":"2024041009313984500_btz095-B12","first-page":"1328","author":"Gupta","year":"2015"},{"key":"2024041009313984500_btz095-B13","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1126\/science.1127647","article-title":"Reducing the dimensionality of data with neural networks","volume":"313","author":"Hinton","year":"2006","journal-title":"Science"},{"key":"2024041009313984500_btz095-B14","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1016\/j.ajhg.2012.07.014","article-title":"The TRK-fused gene is mutated in hereditary motor and sensory neuropathy with proximal dominant involvement","volume":"91","author":"Ishiura","year":"2012","journal-title":"Am. J. Hum. Genet"},{"key":"2024041009313984500_btz095-B15","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/978-1-4757-1904-8_7","volume-title":"Principal Component Analysis","author":"Jolliffe","year":"1986"},{"key":"2024041009313984500_btz095-B16","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1007\/978-3-642-04898-2_327","volume-title":"International Encyclopedia of Statistical Science","author":"Joyce","year":"2011"},{"key":"2024041009313984500_btz095-B17","doi-asserted-by":"crossref","first-page":"1089","DOI":"10.1016\/j.jprot.2011.10.005","article-title":"Clinical proteomics identified ATP-dependent RNA helicase DDX39 as a novel biomarker to predict poor prognosis of patients with gastrointestinal stromal tumor","volume":"75","author":"Kikuta","year":"2012","journal-title":"J. Proteomics"},{"key":"2024041009313984500_btz095-B18","first-page":"6114","article-title":"Auto-encoding variational Bayes","volume":"1312","author":"Kingma","year":"2013","journal-title":"arXivv"},{"key":"2024041009313984500_btz095-B19","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2024041009313984500_btz095-B20","first-page":"556","author":"Lee","year":"2001"},{"key":"2024041009313984500_btz095-B21","doi-asserted-by":"crossref","first-page":"e166.","DOI":"10.1093\/nar\/gkx750","article-title":"Network embedding-based representation learning for single cell RNA-seq data","volume":"45","author":"Li","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2024041009313984500_btz095-B22","doi-asserted-by":"crossref","first-page":"e156","DOI":"10.1093\/nar\/gkx681","article-title":"Using neural networks for reducing the dimensions of single-cell RNA-seq data","volume":"45","author":"Lin","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2024041009313984500_btz095-B23","first-page":"05086.","article-title":"A deep generative model for single-cell RNA sequencing with application to detecting differentially expressed genes","author":"Lopez","year":"2017","journal-title":"arXiv"},{"key":"2024041009313984500_btz095-B24","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"2024041009313984500_btz095-B25","doi-asserted-by":"crossref","first-page":"e0135817.","DOI":"10.1371\/journal.pone.0135817","article-title":"Identification of Distinct Tumor Subpopulations in Lung Adenocarcinoma via Single-Cell RNA-seq","volume":"10","author":"Min","year":"2015","journal-title":"PLoS One"},{"key":"2024041009313984500_btz095-B26","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1016\/j.molonc.2010.04.010","article-title":"Tracing the tumor lineage","volume":"4","author":"Navin","year":"2010","journal-title":"Mol. Oncol"},{"key":"2024041009313984500_btz095-B27","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1038\/gene.2009.111","article-title":"RGMA and IL21R show association with experimental inflammation and multiple sclerosis","volume":"11","author":"Nohra","year":"2010","journal-title":"Genes Immun"},{"key":"2024041009313984500_btz095-B28","doi-asserted-by":"crossref","first-page":"1396","DOI":"10.1126\/science.1254257","article-title":"Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma","volume":"344","author":"Patel","year":"2014","journal-title":"Science"},{"key":"2024041009313984500_btz095-B29","doi-asserted-by":"crossref","first-page":"241.","DOI":"10.1186\/s13059-015-0805-z","article-title":"ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis","volume":"16","author":"Pierson","year":"2015","journal-title":"Genome Biol"},{"key":"2024041009313984500_btz095-B30","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1016\/S0925-4439(96)00044-0","article-title":"Significance of two point mutations present in each HEXB allele of patients with adult GM2 gangliosidosis (sandhoff disease) homozygosity for the Ile207 Val substitution is not associated with a clinical or biochemical phenotype","volume":"1317","author":"Redonnet-Vernhet","year":"1996","journal-title":"Biochim. Biophys. Acta"},{"key":"2024041009313984500_btz095-B31","doi-asserted-by":"crossref","first-page":"2323","DOI":"10.1126\/science.290.5500.2323","article-title":"Nonlinear dimensionality reduction by locally linear embedding","volume":"290","author":"Roweis","year":"2000","journal-title":"Science"},{"key":"2024041009313984500_btz095-B32","doi-asserted-by":"crossref","first-page":"3810.","DOI":"10.1172\/JCI57088","article-title":"Insight into the heterogeneity of breast cancer through next-generation sequencing","volume":"121","author":"Russnes","year":"2011","journal-title":"J. Clin. Invest"},{"key":"2024041009313984500_btz095-B33","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1038\/nbt.3569","article-title":"Wishbone identifies bifurcating developmental trajectories from single-cell data","volume":"34","author":"Setty","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2024041009313984500_btz095-B34","first-page":"26","volume-title":"COURSERA: Neural Networks for Machine Learning","author":"Tieleman","year":"2012"},{"key":"2024041009313984500_btz095-B35","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1126\/science.aad0501","article-title":"Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq","volume":"352","author":"Tirosh","year":"2016","journal-title":"Science"},{"key":"2024041009313984500_btz095-B36","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1038\/nature20123","article-title":"Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma","volume":"539","author":"Tirosh","year":"2016","journal-title":"Nature"},{"key":"2024041009313984500_btz095-B37","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1038\/nbt.2859","article-title":"The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells","volume":"32","author":"Trapnell","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2024041009313984500_btz095-B38","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.oraloncology.2008.05.003","article-title":"Gene polymorphisms related to angiogenesis, inflammation and thrombosis that influence risk for oral cancer","volume":"45","author":"Vairaktaris","year":"2009","journal-title":"Oral Oncol"},{"key":"2024041009313984500_btz095-B39","first-page":"111591","article-title":"MAGIC: a diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data","author":"van Dijk","year":"2017","journal-title":"BioRxiv"},{"key":"2024041009313984500_btz095-B40","doi-asserted-by":"crossref","first-page":"eaai8478.","DOI":"10.1126\/science.aai8478","article-title":"Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq","volume":"355","author":"Venteicher","year":"2017","journal-title":"Science"},{"key":"2024041009313984500_btz095-B41","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat. Methods"},{"key":"2024041009313984500_btz095-B42","doi-asserted-by":"crossref","first-page":"1665","DOI":"10.1101\/gr.6861907","article-title":"PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data","volume":"17","author":"Wang","year":"2007","journal-title":"Genome Res"},{"key":"2024041009313984500_btz095-B43","doi-asserted-by":"crossref","first-page":"1974","DOI":"10.1093\/bioinformatics\/btv088","article-title":"Identification of cell types from single-cell transcriptomes using a novel clustering method","volume":"31","author":"Xu","year":"2015","journal-title":"Bioinformatics"},{"key":"2024041009313984500_btz095-B44","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1038\/nmeth.4140","article-title":"Scalable whole-genome single-cell library preparation without preamplification","volume":"14","author":"Zahn","year":"2017","journal-title":"Nat. Methods"},{"key":"2024041009313984500_btz095-B45","volume-title":"Encyclopedia of Biostatistics","author":"Zar","year":"1998"},{"key":"2024041009313984500_btz095-B46","doi-asserted-by":"crossref","first-page":"1622","DOI":"10.1126\/science.1229164","article-title":"Genome-wide detection of single-nucleotide and copy-number variations of a single human cell","volume":"338","author":"Zong","year":"2012","journal-title":"Science"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/11\/1535\/57196125\/btz095.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/11\/1535\/57196125\/btz095.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T05:39:26Z","timestamp":1712727566000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/11\/1535\/5320558"}},"subtitle":[],"editor":[{"given":"Kelso","family":"Janet","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,2,15]]},"references-count":46,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2021,7,12]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz095","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/183863","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,6,1]]},"published":{"date-parts":[[2019,2,15]]}}}