{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T00:31:04Z","timestamp":1774571464147,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T00:00:00Z","timestamp":1570752000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Cancer subtype classification has the potential to significantly improve disease prognosis and develop individualized patient management. Existing methods are limited by their ability to handle extremely high-dimensional data and by the influence of misleading, irrelevant factors, resulting in ambiguous and overlapping subtypes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To address the above issues, we proposed a novel approach to disentangling and eliminating irrelevant factors by leveraging the power of deep learning. Specifically, we designed a deep-learning framework, referred to as DeepType, that performs joint supervised classification, unsupervised clustering and dimensionality reduction to learn cancer-relevant data representation with cluster structure. We applied DeepType to the METABRIC breast cancer dataset and compared its performance to state-of-the-art methods. DeepType significantly outperformed the existing methods, identifying more robust subtypes while using fewer genes. The new approach provides a framework for the derivation of more accurate and robust molecular cancer subtypes by using increasingly complex, multi-source data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>An open-source software package for the proposed method is freely available at http:\/\/www.acsu.buffalo.edu\/~yijunsun\/lab\/DeepType.html.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz769","type":"journal-article","created":{"date-parts":[[2019,10,8]],"date-time":"2019-10-08T15:33:06Z","timestamp":1570548786000},"page":"1476-1483","source":"Crossref","is-referenced-by-count":124,"title":["Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data"],"prefix":"10.1093","volume":"36","author":[{"given":"Runpu","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, University at Buffalo, The State University of New York , Buffalo, NY 14214, USA"}]},{"given":"Le","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University at Buffalo, The State University of New York , Buffalo, NY 14214, USA"}]},{"given":"Steve","family":"Goodison","sequence":"additional","affiliation":[{"name":"Department of Health Sciences Research , Mayo Clinic, Jacksonville, FL 32224, USA"}]},{"given":"Yijun","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University at Buffalo, The State University of New York , Buffalo, NY 14214, USA"},{"name":"Department of Microbiology and Immunology"},{"name":"Department of Biostatistics, University at Buffalo, The State University of New York , Buffalo, NY 14214, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,10,11]]},"reference":[{"key":"2023060910385003000_btz769-B1","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1016\/j.cell.2015.10.025","article-title":"The molecular taxonomy of primary prostate cancer","volume":"163","author":"Abeshouse","year":"2015","journal-title":"Cell"},{"key":"2023060910385003000_btz769-B2","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","year":"2012","journal-title":"Nature"},{"key":"2023060910385003000_btz769-B3","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1038\/nature10983","article-title":"The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups","volume":"486","author":"Curtis","year":"2012","journal-title":"Nature"},{"key":"2023060910385003000_btz769-B4","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","article-title":"A cluster separation measure","volume":"1","author":"Davies","year":"1979","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023060910385003000_btz769-B5","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1093\/jnci\/djr545","article-title":"A three-gene model to robustly identify breast cancer molecular subtypes","volume":"104","author":"Haibe-Kains","year":"2012","journal-title":"J. Natl. Cancer Inst"},{"key":"2023060910385003000_btz769-B6","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1023\/A:1012801612483","article-title":"On clustering validation techniques","volume":"17","author":"Halkidi","year":"2001","journal-title":"J. Intell. Inform. Syst"},{"key":"2023060910385003000_btz769-B7","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1016\/j.cell.2011.02.013","article-title":"Hallmarks of cancer: the next generation","volume":"144","author":"Hanahan","year":"2011","journal-title":"Cell"},{"key":"2023060910385003000_btz769-B8","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning","author":"Hastie","year":"2009"},{"key":"2023060910385003000_btz769-B9","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1038\/bjc.1982.62","article-title":"A prognostic index in primary breast cancer","volume":"45","author":"Haybittle","year":"1982","journal-title":"Br. J. Cancer"},{"key":"2023060910385003000_btz769-B10","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2023060910385003000_btz769-B11","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1093\/biostatistics\/kxj029","article-title":"Are clusters found in one dataset present in another dataset?","volume":"8","author":"Kapp","year":"2007","journal-title":"Biostatistics"},{"key":"2023060910385003000_btz769-B12","first-page":"1","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2014","journal-title":"International Conference on Learning Representations, San Diego, USA"},{"key":"2023060910385003000_btz769-B13","doi-asserted-by":"crossref","first-page":"1327","DOI":"10.1214\/11-AOAS533","article-title":"Integrative model-based clustering of microarray methylation and expression data","volume":"6","author":"Kormaksson","year":"2012","journal-title":"Ann. Appl. Statist"},{"key":"2023060910385003000_btz769-B14","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2023060910385003000_btz769-B15","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Trans. Inform. Theory"},{"key":"2023060910385003000_btz769-B16","doi-asserted-by":"crossref","first-page":"662","DOI":"10.1093\/jnci\/djr071","article-title":"Microarray-based class discovery for molecular classification of breast cancer: analysis of interobserver agreement","volume":"103","author":"Mackay","year":"2011","journal-title":"J. Natl. Cancer Inst"},{"key":"2023060910385003000_btz769-B17","first-page":"1813","article-title":"Efficient and robust feature selection via joint \u21132,1-norms minimization","author":"Nie","year":"2010","journal-title":"Advances in Neural Information Processing Systems, Vancouver, Canada,"},{"key":"2023060910385003000_btz769-B18","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1200\/JCO.2008.18.1370","article-title":"Supervised risk predictor of breast cancer based on intrinsic subtypes","volume":"27","author":"Parker","year":"2009","journal-title":"J. Clin. Oncol"},{"key":"2023060910385003000_btz769-B19","doi-asserted-by":"crossref","first-page":"2906","DOI":"10.1093\/bioinformatics\/btp543","article-title":"Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis","volume":"25","author":"Shen","year":"2009","journal-title":"Bioinformatics"},{"key":"2023060910385003000_btz769-B20","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1214\/12-AOAS578","article-title":"Sparse integrative clustering of multiple omics data sets","volume":"7","author":"Shen","year":"2013","journal-title":"Ann. Appl. Statist"},{"key":"2023060910385003000_btz769-B21","doi-asserted-by":"crossref","first-page":"10869","DOI":"10.1073\/pnas.191367098","article-title":"Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications","volume":"98","author":"S\u00f8rlie","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023060910385003000_btz769-B22","doi-asserted-by":"crossref","first-page":"8418","DOI":"10.1073\/pnas.0932692100","article-title":"Repeated observation of breast tumor subtypes in independent gene expression data sets","volume":"100","author":"S\u00f8rlie","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023060910385003000_btz769-B23","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1093\/jnci\/djj052","article-title":"Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis","volume":"98","author":"Sotiriou","year":"2006","journal-title":"J. Natl. Cancer Inst"},{"key":"2023060910385003000_btz769-B24","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1056\/NEJMoa1804710","article-title":"Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer","volume":"379","author":"Sparano","year":"2018","journal-title":"N. Engl. J. Med"},{"key":"2023060910385003000_btz769-B25","doi-asserted-by":"crossref","first-page":"440.","DOI":"10.1186\/s13059-014-0440-0","article-title":"Cancer progression modeling using static sample data","volume":"15","author":"Sun","year":"2014","journal-title":"Genome Biol"},{"key":"2023060910385003000_btz769-B26","first-page":"e69","article-title":"Computational approach for deriving cancer progression roadmaps from static sample data","volume":"45","author":"Sun","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023060910385003000_btz769-B27","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. R. Statist. Soc. Ser. B Statist. Methodol"},{"key":"2023060910385003000_btz769-B28","doi-asserted-by":"crossref","first-page":"6567","DOI":"10.1073\/pnas.082099299","article-title":"Diagnosis of multiple cancer types by shrunken centroids of gene expression","volume":"99","author":"Tibshirani","year":"2002","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023060910385003000_btz769-B29","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"2023060910385003000_btz769-B30","first-page":"1083","article-title":"On deep multi-view representation learning","author":"Wang","year":"2015","journal-title":"International Conference on Machine Learning, Lille, France,"},{"key":"2023060910385003000_btz769-B31","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1016\/S1470-2045(10)70008-5","article-title":"Breast cancer molecular profiling with single sample predictors: a retrospective analysis","volume":"11","author":"Weigelt","year":"2010","journal-title":"Lancet Oncol"},{"key":"2023060910385003000_btz769-B32","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1198\/jasa.2010.tm09415","article-title":"A framework for feature selection in clustering","volume":"105","author":"Witten","year":"2010","journal-title":"J. Am. Statist. Assoc"},{"key":"2023060910385003000_btz769-B33","doi-asserted-by":"crossref","first-page":"1033","DOI":"10.1038\/nmeth.3583","article-title":"Comparing the performance of biomedical clustering methods","volume":"12","author":"Wiwie","year":"2015","journal-title":"Nat. Methods"},{"key":"2023060910385003000_btz769-B34","first-page":"478","article-title":"Unsupervised deep embedding for clustering analysis","author":"Xie","year":"2016","journal-title":"International Conference on Machine Learning, New York, USA,"},{"key":"2023060910385003000_btz769-B35","doi-asserted-by":"crossref","first-page":"9379","DOI":"10.1093\/nar\/gks725","article-title":"Discovery of multi-dimensional modules by integrative analysis of cancer genomic data","volume":"40","author":"Zhang","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023060910385003000_btz769-B36","doi-asserted-by":"crossref","first-page":"1820","DOI":"10.1093\/bioinformatics\/bty887","article-title":"SENSE: Siamese neural network for sequence embedding and alignment-free comparison","volume":"35","author":"Zheng","year":"2019","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz769\/30296242\/btz769.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/5\/1476\/50553154\/btz769.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/5\/1476\/50553154\/btz769.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T06:40:11Z","timestamp":1686292811000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/5\/1476\/5585742"}},"subtitle":[],"editor":[{"given":"Janet","family":"Kelso","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,10,11]]},"references-count":36,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz769","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/629865","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,3]]},"published":{"date-parts":[[2019,10,11]]}}}