{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T03:52:41Z","timestamp":1768794761349,"version":"3.49.0"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,5,5]],"date-time":"2021-05-05T00:00:00Z","timestamp":1620172800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2018R1C1B6005304"],"award-info":[{"award-number":["NRF-2018R1C1B6005304"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2021R1C1C1008307"],"award-info":[{"award-number":["NRF-2021R1C1C1008307"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Bulk tumor samples used for high-throughput molecular profiling are often an admixture of cancer cells and non-cancerous cells, which include immune and stromal cells. The mixed composition can confound the analysis and affect the biological interpretation of the results, and thus, accurate prediction of tumor purity is critical. Although several methods have been proposed to predict tumor purity using high-throughput molecular data, there has been no comprehensive study on machine learning-based methods for the estimation of tumor purity.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We applied various machine learning models to estimate tumor purity. Overall, the models predicted the tumor purity accurately and showed a high correlation with well-established gold standard methods. In addition, we identified a small group of genes and demonstrated that they could predict tumor purity well. Finally, we confirmed that these genes were mainly involved in the immune system.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability<\/jats:title><jats:p>The machine learning models constructed for this study are available at\u2009https:\/\/github.com\/BonilKoo\/ML_purity.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bib\/bbab163","type":"journal-article","created":{"date-parts":[[2021,4,8]],"date-time":"2021-04-08T11:11:07Z","timestamp":1617880267000},"source":"Crossref","is-referenced-by-count":7,"title":["Prediction of tumor purity from gene expression data using machine learning"],"prefix":"10.1093","volume":"22","author":[{"given":"Bonil","family":"Koo","sequence":"first","affiliation":[{"name":"School of Systems Biomedical Science, Soongsil University, Seoul, Korea"},{"name":"Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Je-Keun","family":"Rhee","sequence":"additional","affiliation":[{"name":"School of Systems Biomedical Science, Soongsil University, Seoul, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2021,5,5]]},"reference":[{"issue":"3","key":"2021110814400348400_ref1","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1038\/nrg3833","article-title":"Computational and analytical challenges in single-cell transcriptomics","volume":"16","author":"Stegle","year":"2015","journal-title":"Nat Rev Genet"},{"issue":"1","key":"2021110814400348400_ref2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/ncomms9971","article-title":"Systematic pan-cancer analysis of tumour purity","volume":"6","author":"Aran","year":"2015","journal-title":"Nat Commun"},{"issue":"1","key":"2021110814400348400_ref3","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1158\/2326-6066.CIR-17-0201","article-title":"Impact of tumor purity on immune gene expression and clustering analyses across multiple cancer types","volume":"6","author":"Rhee","year":"2018","journal-title":"Cancer Immunol Res"},{"issue":"4","key":"2021110814400348400_ref4","doi-asserted-by":"crossref","first-page":"964","DOI":"10.1016\/j.cell.2019.10.007","article-title":"Integrated proteogenomic characterization of clear cell renal cell carcinoma","volume":"179","author":"Clark","year":"2019","journal-title":"Cell"},{"key":"2021110814400348400_ref5","doi-asserted-by":"crossref","first-page":"321","DOI":"10.3389\/fgene.2018.00321","article-title":"Dectp: Calling differential gene expression between cancer and normal samples by integrating tumor purity information","volume":"9","author":"Zhang","year":"2018","journal-title":"Front Genet"},{"key":"2021110814400348400_ref6","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1200\/PO.20.00016","article-title":"Systematic assessment of tumor purity and its clinical implications","volume":"4","author":"Haider","year":"2020","journal-title":"JCO Precis Oncol"},{"issue":"39","key":"2021110814400348400_ref7","doi-asserted-by":"crossref","first-page":"16910","DOI":"10.1073\/pnas.1009843107","article-title":"Allele-specific copy number analysis of tumors","volume":"107","author":"Van Loo","year":"2010","journal-title":"Proc Natl Acad Sci"},{"issue":"1","key":"2021110814400348400_ref8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/ncomms3612","article-title":"Inferring tumour purity and stromal and immune cell admixture from expression data","volume":"4","author":"Yoshihara","year":"2013","journal-title":"Nat Commun"},{"issue":"1","key":"2021110814400348400_ref9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-019-6412-8","article-title":"Putative biomarkers for predicting tumor sample purity based on gene expression data","volume":"20","author":"Li","year":"2019","journal-title":"BMC Genomics"},{"issue":"1","key":"2021110814400348400_ref10","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.gendis.2018.02.003","article-title":"Infiniumpurify: an r package for estimating and accounting for tumor purity in cancer methylation research","volume":"5","author":"Qin","year":"2018","journal-title":"Genes Dis"},{"issue":"10","key":"2021110814400348400_ref11","doi-asserted-by":"crossref","first-page":"1642","DOI":"10.1093\/bioinformatics\/bty011","article-title":"Tumor purity quantification by clonal DNA methylation signatures","volume":"34","author":"Benelli","year":"2018","journal-title":"Bioinformatics"},{"issue":"1","key":"2021110814400348400_ref12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-019-3014-z","article-title":"Rf_purify: a novel tool for comprehensive analysis of tumor-purity in methylation array data based on random forest regression","volume":"20","author":"Johann","year":"2019","journal-title":"BMC Bioinfor"},{"issue":"22","key":"2021110814400348400_ref13","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1186\/s12859-019-3227-1","article-title":"Peis: a novel approach of tumor purity estimation by identifying information sites through integrating signal based on dna methylation data","volume":"20","author":"Wang","year":"2019","journal-title":"BMC Bioinfo"},{"issue":"5","key":"2021110814400348400_ref14","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1038\/nbt.2203","article-title":"Absolute quantification of somatic dna alterations in human cancer","volume":"30","author":"Carter","year":"2012","journal-title":"Nat Biotechnol"},{"issue":"6","key":"2021110814400348400_ref15","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1038\/s41587-020-0546-8","article-title":"Visualizing and interpreting cancer genomics data via the xena platform","volume":"38","author":"Goldman","year":"2020","journal-title":"Nat Biotechnol"},{"issue":"W1","key":"2021110814400348400_ref16","doi-asserted-by":"crossref","first-page":"W90","DOI":"10.1093\/nar\/gkw377","article-title":"Enrichr: a comprehensive gene set enrichment analysis web server 2016 update","volume":"44","author":"Kuleshov","year":"2016","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2021110814400348400_ref17","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J R Stat Soc B Methodol"},{"key":"2021110814400348400_ref18","doi-asserted-by":"crossref","DOI":"10.7717\/peerj.9530","article-title":"Analysis of prognostic genes in the tumor microenvironment of lung adenocarcinoma","volume":"8","author":"Xu","year":"2020","journal-title":"PeerJ"},{"issue":"6","key":"2021110814400348400_ref19","doi-asserted-by":"crossref","first-page":"4757","DOI":"10.18632\/aging.102871","article-title":"Prognostic value of immune-related genes in the tumor microenvironment of lung adenocarcinoma and lung squamous cell carcinoma","volume":"12","author":"Qu","year":"2020","journal-title":"Aging (Albany NY)"},{"issue":"5","key":"2021110814400348400_ref20","doi-asserted-by":"crossref","first-page":"638","DOI":"10.1016\/j.ccell.2014.09.007","article-title":"Dissecting the tumor myeloid compartment reveals rare activating antigen-presenting cells critical for t cell immunity","volume":"26","author":"Broz","year":"2014","journal-title":"Cancer Cell"},{"issue":"6","key":"2021110814400348400_ref21","doi-asserted-by":"crossref","first-page":"1031","DOI":"10.1016\/j.immuni.2012.03.027","article-title":"GM-CSF controls nonlymphoid tissue dendritic cell homeostasis but is dispensable for the differentiation of inflammatory dendritic cells","volume":"36","author":"Greter","year":"2012","journal-title":"Immunity"},{"key":"2021110814400348400_ref22","doi-asserted-by":"crossref","first-page":"194","DOI":"10.3389\/fonc.2012.00194","article-title":"Identification of cell surface proteins as potential immunotherapy targets in 12 pediatric cancers","volume":"2","author":"Orentas","year":"2012","journal-title":"Front Oncol"},{"key":"2021110814400348400_ref23","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1016\/j.cyto.2018.04.030","article-title":"The expression of il10ra in colorectal cancer and its correlation with the proliferation index and the clinical stage of the disease","volume":"110","author":"Zadka","year":"2018","journal-title":"Cytokine"},{"key":"2021110814400348400_ref24","first-page":"4237\u201353","article-title":"Statistical challenges of high-dimensional data","volume-title":"Phil. Trans. R. Soc. A.","author":"Johnstone","year":"2009"},{"issue":"7","key":"2021110814400348400_ref25","doi-asserted-by":"crossref","first-page":"878","DOI":"10.15252\/msb.20156651","article-title":"Deep learning for computational biology","volume":"12","author":"Angermueller","year":"2016","journal-title":"Mol Syst Biol"},{"issue":"14","key":"2021110814400348400_ref26","doi-asserted-by":"crossref","first-page":"8845","DOI":"10.1093\/nar\/gku555","article-title":"Single-cell RNA-seq: advances and future challenges","volume":"42","author":"Saliba","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2021110814400348400_ref27","doi-asserted-by":"crossref","first-page":"317","DOI":"10.3389\/fgene.2019.00317","article-title":"Single-cell RNA-seq technologies and related computational data analysis","volume":"10","author":"Chen","year":"2019","journal-title":"Front Genet"},{"issue":"8","key":"2021110814400348400_ref28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s12276-018-0071-8","article-title":"Single-cell RNA sequencing technologies and bioinformatics pipelines","volume":"50","author":"Hwang","year":"2018","journal-title":"Exp Mol Med"},{"issue":"8","key":"2021110814400348400_ref29","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1038\/s41591-018-0096-5","article-title":"Phenotype molding of stromal cells in the lung tumor microenvironment","volume":"24","author":"Lambrechts","year":"2018","journal-title":"Nat Med"},{"issue":"7","key":"2021110814400348400_ref30","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1038\/s41591-018-0045-3","article-title":"Global characterization of t cells in non-small-cell lung cancer by single-cell sequencing","volume":"24","author":"Guo","year":"2018","journal-title":"Nat Med"},{"issue":"1","key":"2021110814400348400_ref31","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1038\/s41591-020-1125-8","article-title":"Single-cell dissection of intratumoral heterogeneity and lineage diversity in metastatic gastric adenocarcinoma","volume":"27","author":"Wang","year":"2021","journal-title":"Nat Med"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab163\/41087699\/bbab163.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab163\/41087699\/bbab163.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T21:59:47Z","timestamp":1724795987000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab163\/6265216"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,5]]},"references-count":31,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab163","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,5,5]]},"article-number":"bbab163"}}