{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T22:42:50Z","timestamp":1776379370236,"version":"3.51.2"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"14","funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["DGE-1632976"],"award-info":[{"award-number":["DGE-1632976"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Len Blavatnik and the Blavatnik Family foundation"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Somatic mutations result from processes related to DNA replication or environmental\/lifestyle exposures. Knowing the activity of mutational processes in a tumor can inform personalized therapies, early detection, and understanding of tumorigenesis. Computational methods have revealed 30 validated signatures of mutational processes active in human cancers, where each signature is a pattern of single base substitutions. However, half of these signatures have no known etiology, and some similar signatures have distinct etiologies, making patterns of mutation signature activity hard to interpret. Existing mutation signature detection methods do not consider tumor-level clinical\/demographic (e.g. smoking history) or molecular features (e.g. inactivations to DNA damage repair genes).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To begin to address these challenges, we present the Tumor Covariate Signature Model (TCSM), the first method to directly model the effect of observed tumor-level covariates on mutation signatures. To this end, our model uses methods from Bayesian topic modeling to change the prior distribution on signature exposure conditioned on a tumor\u2019s observed covariates. We also introduce methods for imputing covariates in held-out data and for evaluating the statistical significance of signature-covariate associations. On simulated and real data, we find that TCSM outperforms both non-negative matrix factorization and topic modeling-based approaches, particularly in recovering the ground truth exposure to similar signatures. We then use TCSM to discover five mutation signatures in breast cancer and predict homologous recombination repair deficiency in held-out tumors. We also discover four signatures in a combined melanoma and lung cancer cohort\u2014using cancer type as a covariate\u2014and provide statistical evidence to support earlier claims that three lung cancers from The Cancer Genome Atlas are misdiagnosed metastatic melanomas.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>TCSM is implemented in Python 3 and available at https:\/\/github.com\/lrgr\/tcsm, along with a data workflow for reproducing the experiments in the paper.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz340","type":"journal-article","created":{"date-parts":[[2019,5,9]],"date-time":"2019-05-09T19:21:53Z","timestamp":1557429713000},"page":"i492-i500","source":"Crossref","is-referenced-by-count":20,"title":["Modeling clinical and molecular covariates of mutational process activity in cancer"],"prefix":"10.1093","volume":"35","author":[{"given":"Welles","family":"Robinson","sequence":"first","affiliation":[{"name":"Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA"},{"name":"Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Roded","family":"Sharan","sequence":"additional","affiliation":[{"name":"Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1034-4363","authenticated-orcid":false,"given":"Mark D M","family":"Leiserson","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,7,5]]},"reference":[{"key":"2023062712361576000_btz340-B1","author":"Alexandrov","year":"2018"},{"key":"2023062712361576000_btz340-B2","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1038\/nature12477","article-title":"Signatures of mutational processes in human cancer","volume":"500","author":"Alexandrov","year":"2013","journal-title":"Nature"},{"key":"2023062712361576000_btz340-B3","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1016\/j.celrep.2012.12.008","article-title":"Deciphering signatures of mutational processes operative in human cancer","volume":"3","author":"Alexandrov","year":"2013","journal-title":"Cell Rep"},{"key":"2023062712361576000_btz340-B4","doi-asserted-by":"crossref","first-page":"1402","DOI":"10.1038\/ng.3441","article-title":"Clock-like mutational processes in human somatic cells","volume":"47","author":"Alexandrov","year":"2015","journal-title":"Nat. Genet"},{"key":"2023062712361576000_btz340-B5","doi-asserted-by":"crossref","first-page":"618","DOI":"10.1126\/science.aag0299","article-title":"Mutational signatures associated with tobacco smoking in human cancer","volume":"354","author":"Alexandrov","year":"2016","journal-title":"Science"},{"key":"2023062712361576000_btz340-B6","first-page":"289","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc"},{"key":"2023062712361576000_btz340-B7","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1145\/2133806.2133826","article-title":"Probabilistic topic models","volume":"55","author":"Blei","year":"2012","journal-title":"Commun. ACM"},{"key":"2023062712361576000_btz340-B8","first-page":"147","volume-title":"Proceedings of the 18th International Conference on Neural Information Processing Systems, NIPS\u201905","author":"Blei","year":"2005"},{"key":"2023062712361576000_btz340-B9","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"2023062712361576000_btz340-B10","doi-asserted-by":"crossref","DOI":"10.1016\/j.cell.2017.09.048","article-title":"Comprehensive analysis of hypermutation in human cancer","author":"Campbell","year":"2017","journal-title":"Cell"},{"key":"2023062712361576000_btz340-B11","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1038\/ng.3564","article-title":"Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas","volume":"48","author":"Campbell","year":"2016","journal-title":"Nat. Genet"},{"key":"2023062712361576000_btz340-B12","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.1700759114","article-title":"Mutational spectra of aflatoxin B1 in vivo establish biomarkers of exposure for human hepatocellular carcinoma","author":"Chawanthayatham","year":"2017","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062712361576000_btz340-B13","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1038\/nm.4292","article-title":"HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures","volume":"23","author":"Davies","year":"2017","journal-title":"Nat. Med"},{"key":"2023062712361576000_btz340-B14","author":"Eisenstein","year":"2010"},{"key":"2023062712361576000_btz340-B15","author":"Eisenstein","year":"2011"},{"key":"2023062712361576000_btz340-B16","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1038\/nature03445","article-title":"Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy","volume":"434","author":"Farmer","year":"2005","journal-title":"Nature"},{"key":"2023062712361576000_btz340-B17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2013-14-4-r39","article-title":"EMu: probabilistic inference of mutational processes and their localization in the cancer genome","volume":"14","author":"Fischer","year":"2013","journal-title":"Genome Biol"},{"key":"2023062712361576000_btz340-B18","doi-asserted-by":"crossref","first-page":"D777","DOI":"10.1093\/nar\/gkw1121","article-title":"COSMIC: somatic cancer genetics at high-resolution","volume":"45","author":"Forbes","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023062712361576000_btz340-B19","article-title":"Integrated single-nucleotide and structural variation signatures of DNA-repair deficient human cancers","author":"Funnell","year":"2018","journal-title":"bioRxiv"},{"key":"2023062712361576000_btz340-B20","first-page":"3673","article-title":"SomaticSignatures: inferring mutational signatures from single-nucleotide variants","volume":"31","author":"Gehring","year":"2015","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023062712361576000_btz340-B21","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1056\/NEJMoa043331","article-title":"MGMT gene silencing and benefit from temozolomide in glioblastoma","volume":"352","author":"Hegi","year":"2005","journal-title":"New Eng. J. Med"},{"key":"2023062712361576000_btz340-B22","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1016\/j.cell.2018.03.022","article-title":"Cell-of-origin patterns dominate the molecular classification of 10,\u2009000 tumors from 33 types of cancer","volume":"173","author":"Hoadley","year":"2018","journal-title":"Cell"},{"key":"2023062712361576000_btz340-B23","author":"Huang","year":"2017"},{"key":"2023062712361576000_btz340-B24","doi-asserted-by":"crossref","first-page":"8866","DOI":"10.1038\/ncomms9866","article-title":"Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution","volume":"6","author":"Kasar","year":"2015","journal-title":"Nat. Commun"},{"key":"2023062712361576000_btz340-B25","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1038\/ng.3557","article-title":"Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors","volume":"48","author":"Kim","year":"2016","journal-title":"Nat. Genet"},{"key":"2023062712361576000_btz340-B26","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/j.celrep.2018.03.076","article-title":"Genomic and molecular landscape of DNA damage repair deficiency across The Cancer Genome Atlas","volume":"23","author":"Knijnenburg","year":"2018","journal-title":"Cell Rep"},{"key":"2023062712361576000_btz340-B27","first-page":"2520","article-title":"Snakemake\u2014a scalable bioinformatics workflow engine","volume":"28","author":"K\u00f6ster","year":"2012","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023062712361576000_btz340-B28","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1126\/science.aan6733","article-title":"Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade","volume":"357","author":"Le","year":"2017","journal-title":"Science"},{"key":"2023062712361576000_btz340-B29","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/0022-2836(85)90026-9","article-title":"Mutagenic specificity of ultraviolet light","volume":"182","author":"Miller","year":"1985","journal-title":"J. Mol. Biol"},{"key":"2023062712361576000_btz340-B30","first-page":"411","author":"Mimno","year":"2008"},{"key":"2023062712361576000_btz340-B31","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1038\/nature17676","article-title":"Landscape of somatic mutations in 560 breast cancer whole-genome sequences","volume":"534","author":"Nik-Zainal","year":"2016","journal-title":"Nature"},{"key":"2023062712361576000_btz340-B32","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1186\/gm175","article-title":"Environmental exposures and mutational patterns of cancer genomes","volume":"2","author":"Pfeifer","year":"2010","journal-title":"Genome Med"},{"key":"2023062712361576000_btz340-B33","doi-asserted-by":"crossref","DOI":"10.1038\/ng.3934","article-title":"A mutational signature reveals alterations underlying deficient homologous recombination repair in breast cancer","volume":"49","author":"Polak","year":"2017","journal-title":"Nat. Genet"},{"key":"2023062712361576000_btz340-B34","first-page":"248","author":"Ramage","year":"2009"},{"key":"2023062712361576000_btz340-B35","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-017-00921-w","article-title":"Pan-cancer analysis of bi-allelic alterations in homologous recombination DNA repair genes","volume":"8","author":"Riaz","year":"2017","journal-title":"Nat. Commun"},{"key":"2023062712361576000_btz340-B36","doi-asserted-by":"crossref","first-page":"5454","DOI":"10.1158\/0008-5472.CAN-12-1470","article-title":"Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1\/2 inactivation","volume":"72","author":"Rieunier","year":"2012","journal-title":"Cancer Res"},{"key":"2023062712361576000_btz340-B37","author":"Roberts","year":"2018"},{"key":"2023062712361576000_btz340-B38","author":"Roberts","year":"2013"},{"key":"2023062712361576000_btz340-B39","first-page":"51","volume-title":"Navigating the Local Modes of Big Data: The Case of Topic Models","author":"Roberts","year":"2016"},{"key":"2023062712361576000_btz340-B40","doi-asserted-by":"crossref","DOI":"10.1080\/01621459.2016.1141684","article-title":"A model of text for experimentation in the social sciences","author":"Roberts","year":"2016","journal-title":"J. Am. Stat. Assoc"},{"key":"2023062712361576000_btz340-B41","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/s13059-016-0893-4","article-title":"deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution","volume":"17","author":"Rosenthal","year":"2016","journal-title":"Genome Biol"},{"key":"2023062712361576000_btz340-B42","first-page":"8","article-title":"signeR: an empirical Bayesian approach to mutational signature discovery","volume":"33","author":"Rosales","year":"2017","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023062712361576000_btz340-B43","doi-asserted-by":"crossref","first-page":"e1005657.","DOI":"10.1371\/journal.pgen.1005657","article-title":"A simple model-based approach to inferring and visualizing cancer mutation signatures","volume":"11","author":"Shiraishi","year":"2015","journal-title":"PLoS Genet"},{"key":"2023062712361576000_btz340-B44","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1186\/s13059-016-0963-7","article-title":"A comprehensive survey of the mutagenic impact of common cancer cytotoxics","volume":"17","author":"Szikriszt","year":"2016","journal-title":"Genome Biol"},{"key":"2023062712361576000_btz340-B45","first-page":"1385","article-title":"Sharing clusters among related groups: Hierarchical Dirichlet processes","author":"Teh","year":"2005","journal-title":"Advances in neural information processing systems"},{"key":"2023062712361576000_btz340-B46","doi-asserted-by":"crossref","first-page":"1330","DOI":"10.1126\/science.aaf9011","article-title":"Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention","volume":"355","author":"Tomasetti","year":"2017","journal-title":"Science"},{"key":"2023062712361576000_btz340-B47","article-title":"Ultraviolet radiation-induced DNA damage is prognostic for outcome in melanoma","author":"Trucco","year":"2018","journal-title":"Nat. Med"},{"key":"2023062712361576000_btz340-B48","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1016\/j.cell.2017.01.002","article-title":"Endogenous DNA damage as a source of genomic instability in cancer","volume":"168","author":"Tubbs","year":"2017","journal-title":"Cell"},{"key":"2023062712361576000_btz340-B49","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1101\/gr.120477.111","article-title":"De novo discovery of mutated driver pathways in cancer","volume":"22","author":"Vandin","year":"2012","journal-title":"Genome Res"},{"key":"2023062712361576000_btz340-B50","first-page":"1105","author":"Wallach","year":"2009"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/14\/i492\/50721431\/bioinformatics_35_14_i492.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/14\/i492\/50721431\/bioinformatics_35_14_i492.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T12:37:06Z","timestamp":1687869426000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/14\/i492\/5529117"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7]]},"references-count":50,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2019,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz340","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,7]]},"published":{"date-parts":[[2019,7]]}}}