{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,26]],"date-time":"2025-10-26T14:59:51Z","timestamp":1761490791228,"version":"3.37.3"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"22","license":[{"start":{"date-parts":[[2019,4,17]],"date-time":"2019-04-17T00:00:00Z","timestamp":1555459200000},"content-version":"vor","delay-in-days":1,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001700","name":"Ministry of Education, Culture, Sports, Science and Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001700","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001700","name":"MEXT","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001700","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"KAKENHI","doi-asserted-by":"crossref","award":["JP18KT0016","JP17K20032","JP16H05879","JP16H01318","JP16H02484"],"award-info":[{"award-number":["JP18KT0016","JP17K20032","JP16H05879","JP16H01318","JP16H02484"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]},{"name":"JST CREST","award":["JPMJCR1881"],"award-info":[{"award-number":["JPMJCR1881"]}]},{"name":"Waseda University Grant for Special Research Projects"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a \u2018mutation signature.\u2019 Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this study, we present a novel method for estimating the number of mutation signatures\u2014latent Dirichlet allocation with variational Bayes inference (VB-LDA)\u2014where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>All the predicted mutation signatures with clustering results are freely available at http:\/\/www.f.waseda.jp\/mhamada\/MS\/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https:\/\/github.com\/qkirikigaku\/MS_LDA).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz266","type":"journal-article","created":{"date-parts":[[2019,4,11]],"date-time":"2019-04-11T17:47:08Z","timestamp":1555004828000},"page":"4543-4552","source":"Crossref","is-referenced-by-count":13,"title":["Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference"],"prefix":"10.1093","volume":"35","author":[{"given":"Taro","family":"Matsutani","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University , Tokyo, Japan"},{"name":"AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , Tokyo, Japan"}]},{"given":"Yuki","family":"Ueno","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University , Tokyo, Japan"},{"name":"AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4442-6049","authenticated-orcid":false,"given":"Tsukasa","family":"Fukunaga","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University , Tokyo, Japan"},{"name":"Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo , Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9466-1034","authenticated-orcid":false,"given":"Michiaki","family":"Hamada","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University , Tokyo, Japan"},{"name":"AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , Tokyo, Japan"},{"name":"Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST) , Tokyo, Japan"},{"name":"Graduate School of Medicine, Nippon Medical School , Tokyo, Japan"}]}],"member":"286","published-online":{"date-parts":[[2019,4,16]]},"reference":[{"key":"2023013108331217400_btz266-B1","first-page":"322859","article-title":"The repertoire of mutational signatures in human cancer","author":"Alexandrov","year":"2018","journal-title":"bioRxiv"},{"key":"2023013108331217400_btz266-B2","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1016\/j.celrep.2012.12.008","article-title":"Deciphering signatures of mutational processes operative in human cancer","volume":"3","author":"Alexandrov","year":"2013","journal-title":"Cell Rep"},{"key":"2023013108331217400_btz266-B3","doi-asserted-by":"crossref","first-page":"415.","DOI":"10.1038\/nature12477","article-title":"Signatures of mutational processes in human cancer","volume":"500","author":"Alexandrov","year":"2013","journal-title":"Nature"},{"key":"2023013108331217400_btz266-B4","doi-asserted-by":"crossref","first-page":"1402.","DOI":"10.1038\/ng.3441","article-title":"Clock-like mutational processes in human somatic cells","volume":"47","author":"Alexandrov","year":"2015","journal-title":"Nat. Genet"},{"key":"2023013108331217400_btz266-B5","doi-asserted-by":"crossref","first-page":"106","DOI":"10.3390\/v6010106","article-title":"Historical perspective, development and applications of next-generation sequencing in plant virology","volume":"6","author":"Barba","year":"2014","journal-title":"Viruses"},{"key":"2023013108331217400_btz266-B6","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"2023013108331217400_btz266-B7","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1002\/path.4185","article-title":"Germline and somatic polymerase \u03f5 and \u03b4 mutations define a new class of hypermutated colorectal and endometrial cancers","volume":"230","author":"Briggs","year":"2013","journal-title":"J. Pathol"},{"key":"2023013108331217400_btz266-B8","first-page":"27","volume":"2001","author":"Corduneanu","year":"2001","journal-title":"Artificial intelligence and Statistics"},{"key":"2023013108331217400_btz266-B9","doi-asserted-by":"crossref","first-page":"R39.","DOI":"10.1186\/gb-2013-14-4-r39","article-title":"Emu: probabilistic inference of mutational processes and their localization in the cancer genome","volume":"14","author":"Fischer","year":"2013","journal-title":"Genome Biol"},{"key":"2023013108331217400_btz266-B10","doi-asserted-by":"crossref","first-page":"3286","DOI":"10.1093\/bioinformatics\/bti515","article-title":"A latent variable model for chemogenomic profiling","volume":"21","author":"Flaherty","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013108331217400_btz266-B11","doi-asserted-by":"crossref","first-page":"D805","DOI":"10.1093\/nar\/gku1075","article-title":"Cosmic: exploring the world\u2019s knowledge of somatic mutations in human cancer","volume":"43","author":"Forbes","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023013108331217400_btz266-B12","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1038\/nature05610","article-title":"Patterns of somatic mutation in human cancer genomes","volume":"446","author":"Greenman","year":"2007","journal-title":"Nature"},{"key":"2023013108331217400_btz266-B13","doi-asserted-by":"crossref","first-page":"87.","DOI":"10.1186\/gm490","article-title":"Cancer mutation signatures, dna damage mechanisms, and potential clinical implications","volume":"5","author":"Harris","year":"2013","journal-title":"Genome Med"},{"key":"2023013108331217400_btz266-B14","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1145\/312624.312649","volume-title":"Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Hofmann","year":"1999"},{"first-page":"556","year":"2001","author":"Lee","key":"2023013108331217400_btz266-B15"},{"key":"2023013108331217400_btz266-B16","doi-asserted-by":"crossref","first-page":"3105","DOI":"10.1093\/bioinformatics\/btq576","article-title":"Identifying functional mirna\u2013mrna regulatory modules with correspondence latent dirichlet allocation","volume":"26","author":"Liu","year":"2010","journal-title":"Bioinformatics"},{"key":"2023013108331217400_btz266-B17","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1016\/j.cell.2012.04.024","article-title":"Mutational processes molding the genomes of 21 breast cancers","volume":"149","author":"Nik-Zainal","year":"2012","journal-title":"Cell"},{"key":"2023013108331217400_btz266-B18","doi-asserted-by":"crossref","first-page":"47.","DOI":"10.1038\/nature17676","article-title":"Landscape of somatic mutations in 560 breast cancer whole-genome sequences","volume":"534","author":"Nik-Zainal","year":"2016","journal-title":"Nature"},{"key":"2023013108331217400_btz266-B19","doi-asserted-by":"crossref","first-page":"136.","DOI":"10.1038\/ng.2503","article-title":"Germline mutations affecting the proofreading domains of pole and pold1 predispose to colorectal adenomas and carcinomas","volume":"45","author":"Palles","year":"2013","journal-title":"Nat. Genet"},{"volume-title":"DNA Methylation: Basic Mechanisms","year":"2006","author":"Pfeifer","key":"2023013108331217400_btz266-B20"},{"year":"2018","author":"Ramazzotti","key":"2023013108331217400_btz266-B21"},{"key":"2023013108331217400_btz266-B22","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1093\/bioinformatics\/btw572","article-title":"Signer: an empirical bayesian approach to mutational signature discovery","volume":"33","author":"Rosales","year":"2017","journal-title":"Bioinformatics"},{"key":"2023013108331217400_btz266-B23","doi-asserted-by":"crossref","first-page":"21766","DOI":"10.1073\/pnas.0912499106","article-title":"Mutation patterns in cancer genomes","volume":"106","author":"Rubin","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013108331217400_btz266-B24","doi-asserted-by":"crossref","first-page":"e1005657.","DOI":"10.1371\/journal.pgen.1005657","article-title":"A simple model-based approach to inferring and visualizing cancer mutation signatures","volume":"11","author":"Shiraishi","year":"2015","journal-title":"PLoS Genet"},{"key":"2023013108331217400_btz266-B25","doi-asserted-by":"crossref","first-page":"702.","DOI":"10.1093\/embo-reports\/kvf164","article-title":"Informatics and hypothesis-driven research","volume":"3","author":"Smalheiser","year":"2002","journal-title":"EMBO Rep"},{"key":"2023013108331217400_btz266-B26","doi-asserted-by":"crossref","first-page":"1553","DOI":"10.1126\/science.1204040","article-title":"Exploring the genomes of cancer cells: progress and promise","volume":"331","author":"Stratton","year":"2011","journal-title":"Science"},{"key":"2023013108331217400_btz266-B27","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1038\/nature07943","article-title":"The cancer genome","volume":"458","author":"Stratton","year":"2009","journal-title":"Nature"},{"key":"2023013108331217400_btz266-B28","doi-asserted-by":"crossref","first-page":"1857.","DOI":"10.1038\/s41467-018-04208-6","article-title":"The effects of mutational processes and selection on driver mutations across cancer types","volume":"9","author":"Temko","year":"2018","journal-title":"Nat. Commun"},{"key":"2023013108331217400_btz266-B29","first-page":"68","article-title":"The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge","volume":"19","author":"Tomczak","year":"2015","journal-title":"Contemp. Oncol. (Pozn)"},{"key":"2023013108331217400_btz266-B30","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1002\/humu.10177","article-title":"The tp53 gene, tobacco exposure, and lung cancer","volume":"21","author":"Toyooka","year":"2003","journal-title":"Hum. Mutat"},{"key":"2023013108331217400_btz266-B31","doi-asserted-by":"crossref","first-page":"1612","DOI":"10.1038\/leu.2015.22","article-title":"Analysis of mutational signatures in exomes from B-cell lymphoma cell lines suggest APOBEC3 family members to be involved in the pathogenesis of primary effusion lymphoma","volume":"29","author":"Wagener","year":"2015","journal-title":"Leukemia"},{"key":"2023013108331217400_btz266-B32","doi-asserted-by":"crossref","first-page":"2147","DOI":"10.1093\/bioinformatics\/btr357","article-title":"Chasm and snvbox: toolkit for detecting biologically important single nucleotide mutations in cancer","volume":"27","author":"Wong","year":"2011","journal-title":"Bioinformatics"},{"key":"2023013108331217400_btz266-B33","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1016\/j.neucom.2005.02.014","article-title":"Algebraic geometry and stochastic complexity of hidden markov models","volume":"69","author":"Yamazaki","year":"2005","journal-title":"Neurocomputing"},{"key":"2023013108331217400_btz266-B34","doi-asserted-by":"crossref","first-page":"1744.","DOI":"10.1038\/s41467-018-04052-8","article-title":"Validating the concept of mutational signatures with isogenic cell models","volume":"9","author":"Zou","year":"2018","journal-title":"Nat. Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz266\/28642313\/btz266.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/22\/4543\/48979697\/bioinformatics_35_22_4543.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/22\/4543\/48979697\/bioinformatics_35_22_4543.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T17:43:28Z","timestamp":1675187008000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/22\/4543\/5472341"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,4,16]]},"references-count":34,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2019,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz266","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,11,15]]},"published":{"date-parts":[[2019,4,16]]}}}