{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T00:43:33Z","timestamp":1778633013627,"version":"3.51.4"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2016,12,3]],"date-time":"2016-12-03T00:00:00Z","timestamp":1480723200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2016,12,3]],"date-time":"2016-12-03T00:00:00Z","timestamp":1480723200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003710","name":"Korea Health Industry Development Institute","doi-asserted-by":"publisher","award":["HI13C2143"],"award-info":[{"award-number":["HI13C2143"]}],"id":[{"id":"10.13039\/501100003710","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003621","name":"Ministry of Science, ICT and Future Planning","doi-asserted-by":"publisher","award":["2013M3A9C4078139"],"award-info":[{"award-number":["2013M3A9C4078139"]}],"id":[{"id":"10.13039\/501100003621","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003621","name":"Ministry of Science, ICT and Future Planning","doi-asserted-by":"publisher","award":["NRF-2015M3C9A4053251"],"award-info":[{"award-number":["NRF-2015M3C9A4053251"]}],"id":[{"id":"10.13039\/501100003621","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this work, we developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. With breast cancer as a model, we profiled 35 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif was disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for machine learning was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of our random forest classifier was evaluated by cross validations. The variable importance of each feature in the classification of mutations was investigated. Our statistical recurrence model for the random forest classifier showed an area under the curve (AUC) of ~0.78 in predicting recurrent mutations. Chromatin accessibility at the mutation sites, the distance from the mutations to known cancer risk loci, and the role of the target genes in the regulatory or protein interaction network were among the most important variables.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Our methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-016-1385-y","type":"journal-article","created":{"date-parts":[[2016,12,3]],"date-time":"2016-12-03T01:38:28Z","timestamp":1480729108000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Predicting the recurrence of noncoding regulatory mutations in cancer"],"prefix":"10.1186","volume":"17","author":[{"given":"Woojin","family":"Yang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hyoeun","family":"Bang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kiwon","family":"Jang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Min Kyung","family":"Sung","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jung Kyoon","family":"Choi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2016,12,3]]},"reference":[{"key":"1385_CR1","doi-asserted-by":"publisher","first-page":"1546","DOI":"10.1126\/science.1235122","volume":"339","author":"B Vogelstein","year":"2013","unstructured":"Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz Jr LA, Kinzler KW. Cancer genome landscapes. Science. 2013;339:1546\u201358.","journal-title":"Science"},{"key":"1385_CR2","doi-asserted-by":"publisher","first-page":"710","DOI":"10.1038\/ng.3332","volume":"47","author":"C Melton","year":"2015","unstructured":"Melton C, Reuter JA, Spacek DV, Snyder M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat Genet. 2015;47:710\u20136. Available: http:\/\/www.nature.com\/doifinder\/10.1038\/ng.3332.","journal-title":"Nat Genet"},{"key":"1385_CR3","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1038\/nature11273","volume":"488","author":"B Schuster-B\u00f6ckler","year":"2012","unstructured":"Schuster-B\u00f6ckler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488:504\u20137. doi:10.1038\/nature11273.","journal-title":"Nature"},{"key":"1385_CR4","doi-asserted-by":"publisher","first-page":"1004","DOI":"10.1038\/ncomms1982","volume":"3","author":"YH Woo","year":"2012","unstructured":"Woo YH, Li W-H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nat Commun. 2012;3:1004. doi:10.1038\/ncomms1982.","journal-title":"Nat Commun"},{"key":"1385_CR5","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1038\/nature12213","volume":"499","author":"MS Lawrence","year":"2013","unstructured":"Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214\u20138. doi:10.1038\/nature12213.","journal-title":"Nature"},{"key":"1385_CR6","doi-asserted-by":"publisher","first-page":"360","DOI":"10.1038\/nature14221","volume":"518","author":"P Polak","year":"2015","unstructured":"Polak P, Karli\u0107 R, Koren A, Thurman R, Sandstrom R, Lawrence MS, et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature. 2015;518:360\u20134. doi:10.1038\/nature14221.","journal-title":"Nature"},{"key":"1385_CR7","doi-asserted-by":"publisher","first-page":"495","DOI":"10.1038\/nature12912","volume":"505","author":"MS Lawrence","year":"2014","unstructured":"Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR, et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505:495\u2013501. doi:10.1038\/nature12912.","journal-title":"Nature"},{"key":"1385_CR8","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1038\/nature12753","volume":"502","author":"W de Laat","year":"2013","unstructured":"de Laat W, Duboule D. Topology of mammalian developmental enhancers and their regulatory landscapes. Nature. 2013;502:499\u2013506. doi:10.1038\/nature12753.","journal-title":"Nature"},{"key":"1385_CR9","doi-asserted-by":"publisher","first-page":"e1004590","DOI":"10.1371\/journal.pcbi.1004590","volume":"11","author":"D Svetlichnyy","year":"2015","unstructured":"Svetlichnyy D, Imrichova H, Fiers M, Kalender Atak Z, Aerts S. Identification of high-impact cis-regulatory mutations using transcription factor specific random forest models. PLoS Comput Biol. 2015;11:e1004590. doi:10.1371\/journal.pcbi.1004590.","journal-title":"PLoS Comput Biol"},{"key":"1385_CR10","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1038\/nature12477","volume":"500","author":"LB Alexandrov","year":"2013","unstructured":"Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415\u201321. doi:10.1038\/nature12477.","journal-title":"Nature"},{"key":"1385_CR11","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1038\/nature11632","volume":"491","author":"The 1000 Genomes Project Consortium","year":"2012","unstructured":"The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56\u201365. doi:10.1038\/nature11632.","journal-title":"Nature"},{"key":"1385_CR12","doi-asserted-by":"publisher","first-page":"E2191","DOI":"10.1073\/pnas.1320308111","volume":"111","author":"B He","year":"2014","unstructured":"He B, Chen C, Teng L, Tan K. Global view of enhancer-promoter interactome in human cells. Proc Natl Acad Sci U S A. 2014;111:E2191\u20139. doi:10.1073\/pnas.1320308111.","journal-title":"Proc Natl Acad Sci U S A"},{"key":"1385_CR13","doi-asserted-by":"publisher","first-page":"1190","DOI":"10.1126\/science.1222794","volume":"337","author":"MT Maurano","year":"2012","unstructured":"Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. Systematic localization of common disease-associated variation in regulatorty DNA. Science. 2012;337:1190. doi:10.1126\/science.1222794.","journal-title":"Science"},{"key":"1385_CR14","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1093\/nar\/gkg108","volume":"31","author":"V Matys","year":"2003","unstructured":"Matys V, Fricke E, Geffers R, G\u00f6ssling E, Haubrock M, Hehl R, et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003;31:374\u20138.","journal-title":"Nucleic Acids Res"},{"key":"1385_CR15","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1093\/bioinformatics\/btr064","volume":"27","author":"CE Grant","year":"2011","unstructured":"Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017\u20138.","journal-title":"Bioinformatics"},{"key":"1385_CR16","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1038\/nature11247","volume":"489","author":"I Dunham","year":"2012","unstructured":"Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57\u201374. doi:10.1038\/nature11247.","journal-title":"Nature"},{"key":"1385_CR17","doi-asserted-by":"publisher","first-page":"5716","DOI":"10.1093\/nar\/gkv532","volume":"43","author":"K Kim","year":"2015","unstructured":"Kim K, Yang W, Lee KS, Bang H, Jang K, Kim SC, et al. Global transcription network incorporating distal regulator binding reveals selective cooperation of cancer drivers and risk genes. Nucleic Acids Res. 2015;43:5716\u201329. doi:10.1093\/nar\/gkv532.","journal-title":"Nucleic Acids Res"},{"key":"1385_CR18","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1038\/nrc1299","volume":"4","author":"PA Futreal","year":"2004","unstructured":"Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A census of human cancer genes. Nat Rev Cancer. 2004;4:177\u201383. doi:10.1038\/nrc1299.","journal-title":"Nat Rev Cancer"},{"key":"1385_CR19","doi-asserted-by":"publisher","first-page":"1212","DOI":"10.1016\/j.cell.2014.10.050","volume":"159","author":"T Rolland","year":"2014","unstructured":"Rolland T, Ta\u015fan M, Charloteaux B, Pevzner SJ, Zhong Q, Sahni N, et al. A proteome-scale map of the human interactome network. Cell. 2014;159:1212\u201326. doi:10.1016\/j.cell.2014.10.050.","journal-title":"Cell"},{"key":"1385_CR20","doi-asserted-by":"publisher","first-page":"478","DOI":"10.1038\/nmeth.1597","volume":"8","author":"H Yu","year":"2011","unstructured":"Yu H, Tardivo L, Tam S, Weiner E, Gebreab F, Fan C, et al. Next-generation sequencing to generate interactome datasets. Nat Methods. 2011;8:478\u201380. doi:10.1038\/nmeth.1597.","journal-title":"Nat Methods"},{"key":"1385_CR21","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1186\/1752-0509-6-92","volume":"6","author":"J Das","year":"2012","unstructured":"Das J, Yu H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol. 2012;6:92. doi:10.1186\/1752-0509-6-92.","journal-title":"BMC Syst Biol"},{"key":"1385_CR22","doi-asserted-by":"publisher","first-page":"1109","DOI":"10.1101\/gr.118992.110","volume":"21","author":"I Lee","year":"2011","unstructured":"Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011;21:1109\u201321. doi:10.1101\/gr.118992.110.","journal-title":"Genome Res"},{"key":"1385_CR23","doi-asserted-by":"publisher","first-page":"D1001","DOI":"10.1093\/nar\/gkt1229","volume":"42","author":"D Welter","year":"2014","unstructured":"Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001\u20136. doi:10.1093\/nar\/gkt1229.","journal-title":"Nucleic Acids Res"},{"key":"1385_CR24","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1073\/pnas.0912402107","volume":"107","author":"RS Hansen","year":"2010","unstructured":"Hansen RS, Thomas S, Sandstrom R, Canfield TK, Thurman RE, Weaver M, et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A. 2010;107:139\u201344. doi:10.1073\/pnas.0912402107.","journal-title":"Proc Natl Acad Sci U S A"},{"key":"1385_CR25","doi-asserted-by":"crossref","unstructured":"Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034\u201350. Available: http:\/\/genome.cshlp.org\/content\/15\/8\/1034.full.","DOI":"10.1101\/gr.3715005"},{"key":"1385_CR26","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1080\/10618600.1996.10474713","volume":"5","author":"R Ihaka","year":"1996","unstructured":"Ihaka R. R: A language for data analysis and graphics. J Comput Graph Stat. 1996;5:299\u2013314.","journal-title":"J Comput Graph Stat"},{"key":"1385_CR27","first-page":"18","volume":"2","author":"A Liaw","year":"2002","unstructured":"Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2:18\u201322.","journal-title":"R News"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-1385-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-016-1385-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-1385-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T23:57:36Z","timestamp":1718927856000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-016-1385-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12,3]]},"references-count":27,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2016,12]]}},"alternative-id":["1385"],"URL":"https:\/\/doi.org\/10.1186\/s12859-016-1385-y","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,12,3]]},"assertion":[{"value":"24 June 2016","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 November 2016","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 December 2016","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"492"}}