{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T09:35:09Z","timestamp":1762508109077,"version":"3.37.3"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,6,4]],"date-time":"2020-06-04T00:00:00Z","timestamp":1591228800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,6,4]],"date-time":"2020-06-04T00:00:00Z","timestamp":1591228800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["T32 GM007205"],"award-info":[{"award-number":["T32 GM007205"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many of the same single nucleotide variants (SNVs) are shared between germline and somatic mutation databases, such as between the gnomAD database of 120,000 germline exomes and the TCGA database of 10,000 somatic exomes. Here, we sought to explain this overlap.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>After strict filtering to exclude common germline polymorphisms and sites with poor coverage or mappability, we found 336,987 variants shared between the somatic and germline databases. A uniform statistical model explains 34% of these shared variants; a model that incorporates the varying mutation rates of the basic mutation types explains another 50% of shared variants; and a model that includes extended nucleotide contexts (e.g. surrounding 3 bases on either side) explains an additional 4% of shared variants. Analysis of read depth finds mixed evidence that up to 4% of the shared variants may represent germline variants leaked into somatic call sets. 9% of the shared variants are not explained by any model. Sequencing errors and convergent evolution did not account for these. We surveyed other factors as well: Cancers driven by endogenous mutational processes share a greater fraction of variants with the germline, and recently derived germline variants were more likely to be somatically shared than were ancient germline ones.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>Overall, we find that shared variants largely represent bona fide biological occurrences of the same variant in the germline and somatic setting and arise primarily because DNA has some of the same basic chemical vulnerabilities in either setting. Moreover, we find mixed evidence that somatic call-sets leak appreciable numbers of germline variants, which is relevant to genomic privacy regulations. In future studies, the similar chemical vulnerability of DNA between the somatic and germline settings might be used to help identify disease-related genes by guiding the development of background-mutation models that are informed by both somatic and germline patterns of variation.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-020-3508-8","type":"journal-article","created":{"date-parts":[[2020,6,4]],"date-time":"2020-06-04T17:03:25Z","timestamp":1591290205000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Origins and characterization of variants shared between databases of somatic and germline human mutations"],"prefix":"10.1186","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1757-1451","authenticated-orcid":false,"given":"William","family":"Meyerson","sequence":"first","affiliation":[]},{"given":"John","family":"Leisman","sequence":"additional","affiliation":[]},{"given":"Fabio C. P.","family":"Navarro","sequence":"additional","affiliation":[]},{"given":"Mark","family":"Gerstein","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,6,4]]},"reference":[{"issue":"7","key":"3508_CR1","doi-asserted-by":"publisher","first-page":"e101093","DOI":"10.1371\/journal.pone.0101093","volume":"9","author":"KA Ross","year":"2014","unstructured":"Ross KA. Coherent somatic mutation in autoimmune disease. PLoS One. 2014;9(7):e101093..","journal-title":"PLoS One"},{"issue":"4","key":"3508_CR2","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1038\/nm.3824","volume":"21","author":"JS Lim","year":"2015","unstructured":"Lim JS, Kim WI, Kang HC, et al. Brain somatic mutations in MTOR cause focal cortical dysplasia type II leading to intractable epilepsy. Nat Med. 2015;21(4):395\u2013400.","journal-title":"Nat Med"},{"issue":"7","key":"3508_CR3","doi-asserted-by":"publisher","first-page":"1177","DOI":"10.1016\/j.cell.2017.05.038","volume":"169","author":"EA Boyle","year":"2017","unstructured":"Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to Omnigenic. Cell. 2017;169(7):1177\u201386.","journal-title":"Cell"},{"issue":"7","key":"3508_CR4","doi-asserted-by":"publisher","first-page":"702","DOI":"10.1038\/ng.3285","volume":"47","author":"TJ Polderman","year":"2015","unstructured":"Polderman TJ, Benyamin B, De Leeuw CA, et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47(7):702\u20139.","journal-title":"Nat Genet"},{"key":"3508_CR5","doi-asserted-by":"publisher","unstructured":"Karczewski KJ, Francioli LC, Tiao G, et al. Variation across 141, 456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv. 2019. https:\/\/doi.org\/10.1101\/531210.","DOI":"10.1101\/531210"},{"issue":"10","key":"3508_CR6","doi-asserted-by":"publisher","first-page":"1113","DOI":"10.1038\/ng.2764","volume":"45","author":"JN Weinstein","year":"2013","unstructured":"Weinstein JN, Collisson EA, Mills GB, et al. The Cancer genome atlas pan-Cancer analysis project. Nat Genet. 2013;45(10):1113\u201320.","journal-title":"Nat Genet"},{"issue":"3","key":"3508_CR7","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1016\/j.cels.2018.03.002","volume":"6","author":"K Ellrott","year":"2018","unstructured":"Ellrott K, Bailey MH, Saksena G, et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 2018;6(3):271\u2013281.e7.","journal-title":"Cell Syst"},{"issue":"3","key":"3508_CR8","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1016\/0022-2836(67)90317-8","volume":"26","author":"WM Fitch","year":"1967","unstructured":"Fitch WM. Evidence suggesting a non-random character to nucleotide replacements in naturally occurring mutations. J Mol Biol. 1967;26(3):499\u2013507.","journal-title":"J Mol Biol"},{"issue":"2","key":"3508_CR9","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1093\/genetics\/165.2.695","volume":"165","author":"Z Yang","year":"2003","unstructured":"Yang Z, Ro S, Rannala B. Likelihood models of somatic mutation and codon substitution in cancer genes. Genetics. 2003;165(2):695\u2013705.","journal-title":"Genetics"},{"issue":"10","key":"3508_CR10","doi-asserted-by":"publisher","first-page":"1127","DOI":"10.1038\/ng.2762","volume":"45","author":"G Ciriello","year":"2013","unstructured":"Ciriello G, Miller ML, Aksoy BA, Senbabaoglu Y, Schultz N, Sander C. Emerging landscape of oncogenic signatures across human cancers. Nat Genet. 2013;45(10):1127\u201333.","journal-title":"Nat Genet"},{"issue":"7539","key":"3508_CR11","doi-asserted-by":"publisher","first-page":"360","DOI":"10.1038\/nature14221","volume":"518","author":"P Polak","year":"2015","unstructured":"Polak P, Karli\u0107 R, Koren A, et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature. 2015;518(7539):360\u20134.","journal-title":"Nature"},{"issue":"6","key":"3508_CR12","doi-asserted-by":"publisher","first-page":"519","DOI":"10.1038\/nbt.2926","volume":"32","author":"D Milius","year":"2014","unstructured":"Milius D, Dove ES, Chalmers D, et al. The International Cancer Genome Consortium\u2019s evolving data-protection policies. Nat Biotechnol. 2014;32(6):519\u201323.","journal-title":"Nat Biotechnol"},{"key":"3508_CR13","doi-asserted-by":"publisher","first-page":"273","DOI":"10.21900\/j.inhs.v7.407","volume":"7","author":"SA Forbes","year":"1907","unstructured":"Forbes SA. On the distribution of certain Illinois fishes: an essay in statistical ecology. Bull Illinois State Lab Nat History. 1907;7:273\u2013303.","journal-title":"Bull Illinois State Lab Nat History"},{"key":"3508_CR14","first-page":"259","volume":"301","author":"GP Pfeifer","year":"2006","unstructured":"Pfeifer GP. Mutagenesis at methylated CpG sequences. Curr Top Microbiol Immunol. 2006;301:259\u201381.","journal-title":"Curr Top Microbiol Immunol"},{"issue":"11","key":"3508_CR15","doi-asserted-by":"publisher","first-page":"1887","DOI":"10.1093\/molbev\/msg204","volume":"20","author":"PF Arndt","year":"2003","unstructured":"Arndt PF, Petrov DA, Hwa T. Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation. Mol Biol Evol. 2003;20(11):1887\u201396.","journal-title":"Mol Biol Evol"},{"issue":"7","key":"3508_CR16","doi-asserted-by":"publisher","first-page":"1823","DOI":"10.1016\/j.cell.2018.06.001","volume":"173","author":"I Martincorena","year":"2018","unstructured":"Martincorena I, Raine KM, Gerstung M, et al. Universal patterns of selection in Cancer and somatic tissues. Cell. 2018;173(7):1823.","journal-title":"Cell"},{"issue":"4","key":"3508_CR17","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1038\/ng.3511","volume":"48","author":"V Aggarwala","year":"2016","unstructured":"Aggarwala V, Voight BF. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat Genet. 2016;48(4):349\u201355.","journal-title":"Nat Genet"},{"issue":"12","key":"3508_CR18","doi-asserted-by":"publisher","first-page":"1785","DOI":"10.1038\/ng.3987","volume":"49","author":"D Weghorn","year":"2017","unstructured":"Weghorn D, Sunyaev S. Bayesian inference of negative and positive selection in human cancers. Nat Genet. 2017;49(12):1785\u20138.","journal-title":"Nat Genet"},{"issue":"3","key":"3508_CR19","doi-asserted-by":"publisher","first-page":"310","DOI":"10.1038\/ng.2892","volume":"46","author":"M Kircher","year":"2014","unstructured":"Kircher M, Witten DM, Jain P, O'roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310\u20135.","journal-title":"Nat Genet"},{"issue":"1","key":"3508_CR20","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1186\/s12859-018-2046-0","volume":"19","author":"DH Sendorek","year":"2018","unstructured":"Sendorek DH, Caloian C, Ellrott K, et al. Germline contamination and leakage in whole genome somatic single nucleotide variant detection. BMC Bioinformatics. 2018;19(1):28.","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"3508_CR21","doi-asserted-by":"publisher","first-page":"908","DOI":"10.1038\/nbt.1975","volume":"29","author":"MJ Clark","year":"2011","unstructured":"Clark MJ, Chen R, Lam HY, et al. Performance comparison of exome DNA sequencing technologies. Nat Biotechnol. 2011;29(10):908\u201314.","journal-title":"Nat Biotechnol"},{"issue":"7447","key":"3508_CR22","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1038\/nature12113","volume":"497","author":"C Kandoth","year":"2013","unstructured":"Kandoth C, Schultz N, Cherniack AD, et al. Integrated genomic characterization of endometrial carcinoma. Nature. 2013;497(7447):67\u201373.","journal-title":"Nature"},{"issue":"7463","key":"3508_CR23","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1038\/nature12477","volume":"500","author":"LB Alexandrov","year":"2013","unstructured":"Alexandrov LB, Nik-zainal S, Wedge DC, et al. Signatures of mutational processes in human cancer. Nature. 2013;500(7463):415\u201321.","journal-title":"Nature"},{"issue":"2","key":"3508_CR24","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1038\/ng.3469","volume":"48","author":"R Rahbari","year":"2016","unstructured":"Rahbari R, Wuster A, Lindsay SJ, et al. Timing, rates and spectra of human germline mutation. Nat Genet. 2016;48(2):126\u201333.","journal-title":"Nat Genet"},{"issue":"D1","key":"3508_CR25","doi-asserted-by":"publisher","first-page":"D964","DOI":"10.1093\/nar\/gkx1133","volume":"46","author":"PJ Huang","year":"2018","unstructured":"Huang PJ, Chiu LY, Lee CC, et al. mSignatureDB: a database for deciphering mutational signatures in human cancers. Nucleic Acids Res. 2018;46(D1):D964\u201370.","journal-title":"Nucleic Acids Res"},{"issue":"10","key":"3508_CR26","doi-asserted-by":"publisher","first-page":"1045","DOI":"10.1038\/nbt1010-1045","volume":"28","author":"BE Bernstein","year":"2010","unstructured":"Bernstein BE, Stamatoyannopoulos JA, Costello JF, et al. The NIH Roadmap Epigenomics mapping consortium. Nat Biotechnol. 2010;28(10):1045\u20138.","journal-title":"Nat Biotechnol"},{"issue":"6235","key":"3508_CR27","doi-asserted-by":"publisher","first-page":"648","DOI":"10.1126\/science.1262110","volume":"348","author":"Human genomics","year":"2015","unstructured":"Human genomics. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348(6235):648\u201360.","journal-title":"Science"},{"issue":"6217","key":"3508_CR28","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1126\/science.1260825","volume":"347","author":"C Tomasetti","year":"2015","unstructured":"Tomasetti C, Vogelstein B. Cancer etiology. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 2015;347(6217):78\u201381.","journal-title":"Science"},{"issue":"12","key":"3508_CR29","doi-asserted-by":"publisher","first-page":"1445","DOI":"10.1038\/bjc.2016.354","volume":"115","author":"SR Llaguno","year":"2016","unstructured":"Llaguno SR, Parada LF. Cell of origin of glioma: biological and clinical implications. Br J Cancer. 2016;115(12):1445\u201350.","journal-title":"Br J Cancer"},{"issue":"6","key":"3508_CR30","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/j.tig.2007.03.011","volume":"23","author":"N Galtier","year":"2007","unstructured":"Galtier N, Duret L. Adaptation or biased gene conversion? Extending the null hypothesis of molecular evolution. Trends Genet. 2007;23(6):273\u20137.","journal-title":"Trends Genet"},{"issue":"1","key":"3508_CR31","doi-asserted-by":"publisher","first-page":"e3000586","DOI":"10.1371\/journal.pbio.3000586","volume":"18","author":"PK Albers","year":"2020","unstructured":"Albers PK, Mcvean G. Dating genomic variants and shared ancestry in population-scale sequencing data. PLoS Biol. 2020;18(1):e3000586.","journal-title":"PLoS Biol"},{"issue":"D1","key":"3508_CR32","doi-asserted-by":"publisher","first-page":"D804","DOI":"10.1093\/nar\/gkw865","volume":"45","author":"TN Turner","year":"2017","unstructured":"Turner TN, Yi Q, Krumm N, et al. denovo-db: a compendium of human de novo variants. Nucleic Acids Res. 2017;45(D1):D804\u201311.","journal-title":"Nucleic Acids Res"},{"issue":"6169","key":"3508_CR33","doi-asserted-by":"publisher","first-page":"437","DOI":"10.1126\/science.1247167","volume":"343","author":"EP Murchison","year":"2014","unstructured":"Murchison EP, Wedge DC, Alexandrov LB, et al. Transmissible dog cancer genome reveals the origin and history of an ancient cell lineage. Science. 2014;343(6169):437\u201340.","journal-title":"Science"},{"issue":"9","key":"3508_CR34","doi-asserted-by":"publisher","first-page":"1297","DOI":"10.1101\/gr.107524.110","volume":"20","author":"A Mckenna","year":"2010","unstructured":"Mckenna A, Hanna M, Banks E, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297\u2013303.","journal-title":"Genome Res"},{"issue":"1","key":"3508_CR35","doi-asserted-by":"publisher","first-page":"e30377","DOI":"10.1371\/journal.pone.0030377","volume":"7","author":"T Derrien","year":"2012","unstructured":"Derrien T, Estelle J, Marco Sola S, Knowles DG, Raineri E, Guigo R, Ribeca P. Fast computation and applications of genome mappability. PLoS One. 2012;7(1):e30377.","journal-title":"PLoS One"},{"key":"3508_CR36","volume-title":"RepeatMasker. Open-3.0.","author":"A Smit","year":"1996","unstructured":"Smit A, et al. RepeatMasker. Open-3.0., 1996. Available at http:\/\/www.repeatmaske."},{"key":"3508_CR37","volume-title":"Rsamtools: Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import. R package version1.32.0","author":"M Morgan","year":"2018","unstructured":"Morgan M, Pag\u00e8s H, Obenchain V, Hayden N (2018). Rsamtools: Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import. R package version1.32.0, http:\/\/bioconductor.org\/packages\/release\/bioc\/html\/Rsamtools.html."},{"key":"3508_CR38","doi-asserted-by":"publisher","unstructured":"Lawrence M, Huber W, Pag\u00e8s H, Aboyoun P, Carlson M, Gentleman R, Morgan M, Carey V. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9. https:\/\/doi.org\/10.1371\/journal.pcbi.1003118\nhttp:\/\/www.ploscompbiol.org\/article\/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118.","DOI":"10.1371\/journal.pcbi.1003118"},{"issue":"5","key":"3508_CR39","doi-asserted-by":"publisher","first-page":"e1000471","DOI":"10.1371\/journal.pgen.1000471","volume":"5","author":"G Mcvicker","year":"2009","unstructured":"Mcvicker G, Gordon D, Davis C, Green P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 2009;5(5):e1000471.","journal-title":"PLoS Genet"},{"issue":"7414","key":"3508_CR40","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1038\/nature11247","volume":"489","author":"ENCODE Project Consortium","year":"2012","unstructured":"ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57\u201374.","journal-title":"Nature"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3508-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-020-3508-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3508-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,6,3]],"date-time":"2021-06-03T23:44:22Z","timestamp":1622763862000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-3508-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,4]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3508"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-3508-8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2020,6,4]]},"assertion":[{"value":"24 February 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 April 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 June 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"227"}}