{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T09:14:32Z","timestamp":1775294072699,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2024,7,30]],"date-time":"2024-07-30T00:00:00Z","timestamp":1722297600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1R01HG010149"],"award-info":[{"award-number":["1R01HG010149"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,8,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Somatic mosaicism has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1\u20136\u00a0bp and comprise &amp;gt;1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion disorders and is highly prevalent in cancers exhibiting microsatellite instability. While a variety of tools have been developed to genotype germline variation at STRs, a method for systematically identifying mosaic STRs is lacking.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce prancSTR, a novel method for detecting mosaic STRs from individual high-throughput sequencing datasets. prancSTR is designed to detect loci characterized by a single high-frequency mosaic allele, but can also detect loci with multiple mosaic alleles. Unlike many existing mosaicism detection methods for other variant types, prancSTR does not require a matched control sample as input. We show that prancSTR accurately identifies mosaic STRs in simulated data, demonstrate its feasibility by identifying candidate mosaic STRs in Illumina whole genome sequencing data derived from lymphoblastoid cell lines for individuals sequenced by the 1000 Genomes Project, and evaluate the use of prancSTR on Element and PacBio data. In addition to prancSTR, we present simTR, a novel simulation framework which simulates raw sequencing reads with realistic error profiles at STRs.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>prancSTR and simTR are freely available at https:\/\/github.com\/gymrek-lab\/trtools. Detailed documentation is available at https:\/\/trtools.readthedocs.io\/.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae485","type":"journal-article","created":{"date-parts":[[2024,7,30]],"date-time":"2024-07-30T15:52:45Z","timestamp":1722354765000},"source":"Crossref","is-referenced-by-count":10,"title":["Genome-wide detection of somatic mosaicism at short tandem repeats"],"prefix":"10.1093","volume":"40","author":[{"given":"Aarushi","family":"Sehgal","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, University of California San Diego , 9500 Gilman Drive , La Jolla, CA, 92093, United States"}]},{"given":"Helyaneh","family":"Ziaei Jam","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of California San Diego , 9500 Gilman Drive , La Jolla, CA, 92093, United States"}]},{"given":"Andrew","family":"Shen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of California San Diego , 9500 Gilman Drive , La Jolla, CA, 92093, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6086-3903","authenticated-orcid":false,"given":"Melissa","family":"Gymrek","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of California San Diego , 9500 Gilman Drive , La Jolla, CA, 92093, United States"},{"name":"Department of Medicine, University of California San Diego , 9500 Gilman Drive , La Jolla, CA, 92093, United States"}]}],"member":"286","published-online":{"date-parts":[[2024,7,30]]},"reference":[{"key":"2024081305561526700_btae485-B1","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1038\/s41587-023-01750-7","article-title":"Sequencing by avidity enables high accuracy with low reagent consumption","volume":"42","author":"Arslan","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2024081305561526700_btae485-B2","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J R Stat Soc Ser B (Methodological)"},{"key":"2024081305561526700_btae485-B3","doi-asserted-by":"crossref","first-page":"2073","DOI":"10.1053\/j.gastro.2009.12.064","article-title":"Microsatellite instability in colorectal cancer","volume":"138","author":"Boland","year":"2010","journal-title":"Gastroenterology"},{"key":"2024081305561526700_btae485-B4","doi-asserted-by":"crossref","first-page":"689","DOI":"10.1038\/s41586-022-04602-7","article-title":"Somatic mosaicism reveals clonal distributions of neocortical development","volume":"604","author":"Breuss","year":"2022","journal-title":"Nature"},{"key":"2024081305561526700_btae485-B5","doi-asserted-by":"crossref","first-page":"3426","DOI":"10.1016\/j.cell.2022.08.004","article-title":"High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios","volume":"185","author":"Byrska-Bishop","year":"2022","journal-title":"Cell"},{"key":"2024081305561526700_btae485-B6","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1073\/pnas.63.2.428","article-title":"Xeroderma pigmentosum: a human disease in which an initial stage of DNA repair is defective","volume":"63","author":"Cleaver","year":"1969","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024081305561526700_btae485-B7","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1002\/ajmg.1320470514","article-title":"Proteus syndrome: clinical evidence for somatic mosaicism and selective review","volume":"47","author":"Cohen","year":"1993","journal-title":"Am J Med Genet"},{"key":"2024081305561526700_btae485-B8","doi-asserted-by":"crossref","first-page":"1895","DOI":"10.1101\/gr.225672.117","article-title":"Detection of long repeat expansions from PCR-free whole-genome sequence data","volume":"27","author":"Dolzhenko","year":"2017","journal-title":"Genome Res"},{"key":"2024081305561526700_btae485-B9","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1016\/j.tig.2018.04.003","article-title":"Detecting somatic mutations in normal cells","volume":"34","author":"Dou","year":"2018","journal-title":"Trends Genet"},{"key":"2024081305561526700_btae485-B10","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1038\/s41587-019-0368-8","article-title":"Accurate detection of mosaic variants in sequencing data without matched controls","volume":"38","author":"Dou","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2024081305561526700_btae485-B11","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1038\/nrc.2015.1","article-title":"Somatic mosaicism: on the road to cancer","volume":"16","author":"Fern\u00e1ndez","year":"2016","journal-title":"Nat Rev Cancer"},{"key":"2024081305561526700_btae485-B12","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1101\/gr.255026.119","article-title":"Comprehensive analysis of indels in whole-genome microsatellite regions and microsatellite instability across 21 cancer types","volume":"30","author":"Fujimoto","year":"2020","journal-title":"Genome Res"},{"key":"2024081305561526700_btae485-B13","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1186\/s12864-021-07395-7","article-title":"MONTAGE: a new tool for high-throughput detection of mosaic copy number variation","volume":"22","author":"Glessner","year":"2021","journal-title":"BMC Genomics"},{"key":"2024081305561526700_btae485-B14","doi-asserted-by":"crossref","first-page":"1342","DOI":"10.1038\/nm.4191","article-title":"Classification and characterization of microsatellite instability across 18 cancer types","volume":"22","author":"Hause","year":"2016","journal-title":"Nat Med"},{"key":"2024081305561526700_btae485-B15","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2024081305561526700_btae485-B16","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at UCSC","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res"},{"key":"2024081305561526700_btae485-B17","doi-asserted-by":"crossref","first-page":"858","DOI":"10.1016\/j.cell.2013.10.015","article-title":"The landscape of microsatellite instability in colorectal and endometrial cancer genomes","volume":"155","author":"Kim","year":"2013","journal-title":"Cell"},{"key":"2024081305561526700_btae485-B18","doi-asserted-by":"crossref","first-page":"1704","DOI":"10.1101\/gr.212373.116","article-title":"Detection of structural mosaicism from targeted and whole-genome sequencing data","volume":"27","author":"King","year":"2017","journal-title":"Genome Res"},{"key":"2024081305561526700_btae485-B19","volume-title":"A Software Package for Sequential Quadratic Programming","author":"Kraft","year":"1988"},{"key":"2024081305561526700_btae485-B20","doi-asserted-by":"crossref","first-page":"2269","DOI":"10.1093\/bioinformatics\/btz913","article-title":"popSTR2 enables clinical and population-scale genotyping of microsatellites","volume":"36","author":"Kristmundsdottir","year":"2020","journal-title":"Bioinformatics"},{"key":"2024081305561526700_btae485-B21","doi-asserted-by":"crossref","first-page":"1108","DOI":"10.1016\/j.ajhg.2012.05.006","article-title":"Somatic mosaic activating mutations in PIK3CA cause CLOVES syndrome","volume":"90","author":"Kurek","year":"2012","journal-title":"Am J Hum Genet"},{"key":"2024081305561526700_btae485-B22","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/35057062","article-title":"Initial sequencing and analysis of the human genome","volume":"409","author":"Lander","year":"2001","journal-title":"Nature"},{"key":"2024081305561526700_btae485-B23","author":"Li","year":"2013"},{"key":"2024081305561526700_btae485-B24","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2024081305561526700_btae485-B25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.1399-0004.2009.01230.x","article-title":"Review of the Lynch syndrome: history, molecular genetics, screening, differential diagnosis, and medicolegal ramifications","volume":"76","author":"Lynch","year":"2009","journal-title":"Clin Genet"},{"key":"2024081305561526700_btae485-B26","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.bbagen.2003.10.014","article-title":"Genetic instability in EBV-transformed lymphoblastoid cell lines","volume":"1670","author":"Mohyuddin","year":"2004","journal-title":"Biochim Biophys Acta"},{"key":"2024081305561526700_btae485-B27","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1093\/bioinformatics\/btaa736","article-title":"TRTools: a toolkit for genome-wide analysis of tandem repeats","volume":"37","author":"Mousavi","year":"2021","journal-title":"Bioinformatics"},{"key":"2024081305561526700_btae485-B28","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1093\/genetics\/158.2.885","article-title":"Distinguishing migration from isolation: a Markov chain Monte Carlo approach","volume":"158","author":"Nielsen","year":"2001","journal-title":"Genetics"},{"key":"2024081305561526700_btae485-B29","doi-asserted-by":"crossref","first-page":"100129","DOI":"10.1016\/j.xgen.2022.100129","article-title":"PrecisionFDA truth challenge V2: calling variants from short and long reads in difficult-to-map regions","volume":"2","author":"Olson","year":"2022","journal-title":"Cell Genom"},{"key":"2024081305561526700_btae485-B30","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","article-title":"BEDTools: a flexible suite of utilities for comparing genomic features","volume":"26","author":"Quinlan","year":"2010","journal-title":"Bioinformatics"},{"key":"2024081305561526700_btae485-B31","doi-asserted-by":"crossref","first-page":"2436","DOI":"10.1093\/nar\/gky1318","article-title":"Short tandem repeat stutter model inferred from direct measurement of in vitro stutter noise","volume":"47","author":"Raz","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024081305561526700_btae485-B32","doi-asserted-by":"crossref","first-page":"1433","DOI":"10.1212\/WNL.56.11.1433","article-title":"The clinical and diagnostic implications of mosaicism in the neurofibromatoses","volume":"56","author":"Ruggieri","year":"2001","journal-title":"Neurology"},{"key":"2024081305561526700_btae485-B33","doi-asserted-by":"crossref","first-page":"1192","DOI":"10.1373\/clinchem.2014.223677","article-title":"Microsatellite instability detection by next generation sequencing","volume":"60","author":"Salipante","year":"2014","journal-title":"Clin Chem"},{"key":"2024081305561526700_btae485-B34","doi-asserted-by":"crossref","first-page":"558","DOI":"10.1186\/s13104-018-3664-3","article-title":"Genetic and genomic stability across lymphoblastoid cell line expansions","volume":"11","author":"Scheinfeldt","year":"2018","journal-title":"BMC Res Notes"},{"key":"2024081305561526700_btae485-B35","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1038\/nature07943","article-title":"The cancer genome","volume":"458","author":"Stratton","year":"2009","journal-title":"Nature"},{"key":"2024081305561526700_btae485-B36","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1038\/ng.2398","article-title":"A direct characterization of human mutation based on microsatellites","volume":"44","author":"Sun","year":"2012","journal-title":"Nat Genet"},{"key":"2024081305561526700_btae485-B37","doi-asserted-by":"crossref","first-page":"3039","DOI":"10.1093\/hmg\/ddp242","article-title":"Somatic expansion of the Huntington\u2019s disease CAG repeat in the brain is associated with an earlier age of disease onset","volume":"18","author":"Swami","year":"2009","journal-title":"Hum Mol Genet"},{"key":"2024081305561526700_btae485-B38","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1038\/ng0494-409","article-title":"Somatic and gonadal mosaicism of the Huntington disease gene CAG repeat in brain and sperm","volume":"6","author":"Telenius","year":"1994","journal-title":"Nat Genet"},{"key":"2024081305561526700_btae485-B39","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1093\/bib\/bbs017","article-title":"Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration","volume":"14","author":"Thorvaldsd\u00f3ttir","year":"2013","journal-title":"Brief Bioinform"},{"key":"2024081305561526700_btae485-B40","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","article-title":"Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome","volume":"37","author":"Wenger","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2024081305561526700_btae485-B41","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1038\/nmeth.4267","article-title":"Genome-wide profiling of heritable and de novo STR variations","volume":"14","author":"Willems","year":"2017","journal-title":"Nat Methods"},{"key":"2024081305561526700_btae485-B42","doi-asserted-by":"crossref","first-page":"870","DOI":"10.1038\/s41587-022-01559-w","article-title":"Control-independent mosaic single nucleotide variant detection with DeepMosaic","volume":"41","author":"Yang","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2024081305561526700_btae485-B43","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1038\/nrg906","article-title":"Mechanisms and consequences of somatic mosaicism in humans","volume":"3","author":"Youssoufian","year":"2002","journal-title":"Nat Rev Genet"},{"key":"2024081305561526700_btae485-B44","doi-asserted-by":"crossref","first-page":"6711","DOI":"10.1038\/s41467-023-42278-3","article-title":"A deep population reference panel of tandem repeat variation","volume":"14","author":"Ziaei Jam","year":"2023","journal-title":"Nat Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae485\/58689571\/btae485.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/8\/btae485\/58802712\/btae485.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/8\/btae485\/58802712\/btae485.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,13]],"date-time":"2024-08-13T02:41:16Z","timestamp":1723516876000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae485\/7723996"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,7,30]]},"references-count":44,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,8,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae485","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.11.22.568371","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,8]]},"published":{"date-parts":[[2024,7,30]]},"article-number":"btae485"}}