{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T22:15:05Z","timestamp":1775772905495,"version":"3.50.1"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2020,10,8]],"date-time":"2020-10-08T00:00:00Z","timestamp":1602115200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Grant-in-Aid for Scientific Research on Innovative Areas","award":["16H06279"],"award-info":[{"award-number":["16H06279"]}]},{"DOI":"10.13039\/100009619","name":"Japan Agency for Medical Research and Development","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100009619","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Long tandem repeat expansions of more than 1000\u2009nt have been suggested to be associated with diseases, but remain largely unexplored in individual human genomes because read lengths have been too short. However, new long-read sequencing technologies can produce single reads of 10\u00a0000\u2009nt or more that can span such repeat expansions, although these long reads have high error rates, of 10\u201320%, which complicates the detection of repetitive elements. Moreover, most traditional algorithms for finding tandem repeats are designed to find short tandem repeats (&amp;lt;1000 nt) and cannot effectively handle the high error rate of long reads in a reasonable amount of time.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we report an efficient algorithm for solving this problem that takes advantage of the length of the repeat. Namely, a long tandem repeat has hundreds or thousands of approximate copies of the repeated unit, so despite the error rate, many short k-mers will be error-free in many copies of the unit. We exploited this characteristic to develop a method for first estimating regions that could contain a tandem repeat, by analyzing the k-mer frequency distributions of fixed-size windows across the target read, followed by an algorithm that assembles the k-mers of a putative region into the consensus repeat unit by greedily traversing a de Bruijn graph. Experimental results indicated that the proposed algorithm largely outperformed Tandem Repeats Finder, a widely used program for finding tandem repeats, in terms of sensitivity.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/morisUtokyo\/mTR.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa865","type":"journal-article","created":{"date-parts":[[2020,9,23]],"date-time":"2020-09-23T11:54:15Z","timestamp":1600862055000},"page":"612-621","source":"Crossref","is-referenced-by-count":13,"title":["Finding long tandem repeats in long noisy reads"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6201-8885","authenticated-orcid":false,"given":"Shinichi","family":"Morishita","sequence":"first","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Chiba 277-8562, Japan"}]},{"given":"Kazuki","family":"Ichikawa","sequence":"additional","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Chiba 277-8562, Japan"}]},{"given":"Eugene W","family":"Myers","sequence":"additional","affiliation":[{"name":"Max Planck Institute of Molecular Cell Biology and Genetics , Dresden, Saxony 01307, Germany"},{"name":"Center for Systems Biology Dresden , Dresden, Saxony 01307, Germany"}]}],"member":"286","published-online":{"date-parts":[[2020,10,8]]},"reference":[{"key":"2023051704113224100_btaa865-B2","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem repeats finder: a program to analyze dna sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023051704113224100_btaa865-B3","doi-asserted-by":"crossref","first-page":"1869","DOI":"10.1038\/s41467-019-09637-5","article-title":"Sequencing of human genomes with nanopore technology","volume":"10","author":"Bowden","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051704113224100_btaa865-B4","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1093\/bioinformatics\/btl674","article-title":"Quaternionic periodicity transform: an algebraic solution to the tandem repeat detection problem","volume":"23","author":"Brodzik","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051704113224100_btaa865-B5","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1016\/0092-8674(92)90154-5","article-title":"Molecular basis of myotonic dystrophy: expansion of a trinucleotide (ctg) repeat at the 3\u2032 end of a transcript encoding a protein kinase family member","volume":"68","author":"Brook","year":"1992","journal-title":"Cell"},{"key":"2023051704113224100_btaa865-B6","doi-asserted-by":"crossref","first-page":"2280","DOI":"10.1109\/TSP.2003.815396","article-title":"Detection and visualization of tandem repeats in DNA sequences","volume":"51","author":"Buchner","year":"2003","journal-title":"IEEE Trans. Signal Process"},{"key":"2023051704113224100_btaa865-B7","first-page":"51","article-title":"Better filtering with gapped q-grams","volume":"56","author":"Burkhardt","year":"2003","journal-title":"Fundam. Inf"},{"key":"2023051704113224100_btaa865-B8","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1038\/nature13907","article-title":"Resolving the complexity of the human genome using single-molecule sequencing","volume":"517","author":"Chaisson","year":"2015","journal-title":"Nature"},{"key":"2023051704113224100_btaa865-B9","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1038\/nmeth.4035","article-title":"Phased diploid genome assembly with single-molecule real-time sequencing","volume":"13","author":"Chin","year":"2016","journal-title":"Nat. Methods"},{"key":"2023051704113224100_btaa865-B10","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1038\/nbt.2023","article-title":"How to apply de Bruijn graphs to genome assembly","volume":"29","author":"Compeau","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023051704113224100_btaa865-B11","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.neuron.2011.09.011","article-title":"Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS","volume":"72","author":"DeJesus-Hernandez","year":"2011","journal-title":"Neuron"},{"key":"2023051704113224100_btaa865-B12","doi-asserted-by":"crossref","first-page":"815","DOI":"10.1093\/bioinformatics\/btt647","article-title":"Rapid detection of expanded short tandem repeats in personal genomics using hybrid sequencing","volume":"30","author":"Doi","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051704113224100_btaa865-B13","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1089\/cmb.2007.0018","article-title":"A novel approach to the detection of genomic approximate tandem repeats in the Levenshtein metric","volume":"14","author":"Domani\u00e7","year":"2007","journal-title":"J. Comput. Biol."},{"key":"2023051704113224100_btaa865-B14","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/0020-0190(93)90245-5","article-title":"Identifying periodic occurrences of a template with applications to protein structure","volume":"45","author":"Fischetti","year":"1993","journal-title":"Inf. Process. Lett"},{"key":"2023051704113224100_btaa865-B15","author":"Floratos","year":"2002"},{"key":"2023051704113224100_btaa865-B16","doi-asserted-by":"crossref","first-page":"i200","DOI":"10.1093\/bioinformatics\/btz376","article-title":"TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain","volume":"35","author":"Gao","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051704113224100_btaa865-B17","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1093\/bioinformatics\/bty747","article-title":"Dot2dot: accurate whole-genome tandem repeats discovery","volume":"35","author":"Genovese","year":"2019","journal-title":"Bioinfromaatics"},{"key":"2023051704113224100_btaa865-B18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2007\/43596","article-title":"A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences","volume":"2007","author":"Gupta","year":"2007","journal-title":"EURASIP J. Bioinf. Syst. Biol"},{"key":"2023051704113224100_btaa865-B19","doi-asserted-by":"crossref","first-page":"1154","DOI":"10.1101\/gr.135780.111","article-title":"lobSTR: a short tandem repeat profiler for personal genomes","volume":"22","author":"Gymrek","year":"2012","journal-title":"Genome Res"},{"key":"2023051704113224100_btaa865-B20","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1038\/s41588-018-0067-2","article-title":"Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy","volume":"50","author":"Ishiura","year":"2018","journal-title":"Nat. Genet"},{"key":"2023051704113224100_btaa865-B21","doi-asserted-by":"crossref","first-page":"1222","DOI":"10.1038\/s41588-019-0458-z","article-title":"Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease","volume":"51","author":"Ishiura","year":"2019","journal-title":"Nat. Genet"},{"key":"2023051704113224100_btaa865-B22","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1038\/nbt.4109","article-title":"Linear assembly of a human centromere on the y chromosome","volume":"36","author":"Jain","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023051704113224100_btaa865-B23","first-page":"596","author":"Kolpakov","year":"1999"},{"key":"2023051704113224100_btaa865-B24","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"key":"2023051704113224100_btaa865-B25","doi-asserted-by":"crossref","first-page":"1711","DOI":"10.1126\/science.1675488","article-title":"Mapping of DNA instability at the fragile x to a trinucleotide repeat sequence p(CCG)n","volume":"252","author":"Kremer","year":"1991","journal-title":"Science"},{"key":"2023051704113224100_btaa865-B26","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051704113224100_btaa865-B27","doi-asserted-by":"crossref","first-page":"864","DOI":"10.1126\/science.1062125","article-title":"Myotonic dystrophy type 2 caused by a CCTG expansion in intron 1 of ZNF9","volume":"293","author":"Liquori","year":"2001","journal-title":"Science"},{"key":"2023051704113224100_btaa865-B28","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1101\/gr.141705.112","article-title":"Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile x gene","volume":"23","author":"Loomis","year":"2013","journal-title":"Genome Res"},{"key":"2023051704113224100_btaa865-B29","doi-asserted-by":"crossref","first-page":"971","DOI":"10.1016\/0092-8674(93)90585-E","article-title":"A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington\u2019s disease chromosomes","volume":"72","author":"MacDonald","year":"1993","journal-title":"Cell"},{"key":"2023051704113224100_btaa865-B30","doi-asserted-by":"crossref","first-page":"1253","DOI":"10.1126\/science.1546325","article-title":"Myotonic dystrophy mutation: an unstable ctg repeat in the 3\u2032untranslated region of the gene","volume":"255","author":"Mahadevan","year":"1992","journal-title":"Science"},{"key":"2023051704113224100_btaa865-B31","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/S0092-8240(88)80016-8","article-title":"Sequence comparison with concave weighting functions","volume":"50","author":"Miller","year":"1988","journal-title":"Bull. Math. Biol"},{"key":"2023051704113224100_btaa865-B32","doi-asserted-by":"crossref","first-page":"932","DOI":"10.1038\/nature05977","article-title":"Expandable DNA repeats and human disease","volume":"447","author":"Mirkin","year":"2007","journal-title":"Nature"},{"key":"2023051704113224100_btaa865-B33","first-page":"38","author":"Myers","year":"1995"},{"key":"2023051704113224100_btaa865-B34","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.neuron.2011.10.001","article-title":"FTD and ALS: genetic ties that bind","volume":"72","author":"Orr","year":"2011","journal-title":"Neuron"},{"key":"2023051704113224100_btaa865-B35","doi-asserted-by":"crossref","first-page":"i358","DOI":"10.1093\/bioinformatics\/btq209","article-title":"Trstalker: an efficient heuristic for finding fuzzy tandem repeats","volume":"26","author":"Pellegrini","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051704113224100_btaa865-B36","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051704113224100_btaa865-B37","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1016\/j.neuron.2011.09.010","article-title":"A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD","volume":"72","author":"Renton","year":"2011","journal-title":"Neuron"},{"key":"2023051704113224100_btaa865-B38","doi-asserted-by":"crossref","first-page":"1405","DOI":"10.1093\/bioinformatics\/bth103","article-title":"Spectral repeat finder (SRF): identification of repetitive sequences using Fourier transformation","volume":"20","author":"Sharma","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051704113224100_btaa865-B39","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1007\/BF00291644","article-title":"Further segregation analysis of the fragile x syndrome with special reference to transmitting males","volume":"69","author":"Sherman","year":"1985","journal-title":"Hum. Genet"},{"key":"2023051704113224100_btaa865-B40","first-page":"1","article-title":"Non hybrid long read consensus using local de Bruijn graph assembly","author":"Tischler","year":"2017","journal-title":"bioRxiv"},{"key":"2023051704113224100_btaa865-B41","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1016\/0304-3975(92)90143-4","article-title":"Approximate string-matching with q-grams and maximal matches","volume":"92","author":"Ukkonen","year":"1992","journal-title":"Theor. Comput. Sci"},{"key":"2023051704113224100_btaa865-B42","doi-asserted-by":"crossref","first-page":"905","DOI":"10.1016\/0092-8674(91)90397-H","article-title":"Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile x syndrome","volume":"65","author":"Verkerk","year":"1991","journal-title":"Cell"},{"key":"2023051704113224100_btaa865-B43","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1111\/ahg.12364","article-title":"Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads","volume":"84","author":"Vollger","year":"2020","journal-title":"Ann. Hum. Genet"},{"key":"2023051704113224100_btaa865-B44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-018-2208-0","article-title":"NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model","volume":"19","author":"Wei","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023051704113224100_btaa865-B45","doi-asserted-by":"crossref","first-page":"100","DOI":"10.12688\/f1000research.10571.2","article-title":"Comprehensive comparison of pacific biosciences and oxford nanopore technologies and their applications to transcriptome analysis","volume":"6","author":"Weirather","year":"2017","journal-title":"F1000Research"},{"key":"2023051704113224100_btaa865-B1"},{"key":"2023051704113224100_btaa865-B46","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1089\/cmb.2005.12.928","article-title":"Finding approximate tandem repeats in genomic sequences","volume":"12","author":"Wexler","year":"2005","journal-title":"J. Comput. Biol"},{"key":"2023051704113224100_btaa865-B47","doi-asserted-by":"crossref","first-page":"1316","DOI":"10.21105\/joss.01316","article-title":"Badread: simulation of error-prone long reads","volume":"4","author":"Wick","year":"2019","journal-title":"J. Open Source Softw"},{"key":"2023051704113224100_btaa865-B48","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1101\/gr.244830.118","article-title":"Recompleting the Caenorhabditis elegans genome","volume":"29","author":"Yoshimura","year":"2019","journal-title":"Genome Res"},{"key":"2023051704113224100_btaa865-B49","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa865\/36613355\/btaa865.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/5\/612\/50357658\/btaa865.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/5\/612\/50357658\/btaa865.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T04:12:35Z","timestamp":1684296755000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/5\/612\/5919583"}},"subtitle":[],"editor":[{"given":"Robinson","family":"Peter","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,10,8]]},"references-count":49,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,5,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa865","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,3,1]]},"published":{"date-parts":[[2020,10,8]]}}}