{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:43:00Z","timestamp":1753875780991,"version":"3.41.2"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T00:00:00Z","timestamp":1744156800000},"content-version":"vor","delay-in-days":11,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009619","name":"Japan Agency for Medical Research and Development","doi-asserted-by":"publisher","award":["24tm0424219h0004"],"award-info":[{"award-number":["24tm0424219h0004"]}],"id":[{"id":"10.13039\/100009619","id-type":"DOI","asserted-by":"publisher"}]},{"name":"JSPS KAKENHI","award":["JP22H04925"],"award-info":[{"award-number":["JP22H04925"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Extended tandem repeats (TRs) have been associated with 60 or more diseases over the past 30\u2009years. Although most TRs have single repeat units (or motifs), complex TRs with different units have recently been correlated with some brain disorders. Of note, a population-scale analysis shows that complex TRs at one locus can be divergent, and different units are often expanded between individuals. To understand the evolution of high TR diversity, it is informative to visualize a phylogenetic tree. To do this, we need to measure the edit distance between pairs of complex TRs by considering duplication and contraction of units created by replication slippage. However, traditional rigorous algorithms for this purpose are computationally expensive.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We here propose an efficient heuristic algorithm to estimate the edit distance with duplication and contraction of units (EDDC, for short). We select a set of frequent units that occur in given complex TRs, encode each unit as a single symbol, compress a TR into an optimal series of unit symbols that partially matches the original TR with the minimum Levenshtein distance, and estimate the EDDC between a pair of complex TRs from their compressed forms. Using substantial synthetic benchmark datasets, we demonstrate that the estimated EDDC is highly correlated with the accurate EDDC, with a Pearson correlation coefficient of &amp;gt;0.983, while the heuristic algorithm achieves orders of magnitude performance speedup.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The software program hEDDC that implements the proposed algorithm is available at https:\/\/github.com\/Ricky-pon\/hEDDC (DOI: 10.5281\/zenodo.14732958)<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf155","type":"journal-article","created":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T17:45:56Z","timestamp":1744220756000},"source":"Crossref","is-referenced-by-count":0,"title":["Approximating edit distances between complex tandem repeats efficiently"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9633-9812","authenticated-orcid":false,"given":"Riki","family":"Kawahara","sequence":"first","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Chiba 277-8562,","place":["Japan"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6201-8885","authenticated-orcid":false,"given":"Shinichi","family":"Morishita","sequence":"additional","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Chiba 277-8562,","place":["Japan"]}]}],"member":"286","published-online":{"date-parts":[[2025,4,9]]},"reference":[{"key":"2025042217472906100_btaf155-B1","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem repeats finder: a program to analyze DNA sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2025042217472906100_btaf155-B2","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1089\/10665270360688066","article-title":"Comparison of minisatellites","volume":"10","author":"B\u00e9rard","year":"2003","journal-title":"J Comput Biol"},{"key":"2025042217472906100_btaf155-B3","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/j.tcs.2004.12.030","article-title":"A survey on tree edit distance and related problems","volume":"337","author":"Bille","year":"2005","journal-title":"Theor Comput Sci"},{"key":"2025042217472906100_btaf155-B4","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1038\/368455a0","article-title":"High resolution of human evolutionary trees with polymorphic microsatellites","volume":"368","author":"Bowcock","year":"1994","journal-title":"Nature"},{"key":"2025042217472906100_btaf155-B5","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1038\/s41588-019-0372-4","article-title":"Biallelic expansion of an intronic repeat in RFC1 is a common cause of late-onset ataxia","volume":"51","author":"Cortese","year":"2019","journal-title":"Nat Genet"},{"key":"2025042217472906100_btaf155-B6","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1016\/j.ajhg.2020.07.004","article-title":"Evolution of a human-specific tandem repeat associated with ALS","volume":"107","author":"Course","year":"2020","journal-title":"Am J Hum Genet"},{"key":"2025042217472906100_btaf155-B7","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1186\/s13059-018-1505-2","article-title":"STRetch: detecting and discovering pathogenic short tandem repeat expansions","volume":"19","author":"Dashnow","year":"2018","journal-title":"Genome Biol"},{"key":"2025042217472906100_btaf155-B8","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1007\/s00401-018-1841-z","article-title":"An intronic VNTR affects splicing of ABCA7 and increases risk of Alzheimer\u2019s disease","volume":"135","author":"De Roeck","year":"2018","journal-title":"Acta Neuropathol"},{"key":"2025042217472906100_btaf155-B9","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1016\/j.ajhg.2021.03.011","article-title":"30 Years of repeat expansion disorders: what have we learned and what are the remaining challenges?","volume":"108","author":"Depienne","year":"2021","journal-title":"Am J Hum Genet"},{"key":"2025042217472906100_btaf155-B10","doi-asserted-by":"crossref","first-page":"4754","DOI":"10.1093\/bioinformatics\/btz431","article-title":"ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions","volume":"35","author":"Dolzhenko","year":"2019","journal-title":"Bioinformatics"},{"key":"2025042217472906100_btaf155-B11","doi-asserted-by":"crossref","first-page":"i93","DOI":"10.1093\/bioinformatics\/btaa454","article-title":"The string decomposition problem and its applications to centromere analysis and assembly","volume":"36","author":"Dvorkina","year":"2020","journal-title":"Bioinformatics"},{"key":"2025042217472906100_btaf155-B12","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1038\/298396a0","article-title":"Potential Z-DNA forming sequences are highly dispersed in the human genome","volume":"298","author":"Hamada","year":"1982","journal-title":"Nature"},{"key":"2025042217472906100_btaf155-B13","doi-asserted-by":"crossref","first-page":"5530","DOI":"10.1038\/s41467-023-41262-1","article-title":"A landscape of complex tandem repeats within individual human genomes","volume":"14","author":"Ichikawa","year":"2023","journal-title":"Nat Commun"},{"key":"2025042217472906100_btaf155-B14","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1038\/s41588-018-0067-2","article-title":"Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy","volume":"50","author":"Ishiura","year":"2018","journal-title":"Nat Genet"},{"key":"2025042217472906100_btaf155-B15","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1038\/314067a0","article-title":"Hypervariable \u2019minisatellite\u2019 regions in human DNA","volume":"314","author":"Jeffreys","year":"1985","journal-title":"Nature"},{"key":"2025042217472906100_btaf155-B16","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1038\/7710","article-title":"An untranslated CTG expansion causes a novel form of spinocerebellar ataxia (SCA8)","volume":"21","author":"Koob","year":"1999","journal-title":"Nat Genet"},{"key":"2025042217472906100_btaf155-B17","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Soviet Physics Doklady"},{"key":"2025042217472906100_btaf155-B18","doi-asserted-by":"crossref","first-page":"864","DOI":"10.1126\/science.1062125","article-title":"Myotonic dystrophy type 2 caused by a CCTG expansion in intron I of ZNF9","volume":"293","author":"Liquori","year":"2001","journal-title":"Science"},{"key":"2025042217472906100_btaf155-B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/bioinformatics\/btad185","article-title":"Decomposing mosaic tandem repeats accurately from long reads","volume":"39","author":"Masutani","year":"2023","journal-title":"Bioinformatics"},{"key":"2025042217472906100_btaf155-B20","doi-asserted-by":"crossref","first-page":"5931","DOI":"10.1093\/nar\/9.22.5931","article-title":"A member of a new repeated sequence family which is conserved throughout eucaryotic evolution is found between the human delta and beta globin genes","volume":"9","author":"Miesfeld","year":"1981","journal-title":"Nucleic Acids Res"},{"key":"2025042217472906100_btaf155-B21","doi-asserted-by":"crossref","first-page":"e90","DOI":"10.1093\/nar\/gkz501","article-title":"Profiling the genome-wide landscape of tandem repeat expansions","volume":"47","author":"Mousavi","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025042217472906100_btaf155-B22","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/S0092-8240(89)80046-1","article-title":"Approximate matching of regular expressions","volume":"51","author":"Myers","year":"1989","journal-title":"Bull Math Biol"},{"key":"2025042217472906100_btaf155-B23","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1186\/1748-7188-8-27","article-title":"Efficient edit distance with duplications and contractions","volume":"8","author":"Pinhas","year":"2013","journal-title":"Algorithms Mol Biol"},{"key":"2025042217472906100_btaf155-B24","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1038\/s41576-024-00696-z","article-title":"Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications","volume":"25","author":"Rajan-Babu","year":"2024","journal-title":"Nat Rev Genet"},{"key":"2025042217472906100_btaf155-B25","doi-asserted-by":"crossref","first-page":"528","DOI":"10.1126\/science.1251186","article-title":"Evolution of repeated DNA sequences by unequal crossover","volume":"191","author":"Smith","year":"1976","journal-title":"Science"},{"key":"2025042217472906100_btaf155-B26","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1016\/j.ajhg.2018.07.011","article-title":"Characterization of a human-specific tandem repeat associated with bipolar disorder and schizophrenia","volume":"103","author":"Song","year":"2018","journal-title":"Am J Hum Genet"},{"key":"2025042217472906100_btaf155-B27","doi-asserted-by":"crossref","first-page":"5037","DOI":"10.1093\/nar\/9.19.5037","article-title":"Duplication\/deletion polymorphism 5\u2019- to the human \u03b2 globin gene","volume":"9","author":"Spritz","year":"1981","journal-title":"Nucleic Acids Res"},{"key":"2025042217472906100_btaf155-B28","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1038\/322652a0","article-title":"Cryptic simplicity in DNA is a major source of genetic variation","volume":"322","author":"Tautz","year":"1986","journal-title":"Nature"},{"key":"2025042217472906100_btaf155-B29","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1038\/s41586-020-2579-z","article-title":"Genome-wide detection of tandem DNA repeats that are expanded in autism","volume":"586","author":"Trost","year":"2020","journal-title":"Nature"},{"key":"2025042217472906100_btaf155-B30","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1093\/hmg\/2.8.1123","article-title":"Mutation of human short tandem repeats","volume":"2","author":"Weber","year":"1993","journal-title":"Hum Mol Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf155\/62899145\/btaf155.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf155\/62899145\/btaf155.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf155\/62899145\/btaf155.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,22]],"date-time":"2025-04-22T21:47:37Z","timestamp":1745358457000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf155\/8109433"}},"subtitle":[],"editor":[{"given":"Janet","family":"Kelso","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,3,29]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,3,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf155","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,4]]},"published":{"date-parts":[[2025,3,29]]},"article-number":"btaf155"}}