{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T20:37:20Z","timestamp":1765485440157,"version":"3.37.3"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2315,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events.<\/jats:p><jats:p>Results: We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently TRs that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions. To attain this goal, we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem is akin to the \u2018generalized median string\u2019 that is known to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences.<\/jats:p><jats:p>Availability: TRStalker will be integrated in the web-based TRs Discovery Service (TReaDS) at bioalgo.iit.cnr.it.<\/jats:p><jats:p>Contact: \u00a0marco.pellegrini@iit.cnr.it<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq209","type":"journal-article","created":{"date-parts":[[2010,6,7]],"date-time":"2010-06-07T07:28:13Z","timestamp":1275895693000},"page":"i358-i366","source":"Crossref","is-referenced-by-count":38,"title":["TRStalker: an efficient heuristic for finding fuzzy tandem repeats"],"prefix":"10.1093","volume":"26","author":[{"given":"Marco","family":"Pellegrini","sequence":"first","affiliation":[{"name":"1 CNR, Istituto di Informatica e Telematica, Via Moruzzi 1, 56124 Pisa and 2 Univ. di Pisa, Dip. Ingegneria dell'Informazione, Via Diotisalvi 2, 56122 Pisa (Italy)"}]},{"given":"M. Elena","family":"Renda","sequence":"additional","affiliation":[{"name":"1 CNR, Istituto di Informatica e Telematica, Via Moruzzi 1, 56124 Pisa and 2 Univ. di Pisa, Dip. Ingegneria dell'Informazione, Via Diotisalvi 2, 56122 Pisa (Italy)"}]},{"given":"Alessio","family":"Vecchio","sequence":"additional","affiliation":[{"name":"1 CNR, Istituto di Informatica e Telematica, Via Moruzzi 1, 56124 Pisa and 2 Univ. di Pisa, Dip. Ingegneria dell'Informazione, Via Diotisalvi 2, 56122 Pisa (Italy)"}]}],"member":"286","published-online":{"date-parts":[[2010,6,1]]},"reference":[{"key":"2023012508073304500_B1","doi-asserted-by":"crossref","first-page":"1693","DOI":"10.1534\/genetics.108.087882","article-title":"Comparative analyses of human single- and multilocus tandem repeats","volume":"179","author":"Ames","year":"2008","journal-title":"Genetics"},{"key":"2023012508073304500_B2","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1145\/279069.279079","article-title":"An algorithm for finding tandem repeats of unspecified pattern size","volume-title":"Proceedings of the Second Annual international Conference on Computational Molecular Biology (New York, New York, United States, March 22\u201325, 1998)","author":"Benson","year":"1998"},{"key":"2023012508073304500_B3","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem repeats finder: a program to analyze DNA sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023012508073304500_B4","doi-asserted-by":"crossref","first-page":"676","DOI":"10.1093\/bioinformatics\/btk032","article-title":"Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression","volume":"22","author":"Boeva","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012508073304500_B5","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1093\/bioinformatics\/btl674","article-title":"Quaternionic periodicity transform: an algebraic solution to the tandem repeat detection problem","volume":"23","author":"Brodzik","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508073304500_B6","doi-asserted-by":"crossref","first-page":"2280","DOI":"10.1109\/TSP.2003.815396","article-title":"Detection and visualization of tandem repeats in DNA sequences","volume":"51","author":"Buchner","year":"2003","journal-title":"IEEE Trans. Signal Process"},{"key":"2023012508073304500_B7","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1007\/3-540-45452-7_19","article-title":"One-gapped q-grams filters for levenshtein distance","volume-title":"Combinatorial Pattern Matching, 13th Annual Symposium, CPM 2002, Fukuoka, Japan, July 3\u20135, 2002, Proceedings","author":"Burkhardt","year":"2002"},{"key":"2023012508073304500_B8","first-page":"51","article-title":"Better filtering with gapped q-grams","volume":"56","author":"Burkhardt","year":"2003","journal-title":"Fundam. Inform."},{"key":"2023012508073304500_B9","doi-asserted-by":"crossref","first-page":"3809","DOI":"10.1073\/pnas.92.9.3809","article-title":"The nucleotide sequence of chromosome I from Saccharomyces cerevisiae","volume":"92","author":"Bussey","year":"1995","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508073304500_B10","doi-asserted-by":"crossref","first-page":"1423","DOI":"10.1126\/science.271.5254.1423","article-title":"Friedreich's ataxia: autosomal recessive disease caused by an intronic GAA triplet repeat expansion","volume":"271","author":"Campuzano","year":"1996","journal-title":"Science"},{"key":"2023012508073304500_B11","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1016\/S0304-3975(97)00240-5","article-title":"Topology of strings: median string is np-complete","volume":"230","author":"de la Higuera","year":"2000","journal-title":"Theor. Comput. Sci."},{"key":"2023012508073304500_B12","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1016\/0168-9525(96)10027-5","article-title":"The yeast genome project: what did we learn?","volume":"12","author":"Dujon","year":"1996","journal-title":"Trends Genet."},{"key":"2023012508073304500_B13","first-page":"92","article-title":"An efficient and accurate distance based algorithm to reconstruct tandem duplication trees","volume-title":"Proceedings of the European Conference on Computational Biology (ECCB 2002)","author":"Elemento","year":"2002"},{"key":"2023012508073304500_B14","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/0020-0190(93)90245-5","article-title":"Identifying periodic occurrences of a template with applications to protein structure","volume":"45","author":"Fischetti","year":"1993","journal-title":"Inf. Process. Lett."},{"issue":"Database Issue","key":"2023012508073304500_B15","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1093\/nar\/gkl1013","article-title":"TRDB - the tandem repeats database","volume":"35","author":"Gelfand","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012508073304500_B16","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/S1074-7613(01)00200-X","article-title":"Comparative genomics of the human and mouse T cell receptor loci","volume":"15","author":"Glusman","year":"2001","journal-title":"Immunity"},{"key":"2023012508073304500_B17","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1186\/1471-2105-8-172","article-title":"The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats","volume":"8","author":"Grissa","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012508073304500_B18","doi-asserted-by":"crossref","DOI":"10.1155\/2007\/43596","article-title":"A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences","author":"Gupta","year":"2007","journal-title":"EURASIP. J. Bioinform. Syst. Biol."},{"key":"2023012508073304500_B19","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511574931","volume-title":"Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology.","author":"Gusfield","year":"1997"},{"key":"2023012508073304500_B20","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1016\/j.jcss.2004.03.004","article-title":"Linear time algorithms for finding and representing all the tandem repeats in a string","volume":"69","author":"Gusfield","year":"2004","journal-title":"J. Comput. Syst. Sci."},{"key":"2023012508073304500_B21","first-page":"31","article-title":"Beyond tandem repeats: complex pattern structures and distant regions of similarity","volume-title":"Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology","author":"Hauth","year":"2002"},{"key":"2023012508073304500_B22","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/s10044-002-0184-4","article-title":"Dynamic computation of generalised median strings","volume":"6","author":"Jiang","year":"2003","journal-title":"Pattern Anal. Appl."},{"key":"2023012508073304500_B23","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1159\/000084979","article-title":"Repbase update, a database of eukaryotic repetitive elements","volume":"110","author":"Jurka","year":"2005","journal-title":"Cytogenet. Genome Res."},{"key":"2023012508073304500_B24","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1101\/gr.7113408","article-title":"The genome-wide determinants of human and chimpanzee microsatellite evolution","volume":"18","author":"Kelkar","year":"2008","journal-title":"Genome Res."},{"key":"2023012508073304500_B25","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/S0304-3975(02)00448-6","article-title":"Finding approximate repetitions under Hamming distance","volume":"303","author":"Kolpakov","year":"2003","journal-title":"Theor. Comput. Sci."},{"key":"2023012508073304500_B26","doi-asserted-by":"crossref","first-page":"3672","DOI":"10.1093\/nar\/gkg617","article-title":"mreps: efficient and flexible detection of tandem repeats in DNA","volume":"31","author":"Kolpakov","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023012508073304500_B27","first-page":"596","article-title":"Finding maximal repetitions in a word in linear time","volume-title":"Proceedings of the 40th Annual Symposium on Foundations of Computer Science (October 17\u201318, 1999). FOCS","author":"Kolpakov","year":"1999"},{"key":"2023012508073304500_B28","doi-asserted-by":"crossref","first-page":"2702","DOI":"10.1093\/bioinformatics\/bth311","article-title":"Exhaustive whole-genome tandem repeats search","volume":"20","author":"Krishnan","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012508073304500_B29","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1093\/bioinformatics\/15.5.426","article-title":"Reputer: fast computation of maximal repeats in complete genomes","volume":"15","author":"Kurtz","year":"1999","journal-title":"Bioinformatics"},{"key":"2023012508073304500_B30","doi-asserted-by":"crossref","first-page":"4633","DOI":"10.1093\/nar\/29.22.4633","article-title":"Reputer: the manifold applications of repeat analysis on a genomic scale","volume":"29","author":"Kurtz","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012508073304500_B31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1089\/106652701300099038","article-title":"An algorithm for approximate tandem repeats","volume":"8","author":"Landau","year":"2001","journal-title":"J. Comput. Biol."},{"key":"2023012508073304500_B32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-8-125","article-title":"Detecting microsatellites within genomes: significant variation among algorithms","volume":"8","author":"Leclercq","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012508073304500_B33","doi-asserted-by":"crossref","first-page":"1787","DOI":"10.1101\/gr.6554007","article-title":"Sequence-based estimation of minisatellite and microsatellite repeat variability","volume":"17","author":"Legendre","year":"2007","journal-title":"Genome Res."},{"key":"2023012508073304500_B34","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511814075","volume-title":"Randomized Algorithms.","author":"Motwani","year":"1995"},{"key":"2023012508073304500_B35","doi-asserted-by":"crossref","first-page":"1181","DOI":"10.1093\/bioinformatics\/btm097","article-title":"Imex: Imperfect microsatellite extractor","volume":"23","author":"Mudunuri","year":"2007","journal-title":"Bioinformatics"},{"volume-title":"Computational Geometry, an Introduction through Randomized Algorithms.","year":"1993","author":"Mulmuley","key":"2023012508073304500_B36"},{"key":"2023012508073304500_B37","doi-asserted-by":"crossref","first-page":"R69","DOI":"10.1186\/gb-2005-6-8-r69","article-title":"Tandem repeat copy-number variation in protein-coding regions of human genes","volume":"6","author":"O'Dushlaine","year":"2005","journal-title":"Genome Biology"},{"key":"2023012508073304500_B38","doi-asserted-by":"crossref","first-page":"1733","DOI":"10.1093\/bioinformatics\/btg268","article-title":"String: finding tandem repeats in DNA sequences","volume":"19","author":"Parisi","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508073304500_B39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1748-7188-4-3","article-title":"Lossless filter for multiple repeats with bounded edit distance","volume":"4","author":"Peterlongo","year":"2009","journal-title":"Algorithms Mol. Biol."},{"key":"2023012508073304500_B40","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1142\/S012905410400239X","article-title":"A survey on algorithmic aspects of tandem repeats evolution","volume":"15","author":"Rivals","year":"2004","journal-title":"Int. J. Found. Comput. Sci."},{"key":"2023012508073304500_B41","first-page":"131","article-title":"Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences","volume":"13","author":"Rivals","year":"1997","journal-title":"Comput. Appl. Biosci."},{"key":"2023012508073304500_B42","doi-asserted-by":"crossref","first-page":"1755","DOI":"10.1126\/science.272.5269.1755","article-title":"The complete 685-kilobase DNA sequence of the human beta T Cell Receptor Locus","volume":"272","author":"Rowen","year":"1996","journal-title":"Science"},{"key":"2023012508073304500_B43","doi-asserted-by":"crossref","first-page":"2284","DOI":"10.1093\/nar\/gkn064","article-title":"Empirical comparison of ab initio repeat finding programs","volume":"36","author":"Saha","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012508073304500_B44","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1109\/TCBB.2006.46","article-title":"Comparing tandem repeats with duplications and excisions of variable degree","volume":"3","author":"Sammeth","year":"2006","journal-title":"IEEE\/ACM Trans. Comput. Biology Bioinform."},{"key":"2023012508073304500_B45","doi-asserted-by":"crossref","first-page":"1405","DOI":"10.1093\/bioinformatics\/bth103","article-title":"Spectral repeat finder (SRF): identification of repetitive sequences using fourier transformation","volume":"20","author":"Sharma","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012508073304500_B46","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1016\/S1570-8667(03)00011-X","article-title":"The consensus string problem for a metric is np-complete","volume":"1","author":"Sim","year":"2003","journal-title":"J. Discrete Algorithms"},{"article-title":"Repeatmasker open-3.0.","year":"1996","author":"Smit","key":"2023012508073304500_B47"},{"key":"2023012508073304500_B48","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1093\/bioinformatics\/btl309","article-title":"Tandem repeats over the edit distance","volume":"23","author":"Sokol","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508073304500_B49","article-title":"Tandem repeat detection using pattern discovery with applications to the identification of yeast satellites","volume-title":"Technical Report RC21508","author":"Stolovitzky","year":"1999"},{"key":"2023012508073304500_B50","doi-asserted-by":"crossref","first-page":"3579","DOI":"10.1093\/hmg\/ddp306","article-title":"Rare pathogenic microdeletions and tandem duplications are microhomology-mediated and stimulated by local genomic architecture","volume":"18","author":"Vissers","year":"2009","journal-title":"Hum. Mol. Genet."},{"key":"2023012508073304500_B51","doi-asserted-by":"crossref","first-page":"4253","DOI":"10.1128\/JB.00001-06","article-title":"Effect of repeat copy number on variable-number tandem repeat mutations in Escherichia coli O157:H7","volume":"188","author":"Vogler","year":"2006","journal-title":"J. Bacteriol."},{"key":"2023012508073304500_B52","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1186\/1471-2164-9-533","article-title":"Analysis of the largest tandemly repeated DNA families in the human genome","volume":"9","author":"Warburton","year":"2008","journal-title":"BMC Genomics"},{"key":"2023012508073304500_B53","doi-asserted-by":"crossref","first-page":"1625","DOI":"10.1096\/fj.07-097857","article-title":"DNA triplexes and Friedreich ataxia","volume":"22","author":"Wells","year":"2008","journal-title":"FASEB J."},{"key":"2023012508073304500_B54","first-page":"223","article-title":"Finding approximate tandem repeats in genomic sequences","volume-title":"Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology (RECOMB 2004)","author":"Wexler","year":"2004"},{"key":"2023012508073304500_B55","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1089\/cmb.2005.12.928","article-title":"Finding approximate tandem repeats in genomic sequences","volume":"12","author":"Wexler","year":"2005","journal-title":"J. Comput. Biol."},{"key":"2023012508073304500_B56","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1038\/ng0294-152","article-title":"Instability of short tandem repeats (microsatellites) in human cancers","volume":"6","author":"Wooster","year":"1994","journal-title":"Nat. Genet."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/12\/i358\/48854855\/bioinformatics_26_12_i358.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/12\/i358\/48854855\/bioinformatics_26_12_i358.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T18:29:07Z","timestamp":1740162547000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/12\/i358\/285178"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,6,1]]},"references-count":56,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2010,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq209","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2010,6,15]]},"published":{"date-parts":[[2010,6,1]]}}}