{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T15:16:04Z","timestamp":1764688564774},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Summary: Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels.<\/jats:p><jats:p>Availability: \u00a0<\/jats:p><jats:p>Contact: \u00a0achaz@abi.snv.jussieu.fr \u00a0<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl519","type":"journal-article","created":{"date-parts":[[2006,10,13]],"date-time":"2006-10-13T07:47:15Z","timestamp":1160725635000},"page":"119-121","source":"Crossref","is-referenced-by-count":70,"title":["Repseek, a tool to retrieve approximate repeats from large DNA sequences"],"prefix":"10.1093","volume":"23","author":[{"given":"Guillaume","family":"Achaz","sequence":"first","affiliation":[{"name":"Atelier de Bioinformatique, Universit\u00e9 Pierre et Marie Curie-Paris 6 1 \u00a0 1 \u00a0 \u00a0 12, rue Cuvier, 75005 Paris, France"},{"name":"UMR 7138 Syst\u00e9matique, Adaptation, Evolution, Universit\u00e9 Pierre et Marie Curie-Paris 6, B\u00e2timent A 2 \u00a0 2 \u00a0 \u00a0 7, quai St Bernard, 75252 Paris Cedex 05, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fr\u00e9d\u00e9ric","family":"Boyer","sequence":"additional","affiliation":[{"name":"INRIA-Rh\u00f4ne Alpes projet HELIX, 655, avenue de l'Europe, Montbonnot 3 \u00a0 3 \u00a0 \u00a0 38334 Saint Ismier Cedex, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eduardo P. C.","family":"Rocha","sequence":"additional","affiliation":[{"name":"Atelier de Bioinformatique, Universit\u00e9 Pierre et Marie Curie-Paris 6 1 \u00a0 1 \u00a0 \u00a0 12, rue Cuvier, 75005 Paris, France"},{"name":"Unit\u00e9 G\u00e9n\u00e9tique des G\u00e9nomes Bact\u00e9riens, Institut Pasteur 4 \u00a0 4 \u00a0 \u00a0 28, rue du Dr Roux, 75724 Paris Cedex 15, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alain","family":"Viari","sequence":"additional","affiliation":[{"name":"INRIA-Rh\u00f4ne Alpes projet HELIX, 655, avenue de l'Europe, Montbonnot 3 \u00a0 3 \u00a0 \u00a0 38334 Saint Ismier Cedex, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eric","family":"Coissac","sequence":"additional","affiliation":[{"name":"INRIA-Rh\u00f4ne Alpes projet HELIX, 655, avenue de l'Europe, Montbonnot 3 \u00a0 3 \u00a0 \u00a0 38334 Saint Ismier Cedex, France"},{"name":"UMR 5163 LAPM, Universit\u00e9 Joseph Fourier, BP 53 5 \u00a0 5 \u00a0 \u00a0 38041 Grenoble Cedex 9, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2006,10,11]]},"reference":[{"key":"2023041105093193300_","first-page":"449","article-title":"The enhanced suffix array and its applications to genome analysis","author":"Abouelhoda","year":"2002"},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"1279","DOI":"10.1093\/genetics\/164.4.1279","article-title":"Associations between inverted repeats and the structural evolution of bacterial genomes","volume":"164","author":"Achaz","year":"2003","journal-title":"Genetics"},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped blast and psi-blast: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1101\/gr.88502","article-title":"Automated de novo identification of repeat sequence families in sequenced genomes","volume":"12","author":"Bao","year":"2002","journal-title":"Genome Res."},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"5873","DOI":"10.1073\/pnas.90.12.5873","article-title":"Applications and statistics for multiple high-scoring segments in molecular sequences","volume":"90","author":"Karlin","year":"1993","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041105093193300_","first-page":"225","article-title":"Maximal segmental match length among random sequences from a finite alphabet","volume-title":"Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer","author":"Karlin","year":"1985"},{"key":"2023041105093193300_","first-page":"125","article-title":"Rapid identification of repeated patterns in strings, trees and array","author":"Karp","year":"1972"},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1093\/bioinformatics\/15.5.426","article-title":"Reputer: fast computation of maximal repeats in complete genomes","volume":"15","author":"Kurtz","year":"1999","journal-title":"Bioinformatics"},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"1786","DOI":"10.1101\/gr.2395204","article-title":"De novo repeat classification and fragment assembly","volume":"14","author":"Pevzner","year":"2004","journal-title":"Genome Res."},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"i351","DOI":"10.1093\/bioinformatics\/bti1018","article-title":"De novo identification of repeat families in large genomes","volume":"21","author":"Price","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023041105093193300_","author":"Smit","year":"1996"},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"715","DOI":"10.1093\/bioinformatics\/14.8.715","article-title":"A strategy for finding regions of similarity in complete genome sequences","volume":"14","author":"Vincens","year":"1998","journal-title":"Bioinformatics"},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1016\/0022-2836(87)90478-5","article-title":"A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons","volume":"197","author":"Waterman","year":"1987","journal-title":"J. Mol. Biol."},{"key":"2023041105093193300_","doi-asserted-by":"crossref","first-page":"4625","DOI":"10.1073\/pnas.91.11.4625","article-title":"Rapid and accurate estimates of statistical significance for sequence data base searches","volume":"91","author":"Waterman","year":"1994","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/1\/119\/49816208\/bioinformatics_23_1_119.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/1\/119\/49816208\/bioinformatics_23_1_119.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,9]],"date-time":"2023-05-09T09:56:04Z","timestamp":1683626164000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/1\/119\/188897"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,10,11]]},"references-count":15,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl519","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,1,1]]},"published":{"date-parts":[[2006,10,11]]}}}