{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T03:26:57Z","timestamp":1764905217729},"reference-count":16,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: DNA repeats are a common feature of most genomic sequences. Their de novo identification is still difficult despite being a crucial step in genomic analysis and oligonucleotides design. Several efficient algorithms based on word counting are available, but too short words decrease specificity while long words decrease sensitivity, particularly in degenerated repeats.<\/jats:p><jats:p>Results: The Repeat Analysis Program (RAP) is based on a new word-counting algorithm optimized for high resolution repeat identification using gapped words. Many different overlapping gapped words can be counted at the same genomic position, thus producing a better signal than the single ungapped word. This results in better specificity both in terms of low-frequency detection, being able to identify sequences repeated only once, and highly divergent detection, producing a generally high score in most intron sequences.<\/jats:p><jats:p>Availability: The program is freely available for non-profit organizations, upon request to the authors.<\/jats:p><jats:p>Contact: \u00a0giorgio.valle@unipd.it<\/jats:p><jats:p>Supplementary information: The program has been tested on the Caenorhabditis elegans genome using word lengths of 12, 14 and 16 bases. The full analysis has been implemented in the UCSC Genome Browser and is accessible at http:\/\/genome.cribi.unipd.it.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti039","type":"journal-article","created":{"date-parts":[[2004,9,17]],"date-time":"2004-09-17T00:13:37Z","timestamp":1095380017000},"page":"582-588","source":"Crossref","is-referenced-by-count":37,"title":["RAP: a new computer program for de novo identification of repeated sequences in whole genomes"],"prefix":"10.1093","volume":"21","author":[{"given":"Davide","family":"Campagna","sequence":"first","affiliation":[{"name":"CRIBI, Universit\u00e0 degli Studi di Padova via Ugo Bassi 58b, I-35121 Padova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chiara","family":"Romualdi","sequence":"additional","affiliation":[{"name":"CRIBI, Universit\u00e0 degli Studi di Padova via Ugo Bassi 58b, I-35121 Padova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicola","family":"Vitulo","sequence":"additional","affiliation":[{"name":"CRIBI, Universit\u00e0 degli Studi di Padova via Ugo Bassi 58b, I-35121 Padova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Micky","family":"Del Favero","sequence":"additional","affiliation":[{"name":"CRIBI, Universit\u00e0 degli Studi di Padova via Ugo Bassi 58b, I-35121 Padova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matej","family":"Lexa","sequence":"additional","affiliation":[{"name":"CRIBI, Universit\u00e0 degli Studi di Padova via Ugo Bassi 58b, I-35121 Padova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicola","family":"Cannata","sequence":"additional","affiliation":[{"name":"CRIBI, Universit\u00e0 degli Studi di Padova via Ugo Bassi 58b, I-35121 Padova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Giorgio","family":"Valle","sequence":"additional","affiliation":[{"name":"CRIBI, Universit\u00e0 degli Studi di Padova via Ugo Bassi 58b, I-35121 Padova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2004,9,16]]},"reference":[{"key":"2023013107212244900_B1","doi-asserted-by":"crossref","unstructured":"Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. 1990Basic local alignment search tool. J. Mol. Biol.215403\u2013410","DOI":"10.1016\/S0022-2836(05)80360-2"},{"key":"2023013107212244900_B2","doi-asserted-by":"crossref","unstructured":"Bao, Z. and Eddy, S.R. 2002Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res.121269\u20131276","DOI":"10.1101\/gr.88502"},{"key":"2023013107212244900_B3","unstructured":"Bedell, J.A., Korf, I., Gish, W. 2000MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics161040\u20131041"},{"key":"2023013107212244900_B4","doi-asserted-by":"crossref","unstructured":"Benson, G. 1999Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res.27573\u2013580","DOI":"10.1093\/nar\/27.2.573"},{"key":"2023013107212244900_B5","unstructured":"Technical Report 124. Burrows, M. and Wheeler, D.J. 1994A block sorting lossless data compression algorithm. , Palo Alto, CA Digital Equipment Corporation"},{"key":"2023013107212244900_B6","doi-asserted-by":"crossref","unstructured":"Healy, J., Thomas, E.E., Schwartz, J.T., Wiegler, M. 2003Annotating large genomes with exact word matches. Genome Res.132306\u20132315","DOI":"10.1101\/gr.1350803"},{"key":"2023013107212244900_B7","doi-asserted-by":"crossref","unstructured":"Jurka, J., Walichiewicz, J., Milosavljevic, A. 1992Prototypic sequences for human repetitive DNA. J. Mol. Evol.35286\u2013291","DOI":"10.1007\/BF00161166"},{"key":"2023013107212244900_B8","doi-asserted-by":"crossref","unstructured":"Jurka, J., Klonowski, P., Dagman, V., Pelton, P. 1996CENSOR\u2014a program for identification and elimination of repetitive elements from DNA sequences. Comput. Chem.20119\u2013122","DOI":"10.1016\/S0097-8485(96)80013-1"},{"key":"2023013107212244900_B9","doi-asserted-by":"crossref","unstructured":"Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D. 2002The human genome browser at UCSC. Genome Res.12996\u20131006","DOI":"10.1101\/gr.229102"},{"key":"2023013107212244900_B10","unstructured":"Kurtz, S. 1999Reducing the space requirement for suffix trees. Software Pract. Esperince291149\u20131171"},{"key":"2023013107212244900_B11","unstructured":"Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R. 2001REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res.294633\u20134642"},{"key":"2023013107212244900_B12","unstructured":"Lefebvre, A., Lecroq, T., Dauchel, H., Alexandre, J. 2002FORRepeats: detects repeats on entire chromosomes and between genomes. Bioinformatics19319\u2013326"},{"key":"2023013107212244900_B13","unstructured":"Manber, U. and Myers, E.W. 1993Suffix array: a new method for on-line string searches. SIAM Journal of Computing22935\u2013948"},{"key":"2023013107212244900_B14","doi-asserted-by":"crossref","unstructured":"McCreight, E.M. 1976A space-economical suffix tree construction algorithm. J. Algorithms23262\u2013272","DOI":"10.1145\/321941.321946"},{"key":"2023013107212244900_B15","doi-asserted-by":"crossref","unstructured":"Valle, G. 1993Discover 1: a new program to search for unusually represented DNA motifs. Nucleic Acids Res.215152\u20135156","DOI":"10.1093\/nar\/21.22.5152"},{"key":"2023013107212244900_B16","doi-asserted-by":"crossref","unstructured":"Volfovsky, N., Haas, B.J., Salzberg, S.L. 2001A clustering method for repeat analysis in DNA sequences. Genome Biol.2RESEARCH0027","DOI":"10.1186\/gb-2001-2-8-research0027"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/5\/582\/48962411\/bioinformatics_21_5_582.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/5\/582\/48962411\/bioinformatics_21_5_582.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,29]],"date-time":"2023-04-29T09:48:48Z","timestamp":1682761728000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/5\/582\/219976"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,9,16]]},"references-count":16,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2005,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti039","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2005,3,1]]},"published":{"date-parts":[[2004,9,16]]}}}