{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T04:32:31Z","timestamp":1775190751181,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,1,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Matches to repetitive sequences are usually undesirable in the output of DNA database searches. Repetitive sequences need not be matched to a query, if they can be masked in the database. RepeatMasker\/Maskeraid (RM), currently the most widely used software for DNA sequence masking, is slow and requires a library of repetitive template sequences, such as a manually curated RepBase library, that may not exist for newly sequenced genomes.<\/jats:p>\n               <jats:p>Results: We have developed a software tool called WindowMasker (WM) that identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself. WM is orders of magnitude faster than RM because WM uses a few linear-time scans of the genome sequence, rather than local alignment methods that compare each library sequence with each piece of the genome. We validate WM by comparing BLAST outputs from large sets of queries applied to two versions of the same genome, one masked by WM, and the other masked by RM. Even for genomes such as the human genome, where a good RepBase library is available, searching the database as masked with WM yields more matches that are apparently non-repetitive and fewer matches to repetitive sequences. We show that these results hold for transcribed regions as well. WM also performs well on genomes for which much of the sequence was in draft form at the time of the analysis.<\/jats:p>\n               <jats:p>Availability: WM is included in the NCBI C++ toolkit. The source code for the entire toolkit is available at . Once the toolkit source is unpacked, the instructions for building WindowMasker application in the UNIX environment can be found in file src\/app\/winmasker\/README.build.<\/jats:p>\n               <jats:p>Contact: \u00a0richa@helix.nih.gov<\/jats:p>\n               <jats:p>Supplementary information: Supplementary data are available at .<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti774","type":"journal-article","created":{"date-parts":[[2005,11,16]],"date-time":"2005-11-16T03:08:21Z","timestamp":1132110501000},"page":"134-141","source":"Crossref","is-referenced-by-count":291,"title":["WindowMasker: window-based masker for sequenced genomes"],"prefix":"10.1093","volume":"22","author":[{"given":"Aleksandr","family":"Morgulis","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services \u00a0 Building 38A, Room 1003N, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]},{"given":"E. Michael","family":"Gertz","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services \u00a0 Building 38A, Room 1003N, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]},{"given":"Alejandro A.","family":"Sch\u00e4ffer","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services \u00a0 Building 38A, Room 1003N, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]},{"given":"Richa","family":"Agarwala","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services \u00a0 Building 38A, Room 1003N, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]}],"member":"286","published-online":{"date-parts":[[2005,11,15]]},"reference":[{"key":"2023012408301935200_b1","article-title":"Sputnik","author":"Abajian","year":"1994"},{"key":"2023012408301935200_b2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST\u2014a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012408301935200_b3","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1101\/gr.88502","article-title":"Automated de novo identification of repeat sequence families in sequenced genomes","volume":"12","author":"Bao","year":"2002","journal-title":"Genome Res."},{"key":"2023012408301935200_b4","doi-asserted-by":"crossref","first-page":"1040","DOI":"10.1093\/bioinformatics\/16.11.1040","article-title":"Maskeraid: a performance enhancement to repeatmasker","volume":"16","author":"Bedell","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012408301935200_b5","doi-asserted-by":"crossref","first-page":"D23","DOI":"10.1093\/nar\/gkh045","article-title":"Genbank: update","volume":"32","author":"Benson","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012408301935200_b6","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem Repeats Finder: a program to analyze DNA sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023012408301935200_b7","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1186\/1471-2105-4-22","article-title":"Poly: a quantitative analysis tool for simple sequence repeat (SSR) tracts in DNA","volume":"4","author":"Bizzaro","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023012408301935200_b8","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1093\/bioinformatics\/bti039","article-title":"RAP: a new computer program for de novo identification of repeated sequences in whole genomes","volume":"21","author":"Campagna","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012408301935200_b9","doi-asserted-by":"crossref","first-page":"2812","DOI":"10.1093\/bioinformatics\/bth335","article-title":"Star: an algorithm to search for tandem and approximate repeats","volume":"20","author":"Delgrange","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012408301935200_b10","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1023\/A:1016028332006","article-title":"Genome size and the accumulation of simple sequence repeats: implications of new data from sequencing projects","volume":"115","author":"Hancock","year":"2002","journal-title":"Genetica"},{"key":"2023012408301935200_b11","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1101\/gr.2264004","article-title":"The Atlas genome assembly system","volume":"14","author":"Havlak","year":"2004","journal-title":"Genome Res."},{"key":"2023012408301935200_b12","doi-asserted-by":"crossref","first-page":"2306","DOI":"10.1101\/gr.1350803","article-title":"Annotating large genomes with exact word matches","volume":"13","author":"Healy","year":"2003","journal-title":"Genome Res."},{"key":"2023012408301935200_b13","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1016\/S0168-9525(00)02093-X","article-title":"Repbace update: a database and an electronic journal of repetitive elements","volume":"16","author":"Jurka","year":"2000","journal-title":"Trends Genet."},{"key":"2023012408301935200_b14","article-title":"Genome assembly and annotation process","volume-title":"The NCBI Handbook","author":"Kitts","year":"2003"},{"key":"2023012408301935200_b15","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1016\/0092-8674(88)90383-2","article-title":"The complete sequence of dystrophin predicts a rod-shaped cytokeletal protein","volume":"53","author":"Koenig","year":"1988","journal-title":"Cell"},{"key":"2023012408301935200_b16","doi-asserted-by":"crossref","first-page":"4633","DOI":"10.1093\/nar\/29.22.4633","article-title":"Reputer: the manifold applications of repeat analysis on a genomic scale","volume":"29","author":"Kurtz","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012408301935200_b17","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1093\/bioinformatics\/btf843","article-title":"Forrepeats: detects repeats on entire chromosomes and between genomes","volume":"19","author":"Lefebvre","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012408301935200_b18","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1093\/bioinformatics\/18.3.440","article-title":"Patternhunter: faster and more sensitive homology search","volume":"18","author":"Ma","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012408301935200_b19","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1038\/ng822","article-title":"Microsatellites are preferentially associated with non-repetitive DNA in plant genomes","volume":"30","author":"Morgante","year":"2002","journal-title":"Nat. Genet."},{"key":"2023012408301935200_b20","doi-asserted-by":"crossref","first-page":"1786","DOI":"10.1101\/gr.2395204","article-title":"De novo repeat classification and fragment assembly","volume":"14","author":"Pevzner","year":"2004","journal-title":"Genome Res."},{"key":"2023012408301935200_b21","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1126\/science.6281889","article-title":"The Alu family of dispersed repetitive sequences","volume":"216","author":"Schmid","year":"1982","journal-title":"Science"},{"key":"2023012408301935200_b22","article-title":"Repeatmasker","author":"Smit","year":"1996"},{"key":"2023012408301935200_b23","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1006\/jmbi.1994.0095","article-title":"Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences","volume":"246","author":"Smit","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012408301935200_b24","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1101\/gr.165102","article-title":"RePs: a sequence assembler that masks exact repeats identified from shotgun data","volume":"12","author":"Wang","year":"2002","journal-title":"Genome Res."},{"key":"2023012408301935200_b25","first-page":"388","article-title":"Abundant class of human DNA plymorphisms which can be typed using the polymerase chain reaction","volume":"44","author":"Weber","year":"1989","journal-title":"Am. J. Hum. Genet."},{"key":"2023012408301935200_b26","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1089\/cmb.2005.12.928","article-title":"Finding approximate tandem repeats in genomic sequences","volume":"12","author":"Wexler","year":"2005","journal-title":"J. Comp. Biol."},{"key":"2023012408301935200_b27","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1089\/10665270050081478","article-title":"A greedy algorithm for aligning DNA sequences","volume":"7","author":"Zhang","year":"2000","journal-title":"J. Comp. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/2\/134\/48838048\/bioinformatics_22_2_134.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/2\/134\/48838048\/bioinformatics_22_2_134.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T08:34:03Z","timestamp":1674549243000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/2\/134\/424703"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,11,15]]},"references-count":27,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2006,1,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti774","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,1,15]]},"published":{"date-parts":[[2005,11,15]]}}}