{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T08:26:53Z","timestamp":1772180813509,"version":"3.50.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"17","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies.<\/jats:p>\n               <jats:p>Results: \u00a0k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.<\/jats:p>\n               <jats:p>Contact: \u00a0macas@umbr.cas.cz<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq343","type":"journal-article","created":{"date-parts":[[2010,7,11]],"date-time":"2010-07-11T20:55:14Z","timestamp":1278881714000},"page":"2101-2108","source":"Crossref","is-referenced-by-count":36,"title":["Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data"],"prefix":"10.1093","volume":"26","author":[{"given":"Ji\u0159\u00ed","family":"Macas","sequence":"first","affiliation":[{"name":"Institute of Plant Molecular Biology, Biology Centre ASCR 1 \u00a0 , Branisovska 31, CZ-37005, Ceske Budejovice, Czech Republic and 2 Department of Horticulture, University of Wisconsin-Madison, WI 53706, USA"}]},{"given":"Pavel","family":"Neumann","sequence":"additional","affiliation":[{"name":"Institute of Plant Molecular Biology, Biology Centre ASCR 1 \u00a0 , Branisovska 31, CZ-37005, Ceske Budejovice, Czech Republic and 2 Department of Horticulture, University of Wisconsin-Madison, WI 53706, USA"}]},{"given":"Petr","family":"Nov\u00e1k","sequence":"additional","affiliation":[{"name":"Institute of Plant Molecular Biology, Biology Centre ASCR 1 \u00a0 , Branisovska 31, CZ-37005, Ceske Budejovice, Czech Republic and 2 Department of Horticulture, University of Wisconsin-Madison, WI 53706, USA"}]},{"given":"Jiming","family":"Jiang","sequence":"additional","affiliation":[{"name":"Institute of Plant Molecular Biology, Biology Centre ASCR 1 \u00a0 , Branisovska 31, CZ-37005, Ceske Budejovice, Czech Republic and 2 Department of Horticulture, University of Wisconsin-Madison, WI 53706, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,7,8]]},"reference":[{"key":"2023012508000976100_B1","doi-asserted-by":"crossref","first-page":"1807","DOI":"10.1371\/journal.pcbi.0030181","article-title":"Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data","volume":"3","author":"Alkan","year":"2007","journal-title":"PLoS Comput. Biol."},{"key":"2023012508000976100_B2","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1038\/nrg2466","article-title":"Epigenetic regulation of centromeric chromatin: old dogs, new tricks?","volume":"9","author":"Allshire","year":"2008","journal-title":"Nat. Rev. Genet."},{"key":"2023012508000976100_B3","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012508000976100_B4","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1007\/s00412-008-0181-5","article-title":"A new class of retroviral and satellite encoded small RNAs emanates from mammalian centromeres","volume":"118","author":"Carone","year":"2009","journal-title":"Chromosoma"},{"key":"2023012508000976100_B5","doi-asserted-by":"crossref","first-page":"1691","DOI":"10.1105\/tpc.003079","article-title":"Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon","volume":"14","author":"Cheng","year":"2002","journal-title":"Plant Cell"},{"key":"2023012508000976100_B6","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1086\/419073","article-title":"Concerted evolution of repetitive DNA-sequences in eukaryotes","volume":"70","author":"Elder","year":"1995","journal-title":"Q. Rev. Biol."},{"key":"2023012508000976100_B7","first-page":"543","article-title":"SEAVIEW and PHYLOWIN: two graphic tools for sequence alignment and molecular phylogeny","volume":"12","author":"Galtier","year":"1996","journal-title":"Comput. Appl. Biosci."},{"key":"2023012508000976100_B8","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1126\/science.1068275","article-title":"A draft sequence of the rice genome (Oryza sativa L. ssp. japonica)","volume":"296","author":"Goff","year":"2002","journal-title":"Science"},{"key":"2023012508000976100_B9","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1093\/nar\/1.4.559","article-title":"Fractionation and characterization of satellite DNAs of the kangaroo rat (Dipodomys ordii)","volume":"1","author":"Hacch","year":"1974","journal-title":"Nucleic Acids Res."},{"key":"2023012508000976100_B10","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1101\/gr.593403","article-title":"Centromere satellites from Arabidopsis populations: maintenance of conserved and variable domains","volume":"13","author":"Hall","year":"2003","journal-title":"Genome Res."},{"key":"2023012508000976100_B11","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1105\/tpc.11.1.31","article-title":"Polymorphisms and genomic organization of repetitive DNA from centromeric regions of Arabidopsis chromosomes","volume":"11","author":"Heslop-Harrison","year":"1999","journal-title":"Plant Cell"},{"key":"2023012508000976100_B12","doi-asserted-by":"crossref","first-page":"R143","DOI":"10.1186\/gb-2007-8-7-r143","article-title":"Accuracy and quality of massively parallel DNA pyrosequencing","volume":"8","author":"Huse","year":"2007","journal-title":"Genome Biol."},{"key":"2023012508000976100_B13","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1007\/BF00291993","article-title":"Origin of the main class of repetitive DNA within selected Pennisetum species","volume":"238","author":"Ingham","year":"1993","journal-title":"Mol. Gen. Genet."},{"key":"2023012508000976100_B14","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1093\/bioinformatics\/btm039","article-title":"Gepard: a rapid and sensitive tool for creating dotplots on genome scale","volume":"23","author":"Krumsiek","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508000976100_B15","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1186\/1471-2164-9-517","article-title":"A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes","volume":"9","author":"Kurtz","year":"2008","journal-title":"BMC Genomics"},{"key":"2023012508000976100_B16","doi-asserted-by":"crossref","first-page":"2505","DOI":"10.1093\/molbev\/msl127","article-title":"Transcription and evolutionary dynamics of the centromeric satellite repeat CentO in rice","volume":"23","author":"Lee","year":"2006","journal-title":"Mol. Biol. Evol."},{"key":"2023012508000976100_B17","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1371\/journal.pcbi.0010043","article-title":"ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun","volume":"1","author":"Li","year":"2005","journal-title":"PLoS Comput. Biol."},{"key":"2023012508000976100_B18","doi-asserted-by":"crossref","first-page":"1047","DOI":"10.1139\/g06-055","article-title":"Exact word matches in rice pseudomolecules","volume":"49","author":"Liu","year":"2006","journal-title":"Genome"},{"key":"2023012508000976100_B19","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1101\/gr.4583106","article-title":"Retrotransposon accumulation and satellite amplification mediated by segmental duplication facilitate centromere expansion in rice","volume":"16","author":"Ma","year":"2006","journal-title":"Genome Res."},{"key":"2023012508000976100_B20","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1007\/s004380000245","article-title":"Two new families of tandem repeats isolated from genus Vicia using genomic self-priming PCR","volume":"263","author":"Macas","year":"2000","journal-title":"Mol. Gen. Genet."},{"key":"2023012508000976100_B21","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1093\/bioinformatics\/18.1.28","article-title":"PlantSat: a specialized database for plant satellite repeats","volume":"18","author":"Macas","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012508000976100_B22","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1007\/s00412-003-0255-3","article-title":"Sequence subfamilies of satellite repeats related to rDNA intergenic spacer are differentially amplified on Vicia sativa chromosomes","volume":"112","author":"Macas","year":"2003","journal-title":"Chromosoma"},{"key":"2023012508000976100_B23","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1007\/s00412-006-0070-8","article-title":"Sequence homogenization and chromosomal localization of VicTR-B satellites differ between closely related Vicia species","volume":"115","author":"Macas","year":"2006","journal-title":"Chromosoma"},{"key":"2023012508000976100_B24","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1186\/1471-2164-8-427","article-title":"Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula","volume":"8","author":"Macas","year":"2007","journal-title":"BMC Genomics"},{"key":"2023012508000976100_B25","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1104\/pp.109.142612","article-title":"Gene content and virtual gene order of barley chromosome 1H","volume":"151","author":"Mayer","year":"2009","journal-title":"Plant Physiol."},{"key":"2023012508000976100_B26","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1038\/nbt1291","article-title":"An expression atlas of rice mRNAs and small RNAs","volume":"25","author":"Nobuta","year":"2007","journal-title":"Nat. Biotechnol."},{"key":"2023012508000976100_B27","doi-asserted-by":"crossref","first-page":"2024","DOI":"10.1101\/gr.080200.108","article-title":"Sequencing of natural strains of Arabidopsis thaliana with short reads","volume":"18","author":"Ossowski","year":"2008","journal-title":"Genome Res."},{"key":"2023012508000976100_B28","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/S0378-1119(97)00402-2","article-title":"Conservation of satellite DNA in species of the genus Pimelia (Tenebrionidae, Coleoptera)","volume":"205","author":"Pons","year":"1997","journal-title":"Gene"},{"key":"2023012508000976100_B29","volume-title":"R: A Language and Environment for Statistical Computing.","author":"R Development Core Team","year":"2009"},{"key":"2023012508000976100_B30","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/11780441_15","article-title":"Subsequence combinatorics and applications to microarray production, DNA sequencing and chaining algorithms","volume-title":"Combinatorial Pattern Matching.","author":"Rahmann","year":"2006"},{"key":"2023012508000976100_B31","doi-asserted-by":"crossref","first-page":"1815","DOI":"10.1101\/gr.451502","article-title":"Evidence for a fast, intrachromosomal conversion mechanism from mapping of nucleotide variants within a homogeneous alpha-satellite DNA array","volume":"12","author":"Schindelhauer","year":"2002","journal-title":"Genome Res."},{"key":"2023012508000976100_B32","doi-asserted-by":"crossref","first-page":"6097","DOI":"10.1093\/nar\/18.20.6097","article-title":"Sequence logos: a new way to display consensus sequences","volume":"18","author":"Schneider","year":"1990","journal-title":"Nucleic Acids Res."},{"key":"2023012508000976100_B33","doi-asserted-by":"crossref","first-page":"1135","DOI":"10.1038\/nbt1486","article-title":"Next-generation DNA sequencing","volume":"26","author":"Shendure","year":"2008","journal-title":"Nat. Biotechnol."},{"key":"2023012508000976100_B34","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1139\/g01-029","article-title":"Instability of bacterial artificial chromosome (BAC) clones containing tandemly repeated DNA sequences","volume":"44","author":"Song","year":"2001","journal-title":"Genome"},{"key":"2023012508000976100_B35","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1534\/genetics.105.041087","article-title":"Sobo, a recently amplified satellite repeat of potato, and its implications for the origin of tandemly repeated sequences","volume":"170","author":"Tek","year":"2005","journal-title":"Genetics"},{"key":"2023012508000976100_B36","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison\u2014a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508000976100_B37","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1186\/1471-2164-9-518","article-title":"Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats","volume":"9","author":"Wicker","year":"2008","journal-title":"BMC Genomics"},{"key":"2023012508000976100_B38","doi-asserted-by":"crossref","first-page":"e286","DOI":"10.1371\/journal.pbio.0060286","article-title":"Intergenic locations of rice centromeric chromatin","volume":"6","author":"Yan","year":"2008","journal-title":"PLoS Biol."},{"key":"2023012508000976100_B39","doi-asserted-by":"crossref","first-page":"e33","DOI":"10.1093\/nar\/gkn075","article-title":"Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction","volume":"36","author":"Yang","year":"2008","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/17\/2101\/48853021\/bioinformatics_26_17_2101.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/17\/2101\/48853021\/bioinformatics_26_17_2101.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:00:42Z","timestamp":1674633642000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/17\/2101\/198395"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,7,8]]},"references-count":39,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2010,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq343","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,9,1]]},"published":{"date-parts":[[2010,7,8]]}}}