{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T16:04:29Z","timestamp":1773936269687,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, <jats:italic>Pisum sativum<\/jats:italic> and <jats:italic>Glycine max<\/jats:italic>, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, <jats:italic>SeqGrapheR<\/jats:italic>, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-11-378","type":"journal-article","created":{"date-parts":[[2010,7,15]],"date-time":"2010-07-15T18:14:18Z","timestamp":1279217658000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":404,"title":["Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data"],"prefix":"10.1186","volume":"11","author":[{"given":"Petr","family":"Nov\u00e1k","sequence":"first","affiliation":[]},{"given":"Pavel","family":"Neumann","sequence":"additional","affiliation":[]},{"given":"Ji\u0159\u00ed","family":"Macas","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,7,15]]},"reference":[{"key":"3835_CR1","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/j.nbt.2008.12.009","volume":"25","author":"WJ Ansorge","year":"2009","unstructured":"Ansorge WJ: Next-generation DNA sequencing techniques. New Biotechnol 2009, 25: 195\u2013203. 10.1016\/j.nbt.2008.12.009","journal-title":"New Biotechnol"},{"key":"3835_CR2","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1038\/nbt1486","volume":"26","author":"J Shendure","year":"2008","unstructured":"Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotechnol 2008, 26: 1135\u20131145. 10.1038\/nbt1486","journal-title":"Nat Biotechnol"},{"key":"3835_CR3","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1038\/nmeth1156","volume":"5","author":"SC Schuster","year":"2008","unstructured":"Schuster SC: Next-generation sequencing transforms today's biology. Nat Methods 2008, 5: 16\u201318. 10.1038\/nmeth1156","journal-title":"Nat Methods"},{"key":"3835_CR4","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1007\/BF01792422","volume":"17","author":"MG Murray","year":"1981","unstructured":"Murray MG, Peters DL, Thompson WF: Ancient repeated sequences in the pea and mung bean genomes and implications for genome evolution. J Mol Evol 1981, 17: 31\u201342. 10.1007\/BF01792422","journal-title":"J Mol Evol"},{"key":"3835_CR5","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1007\/BF00485947","volume":"12","author":"RB Flavell","year":"1974","unstructured":"Flavell RB, Bennett MD, Smith JB, Smith DB: Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochem Genet 1974, 12: 257\u2013269. 10.1007\/BF00485947","journal-title":"Biochem Genet"},{"key":"3835_CR6","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1186\/1471-2164-8-427","volume":"8","author":"J Macas","year":"2007","unstructured":"Macas J, Neumann P, Navratilova A: Repetitive DNA in the pea ( Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula . BMC Genomics 2007, 8: 427. 10.1186\/1471-2164-8-427","journal-title":"BMC Genomics"},{"key":"3835_CR7","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1186\/1471-2164-8-132","volume":"8","author":"K Swaminathan","year":"2007","unstructured":"Swaminathan K, Varala K, Hudson ME: Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey. BMC Genomics 2007, 8: 132. 10.1186\/1471-2164-8-132","journal-title":"BMC Genomics"},{"key":"3835_CR8","doi-asserted-by":"publisher","first-page":"518","DOI":"10.1186\/1471-2164-9-518","volume":"9","author":"T Wicker","year":"2008","unstructured":"Wicker T, Narechania A, Sabot F, Stein J, Vu GTH, Graner A, Ware D, Stein N: Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats. BMC Genomics 2008, 9: 518. 10.1186\/1471-2164-9-518","journal-title":"BMC Genomics"},{"key":"3835_CR9","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1093\/bioinformatics\/btg034","volume":"19","author":"G Pertea","year":"2003","unstructured":"Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 2003, 19: 651\u2013652. 10.1093\/bioinformatics\/btg034","journal-title":"Bioinformatics"},{"key":"3835_CR10","volume-title":"R: A Language and Environment for Statistical Computing","author":"R Development Core Team","year":"2009","unstructured":"R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2009."},{"key":"3835_CR11","first-page":"1695","volume-title":"InterJournal","author":"G Csardi","year":"2006","unstructured":"Csardi G, Nepusz T: The igraph Software Package for Complex Network Research. InterJournal 2006, 1695. Complex Systems Complex Systems"},{"key":"3835_CR12","unstructured":"The R project for statistical computing[http:\/\/www.r-project.org]"},{"key":"3835_CR13","doi-asserted-by":"publisher","first-page":"066111","DOI":"10.1103\/PhysRevE.70.066111","volume":"70","author":"A Clauset","year":"2004","unstructured":"Clauset A, Newman MEJ, Moore C: Finding community structure in very large networks. Phys Rev E 2004, 70: 066111. 10.1103\/PhysRevE.70.066111","journal-title":"Phys Rev E"},{"key":"3835_CR14","doi-asserted-by":"publisher","first-page":"7821","DOI":"10.1073\/pnas.122653799","volume":"99","author":"M Girvan","year":"2002","unstructured":"Girvan M, Newman MEJ: Community structure in social and biological networks. P Natl Acad Sci USA 2002, 99: 7821\u20137826. 10.1073\/pnas.122653799","journal-title":"P Natl Acad Sci USA"},{"key":"3835_CR15","doi-asserted-by":"publisher","first-page":"026113","DOI":"10.1103\/PhysRevE.69.026113","volume":"69","author":"MEJ Newman","year":"2004","unstructured":"Newman MEJ, Girvan M: Finding and evaluating community structure in networks. Phys Rev E 2004, 69: 026113. 10.1103\/PhysRevE.69.026113","journal-title":"Phys Rev E"},{"key":"3835_CR16","doi-asserted-by":"publisher","first-page":"8577","DOI":"10.1073\/pnas.0601602103","volume":"103","author":"MEJ Newman","year":"2006","unstructured":"Newman MEJ: Modularity and community structure in networks. P Natl Acad Sci USA 2006, 103: 8577\u20138582. 10.1073\/pnas.0601602103","journal-title":"P Natl Acad Sci USA"},{"key":"3835_CR17","doi-asserted-by":"crossref","unstructured":"Reingold EM, Fruchterman TMJ: Graph drawing by force-directed placement. Software Pract Exper 21: 1129\u20131164.","DOI":"10.1002\/spe.4380211102"},{"key":"3835_CR18","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1007\/s00180-008-0115-y","volume":"24","author":"M Lawrence","year":"2009","unstructured":"Lawrence M, Wickham H, Cook D, Hofmann H, Swayne D: Extending the GGobi pipeline from R. Computation Stat 2009, 24: 195\u2013205. 10.1007\/s00180-008-0115-y","journal-title":"Computation Stat"},{"key":"3835_CR19","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1016\/S0167-9473(02)00286-4","volume":"43","author":"DF Swayne","year":"2003","unstructured":"Swayne DF, Lang DT, Buja A, Cook D: GGobi: evolving from XGobi into an extensible framework for interactive data visualization. Comput Stat Data An 2003, 43: 423\u2013444. 10.1016\/S0167-9473(02)00286-4","journal-title":"Comput Stat Data An"},{"key":"3835_CR20","unstructured":"RepeatMasker Open-3.0[http:\/\/www.repeatmasker.org]"},{"key":"3835_CR21","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1038\/hdy.2009.45","volume":"103","author":"P Smykal","year":"2009","unstructured":"Smykal P, Kalendar R, Ford R, Macas J, Griga M: Evolutionary conserved lineage of Angela-family retrotransposons as a genome-wide microsatellite repeat dispersal agent. Heredity 2009, 103: 157\u2013167. 10.1038\/hdy.2009.45","journal-title":"Heredity"},{"key":"3835_CR22","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1016\/S0168-9525(00)02093-X","volume":"16","author":"J Jurka","year":"2000","unstructured":"Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 2000, 16: 418\u2013420. 10.1016\/S0168-9525(00)02093-X","journal-title":"Trends Genet"},{"key":"3835_CR23","doi-asserted-by":"publisher","first-page":"D46","DOI":"10.1093\/nar\/gkp1024","volume":"38","author":"DA Benson","year":"2010","unstructured":"Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res 2010, 38: D46\u201351. 10.1093\/nar\/gkp1024","journal-title":"Nucleic Acids Res"},{"key":"3835_CR24","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1049\/iet-syb:20060038","volume":"1","author":"O Mason","year":"2007","unstructured":"Mason O, Verwoerd M: Graph theory and networks in Biology. IET Syst Biol 2007, 1: 89\u2013119. 10.1049\/iet-syb:20060038","journal-title":"IET Syst Biol"},{"key":"3835_CR25","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1186\/1471-2105-11-21","volume":"11","author":"C Kingsford","year":"2010","unstructured":"Kingsford C, Schatz M, Pop M: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 2010, 11: 21. 10.1186\/1471-2105-11-21","journal-title":"BMC Bioinformatics"},{"key":"3835_CR26","doi-asserted-by":"publisher","first-page":"1101","DOI":"10.1089\/cmb.2009.0047","volume":"16","author":"P Medvedev","year":"2009","unstructured":"Medvedev P, Brudno M: Maximum Likelihood Genome Assembly. J Comput Biol 2009, 16: 1101\u20131116. 10.1089\/cmb.2009.0047","journal-title":"J Comput Biol"},{"key":"3835_CR27","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1101\/gr.074492.107","volume":"18","author":"D Zerbino","year":"2008","unstructured":"Zerbino D, Birney E: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18: 821\u2013829. 10.1101\/gr.074492.107","journal-title":"Genome Res"},{"key":"3835_CR28","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1186\/1471-2105-9-235","volume":"9","author":"JD DeBarry","year":"2008","unstructured":"DeBarry JD, Liu R, Bennetzen JL: Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm. BMC Bioinformatics 2008, 9: 235. 10.1186\/1471-2105-9-235","journal-title":"BMC Bioinformatics"},{"key":"3835_CR29","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1186\/1471-2229-9-137","volume":"9","author":"S Tangphatsornruang","year":"2009","unstructured":"Tangphatsornruang S, Somta P, Uthaipaisanwong P, Chanprasert J, Sangsrakru D, Seehalak W, Sommanas W, Tragoonrung S, Srinives P: Characterization of microsatellites and gene contents from genome shotgun sequences of mungbean (Vigna radiata (L.) Wilczek). BMC Plant Biol 2009, 9: 137. 10.1186\/1471-2229-9-137","journal-title":"BMC Plant Biol"},{"key":"3835_CR30","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1007\/BF02900361","volume":"5","author":"R Staden","year":"1996","unstructured":"Staden R: The Staden sequence analysis package. Mol Biotechnol 1996, 5: 233\u2013241. 10.1007\/BF02900361","journal-title":"Mol Biotechnol"},{"key":"3835_CR31","doi-asserted-by":"publisher","first-page":"1310","DOI":"10.1109\/TVCG.2007.70580","volume":"13","author":"Y Frishman","year":"2007","unstructured":"Frishman Y, Tal A: Multi-Level Graph Layout on the GPU. IEEE T Vis Comput Gr 2007, 13: 1310\u20131319. 10.1109\/TVCG.2007.70580","journal-title":"IEEE T Vis Comput Gr"},{"key":"3835_CR32","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1007\/978-3-642-00219-9_10","volume-title":"Graph Drawing","author":"A Godiyal","year":"2009","unstructured":"Godiyal A, Hoberock J, Garland M, Hart J: Rapid Multipole Graph Drawing on the GPU. In Graph Drawing. Volume 5417. Heidelberg: Springer Berlin; 2009:90\u2013101. full_text"},{"key":"3835_CR33","unstructured":"Cluster resources[http:\/\/www.clusterresources.com]"},{"key":"3835_CR34","unstructured":"BioPerl[http:\/\/www.bioperl.org]"},{"key":"3835_CR35","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"S Altschul","year":"1990","unstructured":"Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local alignment search tool. J Mol Biol 1990, 215: 403\u2013410.","journal-title":"J Mol Biol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-378.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T05:18:24Z","timestamp":1630473504000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-378"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,7,15]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["3835"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-378","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,7,15]]},"assertion":[{"value":"24 March 2010","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 July 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 July 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"378"}}