{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T21:45:14Z","timestamp":1776116714190,"version":"3.50.1"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2017,10,9]],"date-time":"2017-10-09T00:00:00Z","timestamp":1507507200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/501100004189","name":"Max Planck Society","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004189","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Analyzing k-mer frequencies in whole-genome sequencing data is becoming a common method for estimating genome size (GS). However, it remains uninvestigated how accurate the method is, especially if it can capture intra-species GS variation.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present findGSE, which fits skew normal distributions to k-mer frequencies to estimate GS. findGSE outperformed existing tools in an extensive simulation study. Estimating GSs of 89 Arabidopsis thaliana accessions, findGSE showed the highest capability in capturing GS variations. In an application with 71 female and 71 male human individuals, findGSE delivered an average of 3039 Mb as haploid human GS, while female genomes were on average 41 Mb larger than male genomes, in astonishing agreement with size difference of the X and Y chromosomes. Further analysis showed that human GS variations link to geographical patterns and significant differences between populations, which can be explained by variable abundances of LINE-1 retrotransposons.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>R package of findGSE is freely available at https:\/\/github.com\/schneebergerlab\/findGSE and supported on linux and Mac systems.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx637","type":"journal-article","created":{"date-parts":[[2017,10,6]],"date-time":"2017-10-06T19:12:01Z","timestamp":1507317121000},"page":"550-557","source":"Crossref","is-referenced-by-count":244,"title":["<i>findGSE<\/i>: estimating genome size variation within human and <i>Arabidopsis<\/i> using <i>k<\/i>-mer frequencies"],"prefix":"10.1093","volume":"34","author":[{"given":"Hequan","family":"Sun","sequence":"first","affiliation":[{"name":"Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany"}]},{"given":"Jia","family":"Ding","sequence":"additional","affiliation":[{"name":"Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany"}]},{"given":"Mathieu","family":"Piedno\u00ebl","sequence":"additional","affiliation":[{"name":"Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany"}]},{"given":"Korbinian","family":"Schneeberger","sequence":"additional","affiliation":[{"name":"Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany"}]}],"member":"286","published-online":{"date-parts":[[2017,10,9]]},"reference":[{"key":"2023012712332841400_btx637-B1","first-page":"171","article-title":"A class of distributions which include the normal ones","volume":"12","author":"Azzalini","year":"1985","journal-title":"Scand. J. Stat"},{"key":"2023012712332841400_btx637-B2","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1111\/j.1467-9469.2005.00426.x","article-title":"The skew-normal distribution and related multivariate families","volume":"32","author":"Azzalini","year":"2005","journal-title":"Scand. J. Stat"},{"key":"2023012712332841400_btx637-B3","doi-asserted-by":"crossref","first-page":"6634","DOI":"10.1073\/pnas.97.12.6634","article-title":"Molecular evidence for a relationship between LINE-1 elements and X chromosome inactivation: The Lyon repeat hypothesis","volume":"97","author":"Bailey","year":"2000","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712332841400_btx637-B4","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023012712332841400_btx637-B5","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1038\/nature10555","article-title":"Spontaneous epigenetic variation in the Arabidopsis thaliana methylome","volume":"480","author":"Becker","year":"2011","journal-title":"Nature"},{"key":"2023012712332841400_btx637-B6","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1093\/aob\/mcg057","article-title":"Comparisons with Caenorhabditis (\u223c100Mb) and Drosophila (\u223c175Mb) using flow cytometry show genome size in Arabidopsis to be \u223c157Mb and thus \u223c25% larger than the Arabidopsis genome initiative estimate of \u223c125Mb","volume":"91","author":"Bennett","year":"2003","journal-title":"Ann. Botany"},{"key":"2023012712332841400_btx637-B7","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1093\/bioinformatics\/btt310","article-title":"Informed and automated k-mer size selection for genome assembly","volume":"30","author":"Chikhi","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712332841400_btx637-B8","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1093\/aob\/mci005","article-title":"Plant DNA flow cytometry and estimation of nuclear genome size","volume":"95","author":"Dole\u017eel","year":"2005","journal-title":"Ann. Bot"},{"key":"2023012712332841400_btx637-B9","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1002\/cyto.a.10013","article-title":"Nuclear DNA content and genome size of trout and human","volume":"51","author":"Dole\u017eel","year":"2003","journal-title":"Cytometry"},{"key":"2023012712332841400_btx637-B10","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1002\/cyto.a.20915","article-title":"Nuclear genome size: are we getting closer?","volume":"77","author":"Dole\u017eel","year":"2010","journal-title":"Cytometry Part A"},{"key":"2023012712332841400_btx637-B11","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1093\/oxfordjournals.aob.a010312","article-title":"Plant genome size estimation by flow cytometry: inter-laboratory comparison","volume":"82","author":"Dole\u017eel","year":"1998","journal-title":"Ann. Bot"},{"key":"2023012712332841400_btx637-B12","doi-asserted-by":"crossref","first-page":"2233","DOI":"10.1038\/nprot.2007.310","article-title":"Estimation of nuclear DNA content in plants using flow cytometry","volume":"2","author":"Dole\u017eel","year":"2007","journal-title":"Nat. Protoc"},{"key":"2023012712332841400_btx637-B13","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712332841400_btx637-B14","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1038\/nrg1674","article-title":"Synergy between sequence and size in large-scale genomics","volume":"6","author":"Gregory","year":"2005","journal-title":"Nat. Rev. Genet"},{"key":"2023012712332841400_btx637-B15","doi-asserted-by":"crossref","first-page":"D332","DOI":"10.1093\/nar\/gkl828","article-title":"Eukaryotic genome size databases","volume":"35","author":"Gregory","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023012712332841400_btx637-B16","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1177\/002215540205000601","article-title":"From pixels to picograms: a beginners' guide to genome quantification by Feulgen image analysis denstometry","volume":"50","author":"Hardie","year":"2002","journal-title":"J. Histochem. Cytochem"},{"key":"2023012712332841400_btx637-B17","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1104\/pp.112.200311","article-title":"Fast isogenic mapping-by-sequencing of ethyl methanesulfonate-induced mutant bulks","volume":"160","author":"Hartwig","year":"2012","journal-title":"Plant Physiol"},{"key":"2023012712332841400_btx637-B18","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1093\/bioinformatics\/bts187","article-title":"pIRS: Profile-based Illumina pair-end reads simulator","volume":"28","author":"Hu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712332841400_btx637-B19","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/35057062","article-title":"Initial sequencing and analysis of the human genome","volume":"409","author":"International Human Genome Sequencing Consortium","year":"2001","journal-title":"Nature"},{"key":"2023012712332841400_btx637-B20","doi-asserted-by":"crossref","first-page":"768","DOI":"10.1101\/gr.214346.116","article-title":"ABySS2.0: resource-efficient assembly of large genomes using Bloom filter","volume":"27","author":"Jackman","year":"2017","journal-title":"Genome Res"},{"key":"2023012712332841400_btx637-B21","doi-asserted-by":"crossref","first-page":"1821","DOI":"10.1101\/gr.177659.114","article-title":"Environmentally responsive genome-wide accumulation of de novo Arabidopsis thaliana mutations and epimutations","volume":"24","author":"Jiang","year":"2014","journal-title":"Genome Res"},{"key":"2023012712332841400_btx637-B22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-15-182","article-title":"Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads","volume":"15","author":"Jiang","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023012712332841400_btx637-B23","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012712332841400_btx637-B24","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012712332841400_btx637-B25","author":"Liu","year":"2013"},{"key":"2023012712332841400_btx637-B26","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1101\/gr.1251803","article-title":"Estimating the repeat structure and length of DNA sequences using l-tuples","volume":"13","author":"Li","year":"2003","journal-title":"Genome Res"},{"key":"2023012712332841400_btx637-B27","doi-asserted-by":"crossref","first-page":"884","DOI":"10.1038\/ng.2678","article-title":"Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden","volume":"45","author":"Long","year":"2013","journal-title":"Nat. Genet"},{"key":"2023012712332841400_btx637-B28","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1038\/nature18964","article-title":"The Simons Genome Diversity Project: 300 genomes from 142 diverse populations","volume":"538","author":"Mallick","year":"2016","journal-title":"Nature"},{"key":"2023012712332841400_btx637-B29","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012712332841400_btx637-B30","doi-asserted-by":"crossref","first-page":"2024","DOI":"10.1101\/gr.080200.108","article-title":"Sequencing of natural strains of Arabidopsis thaliana with short reads","volume":"18","author":"Ossowski","year":"2008","journal-title":"Genome Res"},{"key":"2023012712332841400_btx637-B31","doi-asserted-by":"crossref","first-page":"1201","DOI":"10.1534\/g3.117.040204","article-title":"Unstable inheritance of 45S rRNA genes in Arabidopsis thaliana","volume":"7","author":"Rabanal","year":"2017","journal-title":"G3"},{"key":"2023012712332841400_btx637-B32","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1093\/bioinformatics\/btt020","article-title":"DSK: k-mer counting with very low memory usage","volume":"29","author":"Rizk","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712332841400_btx637-B33","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1038\/nature11968","article-title":"Patterns of population epigenomic diversity","volume":"495","author":"Schmitz","year":"2013","journal-title":"Nature"},{"key":"2023012712332841400_btx637-B34","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1093\/aob\/mch037","article-title":"Genome size variation among accessions of Arabidopsis thaliana","volume":"93","author":"Schmuths","year":"2004","journal-title":"Ann. Bot"},{"key":"2023012712332841400_btx637-B35","doi-asserted-by":"crossref","first-page":"e0130679","DOI":"10.1371\/journal.pone.0130679","article-title":"Re-evaluation of reportedly metal tolerant Arabidopsis thaliana accessions","volume":"11","author":"Silva-Guzman","year":"2016","journal-title":"PLoS One"},{"key":"2023012712332841400_btx637-B36","doi-asserted-by":"crossref","first-page":"1596","DOI":"10.3732\/ajb.90.11.1596","article-title":"Evolution of genome size in the angiosperms","volume":"90","author":"Soltis","year":"2003","journal-title":"Am. J. Bot"},{"key":"2023012712332841400_btx637-B37","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"The 1000 Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"key":"2023012712332841400_btx637-B38","doi-asserted-by":"crossref","first-page":"2202","DOI":"10.1093\/bioinformatics\/btx153","article-title":"GenomeScope: Fast reference-free genome profiling from short reads","volume":"33","author":"Vurture","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712332841400_btx637-B39","doi-asserted-by":"crossref","first-page":"e52249","DOI":"10.1371\/journal.pone.0052249","article-title":"FastUniq: A fast de novo duplicates removal tool for paired short reads","volume":"7","author":"Xu","year":"2012","journal-title":"PLoS One"},{"key":"2023012712332841400_btx637-B40","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1101\/gr.188573.114","article-title":"Organelle DNA rearrangement mapping reveals U-turn-like inversions as a major source of genomic instability in Arabidopsis and humans","volume":"25","author":"Zampini","year":"2015","journal-title":"Genome Res"},{"key":"2023012712332841400_btx637-B41","doi-asserted-by":"crossref","first-page":"E4052","DOI":"10.1073\/pnas.1607532113","article-title":"Chromosomal-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms","volume":"113","author":"Zapata","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/4\/550\/48913987\/bioinformatics_34_4_550.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/4\/550\/48913987\/bioinformatics_34_4_550.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:21:47Z","timestamp":1674825707000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/4\/550\/4386918"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,10,9]]},"references-count":41,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx637","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,2,15]]},"published":{"date-parts":[[2017,10,9]]}}}