{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T05:25:55Z","timestamp":1768713955516,"version":"3.49.0"},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T00:00:00Z","timestamp":1730246400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Zhejiang Provincial Natural Science Foundation of China","award":["LQ23F020018"],"award-info":[{"award-number":["LQ23F020018"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62303372"],"award-info":[{"award-number":["62303372"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Summary<\/jats:title><jats:p>Estimating genome size using k-mer frequencies, which plays a fundamental role in designing genome sequencing and analysis projects, has remained challenging for polyploid species, i.e., ploidy p\u2009&amp;gt;\u20092. To address this, we introduce \u201cfindGSEP,\u201d which is designed based on iterative curve fitting of k-mer frequencies. Precisely, it first disentangles up to p normal distributions by analyzing k-mer frequencies in whole genome sequencing of the focal species. Second, it computes the sizes of genomic regions related to 1\u223cp (homologous) chromosome(s) using each respective curve fitting, from which it infers the full polyploid and average haploid genome size. \u201cfindGSEP\u201d can handle any level of ploidy p, and infer more accurate genome size than other well-known tools, as shown by tests using simulated and real genomic sequencing data of various species including octoploids.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>\u201cfindGSEP\u201d was implemented as a web server, which is freely available at http:\/\/146.56.237.198:3838\/findGSEP\/. Also, \u201cfindGSEP\u201d was implemented as an R package for parallel processing of multiple samples. Source code and tutorial on its installation and usage is available at https:\/\/github.com\/sperfu\/findGSEP.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae647","type":"journal-article","created":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T19:16:56Z","timestamp":1730315816000},"source":"Crossref","is-referenced-by-count":4,"title":["findGSEP: estimating genome size of polyploid species using<i>k<\/i>-mer frequencies"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9086-3982","authenticated-orcid":false,"given":"Laiyi","family":"Fu","sequence":"first","affiliation":[{"name":"School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi\u2019an Jiaotong University , Xi\u2019an 710049,","place":["China"]},{"name":"Research Institute of Xi\u2019an Jiaotong University , Zhejiang, Hangzhou 311200,","place":["China"]},{"name":"Sichuan Digital Economy Industry Development Research Institute , Chengdu 610036,","place":["China"]}]},{"given":"Yanxin","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi\u2019an Jiaotong University , Xi\u2019an 710049,","place":["China"]}]},{"given":"Shunkang","family":"Ling","sequence":"additional","affiliation":[{"name":"College of Mechanical and Electrical Engineering, Shihezi University , Shihezi 832000,","place":["China"]}]},{"given":"Ying","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi\u2019an Jiaotong University , Xi\u2019an 710049,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1165-2701","authenticated-orcid":false,"given":"Binzhong","family":"Wang","sequence":"additional","affiliation":[{"name":"Hubei Key Laboratory of Three Gorges Project for Conservation of Fishes , Yichang, Hubei 443100,","place":["China"]}]},{"given":"Hejun","family":"Du","sequence":"additional","affiliation":[{"name":"Hubei Key Laboratory of Three Gorges Project for Conservation of Fishes , Yichang, Hubei 443100,","place":["China"]}]},{"given":"Qinke","family":"Peng","sequence":"additional","affiliation":[{"name":"School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi\u2019an Jiaotong University , Xi\u2019an 710049,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2046-2109","authenticated-orcid":false,"given":"Hequan","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi\u2019an Jiaotong University , Xi\u2019an 710049,","place":["China"]},{"name":"Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research , Cologne 50829,","place":["Germany"]}]}],"member":"286","published-online":{"date-parts":[[2024,10,30]]},"reference":[{"key":"2024112610365832300_btae647-B1","doi-asserted-by":"crossref","first-page":"1362","DOI":"10.3390\/plants10071362","article-title":"Estimation of genome size in the endemic species Reseda pentagyna and the locally rare species Reseda lutea using comparative analyses of flow cytometry and K-mer approaches","volume":"10","author":"Al-Qurainy","year":"2021","journal-title":"Plants"},{"key":"2024112610365832300_btae647-B2","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1093\/bioinformatics\/btt310","article-title":"Informed and automated k-mer size selection for genome assembly","volume":"30","author":"Chikhi","year":"2014","journal-title":"Bioinformatics"},{"key":"2024112610365832300_btae647-B3","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1136\/jcp.30.10.907","article-title":"Feulgen microdensitometry and analysis of S-phase cells in cervical tumour biopsies","volume":"30","author":"Dixon","year":"1977","journal-title":"J Clin Pathol"},{"key":"2024112610365832300_btae647-B4","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1093\/oxfordjournals.aob.a010312","article-title":"Plant genome size estimation by flow cytometry: inter-laboratory comparison","volume":"82","author":"Dole\u017eel","year":"1998","journal-title":"Ann Bot"},{"key":"2024112610365832300_btae647-B5851283","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1093\/dnares\/dst049","article-title":"Dissection of the octoploid strawberry genome by deep sequencing of the genomes of fragaria species","volume":"21","author":"Hirakawa","year":"2014","journal-title":"DNA Res"},{"key":"2024112610365832300_btae647-B5"},{"key":"2024112610365832300_btae647-B6","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1093\/bioinformatics\/bts187","article-title":"pIRS: profile-based Illumina pair-end reads simulator","volume":"28","author":"Hu","year":"2012","journal-title":"Bioinformatics"},{"key":"2024112610365832300_btae647-B7","doi-asserted-by":"crossref","first-page":"2759","DOI":"10.1093\/bioinformatics\/btx304","article-title":"KMC 3: counting and manipulating k-mer statistics","volume":"33","author":"Kokot","year":"2017","journal-title":"Bioinformatics"},{"key":"2024112610365832300_btae647-B8","doi-asserted-by":"crossref","first-page":"1916","DOI":"10.1101\/gr.1251803","article-title":"Estimating the repeat structure and length of DNA sequences using L-tuples","volume":"13","author":"Li","year":"2003","journal-title":"Genome Res"},{"key":"2024112610365832300_btae647-B9","doi-asserted-by":"crossref","first-page":"982","DOI":"10.1038\/s41588-024-01715-9","article-title":"A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range","volume":"56","author":"Lian","year":"2024","journal-title":"Nat Genet"},{"key":"2024112610365832300_btae647-B10","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2024112610365832300_btae647-B11","doi-asserted-by":"crossref","first-page":"5.1.1","DOI":"10.1002\/cpim.40","article-title":"Flow cytometry: an overview","volume":"120","author":"McKinnon","year":"2018","journal-title":"Curr Protoc Immunol"},{"key":"2024112610365832300_btae647-B12","doi-asserted-by":"crossref","first-page":"3047","DOI":"10.1534\/g3.120.401028","article-title":"Measuring genome sizes using read-depth, k-mers, and flow cytometry: methodological comparisons in beetles (Coleoptera)","volume":"10","author":"Pflug","year":"2020","journal-title":"G3 (Bethesda)"},{"key":"2024112610365832300_btae647-B13","doi-asserted-by":"crossref","first-page":"1432","DOI":"10.1038\/s41467-020-14998-3","article-title":"GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes","volume":"11","author":"Ranallo-Benavidez","year":"2020","journal-title":"Nat Commun"},{"key":"2024112610365832300_btae647-B14","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1093\/bioinformatics\/btx637","article-title":"findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies","volume":"34","author":"Sun","year":"2018","journal-title":"Bioinformatics"},{"key":"2024112610365832300_btae647-B15","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1038\/s41588-022-01015-0","article-title":"Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar","volume":"54","author":"Sun","year":"2022","journal-title":"Nat Genet"},{"key":"2024112610365832300_btae647-B16","doi-asserted-by":"crossref","first-page":"2202","DOI":"10.1093\/bioinformatics\/btx153","article-title":"GenomeScope: fast reference-free genome profiling from short reads","volume":"33","author":"Vurture","year":"2017","journal-title":"Bioinformatics"},{"key":"2024112610365832300_btae647-B17","volume-title":"Genom Proteom Bioinform","author":"Wan","year":"2024"},{"key":"2024112610365832300_btae647-B18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/gigascience\/gix097","article-title":"The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum","volume":"6","author":"Zimin","year":"2017","journal-title":"Gigascience"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae647\/60248333\/btae647.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/11\/btae647\/60816233\/btae647.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/11\/btae647\/60816233\/btae647.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,30]],"date-time":"2024-11-30T12:44:56Z","timestamp":1732970696000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae647\/7852830"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,10,30]]},"references-count":19,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae647","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,11]]},"published":{"date-parts":[[2024,10,30]]},"article-number":"btae647"}}