{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T08:11:52Z","timestamp":1769760712960,"version":"3.49.0"},"reference-count":16,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T00:00:00Z","timestamp":1687564800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","award":["FCC\/1\/1976-44-01"],"award-info":[{"award-number":["FCC\/1\/1976-44-01"]}],"id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","award":["FCC\/1\/1976-45-01"],"award-info":[{"award-number":["FCC\/1\/1976-45-01"]}],"id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","award":["URF\/1\/4663-01-01"],"award-info":[{"award-number":["URF\/1\/4663-01-01"]}],"id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","award":["REI\/1\/5202-01-01"],"award-info":[{"award-number":["REI\/1\/5202-01-01"]}],"id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","award":["REI\/1\/4940-01-01"],"award-info":[{"award-number":["REI\/1\/4940-01-01"]}],"id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","award":["RGC\/3\/4816-01-01"],"award-info":[{"award-number":["RGC\/3\/4816-01-01"]}],"id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Sequencing coverage is among key determinants considered in the design of omics studies. To help estimate cost-effective sequencing coverage for specific downstream analysis, downsampling, a technique to sample subsets of reads with a specific size, is routinely used. However, as the size of sequencing becomes larger and larger, downsampling becomes computationally challenging.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we developed an approximate downsampling method called s-leaping that was designed to efficiently and accurately process large-size data. We compared the performance of s-leaping with state-of-the-art downsampling methods in a range of practical omics-study downsampling settings and found s-leaping to be up to 39% faster than the second-fastest method, with comparable accuracy to the exact downsampling methods. To apply s-leaping on FASTQ data, we developed a light-weight tool called fadso in C. Using whole-genome sequencing data with 208 million reads, we compared fadso\u2019s performance with that of a commonly used FASTQ tool with the same downsampling feature and found fadso to be up to 12% faster with 21% lower memory usage, suggesting fadso to have up to 40% higher throughput in a parallel computing setting.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The C source code for s-leaping, as well as the fadso package is freely available at https:\/\/github.com\/hkuwahara\/sleaping.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad399","type":"journal-article","created":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T20:42:30Z","timestamp":1687639350000},"source":"Crossref","is-referenced-by-count":2,"title":["S-leaping: an efficient downsampling method for large high-throughput sequencing data"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5333-6729","authenticated-orcid":false,"given":"Hiroyuki","family":"Kuwahara","sequence":"first","affiliation":[{"name":"Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST) , Thuwal 23955-6900, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7108-3574","authenticated-orcid":false,"given":"Xin","family":"Gao","sequence":"additional","affiliation":[{"name":"Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST) , Thuwal 23955-6900, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,6,24]]},"reference":[{"key":"2023070406462096000_btad399-B1","doi-asserted-by":"crossref","first-page":"1104","DOI":"10.1038\/s41588-021-00877-0","article-title":"Rapid genotype imputation from sequence with reference panels","volume":"53","author":"Davies","year":"2021","journal-title":"Nat Genet"},{"key":"2023070406462096000_btad399-B2","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1101\/gr.210500.116","article-title":"A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree","volume":"27","author":"Eberle","year":"2017","journal-title":"Genome Res"},{"key":"2023070406462096000_btad399-B3","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/0021-9991(76)90041-3","article-title":"A general method for numerically simulating the stochastic time evolution of coupled chemical reactions","volume":"22","author":"Gillespie","year":"1976","journal-title":"J Comput Phys"},{"key":"2023070406462096000_btad399-B4","doi-asserted-by":"crossref","first-page":"1716","DOI":"10.1063\/1.1378322","article-title":"Approximate accelerated stochastic simulation of chemically reacting systems","volume":"115","author":"Gillespie","year":"2001","journal-title":"J Chem Phys"},{"key":"2023070406462096000_btad399-B5","first-page":"2555","article-title":"Very low-depth whole-genome sequencing in complex trait association studies","volume":"35","author":"Gilly","year":"2019","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023070406462096000_btad399-B6","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1186\/s13073-016-0269-0","article-title":"Medical implications of technical accuracy in genome sequencing","volume":"8","author":"Goldfeder","year":"2016","journal-title":"Genome Med"},{"key":"2023070406462096000_btad399-B7","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1186\/s13073-019-0682-2","article-title":"Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores","volume":"11","author":"Homburger","year":"2019","journal-title":"Genome Med"},{"key":"2023070406462096000_btad399-B8","volume-title":"The Art of Computer Programming, Vol. 2: Seminumerical Algorithms","author":"Knuth","year":"1997","edition":"3rd edn."},{"key":"2023070406462096000_btad399-B9","first-page":"2078","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023070406462096000_btad399-B10","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1101\/gr.266486.120","article-title":"Low-pass sequencing increases the power of GWAS and decreases measurement error of polygenic risk scores compared to genotyping arrays","volume":"31","author":"Li","year":"2021","journal-title":"Genome Res"},{"key":"2023070406462096000_btad399-B11","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1145\/198429.198435","article-title":"Reservoir-sampling algorithms of time complexity","volume":"20","author":"Li","year":"1994","journal-title":"ACM Trans Math Softw"},{"key":"2023070406462096000_btad399-B12","doi-asserted-by":"crossref","first-page":"5966","DOI":"10.1111\/mec.16077","article-title":"A beginner\u2019s guide to low-coverage whole genome sequencing for population genomics","volume":"30","author":"Lou","year":"2021","journal-title":"Mol Ecol"},{"key":"2023070406462096000_btad399-B13","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023070406462096000_btad399-B14","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1038\/s41588-020-00756-0","article-title":"Efficient phasing and imputation of low-coverage sequencing data using large reference panels","volume":"53","author":"Rubinacci","year":"2021","journal-title":"Nat Genet"},{"key":"2023070406462096000_btad399-B15","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1186\/s12920-021-00948-5","article-title":"Characterizing sensitivity and coverage of clinical wgs as a diagnostic test for genetic disorders","volume":"14","author":"Sun","year":"2021","journal-title":"BMC Med Genomics"},{"key":"2023070406462096000_btad399-B16","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1145\/3147.3165","article-title":"Random sampling with a reservoir","volume":"11","author":"Vitter","year":"1985","journal-title":"ACM Trans Math Softw"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad399\/50696354\/btad399.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/7\/btad399\/50791673\/btad399.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/7\/btad399\/50791673\/btad399.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,4]],"date-time":"2023-07-04T06:46:34Z","timestamp":1688453194000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad399\/7206878"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2023,6,24]]},"references-count":16,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2023,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad399","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,7,1]]},"published":{"date-parts":[[2023,6,24]]},"article-number":"btad399"}}