{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T11:51:30Z","timestamp":1773229890121,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The deluge of current sequenced data has exceeded Moore\u2019s Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory.<\/jats:p>\n               <jats:p>Results: LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of g -spaced sequenced k -mers and the other holding k -mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by 50% compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage.<\/jats:p>\n               <jats:p>Availability and implementation: \u00a0https:\/\/github.com\/SaraEl-Metwally\/LightAssembler<\/jats:p>\n               <jats:p>Contact: \u00a0sarah_almetwally4@mans.edu.eg<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw470","type":"journal-article","created":{"date-parts":[[2016,7,14]],"date-time":"2016-07-14T06:18:47Z","timestamp":1468477127000},"page":"3215-3223","source":"Crossref","is-referenced-by-count":16,"title":["LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads"],"prefix":"10.1093","volume":"32","author":[{"given":"Sara","family":"El-Metwally","sequence":"first","affiliation":[{"name":"1 Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA"},{"name":"2 Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt"}]},{"given":"Magdi","family":"Zakaria","sequence":"additional","affiliation":[{"name":"2 Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt"}]},{"given":"Taher","family":"Hamza","sequence":"additional","affiliation":[{"name":"2 Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt"}]}],"member":"286","published-online":{"date-parts":[[2016,7,13]]},"reference":[{"key":"2023020113515485900_btw470-B1","doi-asserted-by":"crossref","first-page":"3515","DOI":"10.1093\/bioinformatics\/btu578","article-title":"String graph construction using incremental hashing","volume":"30","author":"Ben-Bassat","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B2","doi-asserted-by":"crossref","first-page":"422.","DOI":"10.1145\/362686.362692","article-title":"Space\/Time Trade\/Offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"2023020113515485900_btw470-B3","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1093\/bioinformatics\/btq683","article-title":"Scaffolding pre-assembled contigs using SSPACE","volume":"27","author":"Boetzer","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B4","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1007\/978-3-642-33122-0_18","volume-title":"Algorithms in Bioinformatics","author":"Bowe","year":"2012"},{"key":"2023020113515485900_btw470-B5","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1186\/2047-217X-2-10","article-title":"Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species","volume":"2","author":"Bradnam","year":"2013","journal-title":"Gigascience"},{"key":"2023020113515485900_btw470-B6","doi-asserted-by":"crossref","first-page":"2067","DOI":"10.1093\/bioinformatics\/bth205","article-title":"Fragment assembly with short reads","volume":"20","author":"Chaisson","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B7","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1101\/gr.079053.108","article-title":"De novo fragment assembly with short mate-paired reads: does the read length matter?","volume":"19","author":"Chaisson","year":"2009","journal-title":"Genome Res"},{"key":"2023020113515485900_btw470-B8","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1089\/cmb.2014.0160","article-title":"On the representation of De Bruijn graphs","volume":"22","author":"Chikhi","year":"2015","journal-title":"J. Comput. Biol"},{"key":"2023020113515485900_btw470-B9","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1093\/bioinformatics\/btt310","article-title":"Informed and automated k-mer size selection for genome assembly","volume":"30","author":"Chikhi","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B10","doi-asserted-by":"crossref","first-page":"22.","DOI":"10.1186\/1748-7188-8-22","article-title":"Space-efficient and exact De Bruijn graph representation based on a Bloom filter","volume":"8","author":"Chikhi","year":"2013","journal-title":"Algorithms Mol. Biol"},{"key":"2023020113515485900_btw470-B11","doi-asserted-by":"crossref","first-page":"1937","DOI":"10.1093\/bioinformatics\/bts297","article-title":"Gossamer\u2013a resource-efficient de novo assembler","volume":"28","author":"Conway","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B12","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1093\/bioinformatics\/btq697","article-title":"Succinct data structures for assembling large genomes","volume":"27","author":"Conway","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B13","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.1101\/gr.126599.111","article-title":"Assemblathon 1: a competitive assessment of de novo short read assembly methods","volume":"21","author":"Earl","year":"2011","journal-title":"Genome Res"},{"key":"2023020113515485900_btw470-B14","doi-asserted-by":"crossref","first-page":"e1003345","DOI":"10.1371\/journal.pcbi.1003345","article-title":"Next-generation sequence assembly: four stages of data processing and computational challenges","volume":"9","author":"El-Metwally","year":"2013","journal-title":"PLoS Comput. Biol"},{"key":"2023020113515485900_btw470-B15","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4939-0715-1","volume-title":"Next Generation Sequencing Technologies and Challenges in Sequence Assembly. SpringerBriefs in Systems Biology","author":"El-Metwally","year":"2014"},{"key":"2023020113515485900_btw470-B16","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020113515485900_btw470-B17","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B18","doi-asserted-by":"crossref","first-page":"61","DOI":"10.2144\/000114133","article-title":"Library construction for next-generation sequencing: overviews and challenges","volume":"56","author":"Head","year":"2014","journal-title":"Biotechniques"},{"key":"2023020113515485900_btw470-B19","doi-asserted-by":"crossref","first-page":"R42","DOI":"10.1186\/gb-2014-15-3-r42","article-title":"A comprehensive evaluation of assembly scaffolding tools","volume":"15","author":"Hunt","year":"2014","journal-title":"Genome Biol"},{"key":"2023020113515485900_btw470-B20","doi-asserted-by":"crossref","first-page":"e75505.","DOI":"10.1371\/journal.pone.0075505","article-title":"Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures","volume":"8","author":"Kleftogiannis","year":"2013","journal-title":"PLoS One"},{"key":"2023020113515485900_btw470-B21","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1146\/annurev-animal-090414-014900","article-title":"The Genome 10K Project: a way forward","volume":"3","author":"Koepfli","year":"2015","journal-title":"Annu. Rev. Anim. Biosci"},{"key":"2023020113515485900_btw470-B22","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/0888-7543(88)90007-9","article-title":"Genomic mapping by fingerprinting random clones: a mathematical analysis","volume":"2","author":"Lander","year":"1988","journal-title":"Genomics"},{"key":"2023020113515485900_btw470-B23","doi-asserted-by":"crossref","first-page":"3541","DOI":"10.1093\/bioinformatics\/btu713","article-title":"KmerStream: streaming algorithms for k-mer abundance estimation","volume":"30","author":"Melsted","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B24","doi-asserted-by":"crossref","first-page":"ii79","DOI":"10.1093\/bioinformatics\/bti1114","article-title":"The fragment assembly string graph","volume":"21","author":"Myers","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B25","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1038\/nrg3367","article-title":"Sequence assembly demystified","volume":"14","author":"Nagarajan","year":"2013","journal-title":"Nat. Rev. Genet"},{"key":"2023020113515485900_btw470-B26","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020113515485900_btw470-B27","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1007\/978-3-540-72845-0_9","article-title":"Cache-, hash- and space-efficient bloom filters","volume":"4525","author":"Putze","year":"2007","journal-title":"Lect. Notes Comput. Sci"},{"key":"2023020113515485900_btw470-B28","doi-asserted-by":"crossref","first-page":"2.","DOI":"10.1186\/1748-7188-9-2","article-title":"Using cascading Bloom filters to improve the memory usage for de Brujin graphs","volume":"9","author":"Salikhov","year":"2014","journal-title":"Algorithms Mol. Biol"},{"key":"2023020113515485900_btw470-B29","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1101\/gr.131383.111","article-title":"GAGE: a critical evaluation of genome assemblies and assembly algorithms","volume":"22","author":"Salzberg","year":"2012","journal-title":"Genome Res"},{"key":"2023020113515485900_btw470-B30","doi-asserted-by":"crossref","first-page":"1228","DOI":"10.1093\/bioinformatics\/btu023","article-title":"Exploring genome characteristics and sequence quality without a reference","volume":"30","author":"Simpson","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B31","doi-asserted-by":"crossref","first-page":"i367","DOI":"10.1093\/bioinformatics\/btq217","article-title":"Efficient construction of an assembly string graph using the FM-index","volume":"26","author":"Simpson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020113515485900_btw470-B32","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res"},{"key":"2023020113515485900_btw470-B33","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res"},{"key":"2023020113515485900_btw470-B34","doi-asserted-by":"crossref","first-page":"509.","DOI":"10.1186\/s13059-014-0509-9","article-title":"Lighter: fast and memory-efficient sequencing error correction without counting","volume":"15","author":"Song","year":"2014","journal-title":"Genome Biol"},{"key":"2023020113515485900_btw470-B35","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1093\/bib\/bbs015","article-title":"A survey of error-correction methods for next-generation sequencing","volume":"14","author":"Yang","year":"2013","journal-title":"Brief Bioinform"},{"key":"2023020113515485900_btw470-B36","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-13-S6-S1","article-title":"Exploiting sparseness in de novo genome assembly","volume":"13","author":"Ye","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020113515485900_btw470-B37","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: Algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/21\/3215\/49022099\/bioinformatics_32_21_3215.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/21\/3215\/49022099\/bioinformatics_32_21_3215.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T23:54:52Z","timestamp":1675295692000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/21\/3215\/2415317"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,7,13]]},"references-count":37,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2016,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw470","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,11,1]]},"published":{"date-parts":[[2016,7,13]]}}}