{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T11:51:24Z","timestamp":1773229884613,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2017,7,24]],"date-time":"2017-07-24T00:00:00Z","timestamp":1500854400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","award":["1425\/13"],"award-info":[{"award-number":["1425\/13"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata\u2014coverage counts collected at junction k-mers and connections bridging between junction pairs\u2014contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency\u2014namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14\u2013110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Faucet is available at https:\/\/github.com\/Shamir-Lab\/Faucet<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx471","type":"journal-article","created":{"date-parts":[[2017,7,21]],"date-time":"2017-07-21T07:08:42Z","timestamp":1500620922000},"page":"147-154","source":"Crossref","is-referenced-by-count":9,"title":["Faucet: streaming\n                    <i>de novo<\/i>\n                    assembly graph construction"],"prefix":"10.1093","volume":"34","author":[{"given":"Roye","family":"Rozov","sequence":"first","affiliation":[{"name":"Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel"}]},{"given":"Gil","family":"Goldshlager","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"given":"Eran","family":"Halperin","sequence":"additional","affiliation":[{"name":"Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA, USA"}]},{"given":"Ron","family":"Shamir","sequence":"additional","affiliation":[{"name":"Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel"}]}],"member":"286","published-online":{"date-parts":[[2017,7,24]]},"reference":[{"key":"2023020208410078000_btx471-B1","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023020208410078000_btx471-B2","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Space\/time trade-offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"2023020208410078000_btx471-B3","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1007\/978-3-642-33122-0_19","article-title":"Space-efficient and exact de Bruijn graph representation based on a Bloom filter","volume":"8","author":"Chikhi","year":"2012","journal-title":"Algorithms Bioinformatics"},{"key":"2023020208410078000_btx471-B4","first-page":"35","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","author":"Chikhi","year":"2014"},{"key":"2023020208410078000_btx471-B5","doi-asserted-by":"crossref","first-page":"i201","DOI":"10.1093\/bioinformatics\/btw279","article-title":"Compacting de Bruijn graphs from sequencing data quickly and in low memory","volume":"32","author":"Chikhi","year":"2016","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B6","doi-asserted-by":"crossref","first-page":"3215","DOI":"10.1093\/bioinformatics\/btw470","article-title":"LightAssembler: Fast and memory-efficient assembly algorithm for high-throughput sequencing reads","volume":"32","author":"El-Metwally","year":"2016","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B7","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B8","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet"},{"key":"2023020208410078000_btx471-B9","doi-asserted-by":"crossref","first-page":"1674","DOI":"10.1093\/bioinformatics\/btv033","article-title":"MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph","volume":"31","author":"Li","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B10","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1007\/978-3-540-74126-8_27","volume-title":"Algorithms in Bioinformatics","author":"Medvedev","year":"2007"},{"key":"2023020208410078000_btx471-B11","doi-asserted-by":"crossref","first-page":"3541","DOI":"10.1093\/bioinformatics\/btu713","article-title":"KmerStream: streaming algorithms for k-mer abundance estimation","volume":"30","author":"Melsted","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B12","doi-asserted-by":"crossref","first-page":"4024","DOI":"10.1093\/bioinformatics\/btw609","article-title":"TwoPaCo: An efficient algorithm to build the compacted de Bruijn graph from many complete genomes","volume":"33","author":"Minkin","year":"2017","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B13","doi-asserted-by":"crossref","first-page":"1324","DOI":"10.1093\/bioinformatics\/btw832","article-title":"ntCard: a streaming algorithm for cardinality estimation in genomics data","volume":"33","author":"Mohamadi","year":"2017","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B14","author":"Nihalani","year":"2016"},{"key":"2023020208410078000_btx471-B15","article-title":"Genome graphs","author":"Novak","year":"2017","journal-title":"bioRxiv"},{"key":"2023020208410078000_btx471-B16","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1101\/gr.213959.116","article-title":"metaSPAdes: a new versatile de novo metagenomics assembler","volume":"27","author":"Nurk","year":"2017","journal-title":"Genome Res."},{"key":"2023020208410078000_btx471-B17","doi-asserted-by":"crossref","first-page":"13272","DOI":"10.1073\/pnas.1121464109","article-title":"Scaling metagenome sequence assembly with probabilistic de Bruijn graphs","volume":"109","author":"Pell","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020208410078000_btx471-B18","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1038\/nbt.3122","article-title":"StringTie enables improved reconstruction of a transcriptome from RNA-seq reads","volume":"33","author":"Pertea","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023020208410078000_btx471-B19","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020208410078000_btx471-B20","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btu266","article-title":"ExSPAnder: a universal repeat resolver for DNA fragment assembly","volume":"30","author":"Prjibelski","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B21","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1038\/nmeth.2251","article-title":"Streaming fragment assignment for real-time analysis of sequencing experiments","volume":"10","author":"Roberts","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020208410078000_btx471-B22","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1093\/bioinformatics\/btw651","article-title":"Recycler: an algorithm for detecting plasmids from de novo assembly graphs","volume":"33","author":"Rozov","year":"2017","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B23","doi-asserted-by":"crossref","first-page":"e43","DOI":"10.1093\/nar\/gkw1191","article-title":"The combination of direct and paired link graphs can boost repetitive genome assembly","volume":"45","author":"Shi","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023020208410078000_btx471-B24","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btq217","article-title":"Efficient construction of an assembly string graph using the FM-index","volume":"26","author":"Simpson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020208410078000_btx471-B25","doi-asserted-by":"crossref","first-page":"509.","DOI":"10.1186\/s13059-014-0509-9","article-title":"Lighter: fast and memory-efficient sequencing error correction without counting","volume":"15","author":"Song","year":"2014","journal-title":"Genome Biol"},{"key":"2023020208410078000_btx471-B26","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1471-2105-13-S6-S1","article-title":"Exploiting sparseness in de novo genome assembly","volume":"13(Suppl. 6)","author":"Ye","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020208410078000_btx471-B27","doi-asserted-by":"crossref","first-page":"e101271.","DOI":"10.1371\/journal.pone.0101271","article-title":"These are not the K-mers you are looking for: Efficient online K-mer counting using a probabilistic data structure","volume":"9","author":"Zhang","year":"2014","journal-title":"PLoS One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/1\/147\/49043491\/bioinformatics_34_1_147.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/1\/147\/49043491\/bioinformatics_34_1_147.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T03:53:14Z","timestamp":1675309994000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/1\/147\/4004871"}},"subtitle":[],"editor":[{"given":"Cenk","family":"Sahinalp","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,7,24]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx471","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/125658","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,1,1]]},"published":{"date-parts":[[2017,7,24]]}}}