{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,23]],"date-time":"2026-06-23T02:29:32Z","timestamp":1782181772072,"version":"3.54.5"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Second-generation sequencing technologies produce high coverage of the genome by short reads at a low cost, which has prompted development of new assembly methods. In particular, multiple algorithms based on de Bruijn graphs have been shown to be effective for the assembly problem. In this article, we describe a new hybrid approach that has the computational efficiency of de Bruijn graph methods and the flexibility of overlap-based assembly strategies, and which allows variable read lengths while tolerating a significant level of sequencing error. Our method transforms large numbers of paired-end reads into a much smaller number of longer \u2018super-reads\u2019. The use of super-reads allows us to assemble combinations of Illumina reads of differing lengths together with longer reads from 454 and Sanger sequencing technologies, making it one of the few assemblers capable of handling such mixtures. We call our system the Maryland Super-Read Celera Assembler (abbreviated MaSuRCA and pronounced \u2018mazurka\u2019).<\/jats:p>\n               <jats:p>Results: We evaluate the performance of MaSuRCA against two of the most widely used assemblers for Illumina data, Allpaths-LG and SOAPdenovo2, on two datasets from organisms for which high-quality assemblies are available: the bacterium Rhodobacter sphaeroides and chromosome 16 of the mouse genome. We show that MaSuRCA performs on par or better than Allpaths-LG and significantly better than SOAPdenovo on these data, when evaluated against the finished sequence. We then show that MaSuRCA can significantly improve its assemblies when the original data are augmented with long reads.<\/jats:p>\n               <jats:p>Availability: MaSuRCA is available as open-source code at ftp:\/\/ftp.genome.umd.edu\/pub\/MaSuRCA\/. Previous (pre-publication) releases have been publicly available for over a year.<\/jats:p>\n               <jats:p>Contact: \u00a0alekseyz@ipst.umd.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt476","type":"journal-article","created":{"date-parts":[[2013,8,30]],"date-time":"2013-08-30T00:39:04Z","timestamp":1377823144000},"page":"2669-2677","source":"Crossref","is-referenced-by-count":1386,"title":["The MaSuRCA genome assembler"],"prefix":"10.1093","volume":"29","author":[{"given":"Aleksey V.","family":"Zimin","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guillaume","family":"Mar\u00e7ais","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Daniela","family":"Puiu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael","family":"Roberts","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Steven L.","family":"Salzberg","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"James A.","family":"Yorke","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2013,8,29]]},"reference":[{"key":"2023063010272574000_btt476-B1","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol."},{"key":"2023063010272574000_btt476-B2","first-page":"177","article-title":"ARACHNE: a whole-genome shotgun assembler","volume":"12","author":"Batzoglou","year":"2002","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B3","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1101\/gr.7088808","article-title":"Short read fragment assembly of bacterial genomes","volume":"18","author":"Chaisson","year":"2008","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B4","doi-asserted-by":"crossref","first-page":"1914","DOI":"10.1128\/JB.01498-06","article-title":"Genome analyses of three strains of Rhodobacter sphaeroides: evidence of rapid evolution of chromosome II","volume":"189","author":"Choudhary","year":"2007","journal-title":"J. Bacteriol."},{"key":"2023063010272574000_btt476-B5","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1101\/gr.1917404","article-title":"Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs","volume":"14","author":"Chevreux","year":"2004","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B6","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023063010272574000_btt476-B7","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"key":"2023063010272574000_btt476-B8","doi-asserted-by":"crossref","first-page":"2164","DOI":"10.1101\/gr.1390403","article-title":"PCAP: a whole-genome assembly program","volume":"13","author":"Huang","year":"2003","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B9","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1089\/cmb.1995.2.291","article-title":"A new algorithm for DNA sequence assembly","volume":"2","author":"Idury","year":"1995","journal-title":"J. Comput. Biol."},{"key":"2023063010272574000_btt476-B10","doi-asserted-by":"crossref","first-page":"R116","DOI":"10.1186\/gb-2010-11-11-r116","article-title":"Quake: quality-aware detection and correction of sequencing errors","volume":"11","author":"Kelley","year":"2010","journal-title":"Genome Biol."},{"key":"2023063010272574000_btt476-B11","doi-asserted-by":"crossref","first-page":"2964","DOI":"10.1093\/bioinformatics\/btr520","article-title":"Bambus 2: scaffolding metagenomes","volume":"27","author":"Koren","year":"2011","journal-title":"Bioinformatics"},{"key":"2023063010272574000_btt476-B12","doi-asserted-by":"crossref","first-page":"R12","DOI":"10.1186\/gb-2004-5-2-r12","article-title":"Versatile and open software for comparing large genomes","volume":"5","author":"Kurtz","year":"2004","journal-title":"Genome Biol."},{"key":"2023063010272574000_btt476-B13","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/35057062","article-title":"Initial sequencing and analysis of the human genome","volume":"409","author":"Lander","year":"2001","journal-title":"Nature"},{"key":"2023063010272574000_btt476-B14","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023063010272574000_btt476-B15","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1093\/bioinformatics\/btn025","article-title":"SOAP: short oligonucleotide alignment program","volume":"24","author":"Li","year":"2008","journal-title":"Bioinformatics"},{"key":"2023063010272574000_btt476-B16","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1101\/gr.097261.109","article-title":"De novo assembly of human genomes with massively parallel short read sequencing","volume":"20","author":"Li","year":"2010","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B17","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/2047-217X-1-18","article-title":"SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler","volume":"1","author":"Luo","year":"2012","journal-title":"Gigascience"},{"key":"2023063010272574000_btt476-B18","doi-asserted-by":"crossref","first-page":"1718","DOI":"10.1093\/bioinformatics\/btt273","article-title":"GAGE-B: an evaluation of genome assemblers for bacterial organisms","volume":"29","author":"Magoc","year":"2013","journal-title":"Bioinformatics"},{"key":"2023063010272574000_btt476-B19","article-title":"QuoUM: an error corrector for Illumina reads","author":"Mar\u00e7ais","year":"2013"},{"key":"2023063010272574000_btt476-B20","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023063010272574000_btt476-B21","doi-asserted-by":"crossref","first-page":"i137","DOI":"10.1093\/bioinformatics\/btr208","article-title":"Error correction of high-throughput sequencing datasets with non-uniform coverage","volume":"27","author":"Medvedev","year":"2011","journal-title":"Bioinformatics"},{"key":"2023063010272574000_btt476-B22","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.1093\/bioinformatics\/btn548","article-title":"Aggressive assembly of pyrosequencing reads with mates","volume":"24","author":"Miller","year":"2008","journal-title":"Bioinformatics"},{"key":"2023063010272574000_btt476-B23","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ygeno.2010.03.001","article-title":"Assembly algorithms for next-generation sequencing data","volume":"95","author":"Miller","year":"2010","journal-title":"Genomics"},{"key":"2023063010272574000_btt476-B24","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1038\/nature01262","article-title":"Initial sequencing and comparative analysis of the mouse genome","volume":"420","author":"Mouse Genome Sequencing Consortium et al.","year":"2002","journal-title":"Nature"},{"key":"2023063010272574000_btt476-B25","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1101\/gr.731003","article-title":"The Phusion assembler","volume":"13","author":"Mullikin","year":"2003","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B26","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A whole genome assembly of Drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"key":"2023063010272574000_btt476-B27","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1080\/07391102.1989.10507752","article-title":"1-Tuple DNA sequencing: computer analysis","volume":"7","author":"Pevzner","year":"1989","journal-title":"J. Biomol. Struct. Dyn."},{"key":"2023063010272574000_btt476-B28","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023063010272574000_btt476-B29","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1101\/gr.131383.111","article-title":"GAGE: a critical evaluation of genome assemblies and assembly algorithms","volume":"22","author":"Salzberg","year":"2012","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B30","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B31","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res."},{"key":"2023063010272574000_btt476-B32","doi-asserted-by":"crossref","first-page":"1304","DOI":"10.1126\/science.1058040","article-title":"The sequence of the human genome","volume":"291","author":"Venter","year":"2001","journal-title":"Science"},{"key":"2023063010272574000_btt476-B33","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/21\/2669\/50744781\/bioinformatics_29_21_2669.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/21\/2669\/50744781\/bioinformatics_29_21_2669.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T10:28:06Z","timestamp":1688120886000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/21\/2669\/195975"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,29]]},"references-count":33,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2013,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt476","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2013,11]]},"published":{"date-parts":[[2013,8,29]]}}}