{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T07:56:46Z","timestamp":1773388606040,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009860","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T00:00:00Z","timestamp":1644969600000}}],"reference-count":17,"publisher":"Public Library of Science (PLoS)","issue":"2","license":[{"start":{"date-parts":[[2022,2,4]],"date-time":"2022-02-04T00:00:00Z","timestamp":1643932800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100005825","name":"National Institute of Food and Agriculture","doi-asserted-by":"publisher","award":["2018-67015-28199"],"award-info":[{"award-number":["2018-67015-28199"]}],"id":[{"id":"10.13039\/100005825","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IOS-1744309"],"award-info":[{"award-number":["IOS-1744309"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01-HG006677"],"award-info":[{"award-number":["R01-HG006677"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R35-GM130151"],"award-info":[{"award-number":["R35-GM130151"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. One strategy to upgrade existing assemblies is to generate additional coverage using long-read data, and add that to the previously assembled contigs. SAMBA is a tool that is designed to scaffold and gap-fill existing genome assemblies with additional long-read data, resulting in substantially greater contiguity. SAMBA is the only tool of its kind that also computes and fills in the sequence for all spanned gaps in the scaffolds, yielding much longer contigs. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/alekseyzimin\/masurca\" xlink:type=\"simple\">https:\/\/github.com\/alekseyzimin\/masurca<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1009860","type":"journal-article","created":{"date-parts":[[2022,2,4]],"date-time":"2022-02-04T13:36:32Z","timestamp":1643981792000},"page":"e1009860","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":48,"title":["The SAMBA tool uses long reads to improve the contiguity of genome assemblies"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5091-3092","authenticated-orcid":true,"given":"Aleksey V.","family":"Zimin","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8859-7432","authenticated-orcid":true,"given":"Steven L.","family":"Salzberg","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,2,4]]},"reference":[{"key":"pcbi.1009860.ref001","article-title":"The complete sequence of a human genome","author":"S Nurk","year":"2021","journal-title":"bioRxiv"},{"issue":"7","key":"pcbi.1009860.ref002","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1038\/nbt.2288","article-title":"A hybrid approach for the automated finishing of bacterial genomes","volume":"30","author":"A Bashir","year":"2012","journal-title":"Nature Biotechnology"},{"issue":"1","key":"pcbi.1009860.ref003","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-15-211","article-title":"SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information","volume":"15","author":"M Boetzer","year":"2014","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"pcbi.1009860.ref004","doi-asserted-by":"crossref","first-page":"s13742","DOI":"10.1186\/s13742-015-0076-3","article-title":"LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads","volume":"4","author":"RL Warren","year":"2015","journal-title":"GigaScience"},{"issue":"1","key":"pcbi.1009860.ref005","first-page":"1","article-title":"LRScaf: improving draft genomes using long noisy reads","volume":"20","author":"M Qin","year":"2019","journal-title":"BMC Genomics"},{"issue":"21","key":"pcbi.1009860.ref006","doi-asserted-by":"crossref","first-page":"2669","DOI":"10.1093\/bioinformatics\/btt476","article-title":"The MaSuRCA genome assembler","volume":"29","author":"AV Zimin","year":"2013","journal-title":"Bioinformatics"},{"issue":"5","key":"pcbi.1009860.ref007","doi-asserted-by":"crossref","first-page":"787","DOI":"10.1101\/gr.213405.116","article-title":"Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm","volume":"27","author":"AV Zimin","year":"2017","journal-title":"Genome Research"},{"issue":"4","key":"pcbi.1009860.ref008","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1038\/s41587-020-00747-w","article-title":"Efficient hybrid de novo assembly of human genomes with WENGAN","volume":"39","author":"A Di Genova","year":"2021","journal-title":"Nature Biotechnology"},{"key":"pcbi.1009860.ref009","first-page":"006395","article-title":"Error correction and assembly complexity of single molecule sequencing reads","author":"H Lee","year":"2014","journal-title":"BioRxiv"},{"issue":"6","key":"pcbi.1009860.ref010","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/nbt.3238","article-title":"Assembling large genomes with single-molecule sequencing and locality-sensitive hashing","volume":"33","author":"K Berlin","year":"2015","journal-title":"Nature Biotechnology"},{"issue":"8","key":"pcbi.1009860.ref011","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"A Gurevich","year":"2013","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1009860.ref012","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-020-02134-9","article-title":"Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies","volume":"21","author":"A Rhie","year":"2020","journal-title":"Genome Biology"},{"issue":"6","key":"pcbi.1009860.ref013","doi-asserted-by":"crossref","first-page":"e1007981","DOI":"10.1371\/journal.pcbi.1007981","article-title":"The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies","volume":"16","author":"AV Zimin","year":"2020","journal-title":"PLoS computational biology"},{"issue":"7823","key":"pcbi.1009860.ref014","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1038\/s41586-020-2547-7","article-title":"Telomere-to-telomere assembly of a complete human X chromosome","volume":"585","author":"KH Miga","year":"2020","journal-title":"Nature"},{"key":"pcbi.1009860.ref015","first-page":"iyab227","article-title":"A reference-quality, fully annotated genome from a Puerto Rican individual","author":"AV Zimin","year":"2021","journal-title":"Genetics"},{"issue":"18","key":"pcbi.1009860.ref016","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"H. Li","year":"2018","journal-title":"Bioinformatics"},{"issue":"5","key":"pcbi.1009860.ref017","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1038\/s41587-019-0072-8","article-title":"Assembly of long, error-prone reads using repeat graphs","volume":"37","author":"M Kolmogorov","year":"2019","journal-title":"Nature Biotechnology"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009860","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T00:00:00Z","timestamp":1644969600000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009860","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T13:49:32Z","timestamp":1645019372000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009860"}},"subtitle":[],"editor":[{"given":"Mingfu","family":"Shao","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,2,4]]},"references-count":17,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2,4]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009860","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.10.21.465348","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,4]]}}}