{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T04:18:10Z","timestamp":1773289090155,"version":"3.50.1"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,3]],"date-time":"2016-10-03T00:00:00Z","timestamp":1475452800000},"content-version":"vor","delay-in-days":845,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation\u2013maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads. Availability: Our tool VGA is freely available at http:\/\/genetics.cs.ucla.edu\/vga\/<\/jats:p><jats:p>Contact: \u00a0serghei@cs.ucla.edu; eeskin@cs.ucla.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu295","type":"journal-article","created":{"date-parts":[[2014,6,16]],"date-time":"2014-06-16T21:55:09Z","timestamp":1402955709000},"page":"i329-i337","source":"Crossref","is-referenced-by-count":49,"title":["Accurate viral population assembly from ultra-deep sequencing data"],"prefix":"10.1093","volume":"30","author":[{"given":"Serghei","family":"Mangul","sequence":"first","affiliation":[{"name":"1 Computer Science Department, 2Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA, 3Department of Computer Science, Georgia State University, Atlanta, GA, 30303 and 4Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Nicholas C.","family":"Wu","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA, 3Department of Computer Science, Georgia State University, Atlanta, GA, 30303 and 4Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Nicholas","family":"Mancuso","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA, 3Department of Computer Science, Georgia State University, Atlanta, GA, 30303 and 4Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Alex","family":"Zelikovsky","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA, 3Department of Computer Science, Georgia State University, Atlanta, GA, 30303 and 4Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Ren","family":"Sun","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA, 3Department of Computer Science, Georgia State University, Atlanta, GA, 30303 and 4Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Eleazar","family":"Eskin","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA, 3Department of Computer Science, Georgia State University, Atlanta, GA, 30303 and 4Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"},{"name":"1 Computer Science Department, 2Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA, 3Department of Computer Science, Georgia State University, Atlanta, GA, 30303 and 4Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,6,11]]},"reference":[{"key":"2023012711103090800_btu295-B1","doi-asserted-by":"crossref","first-page":"e94","DOI":"10.1093\/nar\/gks251","article-title":"Grinder: a versatile amplicon and shotgun sequence simulator","volume":"40","author":"Angly","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012711103090800_btu295-B2","author":"Armin,","year":"2013"},{"key":"2023012711103090800_btu295-B3","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-12-S6-S1","article-title":"Inferring viral quasispecies spectra from 454 pyrosequencing reads","volume":"12","author":"Astrovskaya","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012711103090800_btu295-B4","doi-asserted-by":"crossref","first-page":"i153","DOI":"10.1093\/bioinformatics\/btn298","article-title":"HapCUT: an efficient and accurate algorithm for the haplotype assembly problem","volume":"24","author":"Bansal","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012711103090800_btu295-B5","doi-asserted-by":"crossref","first-page":"2041","DOI":"10.1093\/nar\/gkr1042","article-title":"Fosmid-based whole genome haplotyping of a hapmap trio child: evaluation of single individual haplotyping techniques","volume":"40","author":"Duitama","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012711103090800_btu295-B6","doi-asserted-by":"crossref","first-page":"e1000074","DOI":"10.1371\/journal.pcbi.1000074","article-title":"Viral population estimation using pyrosequencing","volume":"4","author":"Eriksson","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012711103090800_btu295-B7","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012711103090800_btu295-B8","doi-asserted-by":"crossref","first-page":"e1002529","DOI":"10.1371\/journal.ppat.1002529","article-title":"Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection","volume":"8","author":"Henn","year":"2012","journal-title":"PLoS Pathog."},{"key":"2023012711103090800_btu295-B9","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1101\/gr.088633.108","article-title":"Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes","volume":"19","author":"Hormozdiari","year":"2009","journal-title":"Genome Res."},{"key":"2023012711103090800_btu295-B10","doi-asserted-by":"crossref","first-page":"193","DOI":"10.3233\/ISB-2012-0454","article-title":"QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads","volume":"11","author":"Huang","year":"2011","journal-title":"In Silico Biol."},{"key":"2023012711103090800_btu295-B11","doi-asserted-by":"crossref","DOI":"10.1090\/dimacs\/026","volume-title":"Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, October 11-13, 1993","author":"Johnson","year":"1996"},{"key":"2023012711103090800_btu295-B12","doi-asserted-by":"crossref","first-page":"9530","DOI":"10.1073\/pnas.1105422108","article-title":"Detection and quantification of rare mutations with massively parallel sequencing","volume":"108","author":"Kinde","year":"2011","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012711103090800_btu295-B13","doi-asserted-by":"crossref","DOI":"10.1090\/conm\/352","volume-title":"Graph Colorings","author":"Kubale","year":"2004"},{"key":"2023012711103090800_btu295-B14","doi-asserted-by":"crossref","first-page":"e1001005","DOI":"10.1371\/journal.ppat.1001005","article-title":"Quasispecies theory and the behavior of RNA viruses","volume":"6","author":"Lauring","year":"2010","journal-title":"PLoS Pathog."},{"key":"2023012711103090800_btu295-B15","doi-asserted-by":"crossref","first-page":"1114","DOI":"10.1128\/AAC.01492-10","article-title":"Analysis of low-frequency mutations associated with drug resistance to raltegravir before antiretroviral treatment","volume":"55","author":"Liu","year":"2011","journal-title":"Antimicrob. Agents Chemother."},{"key":"2023012711103090800_btu295-B16","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/2047-217X-1-18","article-title":"SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler","volume":"1","author":"Luo","year":"2012","journal-title":"Gigascience"},{"key":"2023012711103090800_btu295-B17","doi-asserted-by":"crossref","first-page":"237","DOI":"10.3233\/ISB-2012-0458","article-title":"Reconstructing viral quasispecies from NGS amplicon reads","volume":"11","author":"Mancuso","year":"2011","journal-title":"In Silico Biol."},{"key":"2023012711103090800_btu295-B18","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1016\/0042-6822(92)90259-R","article-title":"Complex intrapatient sequence variation in the V1 and V2 hypervariable regions of the HIV-1 gp120 envelope sequence","volume":"191","author":"Martins","year":"1992","journal-title":"Virology"},{"key":"2023012711103090800_btu295-B19","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologiesthe next generation","volume":"11","author":"Metzker","year":"2009","journal-title":"Nat. Rev. Genet."},{"key":"2023012711103090800_btu295-B20","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511813603","volume-title":"Probability and Computing: Randomized Algorithms and Probabilistic Analysis","author":"Mitzenmacher","year":"2005"},{"key":"2023012711103090800_btu295-B21","doi-asserted-by":"crossref","first-page":"1255","DOI":"10.1097\/QAD.0b013e32835461b5","article-title":"On HIV diversity","volume":"26","author":"Ndungu","year":"2012","journal-title":"AIDS"},{"key":"2023012711103090800_btu295-B22","doi-asserted-by":"crossref","first-page":"e1000660","DOI":"10.1371\/journal.pcbi.1000660","article-title":"Recombination rate and selection strength in hiv intra-patient evolution","volume":"6","author":"Neher","year":"2010","journal-title":"PLoS Comput. Biol."},{"key":"2023012711103090800_btu295-B23","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1097\/01.aids.0000216370.69066.7f","article-title":"Selection and persistence of non-nucleoside reverse transcriptase inhibitor-resistant HIV-1 in patients starting and stopping non-nucleoside therapy","volume":"20","author":"Palmer","year":"2006","journal-title":"AIDS"},{"key":"2023012711103090800_btu295-B24","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1093\/bioinformatics\/btr627","article-title":"QuRe: software for viral quasispecies reconstruction from next-generation sequencing data","volume":"28","author":"Prosperi","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012711103090800_btu295-B25","doi-asserted-by":"crossref","first-page":"e5683","DOI":"10.1371\/journal.pone.0005683","article-title":"Quantitative deep sequencing reveals dynamic HIV-1 escape and large population shifts during CCR5 antagonist therapy in vivo","volume":"4","author":"Tsibris","year":"2009","journal-title":"PLoS One"},{"key":"2023012711103090800_btu295-B26","doi-asserted-by":"crossref","first-page":"1195","DOI":"10.1101\/gr.6468307","article-title":"Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance","volume":"17","author":"Wang","year":"2007","journal-title":"Genome Res."},{"key":"2023012711103090800_btu295-B27","doi-asserted-by":"crossref","first-page":"2245","DOI":"10.1093\/bioinformatics\/btt386","article-title":"Leveraging multi-SNP reads from sequencing data for haplotype inference","volume":"29","author":"Yang","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012711103090800_btu295-B28","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1186\/1471-2164-13-475","article-title":"De novo assembly of highly diverse viral populations","volume":"13","author":"Yang","year":"2012","journal-title":"BMC Genomics"},{"key":"2023012711103090800_btu295-B29","doi-asserted-by":"crossref","first-page":"8879","DOI":"10.1128\/jvi.70.12.8879-8887.1996","article-title":"Intrapatient sequence variation of the gag gene of human immunodeficiency virus type 1 plasma virions","volume":"70","author":"Yoshimura","year":"1996","journal-title":"J. Virol."},{"key":"2023012711103090800_btu295-B30","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1186\/1471-2105-12-119","article-title":"ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data","volume":"12","author":"Zagordi","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012711103090800_btu295-B31","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1089\/cmb.2009.0164","article-title":"Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction","volume":"17","author":"Zagordi","year":"2010","journal-title":"J. Comput. Biol."},{"key":"2023012711103090800_btu295-B32","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1007\/978-3-642-29627-7_36","article-title":"Probabilistic inference of viral quasispecies subject to recombination","volume-title":"Research in Computational Molecular Biology","author":"Zagordi","year":"2012"},{"key":"2023012711103090800_btu295-B33","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/12\/i329\/48925366\/bioinformatics_30_12_i329.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/12\/i329\/48925366\/bioinformatics_30_12_i329.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,3]],"date-time":"2025-05-03T10:33:35Z","timestamp":1746268415000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/12\/i329\/392298"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6,11]]},"references-count":33,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2014,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu295","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,6,15]]},"published":{"date-parts":[[2014,6,11]]}}}