{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T06:35:52Z","timestamp":1775111752708,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,6,27]],"date-time":"2018-06-27T00:00:00Z","timestamp":1530057600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF 1320273"],"award-info":[{"award-number":["CCF 1320273"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF 1618427"],"award-info":[{"award-number":["CCF 1618427"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains\u2013\u2013a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1\u201310% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>TenSQR is available at https:\/\/github.com\/SoYeonA\/TenSQR.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty291","type":"journal-article","created":{"date-parts":[[2018,4,12]],"date-time":"2018-04-12T19:32:51Z","timestamp":1523561571000},"page":"i23-i31","source":"Crossref","is-referenced-by-count":30,"title":["Viral quasispecies reconstruction via tensor factorization with successive read removal"],"prefix":"10.1093","volume":"34","author":[{"given":"Soyeon","family":"Ahn","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA"}]},{"given":"Ziqi","family":"Ke","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA"}]},{"given":"Haris","family":"Vikalo","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,6,27]]},"reference":[{"key":"2023051604231706100_bty291-B1","first-page":"353","volume-title":"International Conference on Research in Computational Molecular Biology","author":"Ahn","year":"2017"},{"key":"2023051604231706100_bty291-B2","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-12-S6-S1","article-title":"Inferring viral quasispecies spectra from 454 pyrosequencing reads","volume":"12","author":"Astrovskaya","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023051604231706100_bty291-B3","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1101\/gr.215038.116","article-title":"De novo assembly of viral quasispecies using overlap graphs","volume":"27","author":"Baaijens","year":"2017","journal-title":"Genome Res"},{"key":"2023051604231706100_bty291-B4","doi-asserted-by":"crossref","first-page":"329","DOI":"10.3389\/fmicb.2012.00329","article-title":"Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data","volume":"3","author":"Beerenwinkel","year":"2012","journal-title":"Front. Microbiol"},{"key":"2023051604231706100_bty291-B5","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1109\/JSTSP.2016.2547860","article-title":"Structured low-rank matrix factorization for haplotype assembly","volume":"10","author":"Cai","year":"2016","journal-title":"IEEE J. Selected Topics Signal Process"},{"key":"2023051604231706100_bty291-B6","doi-asserted-by":"crossref","first-page":"2608","DOI":"10.1128\/JVI.03118-12","article-title":"Molecular evolution of viruses of the family filoviridae based on 97 whole-genome sequences","volume":"87","author":"Carroll","year":"2013","journal-title":"J. Virol"},{"key":"2023051604231706100_bty291-B7","first-page":"117","volume-title":"International Conference on Research in Computational Molecular Biology","author":"Chaisson","year":"2017"},{"key":"2023051604231706100_bty291-B8","doi-asserted-by":"crossref","first-page":"e115","DOI":"10.1093\/nar\/gku537","article-title":"Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations","volume":"42","author":"Di Giallonardo","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051604231706100_bty291-B9","doi-asserted-by":"crossref","first-page":"12204","DOI":"10.1038\/ncomms12204","article-title":"A rhesus macaque model of asian-lineage zika virus infection","volume":"7","author":"Dudley","year":"2016","journal-title":"Nat. Commun"},{"key":"2023051604231706100_bty291-B10","doi-asserted-by":"crossref","first-page":"e1000074.","DOI":"10.1371\/journal.pcbi.1000074","article-title":"Viral population estimation using pyrosequencing","volume":"4","author":"Eriksson","year":"2008","journal-title":"PLoS Comput. Biol"},{"key":"2023051604231706100_bty291-B11","author":"Hashemi","year":"2017"},{"key":"2023051604231706100_bty291-B12","first-page":"665","author":"Jain","year":"2013"},{"key":"2023051604231706100_bty291-B13","first-page":"886","author":"Jayasundara","year":"2015"},{"key":"2023051604231706100_bty291-B14","doi-asserted-by":"crossref","first-page":"e1001005.","DOI":"10.1371\/journal.ppat.1001005","article-title":"Quasispecies theory and the behavior of rna viruses","volume":"6","author":"Lauring","year":"2010","journal-title":"PLoS Pathogens"},{"key":"2023051604231706100_bty291-B15","doi-asserted-by":"crossref","first-page":"e6079.","DOI":"10.1371\/journal.pone.0006079","article-title":"Low-abundance hiv drug-resistant viral variants in treatment-experienced persons correlate with historical antiretroviral use","volume":"4","author":"Le","year":"2009","journal-title":"PloS One"},{"key":"2023051604231706100_bty291-B16","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with burrows\u2013wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051604231706100_bty291-B17","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1093\/bib\/3.1.23","article-title":"Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem","volume":"3","author":"Lippert","year":"2002","journal-title":"Brief. Bioinformatics"},{"key":"2023051604231706100_bty291-B18","author":"Malhotra","year":"2015"},{"key":"2023051604231706100_bty291-B19","doi-asserted-by":"crossref","first-page":"i329","DOI":"10.1093\/bioinformatics\/btu295","article-title":"Accurate viral population assembly from ultra-deep sequencing data","volume":"30","author":"Mangul","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051604231706100_bty291-B20","first-page":"17","author":"Posada-Cespedes","year":"2016"},{"key":"2023051604231706100_bty291-B21","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1109\/TCBB.2013.145","article-title":"Hiv haplotype inference using a propagating dirichlet process mixture model","volume":"11","author":"Prabhakaran","year":"2014","journal-title":"IEEE\/ACM Trans. on Comput. Biol. Bioinform. (TCBB)"},{"key":"2023051604231706100_bty291-B22","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1093\/bioinformatics\/btr627","article-title":"Qure: software for viral quasispecies reconstruction from next-generation sequencing data","volume":"28","author":"Prosperi","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051604231706100_bty291-B23","first-page":"431","author":"Schirmer","year":"2014"},{"key":"2023051604231706100_bty291-B24","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1186\/s12859-016-0976-y","article-title":"Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data","volume":"17","author":"Schirmer","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023051604231706100_bty291-B25","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1086\/596736","article-title":"Low-abundance drug-resistant viral variants in chronically hiv-infected, antiretroviral treatment\u2013naive patients significantly impact treatment outcomes","volume":"199","author":"Simen","year":"2009","journal-title":"J. Infectious Dis"},{"key":"2023051604231706100_bty291-B26","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1089\/cmb.2012.0232","article-title":"Probabilistic inference of viral quasispecies subject to recombination","volume":"20","author":"T\u00f6pfer","year":"2013","journal-title":"J. Comput. Biol"},{"key":"2023051604231706100_bty291-B27","doi-asserted-by":"crossref","first-page":"e1003515.","DOI":"10.1371\/journal.pcbi.1003515","article-title":"Viral quasispecies assembly via maximal clique enumeration","volume":"10","author":"T\u00f6pfer","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023051604231706100_bty291-B28","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1089\/cmb.2009.0164","article-title":"Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction","volume":"17","author":"Zagordi","year":"2010","journal-title":"J. Comput. Biol"},{"key":"2023051604231706100_bty291-B29","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1186\/1471-2105-12-119","article-title":"Shorah: estimating the genetic diversity of a mixed sample from next-generation sequencing data","volume":"12","author":"Zagordi","year":"2011","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i23\/50315769\/bioinformatics_34_13_i23.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i23\/50315769\/bioinformatics_34_13_i23.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T04:24:32Z","timestamp":1684211072000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/13\/i23\/5045739"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,27]]},"references-count":29,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2018,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty291","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,1]]},"published":{"date-parts":[[2018,6,27]]}}}