{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:43:04Z","timestamp":1753875784095,"version":"3.41.2"},"reference-count":14,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,3,1]],"date-time":"2024-03-01T00:00:00Z","timestamp":1709251200000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["82161148009"],"award-info":[{"award-number":["82161148009"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Intra-host variants refer to genetic variations or mutations that occur within an individual host organism. These variants are typically studied in the context of viruses, bacteria, or other pathogens to understand the evolution of pathogens. Moreover, intra-host variants are also explored in the field of tumor biology and mitochondrial biology to characterize somatic mutations and inherited heteroplasmic mutations. Intra-host variants can involve long insertions, deletions, and combinations of different mutation types, which poses challenges in their identification. The performance of current methods in detecting of complex intra-host variants is unknown.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>First, we simulated a dataset comprising 10 samples with 1869 intra-host variants involving various mutation patterns and benchmarked current variant detection software. The results indicated that though current software can detect most variants with F1-scores between 0.76 and 0.97, their performance in detecting long indels and low frequency variants was limited. Thus, we developed a new software, PySNV, for the detection of complex intra-host variations. On the simulated dataset, PySNV successfully detected 1863 variant cases (F1-score: 0.99) and exhibited the highest Pearson correlation coefficient (PCC: 0.99) to the ground truth in predicting variant frequencies. The results demonstrated that PySNV delivered promising performance even for long indels and low frequency variants, while maintaining computational speed comparable to other methods. Finally, we tested its performance on SARS-CoV-2 replicate sequencing data and found that it reported 21% more variants compared to LoFreq, the best-performing benchmarked software, while showing higher consistency (62% over 54%) within replicates. The discrepancies mostly exist in low-depth regions and low frequency variants.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>https:\/\/github.com\/bnuLyndon\/PySNV\/.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae116","type":"journal-article","created":{"date-parts":[[2024,2,27]],"date-time":"2024-02-27T14:33:47Z","timestamp":1709044427000},"source":"Crossref","is-referenced-by-count":0,"title":["PySNV for complex intra-host variation detection"],"prefix":"10.1093","volume":"40","author":[{"given":"Liandong","family":"Li","sequence":"first","affiliation":[{"name":"Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation , Beijing 100101, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9696-5445","authenticated-orcid":false,"given":"Haoyi","family":"Fu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation , Beijing 100101, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100101, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wentai","family":"Ma","sequence":"additional","affiliation":[{"name":"Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation , Beijing 100101, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100101, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1041-1172","authenticated-orcid":false,"given":"Mingkun","family":"Li","sequence":"additional","affiliation":[{"name":"Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation , Beijing 100101, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100101, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,2,29]]},"reference":[{"year":"2012","author":"Garrison","key":"2024031401335856500_btae116-B1"},{"key":"2024031401335856500_btae116-B2","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1038\/nature05610","article-title":"Patterns of somatic mutation in human cancer genomes","volume":"446","author":"Greenman","year":"2007","journal-title":"Nature"},{"key":"2024031401335856500_btae116-B3","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1186\/s13059-018-1618-7","article-title":"An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar","volume":"20","author":"Grubaugh","year":"2019","journal-title":"Genome Biol"},{"key":"2024031401335856500_btae116-B4","doi-asserted-by":"crossref","first-page":"1560.e1","DOI":"10.1016\/j.cmi.2020.07.032","article-title":"Evolution of viral quasispecies during SARS-CoV-2 infection","volume":"26","author":"Jary","year":"2020","journal-title":"Clin Microbiol Infect"},{"key":"2024031401335856500_btae116-B5","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1101\/gr.129684.111","article-title":"VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing","volume":"22","author":"Koboldt","year":"2012","journal-title":"Genome Res"},{"year":"2013","author":"Li","key":"2024031401335856500_btae116-B6"},{"key":"2024031401335856500_btae116-B7","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2024031401335856500_btae116-B8","doi-asserted-by":"crossref","first-page":"1066","DOI":"10.1016\/j.gpb.2023.10.004","article-title":"RCoV19: a one-stop hub for SARS-CoV-2 genome data integration, variant monitoring, and risk pre-warning","author":"Li","year":"2023","journal-title":"Genom Proteom Bioinform"},{"key":"2024031401335856500_btae116-B9","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1016\/j.ajhg.2010.07.014","article-title":"Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes","volume":"87","author":"Li","year":"2010","journal-title":"Am J Hum Genet"},{"key":"2024031401335856500_btae116-B10","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2024031401335856500_btae116-B11","doi-asserted-by":"crossref","first-page":"e66857","DOI":"10.7554\/eLife.66857","article-title":"Patterns of within-host genetic diversity in SARS-CoV-2","volume":"10","author":"Tonkin-Hill","year":"2021","journal-title":"Elife"},{"key":"2024031401335856500_btae116-B12","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1186\/s12859-021-04294-2","article-title":"HAVoC, a bioinformatic pipeline for reference-based consensus assembly and lineage assignment for SARS-CoV-2 sequences","volume":"22","author":"Truong Nguyen","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2024031401335856500_btae116-B13","doi-asserted-by":"crossref","first-page":"11189","DOI":"10.1093\/nar\/gks918","article-title":"LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets","volume":"40","author":"Wilm","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2024031401335856500_btae116-B14","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae116\/56810559\/btae116.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/3\/btae116\/56965795\/btae116.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/3\/btae116\/56965795\/btae116.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,13]],"date-time":"2024-11-13T00:29:28Z","timestamp":1731457768000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae116\/7616992"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,29]]},"references-count":14,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae116","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2024,3,1]]},"published":{"date-parts":[[2024,2,29]]},"article-number":"btae116"}}