{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,3]],"date-time":"2024-08-03T10:43:07Z","timestamp":1722681787817},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"22","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,11,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The identification of short insertions and deletions (indels) and single nucleotide polymorphisms (SNPs) from Ion Torrent and 454 reads is a challenging problem, essentially because these techniques are prone to sequence erroneously at homopolymers and can, therefore, raise indels in reads. Most of the existing mapping programs do not model homopolymer errors when aligning reads against the reference. The resulting alignments will then contain various kinds of mismatches and indels that confound the accurate determination of variant loci and alleles.<\/jats:p><jats:p>Results: To address these challenges, we realign reads against the reference using our previously proposed hidden Markov model that models homopolymer errors and then merges these pairwise alignments into a weighted alignment graph. Based on our weighted alignment graph and hidden Markov model, we develop a method called PyroHMMvar, which can simultaneously detect short indels and SNPs, as demonstrated in human resequencing data. Specifically, by applying our methods to simulated diploid datasets, we demonstrate that PyroHMMvar produces more accurate results than state-of-the-art methods, such as Samtools and GATK, and is less sensitive to mapping parameter settings than the other methods. We also apply PyroHMMvar to analyze one human whole genome resequencing dataset, and the results confirm that PyroHMMvar predicts SNPs and indels accurately.<\/jats:p><jats:p>Availability and implementation: Source code freely available at the following URL: https:\/\/code.google.com\/p\/pyrohmmvar\/, implemented in C++ and supported on Linux.<\/jats:p><jats:p>Contact: \u00a0ruijiang@tsinghua.edu.cn or cengf08@mails.thu.edu.cn<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt512","type":"journal-article","created":{"date-parts":[[2013,9,1]],"date-time":"2013-09-01T00:34:29Z","timestamp":1377995669000},"page":"2859-2868","source":"Crossref","is-referenced-by-count":9,"title":["PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data"],"prefix":"10.1093","volume":"29","author":[{"given":"Feng","family":"Zeng","sequence":"first","affiliation":[{"name":"1 Bioinformatics Division, TNLIST\/Department of Automation, Tsinghua University, Beijing 100084, China and 2Computational Biology and Bioinformatics Program, University of Southern California, Los Angeles, CA 90089, USA"}]},{"given":"Rui","family":"Jiang","sequence":"additional","affiliation":[{"name":"1 Bioinformatics Division, TNLIST\/Department of Automation, Tsinghua University, Beijing 100084, China and 2Computational Biology and Bioinformatics Program, University of Southern California, Los Angeles, CA 90089, USA"}]},{"given":"Ting","family":"Chen","sequence":"additional","affiliation":[{"name":"1 Bioinformatics Division, TNLIST\/Department of Automation, Tsinghua University, Beijing 100084, China and 2Computational Biology and Bioinformatics Program, University of Southern California, Los Angeles, CA 90089, USA"},{"name":"1 Bioinformatics Division, TNLIST\/Department of Automation, Tsinghua University, Beijing 100084, China and 2Computational Biology and Bioinformatics Program, University of Southern California, Los Angeles, CA 90089, USA"}]}],"member":"286","published-online":{"date-parts":[[2013,8,31]]},"reference":[{"key":"2023012810474568900_btt512-B1","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"Abecasis","year":"2012","journal-title":"Nature"},{"key":"2023012810474568900_btt512-B2","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1101\/gr.112326.110","article-title":"Dindel: accurate indel calls from short-read data","volume":"21","author":"Albers","year":"2011","journal-title":"Genome Res."},{"key":"2023012810474568900_btt512-B3","doi-asserted-by":"crossref","first-page":"i420","DOI":"10.1093\/bioinformatics\/btq365","article-title":"Characteristics of 454 pyrosequencing data\u2013enabling realistic simulation with flowsim","volume":"26","author":"Balzer","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012810474568900_btt512-B4","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1093\/hmg\/ddi006","article-title":"Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes","volume":"14","author":"Bhangale","year":"2005","journal-title":"Hum. Mol. Genet."},{"key":"2023012810474568900_btt512-B5","doi-asserted-by":"crossref","first-page":"2514","DOI":"10.1093\/bioinformatics\/btp486","article-title":"PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds","volume":"25","author":"Chen","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012810474568900_btt512-B6","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat. Methods"},{"key":"2023012810474568900_btt512-B7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Series B Methodol."},{"key":"2023012810474568900_btt512-B8","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet."},{"key":"2023012810474568900_btt512-B9","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023012810474568900_btt512-B10","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1093\/bioinformatics\/bts019","article-title":"Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS","volume":"28","author":"Emde","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012810474568900_btt512-B11","doi-asserted-by":"crossref","first-page":"e7767","DOI":"10.1371\/journal.pone.0007767","article-title":"BFAST: an alignment tool for large scale genome resequencing","volume":"4","author":"Homer","year":"2009","journal-title":"PLoS One"},{"key":"2023012810474568900_btt512-B12","doi-asserted-by":"crossref","first-page":"R143","DOI":"10.1186\/gb-2007-8-7-r143","article-title":"Accuracy and quality of massively parallel DNA pyrosequencing","volume":"8","author":"Huse","year":"2007","journal-title":"Genome Biol."},{"key":"2023012810474568900_btt512-B13","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet."},{"key":"2023012810474568900_btt512-B14","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1186\/1471-2105-10-143","article-title":"PanGEA: identification of allele specific gene expression using the 454 technology","volume":"10","author":"Kofler","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012810474568900_btt512-B15","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012810474568900_btt512-B16","doi-asserted-by":"crossref","first-page":"e254","DOI":"10.1371\/journal.pbio.0050254","article-title":"The diploid genome sequence of an individual human","volume":"5","author":"Levy","year":"2007","journal-title":"PLoS Biol."},{"key":"2023012810474568900_btt512-B17","doi-asserted-by":"crossref","first-page":"1157","DOI":"10.1093\/bioinformatics\/btr076","article-title":"Improving SNP discovery by base alignment quality","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012810474568900_btt512-B18","doi-asserted-by":"crossref","first-page":"1838","DOI":"10.1093\/bioinformatics\/bts280","article-title":"Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly","volume":"28","author":"Li","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012810474568900_btt512-B19","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with Burrows-Wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012810474568900_btt512-B20","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res."},{"key":"2023012810474568900_btt512-B21","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","article-title":"SOAP2: an improved ultrafast tool for short read alignment","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012810474568900_btt512-B22","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1101\/gr.132480.111","article-title":"SOAPindel: efficient identification of indels from short paired reads","volume":"23","author":"Li","year":"2013","journal-title":"Genome Res."},{"key":"2023012810474568900_btt512-B23","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1038\/nature03959","article-title":"Genome sequencing in microfabricated high-density picolitre reactors","volume":"437","author":"Margulies","year":"2005","journal-title":"Nature"},{"key":"2023012810474568900_btt512-B24","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1038\/70570","article-title":"A general approach to single-nucleotide polymorphism discovery","volume":"23","author":"Marth","year":"1999","journal-title":"Nat. Genet."},{"key":"2023012810474568900_btt512-B25","doi-asserted-by":"crossref","first-page":"1182","DOI":"10.1101\/gr.4565806","article-title":"An initial map of insertion and deletion (INDEL) variation in the human genome","volume":"16","author":"Mills","year":"2006","journal-title":"Genome Res."},{"key":"2023012810474568900_btt512-B26","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1038\/nrg2986","article-title":"Genotype and SNP calling from next-generation sequencing data","volume":"12","author":"Nielsen","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023012810474568900_btt512-B27","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1101\/gr.194201","article-title":"SSAHA: a fast search method for large DNA databases","volume":"11","author":"Ning","year":"2001","journal-title":"Genome Res."},{"key":"2023012810474568900_btt512-B28","doi-asserted-by":"crossref","first-page":"348","DOI":"10.1038\/nature10242","article-title":"An integrated semiconductor device enabling non-optical genome sequencing","volume":"475","author":"Rothberg","year":"2011","journal-title":"Nature"},{"key":"2023012810474568900_btt512-B29","doi-asserted-by":"crossref","first-page":"e1000386","DOI":"10.1371\/journal.pcbi.1000386","article-title":"SHRiMP: accurate mapping of short color-space reads","volume":"5","author":"Rumble","year":"2009","journal-title":"PLoS Comput. Biol.,"},{"key":"2023012810474568900_btt512-B30","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1101\/gr.096388.109","article-title":"A SNP discovery method to assess variant allele probability from next-generation resequencing data","volume":"20","author":"Shen","year":"2010","journal-title":"Genome Res."},{"key":"2023012810474568900_btt512-B31","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023012810474568900_btt512-B32","first-page":"75","article-title":"A probabilistic method for small RNA flowgram matching","author":"Vacic","year":"2008","journal-title":"Pac. Symp. Biocomput."},{"key":"2023012810474568900_btt512-B33","doi-asserted-by":"crossref","first-page":"872","DOI":"10.1038\/nature06884","article-title":"The complete genome of an individual by massively parallel DNA sequencing","volume":"452","author":"Wheeler","year":"2008","journal-title":"Nature"},{"key":"2023012810474568900_btt512-B34","doi-asserted-by":"crossref","first-page":"e136","DOI":"10.1093\/nar\/gkt372","article-title":"PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data","volume":"41","author":"Zeng","year":"2013","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/22\/2859\/48900294\/bioinformatics_29_22_2859.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/22\/2859\/48900294\/bioinformatics_29_22_2859.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,17]],"date-time":"2024-05-17T14:44:58Z","timestamp":1715957098000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/22\/2859\/316554"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,31]]},"references-count":34,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2013,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt512","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,11,15]]},"published":{"date-parts":[[2013,8,31]]}}}