{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,15]],"date-time":"2025-11-15T10:22:07Z","timestamp":1763202127359,"version":"3.37.3"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2018,2,19]],"date-time":"2018-02-19T00:00:00Z","timestamp":1518998400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"name":"Research Grants Council of the Hong Kong Special Administrative Region, China","award":["CityU 11256116"],"award-info":[{"award-number":["CityU 11256116"]}]},{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"publisher","award":["61772362","61373048"],"award-info":[{"award-number":["61772362","61373048"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Tianjin Research Program of Application Foundation and Advanced Technology","award":["16JCQNJC00200"],"award-info":[{"award-number":["16JCQNJC00200"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. The new technologies can provide Single Molecular Sequencing (SMS) data that cover about 90% of positions over chromosomes. However, the SMS data has a higher error rate comparing to 1% error rate for short reads. Thus, it becomes very difficult for SNP calling and haplotype assembly using SMS reads. Most existing technologies do not work properly for the SMS data.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this paper, we develop a progressive approach for SNP calling and haplotype assembly that works very well for the SMS data. Our method can handle more than 200 million non-N bases on Chromosome 1 with millions of reads, more than 100 blocks, each of which contains more than 2 million bases and more than 3K SNP sites on average. Experiment results show that the false discovery rate and false negative rate for our method are 15.7 and 11.0% on NA12878, and 16.5 and 11.0% on NA24385. Moreover, the overall switch errors for our method are 7.26 and 5.21 with average 3378 and 5736 SNP sites per block on NA12878 and NA24385, respectively. Here, we demonstrate that SMS reads alone can generate a high quality solution for both SNP calling and haplotype assembly.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Source codes and results are available at https:\/\/github.com\/guofeieileen\/SMRT\/wiki\/Software.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty059","type":"journal-article","created":{"date-parts":[[2018,2,17]],"date-time":"2018-02-17T12:08:19Z","timestamp":1518869299000},"page":"2012-2018","source":"Crossref","is-referenced-by-count":24,"title":["Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data"],"prefix":"10.1093","volume":"34","author":[{"given":"Fei","family":"Guo","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Tianjin University, Tianjin Haihe Education Park, Tianjin, China"}]},{"given":"Dan","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong"}]},{"given":"Lusheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong"},{"name":"University of Hong Kong Shenzhen Research Institute, Shenzhen Hi-Tech Industrial Park, Shenzhen, Guangdong, China"}]}],"member":"286","published-online":{"date-parts":[[2018,2,19]]},"reference":[{"key":"2023012713380752400_bty059-B1","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1089\/cmb.2012.0084","article-title":"Hapcompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data","volume":"19","author":"Aguiar","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023012713380752400_bty059-B2","doi-asserted-by":"crossref","first-page":"513.","DOI":"10.1038\/35035083","article-title":"An SNP map of the human genome generated by reduced representation shotgun sequencing","volume":"407","author":"Altshuler","year":"2000","journal-title":"Nature"},{"key":"2023012713380752400_bty059-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.12688\/f1000research.6037.2","article-title":"Long read nanopore sequencing for detection of hla and cyp2d6 variants and haplotypes","volume":"4","author":"Ammar","year":"2015","journal-title":"F1000Research"},{"key":"2023012713380752400_bty059-B4","doi-asserted-by":"crossref","first-page":"e1003502.","DOI":"10.1371\/journal.pcbi.1003502","article-title":"Haptree: a novel bayesian framework for single individual polyplotyping using ngs data","volume":"10","author":"Berger","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023012713380752400_bty059-B5","doi-asserted-by":"crossref","first-page":"375.","DOI":"10.1186\/1471-2164-13-375","article-title":"Pacific biosciences sequencing technology for genotyping and variation discovery in human data","volume":"13","author":"Carneiro","year":"2012","journal-title":"BMC Genomics"},{"key":"2023012713380752400_bty059-B6","doi-asserted-by":"crossref","first-page":"238.","DOI":"10.1186\/1471-2105-13-238","article-title":"Mapping single molecule sequencing reads using basic local alignment with successive refinement (blasr): application and theory","volume":"13","author":"Chaisson","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023012713380752400_bty059-B7","doi-asserted-by":"crossref","first-page":"627.","DOI":"10.1038\/nrg3933","article-title":"Genetic variation and the de novo assembly of human genomes","volume":"16","author":"Chaisson","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2023012713380752400_bty059-B8","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1101\/gr.6151507","article-title":"Polyscan: an automatic indel and SNP detection approach to the analysis of human resequencing data","volume":"17","author":"Chen","year":"2007","journal-title":"Genome Res"},{"key":"2023012713380752400_bty059-B9","doi-asserted-by":"crossref","first-page":"1938","DOI":"10.1093\/bioinformatics\/btt349","article-title":"Exact algorithms for haplotype assembly from whole-genome sequence data","volume":"29","author":"Chen","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012713380752400_bty059-B10","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1089\/cmb.2015.0035","article-title":"Better ilp-based approaches to haplotype assembly","volume":"23","author":"Chen","year":"2016","journal-title":"J. Comput. Biol"},{"key":"2023012713380752400_bty059-B11","doi-asserted-by":"crossref","first-page":"e1001091.","DOI":"10.1371\/journal.pbio.1001091","article-title":"Modernizing reference genome assemblies","volume":"9","author":"Church","year":"2011","journal-title":"PLoS Biol"},{"key":"2023012713380752400_bty059-B12","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet"},{"year":"2010","author":"Duitama","key":"2023012713380752400_bty059-B13"},{"key":"2023012713380752400_bty059-B14","doi-asserted-by":"crossref","first-page":"2041","DOI":"10.1093\/nar\/gkr1042","article-title":"Fosmid-based whole genome haplotyping of a hapmap trio child: evaluation of single individual haplotyping techniques","volume":"40","author":"Duitama","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023012713380752400_bty059-B15","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1101\/gr.213462.116","article-title":"Hapcut2: robust and accurate haplotype assembly for diverse sequencing technologies","volume":"27","author":"Edge","year":"2017","journal-title":"Genome Res"},{"key":"2023012713380752400_bty059-B16","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1126\/science.1162986","article-title":"Real-time DNA sequencing from single polymerase molecules","volume":"323","author":"Eid","year":"2009","journal-title":"Science"},{"key":"2023012713380752400_bty059-B17","doi-asserted-by":"crossref","first-page":"2801","DOI":"10.1534\/g3.115.023317","article-title":"SMRT sequencing for parallel analysis of multiple targets and accurate SNP phasing","volume":"5","author":"Guo","year":"2015","journal-title":"G3 Genes Genomes Genet"},{"key":"2023012713380752400_bty059-B18","doi-asserted-by":"crossref","first-page":"i183","DOI":"10.1093\/bioinformatics\/btq215","article-title":"Optimal algorithms for haplotype assembly from whole-genome sequence data","volume":"26","author":"He","year":"2010","journal-title":"Bioinformatics"},{"year":"2017","author":"Jain","key":"2023012713380752400_bty059-B19"},{"key":"2023012713380752400_bty059-B20","doi-asserted-by":"crossref","first-page":"2283","DOI":"10.1093\/bioinformatics\/btp373","article-title":"Varscan: variant detection in massively parallel sequencing of individual and pooled samples","volume":"25","author":"Koboldt","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012713380752400_bty059-B21","first-page":"182","volume-title":"ESA","author":"Lancia","year":"2001"},{"key":"2023012713380752400_bty059-B22","doi-asserted-by":"crossref","first-page":"952","DOI":"10.1101\/gr.113084.110","article-title":"SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples","volume":"21","author":"Le","year":"2011","journal-title":"Genome Res"},{"year":"2013","author":"Li","key":"2023012713380752400_bty059-B23"},{"key":"2023012713380752400_bty059-B24","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with burrows\u2013wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012713380752400_bty059-B25","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res"},{"key":"2023012713380752400_bty059-B26","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and samtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012713380752400_bty059-B27","doi-asserted-by":"crossref","first-page":"1124","DOI":"10.1101\/gr.088013.108","article-title":"SNP detection for massively parallel whole-genome resequencing","volume":"19","author":"Li","year":"2009","journal-title":"Genome Res"},{"key":"2023012713380752400_bty059-B28","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1093\/bib\/3.1.23","article-title":"Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem","volume":"3","author":"Lippert","year":"2002","journal-title":"Brief. Bioinf"},{"key":"2023012713380752400_bty059-B29","doi-asserted-by":"crossref","first-page":"2803","DOI":"10.1093\/bioinformatics\/btq526","article-title":"Seqem: an adaptive genotype-calling approach for next-generation sequencing studies","volume":"26","author":"Martin","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012713380752400_bty059-B30","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023012713380752400_bty059-B31","doi-asserted-by":"crossref","first-page":"1097","DOI":"10.1111\/1755-0998.12324","article-title":"A first look at the oxford nanopore minion sequencer","volume":"14","author":"Mikheyev","year":"2014","journal-title":"Mol. Ecol. Resources"},{"key":"2023012713380752400_bty059-B32","doi-asserted-by":"crossref","first-page":"S8.","DOI":"10.1186\/1471-2164-14-S1-S8","article-title":"Genome reassembly with high-throughput sequencing data","volume":"14","author":"Parrish","year":"2013","journal-title":"BMC Genomics"},{"key":"2023012713380752400_bty059-B33","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1101\/gr.194201","article-title":"Ssaha: a fast search method for large DNA databases","volume":"11","author":"Ning","year":"2001","journal-title":"Genome Res"},{"key":"2023012713380752400_bty059-B34","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1007\/978-3-540-30219-3_23","volume-title":"International Workshop on Algorithms in Bioinformatics","author":"Panconesi","year":"2004"},{"key":"2023012713380752400_bty059-B35","first-page":"237","volume-title":"RECOMB","author":"Patterson","year":"2014"},{"key":"2023012713380752400_bty059-B36","doi-asserted-by":"crossref","first-page":"1610","DOI":"10.1093\/bioinformatics\/btv495","article-title":"Hapcol: accurate and memory-efficient haplotype assembly from long reads","volume":"32","author":"Pirola","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713380752400_bty059-B37","doi-asserted-by":"crossref","first-page":"23","DOI":"10.4310\/CIS.2010.v10.n1.a2","article-title":"Theory and algorithms for the haplotype assembly problem","volume":"10","author":"Schwartz","year":"2010","journal-title":"Commun. Inf. Syst"},{"key":"2023012713380752400_bty059-B38","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1038\/nature20098","article-title":"De novo assembly and phasing of a Korean human genome","volume":"538","author":"Seo","year":"2016","journal-title":"Nature"},{"key":"2023012713380752400_bty059-B39","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1101\/gr.096388.109","article-title":"A SNP discovery method to assess variant allele probability from next-generation resequencing data","volume":"20","author":"Shen","year":"2010","journal-title":"Genome Res"},{"key":"2023012713380752400_bty059-B40","doi-asserted-by":"crossref","first-page":"352","DOI":"10.1093\/nar\/28.1.352","article-title":"dbSNP: a database of single nucleotide polymorphisms","volume":"28","author":"Smigielski","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023012713380752400_bty059-B41","doi-asserted-by":"crossref","first-page":"11307.","DOI":"10.1038\/ncomms11307","article-title":"Fast and sensitive mapping of nanopore sequencing reads with graphmap","volume":"7","author":"Sovi\u0107","year":"2016","journal-title":"Nat. Commun"},{"key":"2023012713380752400_bty059-B42","doi-asserted-by":"crossref","first-page":"375.","DOI":"10.1038\/ng1746","article-title":"Automating sequence-based detection and genotyping of SNPs from diploid samples","volume":"38","author":"Stephens","year":"2006","journal-title":"Nat. Genet"},{"key":"2023012713380752400_bty059-B43","first-page":"75.","article-title":"An integrated map of structural variation in 2,504 human genomes","volume":"526","author":"Sudmant","year":"2015","journal-title":"ature"},{"key":"2023012713380752400_bty059-B44","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1101\/gr.2754005","article-title":"novoSNP, a novel computational tool for sequence variation discovery","volume":"15","author":"Weckx","year":"2005","journal-title":"Genome Res"},{"year":"2008","author":"Wu","key":"2023012713380752400_bty059-B45"},{"key":"2023012713380752400_bty059-B46","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1142\/S0219720007002710","article-title":"Research on parameterized algorithms of the individual haplotyping problem","volume":"05","author":"Xie","year":"2007","journal-title":"J. Bioinf. Comput. Biol"},{"key":"2023012713380752400_bty059-B47","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1093\/bioinformatics\/bts001","article-title":"SNP calling using genotype model selection on high-throughput sequencing data","volume":"28","author":"You","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012713380752400_bty059-B48","doi-asserted-by":"crossref","first-page":"e53.","DOI":"10.1371\/journal.pcbi.0010053","article-title":"SNPdetector: a software tool for sensitive and accurate SNP detection","volume":"1","author":"Zhang","year":"2005","journal-title":"PLoS Comput. Biol"},{"key":"2023012713380752400_bty059-B49","doi-asserted-by":"crossref","first-page":"160025","DOI":"10.1038\/sdata.2016.25","article-title":"Extensive sequencing of seven human genomes to characterize benchmark reference materials","volume":"3","author":"Zook","year":"2016","journal-title":"Sci. Data"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/12\/2012\/48935751\/bioinformatics_34_12_2012.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/12\/2012\/48935751\/bioinformatics_34_12_2012.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,1]],"date-time":"2023-09-01T02:25:14Z","timestamp":1693535114000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/12\/2012\/4883351"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,2,19]]},"references-count":49,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2018,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty059","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,6,15]]},"published":{"date-parts":[[2018,2,19]]}}}