{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,11,2]],"date-time":"2022-11-02T05:27:35Z","timestamp":1667366855230},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"S10","license":[{"start":{"date-parts":[[2020,11,1]],"date-time":"2020-11-01T00:00:00Z","timestamp":1604188800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,11,18]],"date-time":"2020-11-18T00:00:00Z","timestamp":1605657600000},"content-version":"vor","delay-in-days":17,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Genomics"],"published-print":{"date-parts":[[2020,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>The emergence of the third generation sequencing technology, featuring longer read lengths, has demonstrated great advancement compared to the next generation sequencing technology and greatly promoted the biological research. However, the third generation sequencing data has a high level of the sequencing error rates, which inevitably affects the downstream analysis. Although the issue of sequencing error has been improving these years, large amounts of data were produced at high sequencing errors, and huge waste will be caused if they are discarded. Thus, the error correction for the third generation sequencing data is especially important. The existing error correction methods have poor performances at heterozygous sites, which are ubiquitous in diploid and polyploidy organisms. Therefore, it is a lack of error correction algorithms for the heterozygous loci, especially at low coverages.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>In this article, we propose a error correction method, named <jats:italic>QIHC<\/jats:italic>. <jats:italic>QIHC<\/jats:italic> is a hybrid correction method, which needs both the next generation and third generation sequencing data. <jats:italic>QIHC<\/jats:italic> greatly enhances the sensitivity of identifying the heterozygous sites from sequencing errors, which leads to a high accuracy on error correction. To achieve this, <jats:italic>QIHC<\/jats:italic> established a set of probabilistic models based on Bayesian classifier, to estimate the heterozygosity of a site and makes a judgment by calculating the posterior probabilities. The proposed method is consisted of three modules, which respectively generates a pseudo reference sequence, obtains the read alignments, estimates the heterozygosity the sites and corrects the read harboring them. The last module is the core module of <jats:italic>QIHC<\/jats:italic>, which is designed to fit for the calculations of multiple cases at a heterozygous site. The other two modules enable the reads mapping to the pseudo reference sequence which somehow overcomes the inefficiency of multiple mappings that adopt by the existing error correction methods.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>To verify the performance of our method, we selected <jats:italic>Canu<\/jats:italic> and <jats:italic>Jabba<\/jats:italic> to compare with <jats:italic>QIHC<\/jats:italic> in several aspects. As a hybrid correction method, we first conducted a groups of experiments under different coverages of the next-generation sequencing data. <jats:italic>QIHC<\/jats:italic> is far ahead of <jats:italic>Jabba<\/jats:italic> on accuracy. Meanwhile, we varied the coverages of the third generation sequencing data and compared performances again among Canu, Jabba and QIHC. <jats:italic>QIHC<\/jats:italic> outperforms the other two methods on accuracy of both correcting the sequencing errors and identifying the heterozygous sites, especially at low coverage. We carried out a comparison analysis between <jats:italic>Canu<\/jats:italic> and <jats:italic>QIHC<\/jats:italic> on the different error rates of the third generation sequencing data. <jats:italic>QIHC<\/jats:italic> still performs better. Therefore, <jats:italic>QIHC<\/jats:italic> is superior to the existing error correction methods when heterozygous sites exist.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12864-020-07008-9","type":"journal-article","created":{"date-parts":[[2020,11,18]],"date-time":"2020-11-18T18:03:41Z","timestamp":1605722621000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model"],"prefix":"10.1186","volume":"21","author":[{"given":"Jiaqi","family":"Liu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiayin","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiao","family":"Xiao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xin","family":"Lai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daocheng","family":"Dai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuanping","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoyan","family":"Zhu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhongmeng","family":"Zhao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Juan","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhimin","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,11,18]]},"reference":[{"issue":"5","key":"7008_CR1","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1016\/j.gpb.2015.08.002","volume":"13","author":"A Rhoads","year":"2015","unstructured":"Rhoads A, Au KF. Pacbio sequencing and its applications. Genomics Proteomics Bioinforma. 2015; 13(5):278\u201389.","journal-title":"Genomics Proteomics Bioinforma"},{"issue":"1","key":"7008_CR2","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1186\/s12864-017-3757-8","volume":"18","author":"NV Hoang","year":"2017","unstructured":"Hoang NV, Furtado A, Mason PJ, Marquardt A, Kasirajan L, Thirugnanasambandam PP, Botha FC, Henry RJ. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing. BMC Genomics. 2017; 18(1):395.","journal-title":"BMC Genomics"},{"issue":"4","key":"7008_CR3","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1093\/dnares\/dsw022","volume":"23","author":"SS Vembar","year":"2016","unstructured":"Vembar SS, Seetin M, Lambert C, Nattestad M, Schatz MC, Baybayan P, Scherf A, Smith ML. Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11kb), single molecule, real-time sequencing. DNA Research. 2016; 23(4):339\u201351.","journal-title":"DNA Research"},{"issue":"6","key":"7008_CR4","first-page":"940","volume":"18","author":"A Magi","year":"2017","unstructured":"Magi A, Giusti B, Tattini L. Characterization of minion nanopore data for resequencing analyses. Brief Bioinform. 2017; 18(6):940\u201353.","journal-title":"Brief Bioinform"},{"issue":"1","key":"7008_CR5","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1186\/s13059-016-1103-0","volume":"17","author":"M Jain","year":"2016","unstructured":"Jain M, Olsen HE, Paten B, Akeson M. The oxford nanopore minion: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016; 17(1):239.","journal-title":"Genome Biol"},{"issue":"12","key":"7008_CR6","doi-asserted-by":"publisher","first-page":"2072","DOI":"10.1101\/gr.228148.117","volume":"27","author":"RJ Mcginty","year":"2017","unstructured":"Mcginty RJ, Rubinstein RG, Neil AJ, Dominska M, Kiktev D, Petes TD, Mirkin SM. Nanopore sequencing of complex genomic rearrangements in yeast reveals mechanisms of repeat-mediated double-strand break repair. Genome Res. 2017; 27(12):2072\u201382.","journal-title":"Genome Res"},{"issue":"4","key":"7008_CR7","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1038\/nbt.4060","volume":"36","author":"M Jain","year":"2018","unstructured":"Jain M, Koren S, Quick J, Rand AC, Sasani TA, Tyson JR, Beggs AD, Dilthey AT, Fiddes IT, Malla S. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol. 2018; 36(4):338\u201345.","journal-title":"Nat Biotechnol"},{"key":"7008_CR8","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1016\/j.jbiotec.2017.04.016","volume":"258","author":"A Kranz","year":"2017","unstructured":"Kranz A, Vogel A, Degner U, Kiefler I, Bott M, Usadel B, Polen T. High precision genome sequencing of engineered g. oxydans 621h by combining long nanopore and short accurate illumina reads. J Biotechnol. 2017; 258:197\u2013205.","journal-title":"J Biotechnol"},{"key":"7008_CR9","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1038\/s41592-018-0001-7","volume":"15","author":"FJ Sedlazeck","year":"2018","unstructured":"Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, Haeseler AV, Schatz MC. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018; 15:461\u2013468.","journal-title":"Nat Methods"},{"issue":"6","key":"7008_CR10","doi-asserted-by":"publisher","first-page":"1485","DOI":"10.1002\/bit.26561","volume":"115","author":"JF Cartwright","year":"2018","unstructured":"Cartwright JF, Anderson K, Longworth J, Lobb P, James DC. Highly sensitive detection of mutations in cho cell recombinant dna using multi-parallel single molecule real-time dna sequencing. Biotech Bioeng. 2018; 115(6):1485\u201398.","journal-title":"Biotech Bioeng"},{"key":"7008_CR11","doi-asserted-by":"publisher","first-page":"7438","DOI":"10.1038\/ncomms8438","volume":"6","author":"J Beaulaurier","year":"2015","unstructured":"Beaulaurier J, Zhang XS, Zhu SJ, Sebra R, Rosenbluh C, Deikus G, Shen N, Munera D, Waldor MK, Chess A. Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes. Nat Commun. 2015; 6:7438.","journal-title":"Nat Commun"},{"issue":"4","key":"7008_CR12","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1038\/nmeth.4184","volume":"14","author":"JT Simpson","year":"2017","unstructured":"Simpson JT, Workman RE, Zuzarte PC, David M, Dursi LJ, Timp W. Detecting dna cytosine methylation using nanopore sequencing. Nat Methods. 2017; 14(4):407\u201310.","journal-title":"Nat Methods"},{"issue":"4","key":"7008_CR13","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1038\/nmeth.4189","volume":"14","author":"AC Rand","year":"2017","unstructured":"Rand AC, Jain M, Eizenga JM, Musselman-Brown A, Olsen HE, Akeson M, Paten B. Mapping dna methylation with high-throughput nanopore sequencing. Nat Methods. 2017; 14(4):411\u20133.","journal-title":"Nat Methods"},{"issue":"1","key":"7008_CR14","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1038\/gim.2017.86","volume":"20","author":"JD Merker","year":"2018","unstructured":"Merker JD, Wenger AM, Sneddon T, Grove M, Zappala Z, Fresard L, Waggott D, Utiramerur S, Hou YL, Smith KS. Long-read genome sequencing identifies causal structural variation in a mendelian disease. Genet Med. 2018; 20(1):159\u201363.","journal-title":"Genet Med"},{"key":"7008_CR15","unstructured":"J K. Understanding accuracy in smrt sequencing. Pac Biosci. 2013."},{"key":"7008_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.bdq.2015.02.001","volume":"3","author":"T Laver","year":"2015","unstructured":"Laver T, Harrison J, O\u2019Neill PA, Moore K, Farbos A, Paszkiewicz K, Studholme DJ. Assessing the performance of the oxford nanopore technologies minion. Biomol Detect Quantif. 2015; 3:1\u20138.","journal-title":"Biomol Detect Quantif"},{"issue":"1","key":"7008_CR17","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1093\/bib\/bbv029","volume":"17","author":"D Laehnemann","year":"2016","unstructured":"Laehnemann D, Borkhardt A, McHardy AC. Denoising dna deep sequencing data\u2013high-throughput sequencing errors and their correction. Brief Bioinform. 2016; 17(1):154\u201379.","journal-title":"Brief Bioinform"},{"key":"7008_CR18","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1186\/s13059-018-1605-z","volume":"20","author":"SH Fu","year":"2019","unstructured":"Fu SH, Wang AQ, Au KF. A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol. 2019; 20:26.","journal-title":"Genome Biol"},{"issue":"7","key":"7008_CR19","first-page":"1","volume":"15","author":"MS Fujimoto","year":"2014","unstructured":"Fujimoto MS, Bodily PM, Okuda N, Clement MJ, Snell Q. Effects of error-correction of heterozygous next-generation sequencing data. BMC Bioinformatics. 2014; 15(7):1\u20138.","journal-title":"BMC Bioinformatics"},{"issue":"24","key":"7008_CR20","doi-asserted-by":"publisher","first-page":"3506","DOI":"10.1093\/bioinformatics\/btu538","volume":"30","author":"L Salmela","year":"2014","unstructured":"Salmela L, Rivals E. Lordec: accurate and efficient long read error correction. Bioinformatics. 2014; 30(24):3506\u201314.","journal-title":"Bioinformatics"},{"issue":"7","key":"7008_CR21","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1038\/nbt.2280","volume":"30","author":"S Koren","year":"2012","unstructured":"Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z, Rasko DA, McCombie WR, Jarvis ED. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol. 2012; 30(7):693\u2013700.","journal-title":"Nat Biotechnol"},{"issue":"21","key":"7008_CR22","doi-asserted-by":"publisher","first-page":"3004","DOI":"10.1093\/bioinformatics\/btu392","volume":"30","author":"T Hackl","year":"2014","unstructured":"Hackl T, Hedrich R, Schultz J, F F. proovread: large-scale high-accuracy pacbio correction through iterative short read consensus. Bioinformatics. 2014; 30(21):3004\u201311.","journal-title":"Bioinformatics"},{"key":"7008_CR23","doi-asserted-by":"crossref","unstructured":"Lee H, Gurtowski J, Yoo S, Marcus S, McCombie WR, Schatz M. Error correction and assembly complexity of single molecule sequencing reads. bioRxiv. 2014. https:\/\/doi.org\/10.1101\/006395.","DOI":"10.1101\/006395"},{"issue":"10","key":"7008_CR24","doi-asserted-by":"publisher","first-page":"46679","DOI":"10.1371\/journal.pone.0046679","volume":"7","author":"KF Au","year":"2012","unstructured":"Au KF, Underwood JG, Lee L, Wong WH. Improving pacbio long read accuracy by short read alignment. Plos ONE. 2012; 7(10):46679.","journal-title":"Plos ONE"},{"issue":"5","key":"7008_CR25","doi-asserted-by":"publisher","first-page":"722","DOI":"10.1101\/gr.215087.116","volume":"27","author":"S Koren","year":"2017","unstructured":"Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017; 27(5):722\u201336.","journal-title":"Genome Res"},{"key":"7008_CR26","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1186\/s13015-016-0075-7","volume":"11","author":"G Miclotte","year":"2016","unstructured":"Miclotte G, Heydari M D, Demeester P, Rombauts S, Yves VDP, Audenaert P, Fostier J. Jabba: hybrid error correction for long sequencing reads. Algorithm Mol Biol. 2016; 11:10.","journal-title":"Algorithm Mol Biol"},{"issue":"1","key":"7008_CR27","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1093\/bioinformatics\/bts649","volume":"29","author":"Y Ono","year":"2013","unstructured":"Ono Y, Asai K, Hamada M. Pbsim: Pacbio reads simulator\u2013toward accurate genome assembly. Bioinformatics. 2013; 29(1):119\u201321.","journal-title":"Bioinformatics"},{"issue":"1","key":"7008_CR28","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1186\/1471-2105-13-238","volume":"13","author":"MJ Chaisson","year":"2012","unstructured":"Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (blasr): application and theory. BMC Bioinformatics. 2012; 13(1):238.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"7008_CR29","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1186\/s12859-018-2051-3","volume":"19","author":"JR Wang","year":"2018","unstructured":"Wang JR, Holt J, McMillan L, Jones CD. Fmlrc: Hybrid long read error correction using an fm-index. BMC Bioinformatics. 2018; 19(1):50.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"7008_CR30","doi-asserted-by":"publisher","first-page":"204","DOI":"10.1186\/s12859-017-1610-3","volume":"18","author":"E Bao","year":"2017","unstructured":"Bao E, Lan LX. Halc: High throughput algorithm for long read error correction. BMC Bioinformatics. 2017; 18(1):204.","journal-title":"BMC Bioinformatics"},{"issue":"6","key":"7008_CR31","doi-asserted-by":"publisher","first-page":"961","DOI":"10.1101\/gr.112326.110","volume":"21","author":"CA Albers","year":"2011","unstructured":"Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R. Dindel: Accurate indel calls from short-read data. Genome Res. 2011; 21(6):961\u201373.","journal-title":"Genome Res"}],"container-title":["BMC Genomics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12864-020-07008-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12864-020-07008-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12864-020-07008-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,19]],"date-time":"2020-11-19T10:05:04Z","timestamp":1605780304000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-020-07008-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11]]},"references-count":31,"journal-issue":{"issue":"S10","published-print":{"date-parts":[[2020,11]]}},"alternative-id":["7008"],"URL":"https:\/\/doi.org\/10.1186\/s12864-020-07008-9","relation":{},"ISSN":["1471-2164"],"issn-type":[{"value":"1471-2164","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11]]},"assertion":[{"value":"18 November 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Juan Wang and Zhimin Li hold the positions at Annoroad Gene Technology.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"753"}}