{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T22:02:12Z","timestamp":1766181732189,"version":"3.41.2"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2021,10,8]],"date-time":"2021-10-08T00:00:00Z","timestamp":1633651200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1909208","61772557"],"award-info":[{"award-number":["U1909208","61772557"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100013314","name":"111 Project","doi-asserted-by":"publisher","award":["B18059"],"award-info":[{"award-number":["B18059"]}],"id":[{"id":"10.13039\/501100013314","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Hunan Provincial Science and Technology Program","award":["2018wk4001"],"award-info":[{"award-number":["2018wk4001"]}]},{"DOI":"10.13039\/100005825","name":"US National Institute of Food and Agriculture","doi-asserted-by":"crossref","award":["2017-70016-26051"],"award-info":[{"award-number":["2017-70016-26051"]}],"id":[{"id":"10.13039\/100005825","id-type":"DOI","asserted-by":"crossref"}]},{"name":"US National Science Foundation","award":["ABI-1759856"],"award-info":[{"award-number":["ABI-1759856"]}]},{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","award":["FCC\/1\/1976-26-01","URF\/1\/3412-01-01","URF\/1\/4098-01-01","REI\/1\/4742-01-01","REI\/1\/4473-01-01"],"award-info":[{"award-number":["FCC\/1\/1976-26-01","URF\/1\/3412-01-01","URF\/1\/4098-01-01","REI\/1\/4742-01-01","REI\/1\/4473-01-01"]}],"id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,17]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Long-read sequencing technology enables significant progress in de novo genome assembly. However, the high error rate and the wide error distribution of raw reads result in a large number of errors in the assembly. Polishing is a procedure to fix errors in the draft assembly and improve the reliability of genomic analysis. However, existing methods treat all the regions of the assembly equally while there are fundamental differences between the error distributions of these regions. How to achieve very high accuracy in genome assembly is still a challenging problem. Motivated by the uneven errors in different regions of the assembly, we propose a novel polishing workflow named BlockPolish. In this method, we divide contigs into blocks with low complexity and high complexity according to statistics of aligned nucleotide bases. Multiple sequence alignment is applied to realign raw reads in complex blocks and optimize the alignment result. Due to the different distributions of error rates in trivial and complex blocks, two multitask bidirectional Long short-term memory (LSTM) networks are proposed to predict the consensus sequences. In the whole-genome assemblies of NA12878 assembled by Wtdbg2 and Flye using Nanopore data, BlockPolish has a higher polishing accuracy than other state-of-the-arts including Racon, Medaka and MarginPolish &amp; HELEN. In all assemblies, errors are predominantly indels and BlockPolish has a good performance in correcting them. In addition to the Nanopore assemblies, we further demonstrate that BlockPolish can also reduce the errors in the PacBio assemblies. The source code of BlockPolish is freely available on Github (https:\/\/github.com\/huangnengCSU\/BlockPolish).<\/jats:p>","DOI":"10.1093\/bib\/bbab405","type":"journal-article","created":{"date-parts":[[2021,9,8]],"date-time":"2021-09-08T11:20:51Z","timestamp":1631100051000},"source":"Crossref","is-referenced-by-count":6,"title":["BlockPolish: accurate polishing of long-read assembly via block divide-and-conquer"],"prefix":"10.1093","volume":"23","author":[{"given":"Neng","family":"Huang","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, China"}]},{"given":"Fan","family":"Nie","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, China"}]},{"given":"Peng","family":"Ni","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, China"}]},{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Computer Science, King Abdullah University of Science and Technology, Saudi Arabia"}]},{"given":"Feng","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Computing, Clemson University, USA"}]},{"given":"Jianxin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, China"}]}],"member":"286","published-online":{"date-parts":[[2021,10,8]]},"reference":[{"issue":"8","key":"2022011921130245600_ref1","doi-asserted-by":"crossref","first-page":"780","DOI":"10.1038\/nmeth.3454","article-title":"Assembly and diploid architecture of an individual human genome via single-molecule technologies","volume":"12","author":"Pendleton","year":"2015","journal-title":"Nat Methods"},{"issue":"1","key":"2022011921130245600_ref2","first-page":"1","article-title":"Efficient assembly of nanopore reads via highly accurate and intact error correction","volume":"12","author":"Chen","year":"2021","journal-title":"Nat Commun"},{"issue":"4","key":"2022011921130245600_ref3","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/nbt.4060","article-title":"Nanopore sequencing and assembly of a human genome with ultra-long reads","volume":"36","author":"Jain","year":"2018","journal-title":"Nat Biotechnol"},{"issue":"22","key":"2022011921130245600_ref4","doi-asserted-by":"crossref","first-page":"4586","DOI":"10.1093\/bioinformatics\/btz276","article-title":"Deepsignal: detecting dna methylation state from nanopore sequencing reads using deep-learning","volume":"35","author":"Ni","year":"2019","journal-title":"Bioinformatics"},{"issue":"5","key":"2022011921130245600_ref5","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giy037","article-title":"Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning","volume":"7","author":"Teng","year":"2018","journal-title":"GigaScience"},{"issue":"14","key":"2022011921130245600_ref6","doi-asserted-by":"crossref","first-page":"4191","DOI":"10.1093\/bioinformatics\/btaa297","article-title":"Deepnano-blitz: a fast base caller for minion nanopore sequencers","volume":"36","author":"Bo\u017ea","year":"2020","journal-title":"Bioinformatics"},{"key":"2022011921130245600_ref7","doi-asserted-by":"crossref","DOI":"10.1109\/TCBB.2020.3039244","article-title":"Sacall: a neural network basecaller for oxford nanopore sequencing data based on self-attention mechanism","author":"Huang","year":"2020","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"issue":"2","key":"2022011921130245600_ref8","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1038\/s41592-020-01056-5","article-title":"Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm","volume":"18","author":"Cheng","year":"2021","journal-title":"Nat Methods"},{"key":"2022011921130245600_ref9","first-page":"1","article-title":"Nanopore sequencing and the shasta toolkit enable efficient de novo assembly of eleven human genomes","author":"Shafin","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2022011921130245600_ref10","article-title":"A sensitive repeat identification framework based on short and long reads","author":"Liao","year":"2021","journal-title":"Nucleic Acids Res"},{"issue":"21","key":"2022011921130245600_ref11","first-page":"e125","article-title":"Hercules: a profile hmm-based hybrid error correction algorithm for long reads","volume":"46","author":"Firtina","year":"2018","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2022011921130245600_ref12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-017-1610-3","article-title":"Halc: High throughput algorithm for long read error correction","volume":"18","author":"Bao","year":"2017","journal-title":"BMC bioinformatics"},{"issue":"11","key":"2022011921130245600_ref13","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1101\/gr.191395.115","article-title":"Oxford nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome","volume":"25","author":"Goodwin","year":"2015","journal-title":"Genome Res"},{"issue":"1","key":"2022011921130245600_ref14","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1186\/s12864-015-1519-z","article-title":"Genome assembly using nanopore-guided long and error-free dna reads","volume":"16","author":"Madoui","year":"2015","journal-title":"BMC Genomics"},{"issue":"21","key":"2022011921130245600_ref15","doi-asserted-by":"crossref","first-page":"3004","DOI":"10.1093\/bioinformatics\/btu392","article-title":"proovread: large-scale high-accuracy pacbio correction through iterative short read consensus","volume":"30","author":"Hackl","year":"2014","journal-title":"Bioinformatics"},{"issue":"21","key":"2022011921130245600_ref16","doi-asserted-by":"crossref","first-page":"2669","DOI":"10.1093\/bioinformatics\/btt476","article-title":"The masurca genome assembler","volume":"29","author":"Zimin","year":"2013","journal-title":"Bioinformatics"},{"issue":"7","key":"2022011921130245600_ref17","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1038\/nbt.2280","article-title":"Hybrid error correction and de novo assembly of single-molecule sequencing reads","volume":"30","author":"Koren","year":"2012","journal-title":"Nat Biotechnol"},{"issue":"20","key":"2022011921130245600_ref18","doi-asserted-by":"crossref","first-page":"3953","DOI":"10.1093\/bioinformatics\/btz206","article-title":"Flas: fast and high-throughput algorithm for pacbio long-read self-correction","volume":"35","author":"Bao","year":"2019","journal-title":"Bioinformatics"},{"issue":"6","key":"2022011921130245600_ref19","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1093\/bioinformatics\/btw321","article-title":"Accurate self-correction of errors in long reads using de bruijn graphs","volume":"33","author":"Salmela","year":"2017","journal-title":"Bioinformatics"},{"issue":"5","key":"2022011921130245600_ref20","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"issue":"5","key":"2022011921130245600_ref21","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1038\/s41587-019-0072-8","article-title":"Assembly of long, error-prone reads using repeat graphs","volume":"37","author":"Kolmogorov","year":"2019","journal-title":"Nat Biotechnol"},{"issue":"2","key":"2022011921130245600_ref22","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1038\/s41592-019-0669-3","article-title":"Fast and accurate long-read assembly with wtdbg2","volume":"17","author":"Ruan","year":"2020","journal-title":"Nat Methods"},{"issue":"11","key":"2022011921130245600_ref23","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0112963","article-title":"Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement","volume":"9","author":"Walker","year":"2014","journal-title":"PloS one"},{"key":"2022011921130245600_ref24","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btz891","article-title":"Nextpolish: a fast and efficient genome polishing tool for long read assembly","author":"Hu","year":"2020","journal-title":"Bioinformatics"},{"issue":"21","key":"2022011921130245600_ref25","doi-asserted-by":"crossref","first-page":"4430","DOI":"10.1093\/bioinformatics\/btz400","article-title":"ntedit: scalable genome sequence polishing","volume":"35","author":"Warren","year":"2019","journal-title":"Bioinformatics"},{"issue":"6","key":"2022011921130245600_ref26","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1007981","article-title":"The genome polishing tool polca makes fast and accurate corrections in genome assemblies","volume":"16","author":"Zimin","year":"2020","journal-title":"PLoS Comput Biol"},{"issue":"12","key":"2022011921130245600_ref27","doi-asserted-by":"crossref","first-page":"3669","DOI":"10.1093\/bioinformatics\/btaa179","article-title":"Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm","volume":"36","author":"Firtina","year":"2020","journal-title":"Bioinformatics"},{"issue":"5","key":"2022011921130245600_ref28","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1101\/gr.214270.116","article-title":"Fast and accurate de novo genome assembly from long uncorrected reads","volume":"27","author":"Vaser","year":"2017","journal-title":"Genome Res"},{"issue":"8","key":"2022011921130245600_ref29","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1093\/bioinformatics\/btg109","article-title":"Generating consensus sequences from partial order multiple sequence alignment graphs","volume":"19","author":"Lee","year":"2003","journal-title":"Bioinformatics"},{"key":"2022011921130245600_ref30","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btab354","article-title":"Neuralpolish: a novel nanopore polishing method based on alignment matrix construction and orthogonal bi-gru networks","author":"Huang","year":"2021","journal-title":"Bioinformatics"},{"volume-title":"Kalign 3: multiple sequence alignment of large datasets","year":"2020","author":"Lassmann","key":"2022011921130245600_ref31"},{"issue":"5","key":"2022011921130245600_ref32","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1038\/s41587-019-0074-6","article-title":"An open resource for accurately benchmarking small variant and reference calls","volume":"37","author":"Zook","year":"2019","journal-title":"Nat Biotechnol"},{"issue":"12","key":"2022011921130245600_ref33","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1038\/nmeth.4035","article-title":"Phased diploid genome assembly with single-molecule real-time sequencing","volume":"13","author":"Chin","year":"2016","journal-title":"Nat Methods"},{"issue":"6","key":"2022011921130245600_ref34","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read smrt sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat Methods"},{"issue":"18","key":"2022011921130245600_ref35","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"issue":"11","key":"2022011921130245600_ref36","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1038\/nmeth.4432","article-title":"Mecat: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads","volume":"14","author":"Xiao","year":"2017","journal-title":"Nat Methods"},{"key":"2022011921130245600_ref37","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1145\/1143844.1143891","article-title":"Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks","volume-title":"Proceedings of the 23rd international conference on Machine learning","author":"Graves","year":"2006"},{"article-title":"Adam: A method for stochastic optimization","year":"2014","author":"Kingma","key":"2022011921130245600_ref38"},{"issue":"6","key":"2022011921130245600_ref39","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Communications of the ACM"},{"issue":"16","key":"2022011921130245600_ref40","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and samtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"article-title":"Haplotype-resolved de novo assembly with phased assembly graphs","year":"2020","author":"Cheng","key":"2022011921130245600_ref41"},{"issue":"8","key":"2022011921130245600_ref42","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"Quast: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"issue":"2","key":"2022011921130245600_ref43","doi-asserted-by":"crossref","DOI":"10.1093\/nargab\/lqaa037","article-title":"Benchmarking of long-read correction methods","volume":"2","author":"Dohm","year":"2020","journal-title":"NAR Genomics and Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab405\/42230652\/bbab405.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab405\/42230652\/bbab405.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,8]],"date-time":"2023-11-08T14:23:59Z","timestamp":1699453439000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab405\/6383560"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,8]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1,17]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab405","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2022,1]]},"published":{"date-parts":[[2021,10,8]]},"article-number":"bbab405"}}