{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:32Z","timestamp":1740185132376,"version":"3.37.3"},"reference-count":14,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2022,8,17]],"date-time":"2022-08-17T00:00:00Z","timestamp":1660694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2021YFF1201200"],"award-info":[{"award-number":["2021YFF1201200"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62150048","U1909208"],"award-info":[{"award-number":["62150048","U1909208"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>\u2003<\/jats:title><jats:p>The third-generation sequencing technology has advanced genome analysis with long-read length, but the reads need error correction due to the high error rate. Error correction is a time-consuming process especially when the sequencing coverage is high. Generally, for a pair of overlapping reads A and B, the existing error correction methods perform a base-level alignment from B to A when correcting the read A. And another base-level alignment from A to B is performed when correcting the read B. However, based on our observation, the base-level alignment information can be reused. In this article, we present a fast error correction tool Fec, using two-rounds overlapping and caching. Fec can be used independently or as an error correction step in an assembly pipeline. In the first round, Fec uses a large window size (20) to quickly find enough overlaps to correct most of the reads. In the second round, a small window size (5) is used to find more overlaps for the reads with insufficient overlaps in the first round. When performing base-level alignment, Fec searches the cache first. If the alignment exists in the cache, Fec takes this alignment out and deduces the second alignment from it. Otherwise, Fec performs base-level alignment and stores the alignment in the cache. We test Fec on nine datasets, and the results show that Fec has 1.24\u201338.56 times speed-up compared to MECAT, CANU and MINICNS on five PacBio datasets and 1.16\u201327.8 times speed-up compared to NECAT and CANU on four nanopore datasets.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Fec is available at https:\/\/github.com\/zhangjuncsu\/Fec.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac565","type":"journal-article","created":{"date-parts":[[2022,8,17]],"date-time":"2022-08-17T20:17:51Z","timestamp":1660767471000},"page":"4629-4632","source":"Crossref","is-referenced-by-count":0,"title":["Fec: a fast error correction method based on two-rounds overlapping and caching"],"prefix":"10.1093","volume":"38","author":[{"given":"Jun","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fan","family":"Nie","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Neng","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peng","family":"Ni","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Feng","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Computing, Clemson University , Clemson, SC 29634, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1516-0480","authenticated-orcid":false,"given":"Jianxin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,8,17]]},"reference":[{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/s12859-017-1610-3","article-title":"HALC: high throughput algorithm for long read error correction","volume":"18","author":"Bao","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/nbt.3238","article-title":"Assembling large genomes with single-molecule sequencing and locality-sensitive hashing","volume":"33","author":"Berlin","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1038\/s41467-020-20236-7","article-title":"Efficient assembly of nanopore reads via highly accurate and intact error correction","volume":"12","author":"Chen","year":"2021","journal-title":"Nat. Commun"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat. Methods"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1038\/s41587-019-0072-8","article-title":"Assembly of long, error-prone reads using repeat graphs","volume":"37","author":"Kolmogorov","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1038\/nbt.2280","article-title":"Hybrid error correction and de novo assembly of single-molecule sequencing reads","volume":"30","author":"Koren","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1186\/s13015-016-0075-7","article-title":"Jabba: hybrid error correction for long sequencing reads","volume":"11","author":"Miclotte","year":"2016","journal-title":"Algorithms Mol. Biol"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"i142","DOI":"10.1093\/bioinformatics\/bty266","article-title":"Versatile genome assembly evaluation with QUAST-LG","volume":"34","author":"Mikheenko","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1093\/bioinformatics\/bts649","article-title":"PBSIM: pacBio reads simulator\u2014toward accurate genome assembly","volume":"29","author":"Ono","year":"2013","journal-title":"Bioinformatics"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"3506","DOI":"10.1093\/bioinformatics\/btu538","article-title":"LoRDEC: accurate and efficient long read error correction","volume":"30","author":"Salmela","year":"2014","journal-title":"Bioinformatics"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1093\/bioinformatics\/btw321","article-title":"Accurate self-correction of errors in long reads using de Bruijn graphs","volume":"33","author":"Salmela","year":"2017","journal-title":"Bioinformatics"},{"key":"2023041408235840000_","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1038\/nmeth.4432","article-title":"MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads","volume":"14","author":"Xiao","year":"2017","journal-title":"Nat. Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac565\/45572661\/btac565.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/19\/4629\/49885057\/btac565.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/19\/4629\/49885057\/btac565.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T22:03:00Z","timestamp":1700949780000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/19\/4629\/6670778"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,8,17]]},"references-count":14,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2022,9,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac565","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2022,10,1]]},"published":{"date-parts":[[2022,8,17]]}}}