{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T22:31:22Z","timestamp":1773268282541,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low.<\/jats:p>\n               <jats:p>Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.<\/jats:p>\n               <jats:p>Availability and implementation: Karect is available at: http:\/\/aminallam.github.io\/karect.<\/jats:p>\n               <jats:p>Contact: \u00a0amin.allam@kaust.edu.sa<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv415","type":"journal-article","created":{"date-parts":[[2015,7,16]],"date-time":"2015-07-16T00:40:31Z","timestamp":1437007231000},"page":"3421-3428","source":"Crossref","is-referenced-by-count":94,"title":["Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data"],"prefix":"10.1093","volume":"31","author":[{"given":"Amin","family":"Allam","sequence":"first","affiliation":[{"name":"Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Panos","family":"Kalnis","sequence":"additional","affiliation":[{"name":"Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Victor","family":"Solovyev","sequence":"additional","affiliation":[{"name":"Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,7,14]]},"reference":[{"key":"2023020202324087700_btv415-B1","first-page":"1040","article-title":"Robust error correction for de novo assembly via spectral partitioning and sequence alignment","volume-title":"Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO)","author":"Alic","year":"2014"},{"key":"2023020202324087700_btv415-B2","doi-asserted-by":"crossref","first-page":"e46679","DOI":"10.1371\/journal.pone.0046679","article-title":"Improving PacBio long read accuracy by short read alignment","volume":"7","author":"Au","year":"2012","journal-title":"PLoS One"},{"key":"2023020202324087700_btv415-B3","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol."},{"key":"2023020202324087700_btv415-B4","article-title":"A reference-free algorithm for computational normalization of shotgun sequencing data","author":"Brown","year":"2012","journal-title":"arXiv"},{"key":"2023020202324087700_btv415-B5","doi-asserted-by":"crossref","first-page":"2067","DOI":"10.1093\/bioinformatics\/bth205","article-title":"Fragment assembly with short reads","volume":"20","author":"Chaisson","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B6","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020202324087700_btv415-B7","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1093\/bib\/bbr063","article-title":"Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data","volume":"13","author":"Finotello","year":"2012","journal-title":"Brief. Bioinformatics"},{"key":"2023020202324087700_btv415-B8","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020202324087700_btv415-B9","doi-asserted-by":"crossref","first-page":"2723","DOI":"10.1093\/bioinformatics\/btu368","article-title":"Blue: correcting sequencing errors using consensus and context","volume":"30","author":"Greenfield","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B10","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B11","doi-asserted-by":"crossref","first-page":"3004","DOI":"10.1093\/bioinformatics\/btu392","article-title":"proovread: large-scale high-accuracy PacBio correction through iterative short read consensus","volume":"30","author":"Hackl","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B12","doi-asserted-by":"crossref","first-page":"1354","DOI":"10.1093\/bioinformatics\/btu030","article-title":"BLESS: bloom filter-based error correction solution for high-throughput sequencing reads","volume":"30","author":"Heo","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B13","doi-asserted-by":"crossref","first-page":"2490","DOI":"10.1093\/bioinformatics\/btt407","article-title":"RACER: rapid and accurate correction of errors in reads","volume":"29","author":"Ilie","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B14","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1093\/bioinformatics\/btq653","article-title":"HiTEC: accurate error correction in high-throughput sequencing data","volume":"27","author":"Ilie","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B15","doi-asserted-by":"crossref","first-page":"1181","DOI":"10.1101\/gr.111351.110","article-title":"ECHO: a reference-free short-read error correction algorithm","volume":"21","author":"Kao","year":"2011","journal-title":"Genome Res."},{"key":"2023020202324087700_btv415-B16","doi-asserted-by":"crossref","first-page":"R116","DOI":"10.1186\/gb-2010-11-11-r116","article-title":"Quake: quality-aware detection and correction of sequencing errors","volume":"11","author":"Kelley","year":"2010","journal-title":"Genome Biol."},{"key":"2023020202324087700_btv415-B17","doi-asserted-by":"crossref","first-page":"e75505","DOI":"10.1371\/journal.pone.0075505","article-title":"Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures","volume":"8","author":"Kleftogiannis","year":"2013","journal-title":"PLoS One"},{"key":"2023020202324087700_btv415-B18","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1038\/nbt.2280","article-title":"Hybrid error correction and de novo assembly of single-molecule sequencing reads","volume":"30","author":"Koren","year":"2012","journal-title":"Nat. Biotechnol."},{"key":"2023020202324087700_btv415-B19","doi-asserted-by":"crossref","first-page":"e109","DOI":"10.1093\/nar\/gkt215","article-title":"Probabilistic error correction for RNA sequencing","volume":"41","author":"Le","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023020202324087700_btv415-B20","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1093\/bioinformatics\/18.3.452","article-title":"Multiple sequence alignment using partial order graphs","volume":"18","author":"Lee","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B21","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1101\/gr.097261.109","article-title":"De\u00a0novo assembly of human genomes with massively parallel short read sequencing","volume":"20","author":"Li","year":"2010","journal-title":"Genome Res."},{"key":"2023020202324087700_btv415-B22","doi-asserted-by":"crossref","first-page":"3264","DOI":"10.1093\/bioinformatics\/btu513","article-title":"Trowel: a fast and accurate error correction module for Illumina sequencing reads","volume":"30","author":"Lim","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B23","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/bioinformatics\/bts690","article-title":"Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data","volume":"29","author":"Liu","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B24","doi-asserted-by":"crossref","first-page":"i137","DOI":"10.1093\/bioinformatics\/btr208","article-title":"Error correction of high-throughput sequencing datasets with non-uniform coverage","volume":"27","author":"Medvedev","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B25","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.1093\/bioinformatics\/btn548","article-title":"Aggressive assembly of pyrosequencing reads with mates","volume":"24","author":"Miller","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B26","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023020202324087700_btv415-B27","doi-asserted-by":"crossref","first-page":"S7","DOI":"10.1186\/1471-2164-14-S1-S7","article-title":"BayesHammer: Bayesian clustering for error correction in single-cell sequencing","volume":"14","author":"Nikolenko","year":"2013","journal-title":"BMC Genomics"},{"key":"2023020202324087700_btv415-B28","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1109\/TR.1979.5220514","article-title":"Two algorithms for determining the most reliable path of a network","volume":"R-28","author":"Petrovic","year":"1979","journal-title":"IEEE Trans. Reliab."},{"key":"2023020202324087700_btv415-B29","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020202324087700_btv415-B30","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1101\/gr.089151.108","article-title":"Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing","volume":"19","author":"Qu","year":"2009","journal-title":"Genome Res."},{"key":"2023020202324087700_btv415-B31","doi-asserted-by":"crossref","first-page":"1284","DOI":"10.1093\/bioinformatics\/btq151","article-title":"Correction of sequencing errors in a mixed set of reads","volume":"26","author":"Salmela","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B32","doi-asserted-by":"crossref","first-page":"3506","DOI":"10.1093\/bioinformatics\/btu538","article-title":"LoRDEC: accurate and efficient long read error correction","volume":"30","author":"Salmela","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B33","doi-asserted-by":"crossref","first-page":"1455","DOI":"10.1093\/bioinformatics\/btr170","article-title":"Correcting errors in short reads by multiple alignments","volume":"27","author":"Salmela","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B34","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1101\/gr.131383.111","article-title":"GAGE: a critical evaluation of genome assemblies and assembly algorithms","volume":"22","author":"Salzberg","year":"2012","journal-title":"Genome Res."},{"key":"2023020202324087700_btv415-B35","doi-asserted-by":"crossref","first-page":"2157","DOI":"10.1093\/bioinformatics\/btp379","article-title":"SHREC: a short-read error correction method","volume":"25","author":"Schroder","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B36","doi-asserted-by":"crossref","first-page":"i356","DOI":"10.1093\/bioinformatics\/btu440","article-title":"Fiona: a parallel and automatic strategy for read error correction","volume":"30","author":"Schulz","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B37","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res."},{"key":"2023020202324087700_btv415-B38","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1186\/s13059-014-0509-9","article-title":"Lighter: fast and memory-efficient sequencing error correction without counting","volume":"15","author":"Song","year":"2014","journal-title":"Genome Biol."},{"key":"2023020202324087700_btv415-B39","first-page":"189","article-title":"Recount: expectation maximization based error correction tool for next generation sequencing data","volume":"23","author":"Wijaya","year":"2009","journal-title":"Genome Inform."},{"key":"2023020202324087700_btv415-B40","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1186\/1471-2105-15-131","article-title":"HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data","volume":"15","author":"Wirawan","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023020202324087700_btv415-B41","doi-asserted-by":"crossref","first-page":"2526","DOI":"10.1093\/bioinformatics\/btq468","article-title":"Reptile: representative tiling for short read error correction","volume":"26","author":"Yang","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020202324087700_btv415-B42","doi-asserted-by":"crossref","first-page":"S52","DOI":"10.1186\/1471-2105-12-S1-S52","article-title":"Repeat-aware modeling and correction of short read errors","volume":"12","author":"Yang","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020202324087700_btv415-B43","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1093\/bib\/bbs015","article-title":"A survey of error-correction methods for next-generation sequencing","volume":"14","author":"Yang","year":"2013","journal-title":"Brief. Bioinformatics"},{"key":"2023020202324087700_btv415-B44","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/21\/3421\/49035554\/bioinformatics_31_21_3421.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/21\/3421\/49035554\/bioinformatics_31_21_3421.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T03:52:02Z","timestamp":1675309922000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/21\/3421\/195621"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,7,14]]},"references-count":44,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2015,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv415","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,11,1]]},"published":{"date-parts":[[2015,7,14]]}}}