{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:42Z","timestamp":1772138082096,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2017,1,21]],"date-time":"2017-01-21T00:00:00Z","timestamp":1484956800000},"content-version":"vor","delay-in-days":31,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01HG007182"],"award-info":[{"award-number":["R01HG007182"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Identifying overlaps between error-prone long reads, specifically those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB), is essential for certain downstream applications, including error correction and de novo assembly. Though akin to the read-to-reference alignment problem, read-to-read overlap detection is a distinct problem that can benefit from specialized algorithms that perform efficiently and robustly on high error rate long reads. Here, we review the current state-of-the-art read-to-read overlap tools for error-prone long reads, including BLASR, DALIGNER, MHAP, GraphMap and Minimap. These specialized bioinformatics tools differ not just in their algorithmic designs and methodology, but also in their robustness of performance on a variety of datasets, time and memory efficiency and scalability. We highlight the algorithmic features of these tools, as well as their potential issues and biases when utilizing any particular method. To supplement our review of the algorithms, we benchmarked these tools, tracking their resource needs and computational performance, and assessed the specificity and precision of each. In the versions of the tools tested, we observed that Minimap is the most computationally efficient, specific and sensitive method on the ONT datasets tested; whereas GraphMap and DALIGNER are the most specific and sensitive methods on the tested PB datasets. The concepts surveyed may apply to future sequencing technologies, as scalability is becoming more relevant with increased sequencing throughput.<\/jats:p>\n                  <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw811","type":"journal-article","created":{"date-parts":[[2016,12,16]],"date-time":"2016-12-16T18:03:06Z","timestamp":1481911386000},"page":"1261-1270","source":"Crossref","is-referenced-by-count":30,"title":["Innovations and challenges in detecting long read overlaps: an evaluation \nof the state-of-the-art"],"prefix":"10.1093","volume":"33","author":[{"given":"Justin","family":"Chu","sequence":"first","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"},{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}]},{"given":"Hamid","family":"Mohamadi","sequence":"additional","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"},{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}]},{"given":"Ren\u00e9 L","family":"Warren","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}]},{"given":"Chen","family":"Yang","sequence":"additional","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"},{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}]},{"given":"Inan\u00e7","family":"Birol","sequence":"additional","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"},{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"},{"name":"Simon Fraser University, Burnaby, BC, Canada"}]}],"member":"286","published-online":{"date-parts":[[2016,12,21]]},"reference":[{"key":"2023020205030738400_btw811-B1","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1145\/48529.48535","article-title":"The input\/output complexity of sorting and related problems","volume":"31","author":"Aggarwal","year":"1988","journal-title":"Commun. ACM"},{"key":"2023020205030738400_btw811-B2","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nmeth.1527","article-title":"Limitations of next-generation genome sequence assembly","volume":"8","author":"Alkan","year":"2010","journal-title":"Nat. Methods"},{"key":"2023020205030738400_btw811-B3","author":"Benson","year":"2013"},{"key":"2023020205030738400_btw811-B4","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/nbt.3238","article-title":"Assembling large genomes with single-molecule sequencing and locality-sensitive hashing","volume":"33","author":"Berlin","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023020205030738400_btw811-B5","author":"Bo\u017ea","year":"2016"},{"key":"2023020205030738400_btw811-B6","first-page":"21","author":"Broder","year":"1997"},{"key":"2023020205030738400_btw811-B7","author":"Burkhardt","year":"2002"},{"key":"2023020205030738400_btw811-B8","doi-asserted-by":"crossref","first-page":"375.","DOI":"10.1186\/1471-2164-13-375","article-title":"Pacific biosciences sequencing technology for genotyping and variation discovery in human data","volume":"13","author":"Carneiro","year":"2012","journal-title":"BMC Genomics"},{"key":"2023020205030738400_btw811-B9","doi-asserted-by":"crossref","first-page":"238.","DOI":"10.1186\/1471-2105-13-238","article-title":"Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory","volume":"13","author":"Chaisson","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020205030738400_btw811-B10","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020205030738400_btw811-B11","author":"David","year":"2016"},{"key":"2023020205030738400_btw811-B12","first-page":"1","article-title":"Cache-oblivious algorithms and data structures","volume":"8","author":"Demaine","year":"2002","journal-title":"Lect. Notes EEF Summer School Massive Data Sets"},{"key":"2023020205030738400_btw811-B13","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1126\/science.1162986","article-title":"Real-time DNA sequencing from single polymerase molecules","volume":"323","author":"Eid","year":"2009","journal-title":"Science"},{"key":"2023020205030738400_btw811-B14","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1038\/nbt0515-433","article-title":"Startups use short-read data to expand long-read sequencing market","volume":"33","author":"Eisenstein","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023020205030738400_btw811-B15","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1101\/gr.8.3.175","article-title":"Base-calling of automated sequencer traces usingPhred. I. Accuracy assessment","volume":"8","author":"Ewing","year":"1998","journal-title":"Genome Res"},{"key":"2023020205030738400_btw811-B16","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1145\/1082036.1082039","article-title":"Indexing compressed text","volume":"52","author":"Ferragina","year":"2005","journal-title":"J. ACM"},{"key":"2023020205030738400_btw811-B17","author":"Frigo","year":"1999"},{"key":"2023020205030738400_btw811-B18","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1101\/gr.191395.115","article-title":"Oxford nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome","volume":"25","author":"Goodwin","year":"2015","journal-title":"Genome Res"},{"key":"2023020205030738400_btw811-B19","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1038\/nmeth.3290","article-title":"Improved data analysis for the MinION nanopore sequencer","volume":"12","author":"Jain","year":"2015","journal-title":"Nat. Methods"},{"key":"2023020205030738400_btw811-B20","first-page":"1","article-title":"A benchmark study on error assessment and quality control of CCS reads derived from the PacBio RS","volume":"4","author":"Jiao","year":"2013","journal-title":"J. Data Min. Genomics Proteomics"},{"key":"2023020205030738400_btw811-B21","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/S0166-218X(03)00382-2","article-title":"On spaced seeds for similarity search","volume":"138","author":"Keich","year":"2004","journal-title":"Discrete Appl. Math"},{"key":"2023020205030738400_btw811-B22","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1093\/bib\/bbv029","article-title":"Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction","volume":"17","author":"Laehnemann","year":"2016","journal-title":"Brief. Bioinform"},{"key":"2023020205030738400_btw811-B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.bdq.2015.02.001","article-title":"Assessing the performance of the Oxford Nanopore Technologies MinION","volume":"3","author":"Laver","year":"2015","journal-title":"Biomol. Detect. Quantif"},{"key":"2023020205030738400_btw811-B24","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1126\/science.1079700","article-title":"Zero-mode waveguides for single-molecule analysis at high concentrations","volume":"299","author":"Levene","year":"2003","journal-title":"Science"},{"key":"2023020205030738400_btw811-B25","author":"Li","year":"2016"},{"key":"2023020205030738400_btw811-B26","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020205030738400_btw811-B27","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nmeth.3444","article-title":"A complete bacterial genome assembled de novo using only nanopore sequencing data","volume":"12","author":"Loman","year":"2015","journal-title":"Nat. Methods"},{"key":"2023020205030738400_btw811-B28","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020205030738400_btw811-B29","doi-asserted-by":"crossref","first-page":"e106689.","DOI":"10.1371\/journal.pone.0106689","article-title":"Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements","volume":"9","author":"McCoy","year":"2014","journal-title":"PLoS One"},{"key":"2023020205030738400_btw811-B30","author":"Morgulis","year":"2006"},{"key":"2023020205030738400_btw811-B31","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A whole-genome assembly of Drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"key":"2023020205030738400_btw811-B32","author":"Myers","year":"2014"},{"key":"2023020205030738400_btw811-B33","doi-asserted-by":"crossref","first-page":"2137","DOI":"10.1002\/elps.201300174","article-title":"Error analysis of idealized nanopore sequencing","volume":"34","author":"O'Donnell","year":"2013","journal-title":"Electrophoresis"},{"key":"2023020205030738400_btw811-B34","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1093\/bioinformatics\/bts649","article-title":"PBSIM: PacBio reads simulator\u2013toward accurate genome assembly","volume":"29","author":"Ono","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020205030738400_btw811-B35","doi-asserted-by":"crossref","first-page":"22.","DOI":"10.1186\/2047-217X-3-22","article-title":"A reference bacterial genome dataset generated on the MinION\u2122 portable single-molecule nanopore sequencer","volume":"3","author":"Quick","year":"2014","journal-title":"Gigascience"},{"key":"2023020205030738400_btw811-B36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.cois.2015.02.013","article-title":"Best practices in insect genome sequencing: what works and what doesn\u2019t","volume":"7","author":"Richards","year":"2015","journal-title":"Curr. Opin. Insect Sci"},{"key":"2023020205030738400_btw811-B37","doi-asserted-by":"crossref","first-page":"R51.","DOI":"10.1186\/gb-2013-14-5-r51","article-title":"Characterizing and measuring bias in sequence data","volume":"14","author":"Ross","year":"2013","journal-title":"Genome Biol"},{"key":"2023020205030738400_btw811-B38","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1146\/annurev-genom-090314-050032","article-title":"The theory and practice of genome sequence assembly","volume":"16","author":"Simpson","year":"2015","journal-title":"Annu. Rev. Genomics Hum. Genet"},{"key":"2023020205030738400_btw811-B39","doi-asserted-by":"crossref","first-page":"1638","DOI":"10.1101\/gr.077776.108","article-title":"Rapid whole-genome mutational profiling using next-generation sequencing technologies","volume":"18","author":"Smith","year":"2008","journal-title":"Genome Res"},{"key":"2023020205030738400_btw811-B201","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023020205030738400_btw811-B40","doi-asserted-by":"crossref","first-page":"11307.","DOI":"10.1038\/ncomms11307","article-title":"Fast and sensitive mapping of nanopore sequencing reads with GraphMap","volume":"7","author":"Sovi\u0107","year":"2016","journal-title":"Nat. Commun"},{"key":"2023020205030738400_btw811-B41","doi-asserted-by":"crossref","first-page":"2582","DOI":"10.1093\/bioinformatics\/btw237","article-title":"Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads","volume":"32","author":"Sovi\u0107","year":"2016","journal-title":"Bioinformatics"},{"key":"2023020205030738400_btw811-B42","doi-asserted-by":"crossref","first-page":"7702","DOI":"10.1073\/pnas.0901054106","article-title":"Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore","volume":"106","author":"Stoddart","year":"2009","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020205030738400_btw811-B43","doi-asserted-by":"crossref","first-page":"e159.","DOI":"10.1093\/nar\/gkq543","article-title":"A flexible and efficient template format for circular consensus sequencing and SNP detection","volume":"38","author":"Travers","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023020205030738400_btw811-B44","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/nrg3117","article-title":"Repetitive DNA and next-generation sequencing: computational challenges and solutions","volume":"13","author":"Treangen","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023020205030738400_btw811-B45","doi-asserted-by":"crossref","first-page":"3491","DOI":"10.1093\/bioinformatics\/btu437","article-title":"Resolving complex tandem repeats with long reads","volume":"30","author":"Ummat","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020205030738400_btw811-B46","author":"Wang","year":"2015"},{"key":"2023020205030738400_btw811-B48","author":"Yang","year":"2016"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/8\/1261\/49038901\/bioinformatics_33_8_1261.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/8\/1261\/49038901\/bioinformatics_33_8_1261.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:07:50Z","timestamp":1675296470000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/8\/1261\/2730233"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2016,12,21]]},"references-count":48,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2017,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw811","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/081596","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,4,15]]},"published":{"date-parts":[[2016,12,21]]}}}