{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:46:15Z","timestamp":1772725575380,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,1,24]],"date-time":"2024-01-24T00:00:00Z","timestamp":1706054400000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"ETH Research","award":["ETH-17 21-1"],"award-info":[{"award-number":["ETH-17 21-1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Sequence alignment has been at the core of computational biology for half a century. Still, it is an open problem to design a practical algorithm for exact alignment of a pair of related sequences in linear-like time.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We solve exact global pairwise alignment with respect to edit distance by using the A* shortest path algorithm. In order to efficiently align long sequences with high divergence, we extend the recently proposed seed heuristic with match chaining, gap costs, and inexact matches. We additionally integrate the novel match pruning technique and diagonal transition to improve the A* search. We prove the correctness of our algorithm, implement it in the A*PA aligner, and justify our extensions intuitively and empirically.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>\u2002<\/jats:title>\n                    <jats:p>On random sequences of divergence d=4% and length n, the empirical runtime of A*PA scales near-linearly with length (best fit n1.06,\u2009n\u2264107\u2009bp). A similar scaling remains up to d=12% (best fit n1.24, n\u2264107\u2009bp). For n=107\u2009bp and d=4%, A*PA reaches &amp;gt;500\u00d7 speedup compared to the leading exact aligners Edlib and BiWFA. The performance of A*PA is highly influenced by long gaps. On long (n&amp;gt;500kb) ONT reads of a human sample it efficiently aligns sequences with d&amp;lt;10%, leading to 3\u00d7 median speedup compared to Edlib and BiWFA. When the sequences come from different human samples, A*PA performs 1.7\u00d7 faster than Edlib and BiWFA.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>github.com\/RagnarGrootKoerkamp\/astar-pairwise-aligner.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae032","type":"journal-article","created":{"date-parts":[[2024,1,20]],"date-time":"2024-01-20T07:19:13Z","timestamp":1705735153000},"source":"Crossref","is-referenced-by-count":10,"title":["Exact global alignment using A* with chaining seed heuristic and match pruning"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2091-1237","authenticated-orcid":false,"given":"Ragnar","family":"Groot Koerkamp","sequence":"first","affiliation":[{"name":"Department of Computer Science, ETH Zurich , R\u00e4mistrasse 101 , Zurich 8092, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8119-3849","authenticated-orcid":false,"given":"Pesho","family":"Ivanov","sequence":"additional","affiliation":[{"name":"Department of Computer Science, ETH Zurich , R\u00e4mistrasse 101 , Zurich 8092, Switzerland"}]}],"member":"286","published-online":{"date-parts":[[2024,1,23]]},"reference":[{"key":"2024041103282239200_btae032-B1","first-page":"51","author":"Backurs","year":"2015"},{"key":"2024041103282239200_btae032-B2","first-page":"257","author":"Benson","year":"2013"},{"key":"2024041103282239200_btae032-B3","doi-asserted-by":"crossref","first-page":"1869","DOI":"10.1038\/s41467-019-09637-5","article-title":"Sequencing of human genomes with nanopore technology","volume":"10","author":"Bowden","year":"2019","journal-title":"Nat Commun"},{"key":"2024041103282239200_btae032-B4","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1186\/s12859-016-0930-z","article-title":"Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments","volume":"17","author":"Daily","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2024041103282239200_btae032-B5","doi-asserted-by":"crossref","first-page":"634","DOI":"10.1016\/j.ipl.2014.05.009","article-title":"Efficient algorithms for the longest common subsequence in k-length substrings","volume":"114","author":"Deorowicz","year":"2014","journal-title":"Inf Process Lett"},{"key":"2024041103282239200_btae032-B6","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1007\/BF01386390","article-title":"A note on two problems in connexion with graphs","volume":"1","author":"Dijkstra","year":"1959","journal-title":"Numer Math"},{"key":"2024041103282239200_btae032-B7","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/0022-2836(82)90398-9","article-title":"An improved algorithm for matching biological sequences","volume":"162","author":"Gotoh","year":"1982","journal-title":"J Mol Biol"},{"key":"2024041103282239200_btae032-B8","first-page":"263","article-title":"Minimum detour methods for string or sequence comparison","volume":"61","author":"Hadlock","year":"1988","journal-title":"Congr Numer"},{"key":"2024041103282239200_btae032-B9","author":"Hadlock","year":"1988"},{"key":"2024041103282239200_btae032-B10","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/TSSC.1968.300136","article-title":"A formal basis for the heuristic determination of minimum cost paths","volume":"4","author":"Hart","year":"1968","journal-title":"IEEE Trans Syst Sci Cyber"},{"key":"2024041103282239200_btae032-B11","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1145\/1056777.1056779","article-title":"Correction to a formal basis for the heuristic determination of minimum cost paths","volume":"37","author":"Hart","year":"1972","journal-title":"SIGART Bull"},{"key":"2024041103282239200_btae032-B12","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1145\/360825.360861","article-title":"A linear space algorithm for computing maximal common subsequences","volume":"18","author":"Hirschberg","year":"1975","journal-title":"Commun ACM"},{"key":"2024041103282239200_btae032-B13","doi-asserted-by":"crossref","first-page":"664","DOI":"10.1145\/322033.322044","article-title":"Algorithms for the longest common subsequence problem","volume":"24","author":"Hirschberg","year":"1977","journal-title":"J ACM"},{"key":"2024041103282239200_btae032-B14","author":"Holte","year":"2010"},{"key":"2024041103282239200_btae032-B15","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1145\/359581.359603","article-title":"A fast algorithm for computing longest common subsequences","volume":"20","author":"Hunt","year":"1977","journal-title":"Commun ACM"},{"key":"2024041103282239200_btae032-B16","author":"Ivanov","year":"2022"},{"key":"2024041103282239200_btae032-B17","first-page":"281","author":"Koenig","year":"2006"},{"key":"2024041103282239200_btae032-B18","doi-asserted-by":"crossref","first-page":"3547","DOI":"10.1093\/bioinformatics\/btz272","article-title":"Evolution of biosequence search algorithms: a brief survey","volume":"35","author":"Kucherov","year":"2019","journal-title":"Bioinformatics"},{"key":"2024041103282239200_btae032-B19","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Sov Phys Dokl"},{"key":"2024041103282239200_btae032-B20","doi-asserted-by":"crossref","first-page":"btad487","DOI":"10.1093\/bioinformatics\/btad487","article-title":"Block aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices","volume":"39","author":"Liu","year":"2023","journal-title":"Bioinformatics"},{"key":"2024041103282239200_btae032-B21","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1038\/nmeth.2221","article-title":"The gem mapper: fast, accurate and versatile alignment by filtration","volume":"9","author":"Marco-Sola","year":"2012","journal-title":"Nat Methods"},{"key":"2024041103282239200_btae032-B22","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1093\/bioinformatics\/btaa777","article-title":"Fast gap-affine pairwise alignment using the wavefront algorithm","volume":"37","author":"Marco-Sola","year":"2021","journal-title":"Bioinformatics"},{"key":"2024041103282239200_btae032-B23","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad074","article-title":"Optimal gap-affine alignment in o(s) space","volume":"39","author":"Marco-Sola","year":"2023","journal-title":"Bioinformatics"},{"key":"2024041103282239200_btae032-B24","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1145\/3582490","article-title":"Theoretical analysis of edit distance algorithms","volume":"66","author":"Medvedev","year":"2023","journal-title":"Commun ACM"},{"key":"2024041103282239200_btae032-B25","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1145\/3571723","article-title":"Theoretical analysis of sequencing bioinformatics algorithms and beyond","volume":"66","author":"Medvedev","year":"2023","journal-title":"Commun ACM"},{"key":"2024041103282239200_btae032-B26","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1007\/BF01840446","article-title":"An O(ND) difference algorithm and its variations","volume":"1","author":"Myers","year":"1986","journal-title":"Algorithmica"},{"key":"2024041103282239200_btae032-B27","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1145\/316542.316550","article-title":"A fast bit-vector algorithm for approximate string matching based on dynamic programming","volume":"46","author":"Myers","year":"1999","journal-title":"J ACM"},{"key":"2024041103282239200_btae032-B28","first-page":"38","author":"Myers","year":"1995"},{"key":"2024041103282239200_btae032-B29","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/375360.375365","article-title":"A guided tour to approximate string matching","volume":"33","author":"Navarro","year":"2001","journal-title":"ACM Comput Surv"},{"key":"2024041103282239200_btae032-B30","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J Mol Biol"},{"key":"2024041103282239200_btae032-B31","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1126\/science.abj6987","article-title":"The complete sequence of a human genome","volume":"376","author":"Nurk","year":"2022","journal-title":"Science"},{"issue":"Suppl. 1","key":"2024041103282239200_btae032-B32","doi-asserted-by":"crossref","first-page":"S10","DOI":"10.1186\/1471-2105-10-S1-S10","article-title":"Improved algorithms for approximate string matching (extended abstract)","volume":"10","author":"Papamichail","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2024041103282239200_btae032-B33","author":"Paveti\u0107","year":"2017"},{"key":"2024041103282239200_btae032-B34","volume-title":"Heuristics: Intelligent Search Strategies for Computer Problem Solving","author":"Pearl","year":"1984"},{"key":"2024041103282239200_btae032-B35","doi-asserted-by":"crossref","DOI":"10.1017\/9781108164085","volume-title":"Artificial Intelligence: Foundations of Computational Agents","author":"Poole","year":"2017","edition":"2nd edn"},{"key":"2024041103282239200_btae032-B36","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/B978-0-12-809633-8.20106-4","volume-title":"Encyclopedia of Bioinformatics and Computational Biology","author":"Prjibelski","year":"2019"},{"key":"2024041103282239200_btae032-B37","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/j.jbiotec.2017.07.017","article-title":"The SeqAn C++ template library for efficient sequence analysis: a resource for programmers","volume":"261","author":"Reinert","year":"2017","journal-title":"J Biotechnol"},{"key":"2024041103282239200_btae032-B38","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1073\/pnas.69.1.4","article-title":"Matching sequences under deletion\/insertion constraints","volume":"69","author":"Sankoff","year":"1972","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024041103282239200_btae032-B39","doi-asserted-by":"crossref","first-page":"787","DOI":"10.1137\/0126070","article-title":"On the theory and computation of evolutionary distances","volume":"26","author":"Sellers","year":"1974","journal-title":"SIAM J Appl Math"},{"key":"2024041103282239200_btae032-B40","doi-asserted-by":"crossref","first-page":"1394","DOI":"10.1093\/bioinformatics\/btw753","article-title":"Edlib: a C\/C++ library for fast, exact sequence alignment using edit distance","volume":"33","author":"\u0160o\u0161i\u0107","year":"2017","journal-title":"Bioinformatics"},{"key":"2024041103282239200_btae032-B41","doi-asserted-by":"crossref","first-page":"1552","DOI":"10.1137\/0149094","article-title":"Speeding up dynamic programming algorithms for finding optimal lattice paths","volume":"49","author":"Spouge","year":"1989","journal-title":"SIAM J Appl Math"},{"key":"2024041103282239200_btae032-B42","first-page":"1","article-title":"Fast optimal alignment","volume":"7","author":"Spouge","year":"1991","journal-title":"Comput Appl Biosci"},{"key":"2024041103282239200_btae032-B43","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1016\/S0019-9958(85)80046-2","article-title":"Algorithms for approximate string matching","volume":"64","author":"Ukkonen","year":"1985","journal-title":"Inf Control"},{"key":"2024041103282239200_btae032-B44","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1007\/BF01074755","article-title":"Speech discrimination by dynamic programming","volume":"4","author":"Vintsyuk","year":"1968","journal-title":"Cybern Syst Anal"},{"key":"2024041103282239200_btae032-B45","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1145\/321796.321811","article-title":"The string-to-string correction problem","volume":"21","author":"Wagner","year":"1974","journal-title":"J ACM"},{"key":"2024041103282239200_btae032-B46","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1145\/135239.135244","article-title":"Fast text searching","volume":"35","author":"Wu","year":"1992","journal-title":"Commun ACM"},{"key":"2024041103282239200_btae032-B47","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1016\/0020-0190(90)90035-V","article-title":"An O(NP) sequence comparison algorithm","volume":"35","author":"Wu","year":"1990","journal-title":"Inf Process Lett"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae032\/56370377\/btae032.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/3\/btae032\/57211939\/btae032.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/3\/btae032\/57211939\/btae032.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T23:28:54Z","timestamp":1712791734000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae032\/7587511"}},"subtitle":[],"editor":[{"given":"Tobias","family":"Marschall","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,1,23]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae032","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.09.19.508631","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,3,1]]},"published":{"date-parts":[[2024,1,23]]},"article-number":"btae032"}}