{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:55Z","timestamp":1772138095100,"version":"3.50.1"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2019,3,9]],"date-time":"2019-03-09T00:00:00Z","timestamp":1552089600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction and variant calling with respect to a variation graph.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers\u2019 bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with |V| nodes and |E| edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of O(|V|+\u2308mw\u2309|E|\u2009log\u2009w) for acyclic graphs and O(|V|+m|E|\u2009log\u2009w) for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared with a previous (asymptotically optimal) alignment algorithm.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>https:\/\/github.com\/maickrau\/GraphAligner<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz162","type":"journal-article","created":{"date-parts":[[2019,3,7]],"date-time":"2019-03-07T15:40:01Z","timestamp":1551973201000},"page":"3599-3607","source":"Crossref","is-referenced-by-count":60,"title":["Bit-parallel sequence-to-graph alignment"],"prefix":"10.1093","volume":"35","author":[{"given":"Mikko","family":"Rautiainen","sequence":"first","affiliation":[{"name":"Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, 66123 Saarbr\u00fccken, Germany"},{"name":"Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbr\u00fccken, Germany"},{"name":"Saarbr\u00fccken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbr\u00fccken, Germany"}]},{"given":"Veli","family":"M\u00e4kinen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki, Finland"}]},{"given":"Tobias","family":"Marschall","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, 66123 Saarbr\u00fccken, Germany"},{"name":"Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbr\u00fccken, Germany"}]}],"member":"286","published-online":{"date-parts":[[2019,3,9]]},"reference":[{"key":"2023013108135586900_btz162-B1","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1093\/bioinformatics\/btv688","article-title":"hybridSPAdes: an algorithm for hybrid assembly of short and long reads","volume":"32","author":"Antipov","year":"2016","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B2","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1145\/135239.135243","article-title":"A new approach to text searching","volume":"35","author":"Baeza-Yates","year":"1992","journal-title":"Commun. ACM"},{"key":"2023013108135586900_btz162-B3","first-page":"1","volume-title":"Combinatorial Pattern Matching","author":"Baeza-Yates","year":"1996"},{"key":"2023013108135586900_btz162-B4","author":"Chaisson","year":"2018"},{"key":"2023013108135586900_btz162-B5","doi-asserted-by":"crossref","first-page":"i201","DOI":"10.1093\/bioinformatics\/btw279","article-title":"Compacting de Bruijn graphs from sequencing data quickly and in low memory","volume":"32","author":"Chikhi","year":"2016","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B6","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1038\/nbt.2023","article-title":"How to apply de Bruijn graphs to genome assembly","volume":"29","author":"Compeau","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023013108135586900_btz162-B7","first-page":"118","article-title":"Computational pan-genomics: status, promises and challenges","volume":"19","year":"2018","journal-title":"Brief. Bioinform"},{"key":"2023013108135586900_btz162-B8","doi-asserted-by":"crossref","first-page":"e109384.","DOI":"10.1371\/journal.pone.0109384","article-title":"Indexes of large genome collections on a PC","volume":"9","author":"Danek","year":"2014","journal-title":"PLoS One"},{"key":"2023013108135586900_btz162-B9","doi-asserted-by":"crossref","first-page":"682.","DOI":"10.1038\/ng.3257","article-title":"Improved genome inference in the MHC using a population reference graph","volume":"47","author":"Dilthey","year":"2015","journal-title":"Nat. Genet"},{"key":"2023013108135586900_btz162-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pcbi.1005151","article-title":"High-accuracy HLA type inference from whole-genome sequencing data using population reference graphs","volume":"12","author":"Dilthey","year":"2016","journal-title":"PLoS Comput. Biol"},{"key":"2023013108135586900_btz162-B11","first-page":"151.","article-title":"An algorithm for syntactical analysis","volume":"3","author":"D\u00f6m\u00f6lki","year":"1964","journal-title":"Comput. Linguist"},{"key":"2023013108135586900_btz162-B12","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1007\/BF01933436","article-title":"A universal compiler system based on production rules","volume":"8","author":"D\u00f6m\u00f6lki","year":"1968","journal-title":"BIT Numer. Math"},{"key":"2023013108135586900_btz162-B13","doi-asserted-by":"crossref","first-page":"11.","DOI":"10.1186\/1471-2105-9-11","article-title":"Seqan an efficient, generic c++ library for sequence analysis","volume":"9","author":"D\u00f6ring","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013108135586900_btz162-B14","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1038\/nbt.4227","article-title":"Variation graph toolkit improves read mapping by representing genetic variation in the reference","volume":"36","author":"Garrison","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023013108135586900_btz162-B15","doi-asserted-by":"crossref","first-page":"68.","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","year":"2015","journal-title":"Nature"},{"key":"2023013108135586900_btz162-B16","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/0022-2836(82)90398-9","article-title":"An improved algorithm for matching biological sequences","volume":"162","author":"Gotoh","year":"1982","journal-title":"J. Mol. Biol"},{"key":"2023013108135586900_btz162-B17","doi-asserted-by":"crossref","first-page":"99.","DOI":"10.1186\/1471-2105-15-99","article-title":"Genome alignment with graph data structures: a comparison","volume":"15","author":"Kehr","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023013108135586900_btz162-B18","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1093\/bioinformatics\/18.3.452","article-title":"Multiple sequence alignment using partial order graphs","volume":"18","author":"Lee","year":"2002","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B19","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","author":"Levenshtein","year":"1966","journal-title":"Soviet Physics Doklady"},{"key":"2023013108135586900_btz162-B20","first-page":"7.","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"1","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B21","doi-asserted-by":"crossref","first-page":"237.","DOI":"10.1186\/s12859-016-1103-9","article-title":"Read mapping on de bruijn graphs","volume":"17","author":"Limasset","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023013108135586900_btz162-B22","doi-asserted-by":"crossref","first-page":"3166","DOI":"10.1093\/bioinformatics\/btu507","article-title":"Bitpal: a bit-parallel, general integer-scoring sequence alignment algorithm","volume":"30","author":"Loving","year":"2014","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B23","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139940023","volume-title":"Genome-Scale Algorithm Design","author":"M\u00e4kinen","year":"2015"},{"key":"2023013108135586900_btz162-B24","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ygeno.2010.03.001","article-title":"Assembly algorithms for next-generation sequencing data","volume":"95","author":"Miller","year":"2010","journal-title":"Genomics"},{"key":"2023013108135586900_btz162-B25","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1145\/316542.316550","article-title":"A fast bit-vector algorithm for approximate string matching based on dynamic programming","volume":"46","author":"Myers","year":"1999","journal-title":"J. ACM"},{"key":"2023013108135586900_btz162-B26","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/S0092-8240(89)80046-1","article-title":"Approximate matching of regular expressions","volume":"51","author":"Myers","year":"1989","journal-title":"Bull. Math. Biol"},{"key":"2023013108135586900_btz162-B27","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1016\/S0304-3975(99)00333-3","article-title":"Improved approximate pattern matching on hypertext","volume":"237","author":"Navarro","year":"2000","journal-title":"Theor. Comput. Sci"},{"key":"2023013108135586900_btz162-B28","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol"},{"key":"2023013108135586900_btz162-B29","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1093\/bioinformatics\/bts649","article-title":"PBSIM: PacBio reads simulator\u2014toward accurate genome assembly","volume":"29","author":"Ono","year":"2013","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B30","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1101\/gr.214155.116","article-title":"Genome graphs and the evolution of genome inference","volume":"27","author":"Paten","year":"2017","journal-title":"Genome Res"},{"key":"2023013108135586900_btz162-B31","doi-asserted-by":"crossref","first-page":"3499","DOI":"10.1093\/bioinformatics\/btu438","article-title":"Journaled string tree a scalable data structure for analyzing thousands of similar genomes on your laptop","volume":"30","author":"Rahn","year":"2014","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B32","author":"Rautiainen","year":"2017"},{"key":"2023013108135586900_btz162-B33","doi-asserted-by":"crossref","first-page":"D423","DOI":"10.1093\/nar\/gku1161","article-title":"The IPD and IMGT\/HLA database: allele variant databases","volume":"43","author":"Robinson","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023013108135586900_btz162-B34","doi-asserted-by":"crossref","first-page":"3506","DOI":"10.1093\/bioinformatics\/btu538","article-title":"Lordec: accurate and efficient long read error correction","volume":"30","author":"Salmela","year":"2014","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B35","first-page":"359","article-title":"The theory and computation of evolutionary distances: pattern recognition","volume":"1","author":"Sellers","year":"1980","journal-title":"J. Algorithm Comput. Technol"},{"key":"2023013108135586900_btz162-B36","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol. Syst. Biol"},{"key":"2023013108135586900_btz162-B37","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023013108135586900_btz162-B38","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/0196-6774(85)90023-9","article-title":"Finding approximate patterns in strings","volume":"6","author":"Ukkonen","year":"1985","journal-title":"J. Algorithms"},{"key":"2023013108135586900_btz162-B39","author":"Vaddadi","year":"2017"},{"key":"2023013108135586900_btz162-B40","doi-asserted-by":"crossref","first-page":"3350","DOI":"10.1093\/bioinformatics\/btv383","article-title":"Bandage: interactive visualization of de novo genome assemblies","volume":"31","author":"Wick","year":"2015","journal-title":"Bioinformatics"},{"key":"2023013108135586900_btz162-B41","author":"Zhang","year":"2018"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz162\/28492158\/btz162.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/19\/3599\/48976030\/bioinformatics_35_19_3599.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/19\/3599\/48976030\/bioinformatics_35_19_3599.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T09:48:17Z","timestamp":1675158497000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/19\/3599\/5372677"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,3,9]]},"references-count":41,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2019,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz162","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/323063","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,10,1]]},"published":{"date-parts":[[2019,3,9]]}}}