{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T18:09:06Z","timestamp":1764785346153,"version":"3.37.3"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T00:00:00Z","timestamp":1719532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Swiss National Research Programme"},{"DOI":"10.13039\/501100001711","name":"SNSF","doi-asserted-by":"publisher","award":["167331"],"award-info":[{"award-number":["167331"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Monique Dornonville de la Cour Foundation"},{"name":"Personalized Health and Related Technologies"},{"name":"Transition Postdoc Fellowship"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Exponential growth in sequencing databases has motivated scalable De Bruijn graph-based (DBG) indexing for searching these data, using annotations to label nodes with sample IDs. Low-depth sequencing samples correspond to fragmented subgraphs, complicating finding the long contiguous walks required for alignment queries. Aligners that target single-labelled subgraphs reduce alignment lengths due to fragmentation, leading to low recall for long reads. While some (e.g. label-free) aligners partially overcome fragmentation by combining information from multiple samples, biologically irrelevant combinations in such approaches can inflate the search space or reduce accuracy.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce a new scoring model, \u2018multi-label alignment\u2019 (MLA), for annotated DBGs. MLA leverages two new operations: To promote biologically relevant sample combinations, \u2018Label Change\u2019 incorporates more informative global sample similarity into local scores. To improve connectivity, \u2018Node Length Change\u2019 dynamically adjusts the DBG node length during traversal. Our fast, approximate, yet accurate MLA implementation has two key steps: a single-label seed-chain-extend aligner (SCA) and a multi-label chainer (MLC). SCA uses a traditional scoring model adapting recent chaining improvements to assembly graphs and provides a curated pool of alignments. MLC extracts seed anchors from SCAs alignments, produces multi-label chains using MLA scoring, then finally forms multi-label alignments. We show via substantial improvements in taxonomic classification accuracy that MLA produces biologically relevant alignments, decreasing average weighted UniFrac errors by 63.1%\u201366.8% and covering 45.5%\u201347.4% (median) more long-read query characters than state-of-the-art aligners. MLAs runtimes are competitive with label-combining alignment and substantially faster than single-label alignment.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The data, scripts, and instructions for generating our results are available at https:\/\/github.com\/ratschlab\/mla.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae226","type":"journal-article","created":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:29:42Z","timestamp":1719566982000},"page":"i337-i346","source":"Crossref","is-referenced-by-count":3,"title":["Label-guided seed-chain-extend alignment on annotated De Bruijn graphs"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2125-6086","authenticated-orcid":false,"given":"Harun","family":"Mustafa","sequence":"first","affiliation":[{"name":"Department of Computer Science, ETH Zurich , Zurich, 8092, Switzerland"},{"name":"Biomedical Informatics Group, University Hospital Zurich , Zurich, 8091, Switzerland"},{"name":"Biomedical Informatics, Swiss Institute of Bioinformatics , Zurich, 8092, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6200-5972","authenticated-orcid":false,"given":"Mikhail","family":"Karasikov","sequence":"additional","affiliation":[{"name":"Department of Computer Science, ETH Zurich , Zurich, 8092, Switzerland"},{"name":"Biomedical Informatics Group, University Hospital Zurich , Zurich, 8091, Switzerland"},{"name":"Biomedical Informatics, Swiss Institute of Bioinformatics , Zurich, 8092, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0833-0042","authenticated-orcid":false,"given":"Nika","family":"Mansouri Ghiasi","sequence":"additional","affiliation":[{"name":"Department of Information Technology and Electrical Engineering, ETH Zurich , Zurich, 8092, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gunnar","family":"R\u00e4tsch","sequence":"additional","affiliation":[{"name":"Department of Computer Science, ETH Zurich , Zurich, 8092, Switzerland"},{"name":"Biomedical Informatics Group, University Hospital Zurich , Zurich, 8091, Switzerland"},{"name":"Biomedical Informatics, Swiss Institute of Bioinformatics , Zurich, 8092, Switzerland"},{"name":"ETH AI Center , Zurich, 8092, Switzerland"},{"name":"Department of Biology, ETH Zurich , Zurich, 8093, Switzerland"},{"name":"The LOOP Zurich\u2014Medical Research Center , Zurich, 8044, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3411-0692","authenticated-orcid":false,"given":"Andr\u00e9","family":"Kahles","sequence":"additional","affiliation":[{"name":"Department of Computer Science, ETH Zurich , Zurich, 8092, Switzerland"},{"name":"Biomedical Informatics Group, University Hospital Zurich , Zurich, 8091, Switzerland"},{"name":"Biomedical Informatics, Swiss Institute of Bioinformatics , Zurich, 8092, Switzerland"},{"name":"The LOOP Zurich\u2014Medical Research Center , Zurich, 8044, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,6,28]]},"reference":[{"key":"2024062809052300000_btae226-B1","doi-asserted-by":"crossref","first-page":"i169","DOI":"10.1093\/bioinformatics\/bty292","article-title":"A space and time-efficient index for the compacted colored De Bruijn graph","volume":"34","author":"Almodaresi","year":"2018","journal-title":"Bioinformatics"},{"year":"2022","author":"Avila","key":"2024062809052300000_btae226-B2"},{"key":"2024062809052300000_btae226-B3","doi-asserted-by":"crossref","first-page":"1075","DOI":"10.1038\/s41587-022-01220-6","article-title":"Multiplex De Bruijn graphs enable genome assembly from long, high-fidelity reads","volume":"40","author":"Bankevich","year":"2022","journal-title":"Nat Biotechnol"},{"first-page":"383","year":"2015","author":"Boucher","key":"2024062809052300000_btae226-B4"},{"key":"2024062809052300000_btae226-B5","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1007\/978-3-642-33122-0_18","volume-title":"Algorithms in Bioinformatics","author":"Bowe","year":"2012"},{"key":"2024062809052300000_btae226-B6","doi-asserted-by":"crossref","first-page":"1182","DOI":"10.1089\/cmb.2023.0186","article-title":"Gap-sensitive colinear chaining algorithms for acyclic pangenome graphs","volume":"30","author":"Chandra","year":"2023","journal-title":"J Comput Biol"},{"year":"2023","author":"Chandra","key":"2024062809052300000_btae226-B7"},{"key":"2024062809052300000_btae226-B8","doi-asserted-by":"crossref","first-page":"i146","DOI":"10.1093\/bioinformatics\/btaa446","article-title":"Distance indexing and seed clustering in sequence graphs","volume":"36","author":"Chang","year":"2020","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B9","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1186\/1748-7188-8-22","article-title":"Space-efficient and exact De Bruijn graph representation based on a Bloom filter","volume":"8","author":"Chikhi","year":"2013","journal-title":"Algorithms Mol Biol"},{"key":"2024062809052300000_btae226-B10","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1186\/s13059-021-02473-1","article-title":"Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs","volume":"22","author":"Colquhoun","year":"2021","journal-title":"Genome Biol"},{"key":"2024062809052300000_btae226-B11","doi-asserted-by":"crossref","first-page":"3376","DOI":"10.1016\/j.cell.2021.05.002","article-title":"A global metagenomic map of urban microbiomes and antimicrobial resistance","volume":"184","author":"Danko","year":"2021","journal-title":"Cell"},{"key":"2024062809052300000_btae226-B12","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1186\/s12859-020-03590-7","article-title":"SPAligner: alignment of long diverged molecular sequences to assembly graphs","volume":"21","author":"Dvorkina","year":"2020","journal-title":"BMC Bioinformatics"},{"year":"2022","author":"Eizenga","key":"2024062809052300000_btae226-B13"},{"key":"2024062809052300000_btae226-B14","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s13015-024-00251-9","article-title":"Fulgor: a fast and compact k-mer index for large-scale matching and color queries","volume":"19","author":"Fan","year":"2024","journal-title":"Algorithms Mol Biol"},{"key":"2024062809052300000_btae226-B15","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1093\/bioinformatics\/btz576","article-title":"How sequence alignment scores correspond to probability models","volume":"36","author":"Frith","year":"2019","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B16","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1038\/nbt.4227","article-title":"Variation graph toolkit improves read mapping by representing genetic variation in the reference","volume":"36","author":"Garrison","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2024062809052300000_btae226-B17","doi-asserted-by":"crossref","first-page":"1377","DOI":"10.1089\/cmb.2022.0411","article-title":"On the hardness of sequence alignment on De Bruijn graphs","volume":"29","author":"Gibney","year":"2022","journal-title":"J Comput Biol"},{"key":"2024062809052300000_btae226-B18","doi-asserted-by":"crossref","first-page":"D82","DOI":"10.1093\/nar\/gkaa1028","article-title":"The European Nucleotide Archive in 2020","volume":"49","author":"Harrison","year":"2020","journal-title":"Nucleic Acids Res"},{"first-page":"683","year":"2013","author":"Heule","key":"2024062809052300000_btae226-B19"},{"key":"2024062809052300000_btae226-B20","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1186\/s12859-018-2319-7","article-title":"BrownieAligner: accurate alignment of Illumina sequencing data to De Bruijn graphs","volume":"19","author":"Heydari","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2024062809052300000_btae226-B21","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1186\/s13059-020-1941-7","article-title":"Genotyping structural variants in pangenome graphs using the vg toolkit","volume":"21","author":"Hickey","year":"2020","journal-title":"Genome Biol"},{"key":"2024062809052300000_btae226-B22","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1186\/s13059-020-02135-8","article-title":"Bifrost: highly parallel construction and indexing of colored and compacted De Bruijn graphs","volume":"21","author":"Holley","year":"2020","journal-title":"Genome Biol"},{"key":"2024062809052300000_btae226-B23","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2011","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B24","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored De Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat Genet"},{"first-page":"104","year":"2020","author":"Ivanov","key":"2024062809052300000_btae226-B25"},{"first-page":"306","year":"2022","author":"Ivanov","key":"2024062809052300000_btae226-B26"},{"key":"2024062809052300000_btae226-B27","doi-asserted-by":"crossref","first-page":"1237","DOI":"10.1089\/cmb.2022.0266","article-title":"Algorithms for colinear chaining with overlaps and gap costs","volume":"29","author":"Jain","year":"2022","journal-title":"J Comput Biol"},{"key":"2024062809052300000_btae226-B28","first-page":"1208","article-title":"Aligning distant sequences to graphs using long seed sketches","volume":"33","author":"Joudaki","year":"2023","journal-title":"Genome Res"},{"year":"2020","author":"Karasikov","key":"2024062809052300000_btae226-B29"},{"key":"2024062809052300000_btae226-B30","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1101\/gr.276607.122","article-title":"Lossless indexing with counting De Bruijn graphs","volume":"32","author":"Karasikov","year":"2022","journal-title":"Genome Res"},{"key":"2024062809052300000_btae226-B31","doi-asserted-by":"crossref","first-page":"D387","DOI":"10.1093\/nar\/gkab1053","article-title":"The Sequence Read Archive: a decade more of explosive growth","volume":"50","author":"Katz","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024062809052300000_btae226-B32","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1093\/bioinformatics\/btab749","article-title":"Population-scale detection of non-reference sequence variants using colored De Bruijn graphs","volume":"38","author":"Krannich","year":"2021","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B33","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1093\/bioinformatics\/18.3.452","article-title":"Multiple sequence alignment using partial order graphs","volume":"18","author":"Lee","year":"2002","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B34","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B35","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1186\/s13059-020-02168-z","article-title":"The design and construction of reference pangenome graphs with minigraph","volume":"21","author":"Li","year":"2020","journal-title":"Genome Biol"},{"key":"2024062809052300000_btae226-B36","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1186\/s12859-016-1103-9","article-title":"Read mapping on De Bruijn graphs","volume":"17","author":"Limasset","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2024062809052300000_btae226-B37","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1186\/s13059-020-02237-3","article-title":"BlastFrost: fast querying of 100,000s of bacterial genomes in Bifrost graphs","volume":"22","author":"Luhmann","year":"2021","journal-title":"Genome Biol"},{"key":"2024062809052300000_btae226-B38","doi-asserted-by":"crossref","first-page":"btad460","DOI":"10.1093\/bioinformatics\/btad460","article-title":"Chaining for accurate alignment of erroneous long reads to acyclic variation graphs","volume":"39","author":"Ma","year":"2023","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3301312","article-title":"Sparse dynamic programming on DAGs with small width","volume":"15","author":"M\u00e4kinen","year":"2019","journal-title":"ACM Trans Algorithms"},{"key":"2024062809052300000_btae226-B40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1101\/gr.260604.119","article-title":"Data structures based on k-mers for querying large collections of sequencing data sets","volume":"31","author":"Marchet","year":"2021","journal-title":"Genome Res"},{"key":"2024062809052300000_btae226-B41","doi-asserted-by":"crossref","first-page":"i177","DOI":"10.1093\/bioinformatics\/btaa487","article-title":"REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets","volume":"36","author":"Marchet","year":"2020","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B42","doi-asserted-by":"crossref","first-page":"1028","DOI":"10.1089\/cmb.2006.13.1028","article-title":"A fast and symmetric dust implementation to mask low-complexity DNA sequences","volume":"13","author":"Morgulis","year":"2006","journal-title":"J Comput Biol"},{"key":"2024062809052300000_btae226-B43","doi-asserted-by":"crossref","first-page":"lqac092","DOI":"10.1093\/nargab\/lqac092","article-title":"PBSIM3: a simulator for all types of PacBio and ONT long reads","volume":"4","author":"Ono","year":"2022","journal-title":"NAR Genom Bioinform"},{"key":"2024062809052300000_btae226-B44","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/j.cels.2018.05.021","article-title":"Mantis: a fast, small, and exact large-scale sequence-search index","volume":"7","author":"Pandey","year":"2018","journal-title":"Cell Syst"},{"key":"2024062809052300000_btae226-B45","doi-asserted-by":"crossref","first-page":"1746","DOI":"10.1101\/gr.276601.122","article-title":"Assembler artifacts include misassembly because of unsafe unitigs and underassembly because of bidirected graphs","volume":"32","author":"Rahman","year":"2022","journal-title":"Genome Res"},{"first-page":"12:1","year":"2023","author":"Rajput","key":"2024062809052300000_btae226-B46"},{"key":"2024062809052300000_btae226-B47","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1186\/s13059-020-02157-2","article-title":"GraphAligner: rapid and versatile sequence-to-graph alignment","volume":"21","author":"Rautiainen","year":"2020","journal-title":"Genome Biol"},{"key":"2024062809052300000_btae226-B48","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1038\/s41586-021-03451-0","article-title":"Towards complete and error-free genome assemblies of all vertebrate species","volume":"592","author":"Rhie","year":"2021","journal-title":"Nature"},{"key":"2024062809052300000_btae226-B49","doi-asserted-by":"crossref","first-page":"i118","DOI":"10.1093\/bioinformatics\/btx236","article-title":"Modelling haplotypes with respect to reference cohort variation graphs","volume":"33","author":"Rosen","year":"2017","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B50","doi-asserted-by":"crossref","first-page":"D29","DOI":"10.1093\/nar\/gkac1032","article-title":"Database resources of the National Center for Biotechnology Information in 2023","volume":"51","author":"Sayers","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024062809052300000_btae226-B51","doi-asserted-by":"crossref","first-page":"2266","DOI":"10.1093\/bioinformatics\/btab077","article-title":"Detecting high-scoring local alignments in pangenome graphs","volume":"37","author":"Schulz","year":"2021","journal-title":"Bioinformatics"},{"key":"2024062809052300000_btae226-B52","doi-asserted-by":"crossref","first-page":"1175","DOI":"10.1101\/gr.277637.122","article-title":"Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic","volume":"33","author":"Shaw","year":"2023","journal-title":"Genome Res"},{"first-page":"13","year":"2017","author":"Sir\u00e9n","key":"2024062809052300000_btae226-B53"},{"key":"2024062809052300000_btae226-B54","doi-asserted-by":"crossref","first-page":"abg8871","DOI":"10.1126\/science.abg8871","article-title":"Pangenomics enables genotyping of known structural variants in 5202 diverse genomes","volume":"374","author":"Sir\u00e9n","year":"2021","journal-title":"Science"},{"key":"2024062809052300000_btae226-B55","doi-asserted-by":"crossref","first-page":"2556","DOI":"10.1093\/bioinformatics\/bty157","article-title":"Integrating long-range connectivity information into De Bruijn graphs","volume":"34","author":"Turner","year":"2018","journal-title":"Bioinformatics"},{"first-page":"15:1","year":"2022","author":"Wei","key":"2024062809052300000_btae226-B56"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i337\/58354999\/btae226.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i337\/58354999\/btae226.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:30:30Z","timestamp":1719567030000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_1\/i337\/7700889"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,28]]},"references-count":56,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2024,6,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae226","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,6,28]]}}}