{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,9]],"date-time":"2026-02-09T21:43:56Z","timestamp":1770673436609,"version":"3.49.0"},"reference-count":31,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T00:00:00Z","timestamp":1764892800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Computational Biology"],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:p>\n                    Given a set of exact matches, chaining consists of finding those which are colinear with respect to the reference. However, this requires superlinear time in the number of matches and is not suitable for approaches such as read classification using a pangenome due to the increased multiplicity of matches. While compressed full-text indexes enable efficient read classification against a pangenome or tree-of-life index, past work on compressed index classification captures only fine-grained colinearity of exact matches. This fails to capture whether seeds appear colinearly in the reference, which we denote as \u201ccoarse-grained colinearity,\u201d as traditionally described by chaining. We present a novel data structure,\n                    <jats:monospace>col-BWT<\/jats:monospace>\n                    (for colinear Burrows\u2013Wheeler Transform [BWT]), that additionally obtains coarse-grained colinearity (\u201cchain\u201d) statistics. We start with a collection of strings, avoiding the multiple-alignment step required by graph approaches. We rapidly compute multi-maximal unique matches (multi-MUMs) and identify BWT sub-runs that correspond to these multi-MUMs. From these, we select those that can be \u201ctunneled,\u201d in that they satisfy the concept of a tunnel in BWT compression, and mark them with their corresponding multi-MUM identifier. We call this approach the\n                    <jats:monospace>col-BWT<\/jats:monospace>\n                    due to its ability to recognize colinearity with respect to the reference. This yields an\n                    <jats:italic toggle=\"yes\">O<\/jats:italic>\n                    (\n                    <jats:italic toggle=\"yes\">r\u2009+\u2009n\/d<\/jats:italic>\n                    )-space index for a collection of\n                    <jats:italic toggle=\"yes\">d<\/jats:italic>\n                    sequences having a length-\n                    <jats:italic toggle=\"yes\">n<\/jats:italic>\n                    BWT consisting of\n                    <jats:italic toggle=\"yes\">r<\/jats:italic>\n                    maximal equal-character runs. Using\n                    <jats:monospace>col-BWT<\/jats:monospace>\n                    , we simultaneously compute both fine-grained statistics and coarse-grained chain statistics in linear time with respect to query length, faster than traditional superlinear chaining approaches. We found that this substantially improves classification accuracy compared with past compressed-indexing approaches and reaches the same level of accuracy as less efficient alignment-based methods.\n                  <\/jats:p>","DOI":"10.1177\/15578666251400784","type":"journal-article","created":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T05:48:56Z","timestamp":1764913736000},"page":"168-183","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Col-BWT: Pangenomic Seed Chaining with Maximal Matches Improves Read Classification"],"prefix":"10.1177","volume":"33","author":[{"given":"Nathaniel K.","family":"Brown","sequence":"first","affiliation":[{"name":"Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vikram S.","family":"Shivakumar","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ben","family":"Langmead","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2025,12,5]]},"reference":[{"key":"e_1_3_4_2_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-023-02958-1"},{"key":"e_1_3_4_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.isci.2021.102696"},{"key":"e_1_3_4_4_1","unstructured":"Baier U. On Undetected Redundancy in the Burrows-Wheeler Transform. In: 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018) volume 105. CPM; 2018 pp. 1\u20133."},{"key":"e_1_3_4_5_1","doi-asserted-by":"crossref","unstructured":"Baier U Dede K. BWT tunnel planning is hard but manageable. In: 2019 Data Compression Conference (DCC). 2019; pp. 142\u2013151.","DOI":"10.1109\/DCC.2019.00022"},{"key":"e_1_3_4_6_1","doi-asserted-by":"crossref","unstructured":"Bal\u00e1\u017e A Gagie T Goga A et al. Wheeler maps. In: Latin American Symposium on Theoretical Informatics. 2024; pp. 178\u2013192.","DOI":"10.1007\/978-3-031-55598-5_12"},{"key":"e_1_3_4_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2019.08.005"},{"key":"e_1_3_4_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2024.105155"},{"key":"e_1_3_4_9_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13015-019-0148-5"},{"key":"e_1_3_4_10_1","article-title":"A block-sorting lossless data compression algorithm","author":"Burrows M","year":"1994","unstructured":"Burrows M, , Wheeler DJ. A block-sorting lossless data compression algorithm. Digital Equipment Corporation, 1994.","journal-title":"Digital Equipment Corporation"},{"key":"e_1_3_4_11_1","unstructured":"Deogun JS Yang J Ma F. Emagen: An efficient approach to multiple whole genome alignment. In: Proceedings of the second conference on Asia-Pacific bioinformatics Volume 29. 2004; pp. 113\u2013122."},{"key":"e_1_3_4_12_1","first-page":"2025.02.25.6401","article-title":"Run-length compressed metagenomic read classification with smem-finding and tagging","author":"Depuydt L","year":"2025","unstructured":"Depuydt L, , Ahmed OY, , Fostier J, et al. Run-length compressed metagenomic read classification with smem-finding and tagging. bioRxiv, 2025:2025.02.25.640119.","journal-title":"bioRxiv"},{"key":"e_1_3_4_13_1","first-page":"2025.05.12.6535","article-title":"Lossless pangenome indexing using tag arrays","author":"Eskandar P","year":"2025","unstructured":"Eskandar P, , Paten B, , Siren J. Lossless pangenome indexing using tag arrays. bioRxiv, 2025:2025.05.12.653561.","journal-title":"bioRxiv"},{"key":"e_1_3_4_14_1","doi-asserted-by":"crossref","unstructured":"Gagie TIT Manzini G Navarro G et al. Practical random access to SLP-compressed texts. In: International Symposium on String Processing and Information Retrieval. 2020; pp. 221\u2013231.","DOI":"10.1007\/978-3-030-59212-7_16"},{"key":"e_1_3_4_15_1","doi-asserted-by":"crossref","unstructured":"Gagie T Navarro G Prezza N. Optimal-time text indexing in BWT-runs bounded space. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. 2018; pp. 1459\u20131477.","DOI":"10.1137\/1.9781611975031.96"},{"key":"e_1_3_4_16_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkad988"},{"key":"e_1_3_4_17_1","doi-asserted-by":"crossref","unstructured":"Kuhnle A Mun T Boucher C et al. Efficient construction of a complete index for pan-genomics read alignment. In: International Conference on Research in Computational Molecular Biology. 2019; pp. 158\u2013173.","DOI":"10.1007\/978-3-030-17083-7_10"},{"key":"e_1_3_4_18_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bty191"},{"key":"e_1_3_4_19_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-023-05896-x"},{"key":"e_1_3_4_20_1","doi-asserted-by":"publisher","DOI":"10.1137\/0222058"},{"key":"e_1_3_4_21_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41587-020-0422-6"},{"key":"e_1_3_4_22_1","unstructured":"Muthukrishnan S. Efficient algorithms for document retrieval problems. In: SODA volume 2. 2002 pp. 657\u2013666."},{"key":"e_1_3_4_23_1","unstructured":"Nishimoto T Tabei Y. Optimal-Time Queries on BWT-Runs Compressed Indexes. In: 48th International Colloquium on Automata Languages and Programming (ICALP 2021) volume 198. 2021; p. 101."},{"key":"e_1_3_4_24_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.abj6987"},{"key":"e_1_3_4_25_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btaf104"},{"key":"e_1_3_4_26_1","doi-asserted-by":"publisher","DOI":"10.1093\/nargab\/lqac092"},{"key":"e_1_3_4_27_1","doi-asserted-by":"publisher","DOI":"10.1089\/cmb.2021.0290"},{"key":"e_1_3_4_28_1","doi-asserted-by":"publisher","DOI":"10.26502\/jbb.2642-91280067"},{"key":"e_1_3_4_29_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btae213"},{"key":"e_1_3_4_30_1","article-title":"Mumemto: Efficient maximal matching across pangenomes","author":"Shivakumar VS","year":"2025","unstructured":"Shivakumar VS, , Langmead B. Mumemto: Efficient maximal matching across pangenomes. bioRxiv, 2025.","journal-title":"bioRxiv"},{"key":"e_1_3_4_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0190(83)90075-3"},{"key":"e_1_3_4_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.isci.2024.111464"}],"container-title":["Journal of Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/15578666251400784","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/15578666251400784","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/15578666251400784","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,9]],"date-time":"2026-02-09T05:47:02Z","timestamp":1770616022000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.1177\/15578666251400784"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,5]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["10.1177\/15578666251400784"],"URL":"https:\/\/doi.org\/10.1177\/15578666251400784","relation":{},"ISSN":["1066-5277","1557-8666"],"issn-type":[{"value":"1066-5277","type":"print"},{"value":"1557-8666","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,5]]}}}