{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T03:25:40Z","timestamp":1780543540450,"version":"3.54.1"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T00:00:00Z","timestamp":1719532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"French ANR","award":["AGATE ANR-21-CE45-0012"],"award-info":[{"award-number":["AGATE ANR-21-CE45-0012"]}]},{"name":"French ANR","award":["Find-RNA ANR-23-CE45-0003\u201301"],"award-info":[{"award-number":["Find-RNA ANR-23-CE45-0003\u201301"]}]},{"name":"ENS Rennes"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Summary<\/jats:title>\n                  <jats:p>In this article, we introduce the Conway\u2013Bromage\u2013Lyndon (CBL) structure, a compressed, dynamic and exact method for representing k-mer sets. Originating from Conway and Bromage\u2019s concept, CBL innovatively employs the smallest cyclic rotations of k-mers, akin to Lyndon words, to leverage lexicographic redundancies. In order to support dynamic operations and set operations, we propose a dynamic bit vector structure that draws a parallel with Elias-Fano\u2019s scheme. This structure is encapsulated in a Rust library, demonstrating a balanced blend of construction efficiency, cache locality, and compression. Our findings suggest that CBL outperforms existing dynamic k-mer set methods. Unique to this work, CBL stands out as the only known exact k-mer structure offering in-place set operations. Its different combined abilities position it as a flexible Swiss knife structure for k-mer set management.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/imartayan\/CBL.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae217","type":"journal-article","created":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:34:26Z","timestamp":1719567266000},"page":"i48-i57","source":"Crossref","is-referenced-by-count":9,"title":["Conway\u2013Bromage\u2013Lyndon (CBL): an exact, dynamic representation of <i>k<\/i>-mer sets"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-3007-6168","authenticated-orcid":false,"given":"Igor","family":"Martayan","sequence":"first","affiliation":[{"name":"Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL , Lille, F-59000, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1761-4354","authenticated-orcid":false,"given":"Bastien","family":"Cazaux","sequence":"additional","affiliation":[{"name":"Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL , Lille, F-59000, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0669-4141","authenticated-orcid":false,"given":"Antoine","family":"Limasset","sequence":"additional","affiliation":[{"name":"Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL , Lille, F-59000, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7235-7346","authenticated-orcid":false,"given":"Camille","family":"Marchet","sequence":"additional","affiliation":[{"name":"Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL , Lille, F-59000, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2024,6,28]]},"reference":[{"key":"2024062809073926100_btae217-B1","doi-asserted-by":"crossref","first-page":"4363","DOI":"10.21105\/joss.04363","article-title":"RedOak: A reference-free and alignment-free structurefor indexing a collection of similar genomes","volume":"7","author":"Agret","year":"2022","journal-title":"JOSS"},{"key":"2024062809073926100_btae217-B2","doi-asserted-by":"crossref","first-page":"4067","DOI":"10.1016\/j.csbj.2021.06.047","article-title":"Buffering updates enables efficient dynamic de Bruijn graphs","volume":"19","author":"Alanko","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2024062809073926100_btae217-B3","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1137\/1.9781611977714.20","article-title":"Small searchable \u03ba-spectra via subset rank queries on the spectral Burrows\u2013Wheeler transform","volume-title":"SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23)","author":"Alanko","year":"2023"},{"key":"2024062809073926100_btae217-B4","doi-asserted-by":"crossref","first-page":"i260","DOI":"10.1093\/bioinformatics\/btad233","article-title":"Themisto: a scalable colored k-mer index for sensitive pseudoalignment against hundreds of thousands of bacterial genomes","volume":"39","author":"Alanko","year":"2023","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B5","doi-asserted-by":"crossref","first-page":"1946","DOI":"10.1093\/bioinformatics\/btaa546","article-title":"Succinct dynamic de Bruijn graphs","volume":"37","author":"Alipanahi","year":"2021","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B6","doi-asserted-by":"crossref","first-page":"3155","DOI":"10.1093\/bioinformatics\/btac142","article-title":"An incrementally updatable and scalable system for large-scale sequence search using the bentley-saxe transformation","volume":"38","author":"Almodaresi","year":"2022","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B7","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1186\/s13059-023-03098-2","article-title":"Comparing methods for constructing and representing human pangenome graphs","volume":"24","author":"Andreace","year":"2023","journal-title":"Genome Biol"},{"key":"2024062809073926100_btae217-B8","doi-asserted-by":"crossref","first-page":"2117","DOI":"10.14778\/3598581.3598586","article-title":"Text indexing for long patterns: anchors are all you need","volume":"16","author":"Ayad","year":"2023","journal-title":"Proc VLDB Endow"},{"key":"2024062809073926100_btae217-B9","author":"Bille"},{"key":"2024062809073926100_btae217-B10","author":"Bowe"},{"key":"2024062809073926100_btae217-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1748-7188-8-22","article-title":"Space-efficient and exact de Bruijn graph representation based on a bloom filter","volume":"8","author":"Chikhi","year":"2013","journal-title":"Algorithms Mol Biol"},{"key":"2024062809073926100_btae217-B12","author":"Chikhi"},{"key":"2024062809073926100_btae217-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3445967","article-title":"Data structures to represent a set of k-long DNA sequences","volume":"54","author":"Chikhi","year":"2021","journal-title":"ACM Comput Surv"},{"key":"2024062809073926100_btae217-B14","doi-asserted-by":"crossref","first-page":"1937","DOI":"10.1093\/bioinformatics\/bts297","article-title":"Gossamer\u2014a resource-efficient de novo assembler","volume":"28","author":"Conway","year":"2012","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B15","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1093\/bioinformatics\/btq697","article-title":"Succinct data structures for assembling large genomes","volume":"27","author":"Conway","year":"2011","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B16","doi-asserted-by":"crossref","first-page":"4189","DOI":"10.1093\/bioinformatics\/bty500","article-title":"Practical dynamic de Bruijn graphs","volume":"34","author":"Crawford","year":"2018","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B17","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1109\/DCC52660.2022.00033","volume-title":"2022 Data Compression Conference (DCC), Snowbird, UT, USA","author":"D\u00f6nges","year":"2022"},{"key":"2024062809073926100_btae217-B18","author":"Fan"},{"key":"2024062809073926100_btae217-B19","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s13015-024-00251-9","article-title":"Fulgor: a fast and compact k-mer index for large-scale matching and color queries","volume":"19","author":"Fan","year":"2024","journal-title":"Algorithms Mol Biol"},{"key":"2024062809073926100_btae217-B20","doi-asserted-by":"crossref","first-page":"1287","DOI":"10.1002\/spe.2198","article-title":"Optimized succinct data structures for massive data","volume":"44","author":"Gog","year":"2014","journal-title":"Softw Pract Exp"},{"key":"2024062809073926100_btae217-B21","doi-asserted-by":"crossref","first-page":"2157","DOI":"10.1109\/TCBB.2019.2913932","article-title":"degsm: memory scalable construction of large scale de Bruijn graph","volume":"18","author":"Guo","year":"2021","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2024062809073926100_btae217-B22","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1186\/s13059-020-02135-8","article-title":"Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs","volume":"21","author":"Holley","year":"2020","journal-title":"Genome Biol"},{"key":"2024062809073926100_btae217-B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13015-016-0066-8","article-title":"Bloom filter trie: an alignment-free and reference-free data structure for pan-genome storage","volume":"11","author":"Holley","year":"2016","journal-title":"Algorithms Mol Biol"},{"key":"2024062809073926100_btae217-B24","author":"Karasikov","year":"2020"},{"key":"2024062809073926100_btae217-B25","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1186\/s13059-022-02743-6","article-title":"Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with cuttlefish 2","volume":"23","author":"Khan","year":"2022","journal-title":"Genome Biol"},{"key":"2024062809073926100_btae217-B26","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1109\/ICDE.2013.6544812","volume-title":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","author":"Leis","year":"2013"},{"key":"2024062809073926100_btae217-B27","doi-asserted-by":"crossref","first-page":"vbac029","DOI":"10.1093\/bioadv\/vbac029","article-title":"Kmtricks: efficient and flexible construction of bloom filters for large sequencing data collections","volume":"2","author":"Lemane","year":"2022","journal-title":"Bioinform Adv"},{"key":"2024062809073926100_btae217-B28","author":"Limasset"},{"key":"2024062809073926100_btae217-B29","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511566097","volume-title":"Combinatorics on Words","author":"Lothaire","year":"1997"},{"key":"2024062809073926100_btae217-B30","author":"Loukides"},{"key":"2024062809073926100_btae217-B31","doi-asserted-by":"crossref","first-page":"i177","DOI":"10.1093\/bioinformatics\/btaa487","article-title":"Reindeer: efficient indexing of k-mer presence and abundance in sequencing datasets","volume":"36","author":"Marchet","year":"2020","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B32","doi-asserted-by":"crossref","first-page":"1184","DOI":"10.1002\/spe.2791","article-title":"Compact Fenwick trees for dynamic ranking and selection","volume":"50","author":"Marchini","year":"2020","journal-title":"Softw Pract Exp"},{"key":"2024062809073926100_btae217-B33","doi-asserted-by":"crossref","first-page":"3476","DOI":"10.1093\/bioinformatics\/btu756","article-title":"Splitmem: a graphical algorithm for pan-genome analysis with suffix skips","volume":"30","author":"Marcus","year":"2014","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B34","doi-asserted-by":"crossref","first-page":"i51","DOI":"10.1093\/bioinformatics\/btz350","article-title":"Building large updatable colored de Bruijn graphs via merging","volume":"35","author":"Muggli","year":"2019","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B35","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1126\/science.abj6987","article-title":"The complete sequence of a human genome","volume":"376","author":"Nurk","year":"2022","journal-title":"Science"},{"key":"2024062809073926100_btae217-B36","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1093\/bioinformatics\/btx636","article-title":"Squeakr: an exact and approximate k-mer counting system","volume":"34","author":"Pandey","year":"2018","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B37","doi-asserted-by":"crossref","first-page":"i185","DOI":"10.1093\/bioinformatics\/btac245","article-title":"Sparse and skew hashing of k-mers","volume":"38","author":"Pibiri","year":"2022","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B38","doi-asserted-by":"crossref","first-page":"101756","DOI":"10.1016\/j.is.2021.101756","article-title":"Rank\/select queries over mutable bitmaps","volume":"99","author":"Pibiri","year":"2021","journal-title":"Inf Syst"},{"key":"2024062809073926100_btae217-B39","author":"Pibiri"},{"key":"2024062809073926100_btae217-B40","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.jda.2017.01.003","article-title":"Practical algorithms to rank necklaces, Lyndon words, and de Bruijn sequences","volume":"43","author":"Sawada","year":"2017","journal-title":"J Discret Algorithms"},{"key":"2024062809073926100_btae217-B41","author":"Shibuya"},{"key":"2024062809073926100_btae217-B42","author":"Sladk\u1ef3","year":"2023"},{"key":"2024062809073926100_btae217-B43","author":"Vigna"},{"key":"2024062809073926100_btae217-B44","doi-asserted-by":"crossref","first-page":"e87","DOI":"10.24072\/pcjournal.323","article-title":"General encoding of canonical k-mers","volume":"3","author":"Wittler","year":"2023","journal-title":"Peer Community J"},{"key":"2024062809073926100_btae217-B45","doi-asserted-by":"crossref","first-page":"i119","DOI":"10.1093\/bioinformatics\/btaa472","article-title":"Improved design and analysis of practical minimizers","volume":"36","author":"Zheng","year":"2020","journal-title":"Bioinformatics"},{"key":"2024062809073926100_btae217-B46","author":"Zhou"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i48\/58354678\/btae217.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i48\/58354678\/btae217.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:35:02Z","timestamp":1719567302000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_1\/i48\/7700845"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,28]]},"references-count":46,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2024,6,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae217","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,6,28]]}}}