{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,2]],"date-time":"2025-09-02T11:04:39Z","timestamp":1756811079968,"version":"3.41.2"},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2025,2,3]],"date-time":"2025-02-03T00:00:00Z","timestamp":1738540800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100020806","name":"ANR","doi-asserted-by":"publisher","award":["ANR-22-CE45-0007","ANR-19-CE45-0008","PIA\/ANR16-CONV-0005","ANR-19-P3IA-0001","ANR-21-CE46-0012-03"],"award-info":[{"award-number":["ANR-22-CE45-0007","ANR-19-CE45-0008","PIA\/ANR16-CONV-0005","ANR-19-P3IA-0001","ANR-21-CE46-0012-03"]}],"id":[{"id":"10.13039\/100020806","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100018693","name":"Horizon Europe","doi-asserted-by":"publisher","award":["872539","956229","101047160","101088572"],"award-info":[{"award-number":["872539","956229","101047160","101088572"]}],"id":[{"id":"10.13039\/100018693","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Summary<\/jats:title>\n                  <jats:p>MUSET is a novel set of utilities designed to efficiently construct abundance unitig matrices from sequencing data. Unitig matrices extend the concept of k-mer matrices by merging overlapping k-mers that unambiguously belong to the same sequence. MUSET addresses the limitations of current software by integrating k-mer counting and unitig extraction to generate unitig matrices containing abundance values, as opposed to only presence\u2013absence in previous tools. These matrices preserve variations between samples while reducing disk space and the number of rows compared to k-mer matrices. We evaluated MUSET\u2019s performance using datasets derived from a 618-GB collection of ancient oral sequencing samples, producing a filtered unitig matrix that records abundances in &amp;lt;10\u2009h and 20 GB memory.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>MUSET is open source and publicly available under the AGPL-3.0 licence in GitHub at https:\/\/github.com\/CamilaDuitama\/muset. Source code is implemented in C++ and provided with kmat_tools, a collection of tools for processing k-mer matrices. Version v0.5.1 is available on Zenodo with DOI 10.5281\/zenodo.14164801.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf054","type":"journal-article","created":{"date-parts":[[2025,2,3]],"date-time":"2025-02-03T14:48:11Z","timestamp":1738594091000},"source":"Crossref","is-referenced-by-count":1,"title":["MUSET: set of utilities for constructing abundance unitig matrices from sequencing data"],"prefix":"10.1093","volume":"41","author":[{"given":"Riccardo","family":"Vicedomini","sequence":"first","affiliation":[{"name":"GenScale, Universit\u00e9 de Rennes, Inria RBA, CNRS UMR 6074 , F-35000 Rennes,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0566-200X","authenticated-orcid":false,"given":"Francesco","family":"Andreace","sequence":"additional","affiliation":[{"name":"Institut Pasteur, Universit\u00e9 Paris Cit\u00e9 , Sequence Bioinformatics Unit, F-75015 Paris,","place":["France"]},{"name":"Sorbonne Universit\u00e9 , Coll\u00e8ge Doctoral, F-75005 Paris,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0930-8920","authenticated-orcid":false,"given":"Yoann","family":"Dufresne","sequence":"additional","affiliation":[{"name":"Institut Pasteur, Universit\u00e9 Paris Cit\u00e9 , Sequence Bioinformatics Unit, F-75015 Paris,","place":["France"]},{"name":"Sorbonne Universit\u00e9 , Coll\u00e8ge Doctoral, F-75005 Paris,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1099-8735","authenticated-orcid":false,"given":"Rayan","family":"Chikhi","sequence":"additional","affiliation":[{"name":"Institut Pasteur, Universit\u00e9 Paris Cit\u00e9 , Sequence Bioinformatics Unit, F-75015 Paris,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6805-2331","authenticated-orcid":false,"given":"Camila","family":"Duitama Gonz\u00e1lez","sequence":"additional","affiliation":[{"name":"Institut Pasteur, Universit\u00e9 Paris Cit\u00e9 , Sequence Bioinformatics Unit, F-75015 Paris,","place":["France"]}]}],"member":"286","published-online":{"date-parts":[[2025,2,3]]},"reference":[{"key":"2025031206274357500_btaf054-B1","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1186\/s13059-023-03098-2","article-title":"Comparing methods for constructing and representing human pangenome graphs","volume":"24","author":"Andreace","year":"2023","journal-title":"Genome Biol"},{"key":"2025031206274357500_btaf054-B2","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1186\/s12864-023-09667-w","article-title":"Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data","volume":"24","author":"Castelli","year":"2023","journal-title":"BMC Genomics"},{"key":"2025031206274357500_btaf054-B3","doi-asserted-by":"crossref","first-page":"i201","DOI":"10.1093\/bioinformatics\/btw279","article-title":"Compacting de Bruijn graphs from sequencing data quickly and in low memory","volume":"32","author":"Chikhi","year":"2016","journal-title":"Bioinformatics"},{"key":"2025031206274357500_btaf054-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3445967","article-title":"Data structures to represent a set of k-long DNA sequences","volume":"54","author":"Chikhi","year":"2022","journal-title":"ACM Comput Surv"},{"year":"2024","author":"Chikhi","key":"2025031206274357500_btaf054-B5","doi-asserted-by":"publisher","DOI":"10.1101\/2024.07.30.605881"},{"key":"2025031206274357500_btaf054-B6","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1038\/nbt.2023","article-title":"How to apply de Bruijn graphs to genome assembly","volume":"29","author":"Compeau","year":"2011","journal-title":"Nat Biotechnol"},{"key":"2025031206274357500_btaf054-B7","first-page":"1198","article-title":"Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT","volume":"33","author":"Cracco","year":"2023","journal-title":"Genome Res"},{"key":"2025031206274357500_btaf054-B8","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1186\/s40168-023-01670-3","article-title":"decOM: similarity-based microbial source tracking of ancient oral samples using k-mer-based methods","volume":"11","author":"Duitama Gonz\u00e1lez","year":"2023","journal-title":"Microbiome"},{"author":"Frouin","key":"2025031206274357500_btaf054-B9","doi-asserted-by":"publisher","DOI":"10.1101\/2023.03.28.534531"},{"key":"2025031206274357500_btaf054-B10","doi-asserted-by":"crossref","first-page":"108057","DOI":"10.1016\/j.isci.2023.108057","article-title":"aKmerBroom: ancient oral DNA decontamination using Bloom filters on k-mer sets","volume":"26","author":"Gonz\u00e1lez","year":"2023","journal-title":"Iscience"},{"key":"2025031206274357500_btaf054-B11","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1186\/s13059-020-02135-8","article-title":"Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs","volume":"21","author":"Holley","year":"2020","journal-title":"Genome Biol"},{"year":"2024","author":"Hunt","key":"2025031206274357500_btaf054-B12","doi-asserted-by":"publisher","DOI":"10.1101\/2024.03.08.584059"},{"key":"2025031206274357500_btaf054-B13","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat Genet"},{"key":"2025031206274357500_btaf054-B14","doi-asserted-by":"crossref","first-page":"e1007758","DOI":"10.1371\/journal.pgen.1007758","article-title":"A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between k-mers and genetic events","volume":"14","author":"Jaillard","year":"2018","journal-title":"PLoS Genet"},{"key":"2025031206274357500_btaf054-B15","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1186\/s13059-022-02743-6","article-title":"Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2","volume":"23","author":"Khan","year":"2022","journal-title":"Genome Biol"},{"key":"2025031206274357500_btaf054-B16","doi-asserted-by":"crossref","first-page":"vbac029","DOI":"10.1093\/bioadv\/vbac029","article-title":"Kmtricks: efficient and flexible construction of Bloom filters for large sequencing data collections","volume":"2","author":"Lemane","year":"2022","journal-title":"Bioinform Adv"},{"key":"2025031206274357500_btaf054-B17","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1038\/s43588-024-00596-6","article-title":"Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA","volume":"4","author":"Lemane","year":"2024","journal-title":"Nat Comput Sci"},{"key":"2025031206274357500_btaf054-B18","doi-asserted-by":"publisher","first-page":"e148","DOI":"10.1093\/nar\/gkw655","article-title":"SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence","volume":"44","author":"Lopez-Maestre","year":"2016","journal-title":"Nucleic Acids Res"},{"author":"Marchet","key":"2025031206274357500_btaf054-B19"},{"key":"2025031206274357500_btaf054-B20","doi-asserted-by":"publisher","first-page":"i177","DOI":"10.1093\/bioinformatics\/btaa487","article-title":"REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets","volume":"36","author":"Marchet","year":"2020","journal-title":"Bioinformatics"},{"key":"2025031206274357500_btaf054-B21","doi-asserted-by":"crossref","first-page":"i185","DOI":"10.1093\/bioinformatics\/btac245","article-title":"Sparse and skew hashing of k-mers","volume":"38","author":"Pibiri","year":"2022","journal-title":"Bioinformatics"},{"key":"2025031206274357500_btaf054-B22","doi-asserted-by":"publisher","DOI":"10.1101\/2023.01.31.23285241","article-title":"A k-mer based transcriptomics analysis for NPM1-mutated AML","volume-title":"medRxiv","author":"Silva"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf054\/61740685\/btaf054.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/3\/btaf054\/61740685\/btaf054.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/3\/btaf054\/61740685\/btaf054.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,12]],"date-time":"2025-03-12T06:28:00Z","timestamp":1741760880000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf054\/7997265"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,2,3]]},"references-count":22,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf054","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,3]]},"published":{"date-parts":[[2025,2,3]]},"article-number":"btaf054"}}