{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T05:23:34Z","timestamp":1744089814386},"reference-count":34,"publisher":"World Scientific Pub Co Pte Ltd","issue":"05","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Bioinform. Comput. Biol."],"published-print":{"date-parts":[[2019,10]]},"abstract":"<jats:p>Many bioinformatics tools heavily rely on [Formula: see text]-mer dictionaries to describe the composition of sequences and allow for faster reference-free algorithms or look-ups. Unfortunately, naive [Formula: see text]-mer dictionaries are very memory-inefficient, requiring very large amount of storage space to save each [Formula: see text]-mer. This problem is generally worsened by the necessity of an index for fast queries. In this work, we discuss how to build an indexed linear reference containing a set of input [Formula: see text]-mers and its application to the compression of quality scores in FASTQ files. Most of the entropies of sequencing data lie in the quality scores, and thus they are difficult to compress. Here, we present an application to improve the compressibility of quality values while preserving the information for SNP calling. We show how a dictionary of significant [Formula: see text]-mers, obtained from SNP databases and multiple genomes, can be indexed in linear space and used to improve the compression of quality value.<\/jats:p><jats:p>Availability: The software is freely available at https:\/\/github.com\/yhhshb\/yalff .<\/jats:p>","DOI":"10.1142\/s0219720019400110","type":"journal-article","created":{"date-parts":[[2019,10,14]],"date-time":"2019-10-14T03:49:54Z","timestamp":1571024994000},"page":"1940011","source":"Crossref","is-referenced-by-count":5,"title":["Indexing<i>k<\/i>-mers in linear space for quality value compression"],"prefix":"10.1142","volume":"17","author":[{"given":"Yoshihiro","family":"Shibuya","sequence":"first","affiliation":[{"name":"Department of Information Engineering, University of Padua, via Gradenigo 6B, Padua, Italy"},{"name":"Laboratoire d\u2019Informatique Gaspard-Monge (LIGM), University Paris-Est Marne-la-Vall\u00e9e, B\u00e2timent Copernic - 5, bd Descartes, Champs sur Marne, France"}]},{"given":"Matteo","family":"Comin","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, University of Padua, via Gradenigo 6B, Padua, Italy"}]}],"member":"219","published-online":{"date-parts":[[2019,12,20]]},"reference":[{"key":"S0219720019400110BIB001","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-015-0709-7"},{"key":"S0219720019400110BIB003","volume-title":"Poster at HiTSeq 2017","author":"B\u0159inda K","year":"2017"},{"key":"S0219720019400110BIB004","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0059190"},{"key":"S0219720019400110BIB006","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw279"},{"key":"S0219720019400110BIB007","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu183"},{"key":"S0219720019400110BIB008","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-44753-6_1"},{"key":"S0219720019400110BIB009","doi-asserted-by":"publisher","DOI":"10.1186\/s13015-014-0029-x"},{"key":"S0219720019400110BIB010","doi-asserted-by":"publisher","DOI":"10.1038\/nature11632"},{"key":"S0219720019400110BIB011","doi-asserted-by":"publisher","DOI":"10.1101\/gr.8.3.175"},{"key":"S0219720019400110BIB012","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.2000.892127"},{"key":"S0219720019400110BIB013","doi-asserted-by":"publisher","DOI":"10.1145\/1082036.1082039"},{"key":"S0219720019400110BIB014","doi-asserted-by":"publisher","DOI":"10.1186\/s12864-017-4273-6"},{"key":"S0219720019400110BIB015","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-018-2415-8"},{"key":"S0219720019400110BIB016","doi-asserted-by":"publisher","DOI":"10.1186\/s13015-018-0125-4"},{"key":"S0219720019400110BIB017","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw385"},{"key":"S0219720019400110BIB019","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btt257"},{"key":"S0219720019400110BIB020","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp324"},{"key":"S0219720019400110BIB021","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp698"},{"key":"S0219720019400110BIB022","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btv330"},{"key":"S0219720019400110BIB023","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btr011"},{"key":"S0219720019400110BIB024","doi-asserted-by":"publisher","DOI":"10.5220\/0006150500590067"},{"issue":"22","key":"S0219720019400110BIB025","doi-asserted-by":"crossref","first-page":"3492","DOI":"10.1093\/bioinformatics\/btw397","volume":"32","author":"Mohamadi H","year":"2016","journal-title":"Bioinform"},{"key":"S0219720019400110BIB026","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-14-187"},{"issue":"2","key":"S0219720019400110BIB027","first-page":"183","volume":"18","author":"Ochoa I","year":"2017","journal-title":"Brief Bioinform"},{"key":"S0219720019400110BIB028","series-title":"Leibniz International Proceedings in Informatics (LIPIcs)","first-page":"3:1","volume-title":"18th Int Workshop Algorithms in Bioinformatics (WABI 2018)","volume":"113","author":"Prezza N","year":"2018"},{"key":"S0219720019400110BIB030","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-94806-5_12"},{"key":"S0219720019400110BIB031","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btt020"},{"issue":"1","key":"S0219720019400110BIB032","first-page":"41","volume":"9","author":"Schimd M","year":"2016","journal-title":"BMC Med Genom"},{"key":"S0219720019400110BIB033","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw460"},{"key":"S0219720019400110BIB034","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/29.1.308"},{"key":"S0219720019400110BIB035","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-019-2883-5"},{"key":"S0219720019400110BIB036","first-page":"21","volume-title":"Proc 12th Int Joint Conf Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Vol 3: BIOINFORMATICS","author":"Shibuya Y"},{"key":"S0219720019400110BIB037","doi-asserted-by":"publisher","DOI":"10.1038\/nbt.3170"},{"key":"S0219720019400110BIB038","doi-asserted-by":"publisher","DOI":"10.1038\/nbt.2835"}],"container-title":["Journal of Bioinformatics and Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219720019400110","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,21]],"date-time":"2023-09-21T16:06:04Z","timestamp":1695312364000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219720019400110"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10]]},"references-count":34,"journal-issue":{"issue":"05","published-print":{"date-parts":[[2019,10]]}},"alternative-id":["10.1142\/S0219720019400110"],"URL":"https:\/\/doi.org\/10.1142\/s0219720019400110","relation":{},"ISSN":["0219-7200","1757-6334"],"issn-type":[{"value":"0219-7200","type":"print"},{"value":"1757-6334","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10]]}}}