{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:44Z","timestamp":1772138084418,"version":"3.50.1"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2017,7,3]],"date-time":"2017-07-03T00:00:00Z","timestamp":1499040000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["BBSRC-NSF\/BIO-156491"],"award-info":[{"award-number":["BBSRC-NSF\/BIO-156491"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The past decade has seen an exponential increase in biological sequencing capacity, and there has been a simultaneous effort to help organize and archive some of the vast quantities of sequencing data that are being generated. Although these developments are tremendous from the perspective of maximizing the scientific utility of available data, they come with heavy costs. The storage and transmission of such vast amounts of sequencing data is expensive.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present Quark, a semi-reference-based compression tool designed for RNA-seq data. Quark makes use of a reference sequence when encoding reads, but produces a representation that can be decoded independently, without the need for a reference. This allows Quark to achieve markedly better compression rates than existing reference-free schemes, while still relieving the burden of assuming a specific, shared reference sequence between the encoder and decoder. We demonstrate that Quark achieves state-of-the-art compression rates, and that, typically, only a small fraction of the reference sequence must be encoded along with the reads to allow reference-free decompression.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Quark is implemented in C\u2009++11, and is available under a GPLv3 license at www.github.com\/COMBINE-lab\/quark.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx428","type":"journal-article","created":{"date-parts":[[2017,6,29]],"date-time":"2017-06-29T15:29:40Z","timestamp":1498750180000},"page":"3380-3386","source":"Crossref","is-referenced-by-count":5,"title":["Quark enables semi-reference-based compression of RNA-seq data"],"prefix":"10.1093","volume":"33","author":[{"given":"Hirak","family":"Sarkar","sequence":"first","affiliation":[{"name":"Department of Computer Science, Stony Brook University Stony Brook, NY, USA"}]},{"given":"Rob","family":"Patro","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University Stony Brook, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,7,3]]},"reference":[{"key":"2023051601055824700_btx428-B1","author":"Adjeroh","year":"2002"},{"key":"2023051601055824700_btx428-B2","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/s12859-015-0709-7","article-title":"Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph","volume":"16","author":"Benoit","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023051601055824700_btx428-B3","first-page":"2818","author":"Bonfield","year":"2014"},{"key":"2023051601055824700_btx428-B4","doi-asserted-by":"crossref","first-page":"e59190.","DOI":"10.1371\/journal.pone.0059190","article-title":"Compression of fastq and sam format sequencing data","volume":"8","author":"Bonfield","year":"2013","journal-title":"PloS One"},{"key":"2023051601055824700_btx428-B5","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023051601055824700_btx428-B6","doi-asserted-by":"crossref","first-page":"2130","DOI":"10.1093\/bioinformatics\/btu183","article-title":"Lossy compression of quality scores in genomic data","volume":"30","author":"C\u00e1novas","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051601055824700_btx428-B7","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051601055824700_btx428-B8","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1101\/gr.114819.110","article-title":"Efficient storage of high throughput DNA sequencing data using reference-based compression","volume":"21","author":"Fritz","year":"2011","journal-title":"Genome Res"},{"key":"2023051601055824700_btx428-B9","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1101\/gr.114819.110","article-title":"Efficient storage of high throughput DNA sequencing data using reference-based compression","volume":"21","author":"Fritz","year":"2011","journal-title":"Genome Res"},{"key":"2023051601055824700_btx428-B10","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1093\/bioinformatics\/bts593","article-title":"SCALCE: boosting sequence compression algorithms using locally consistent encoding","volume":"28","author":"Hach","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051601055824700_btx428-B11","first-page":"btt257","article-title":"Adaptive reference-free compression of sequence quality scores","author":"Janin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051601055824700_btx428-B12","doi-asserted-by":"crossref","first-page":"e171\u2013e171.","DOI":"10.1093\/nar\/gks754","article-title":"Compression of next-generation sequencing reads aided by highly efficient de novo assembly","volume":"40","author":"Jones","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051601055824700_btx428-B13","doi-asserted-by":"crossref","first-page":"1920","DOI":"10.1093\/bioinformatics\/btv071","article-title":"Reference-based compression of short-read sequences using path encoding","volume":"31","author":"Kingsford","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051601055824700_btx428-B14","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023051601055824700_btx428-B15","author":"Li","year":"2013"},{"key":"2023051601055824700_btx428-B16","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1136\/amiajnl-2013-002147","article-title":"Hugo: hierarchical multi-reference genome compression for aligned reads","volume":"21","author":"Li","year":"2014","journal-title":"J. Am. Med. Informatics Assoc"},{"key":"2023051601055824700_btx428-B17","first-page":"btv330","article-title":"Qvz: lossy compression of quality values","author":"Malysa","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051601055824700_btx428-B18","first-page":"btv248","article-title":"Data-dependent bucketing improves reference-free compression of sequencing reads","author":"Patro","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051601055824700_btx428-B19","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/nbt.2862","article-title":"Sailfish enables alignment-free isoform quantification from rna-seq reads using lightweight algorithms","volume":"32","author":"Patro","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023051601055824700_btx428-B20","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1038\/nmeth.4197","article-title":"Salmon provides fast and bias-aware quantification of transcript expression","volume":"14","author":"Patro","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051601055824700_btx428-B21","doi-asserted-by":"crossref","first-page":"e133","DOI":"10.1093\/nar\/gkw540","article-title":"Boiler: lossy compression of RNA-seq alignments using coverage vectors","volume":"44","author":"Pritt","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023051601055824700_btx428-B22","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1093\/bioinformatics\/btw277","article-title":"RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes","volume":"32","author":"Srivastava","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051601055824700_btx428-B23","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/gb-2011-12-2-r13","article-title":"Haplotype and isoform specific expression estimation using multi-mapping rna-seq reads","volume":"12","author":"Turro","year":"2011","journal-title":"Genome Biol"},{"key":"2023051601055824700_btx428-B24","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1038\/nbt.3511","article-title":"Compressive mapping for next-generation sequencing","volume":"34","author":"Yorukoglu","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023051601055824700_btx428-B25","doi-asserted-by":"crossref","first-page":"S10.","DOI":"10.1186\/1471-2105-15-S15-S10","article-title":"Compression of next-generation sequencing quality scores using memetic algorithm","volume":"15","author":"Zhou","year":"2014","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/21\/3380\/50315486\/bioinformatics_33_21_3380.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/21\/3380\/50315486\/bioinformatics_33_21_3380.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T21:06:52Z","timestamp":1684184812000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/21\/3380\/3920524"}},"subtitle":[],"editor":[{"given":"Ivo","family":"Hofacker","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,7,3]]},"references-count":25,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2017,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx428","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/085878","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,11,1]]},"published":{"date-parts":[[2017,7,3]]}}}