{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:40:21Z","timestamp":1772908821012,"version":"3.50.1"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"19","funder":[{"name":"Stanford Graduate Fellowships Program in Science and Engineering"},{"DOI":"10.13039\/501100003086","name":"Basque Government","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003086","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["U01 CA198943"],"award-info":[{"award-number":["U01 CA198943"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Recent advancements in sequencing technology have led to a drastic reduction in the cost of sequencing a genome. This has generated an unprecedented amount of genomic data that must be stored, processed and transmitted. To facilitate this effort, we propose a new lossy compressor for the quality values presented in genomic data files (e.g. FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression allows for compression of data beyond its lossless limit.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The proposed algorithm QVZ exhibits better rate-distortion performance than the previously proposed algorithms, for several distortion metrics and for the lossless case. Moreover, it allows the user to define any quasi-convex distortion function to be minimized, a feature not supported by the previous algorithms. Finally, we show that QVZ-compressed data exhibit better performance in the genotyping than data compressed with previously proposed algorithms, in the sense that for a similar rate, a genotyping closer to that achieved with the original quality values is obtained.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>QVZ is written in C and can be downloaded from https:\/\/github.com\/mikelhernaez\/qvz.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Contact<\/jats:title>\n                  <jats:p>mhernaez@stanford.edu or gmalysa@stanford.edu or iochoa@stanford.edu<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btv330","type":"journal-article","created":{"date-parts":[[2015,5,30]],"date-time":"2015-05-30T00:26:28Z","timestamp":1432945588000},"page":"3122-3129","source":"Crossref","is-referenced-by-count":56,"title":["QVZ: lossy compression of quality values"],"prefix":"10.1093","volume":"31","author":[{"given":"Greg","family":"Malysa","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mikel","family":"Hernaez","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Idoia","family":"Ochoa","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Milind","family":"Rao","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karthik","family":"Ganesan","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tsachy","family":"Weissman","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,5,28]]},"reference":[{"key":"2023020202301517400_btv330-B1","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1097\/GIM.0b013e318220aaba","article-title":"Deploying whole genome sequencing in clinical practice and public health: meeting the challenge one bin at a time","volume":"13","author":"Berg","year":"2011","journal-title":"Genet. Med."},{"key":"2023020202301517400_btv330-B2","doi-asserted-by":"crossref","first-page":"e59190","DOI":"10.1371\/journal.pone.0059190","article-title":"Compression of FASTQ and SAM format sequencing data","volume":"8","author":"Bonfield","year":"2013","journal-title":"PloS One"},{"key":"2023020202301517400_btv330-B3","doi-asserted-by":"crossref","first-page":"2130","DOI":"10.1093\/bioinformatics\/btu183","article-title":"Lossy compression of quality scores in genomic data","volume":"30","author":"C\u00e1novas","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B4","doi-asserted-by":"crossref","first-page":"1677","DOI":"10.1093\/bioinformatics\/bts256","article-title":"Onlinecall: fast online parameter estimation and base calling for illumina\u2019s next-generation sequencing","volume":"28","author":"Das","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B5","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet."},{"key":"2023020202301517400_btv330-B6","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1101\/gr.114819.110","article-title":"Efficient storage of high throughput DNA sequencing data using reference-based compression","volume":"21","author":"Fritz","year":"2011","journal-title":"Genome Res."},{"key":"2023020202301517400_btv330-B7","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1093\/bioinformatics\/bts593","article-title":"Scalce: boosting sequence compression algorithms using locally consistent encoding","volume":"28","author":"Hach","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B8","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1038\/507294a","article-title":"Technology: the $1\u2009000 genome","volume":"507","author":"Hayden","year":"2014","journal-title":"Nature"},{"key":"2023020202301517400_btv330-B9","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1093\/bioinformatics\/btt257","article-title":"Adaptive reference-free compression of sequence quality scores","volume":"30","author":"Janin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B10","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1089\/cmb.2010.0253","article-title":"Compressing genomic sequence fragments using slimgene","volume":"18","author":"Kozanitis","year":"2011","journal-title":"J. Comput. Biol."},{"key":"2023020202301517400_btv330-B11","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol."},{"key":"2023020202301517400_btv330-B12","doi-asserted-by":"crossref","first-page":"2987","DOI":"10.1093\/bioinformatics\/btr509","article-title":"A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B13","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B14","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The Sequence Alignment\/Map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B15","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Trans. Inf. Theory"},{"key":"2023020202301517400_btv330-B16","first-page":"281","article-title":"Some methods for classification and analysis of multivariate observations","author":"MacQueen","year":"1967"},{"key":"2023020202301517400_btv330-B17","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies the next generation","volume":"11","author":"Metzker","year":"2010","journal-title":"Nat. Rev. Genet."},{"key":"2023020202301517400_btv330-B18","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1186\/1471-2105-14-187","article-title":"Qualcomp: a new lossy compressor for quality scores based on rate distortion theory","volume":"14","author":"Ochoa","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020202301517400_btv330-B19","doi-asserted-by":"crossref","first-page":"2213","DOI":"10.1093\/bioinformatics\/btu208","article-title":"DSRC 2\u2014industry-oriented compression of FASTQ files","volume":"30","author":"Roguski","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B20","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1109\/MSPEC.2013.6545119","article-title":"The DNA data deluge","volume":"50","author":"Schatz","year":"2013","journal-title":"IEEE Spectr."},{"key":"2023020202301517400_btv330-B21","doi-asserted-by":"crossref","first-page":"628","DOI":"10.1093\/bioinformatics\/btr689","article-title":"Transformations for the compression of FASTQ quality scores of next-generation sequencing data","volume":"28","author":"Wan","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020202301517400_btv330-B22","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-05269-4_31","article-title":"Traversing the k-mer landscape of NGS read datasets for quality score sparsification","volume-title":"Research in Computational Molecular Biology","author":"Yu","year":"2014"},{"key":"2023020202301517400_btv330-B23","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1534\/genetics.113.159715","article-title":"Sequencing and assembly of the 22-gb loblolly pine genome","volume":"196","author":"Zimin","year":"2014","journal-title":"Genetics"},{"key":"2023020202301517400_btv330-B24","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls","volume":"32","author":"Zook","year":"2014","journal-title":"Nat. Biotechnol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/19\/3122\/49035006\/bioinformatics_31_19_3122.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/19\/3122\/49035006\/bioinformatics_31_19_3122.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T03:49:12Z","timestamp":1675309752000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/19\/3122\/211178"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,5,28]]},"references-count":24,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2015,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv330","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2015,10]]},"published":{"date-parts":[[2015,5,28]]}}}