{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,4]],"date-time":"2024-08-04T13:20:11Z","timestamp":1722777611552},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"18","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: BigWig, a format to represent read density data, is one of the most popular data types. They can represent the peak intensity in ChIP-seq, the transcript expression in RNA-seq, the copy number variation in whole genome sequencing, etc. UCSC Encode project uses the bigWig format heavily for storage and visualization. Of 5.2 TB Encode hg19 database, 1.6 TB (31% of the total space) is used to store bigWig files. BigWig format not only saves a lot of space but also supports fast queries that are crucial for interactive analysis and browsing. In our benchmark, bigWig often has similar size to the gzipped raw data, while is still able to support \u223c5000 random queries per second.<\/jats:p>\n               <jats:p>Results: Although bigWig is good enough at the moment, both storage space and query time are expected to become limited when sequencing gets cheaper. This article describes a new method to store density data named CWig. The format uses on average one-third of the size of existing bigWig files and improves random query speed up to 100 times.<\/jats:p>\n               <jats:p>Availability and implementation: \u00a0http:\/\/genome.ddns.comp.nus.edu.sg\/\u223ccwig<\/jats:p>\n               <jats:p>Contact: \u00a0ksung@comp.nus.edu.sg<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu330","type":"journal-article","created":{"date-parts":[[2014,5,28]],"date-time":"2014-05-28T00:15:16Z","timestamp":1401236116000},"page":"2543-2550","source":"Crossref","is-referenced-by-count":4,"title":["CWig: compressed representation of Wiggle\/BedGraph format"],"prefix":"10.1093","volume":"30","author":[{"given":"Do","family":"Huy Hoang","sequence":"first","affiliation":[{"name":"1 Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672 and 2 Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417"}]},{"given":"Wing-Kin","family":"Sung","sequence":"additional","affiliation":[{"name":"1 Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672 and 2 Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417"},{"name":"1 Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672 and 2 Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417"}]}],"member":"286","published-online":{"date-parts":[[2014,5,27]]},"reference":[{"key":"2023012711554590600_btu330-B1","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1038\/nrg3273","article-title":"Analysing and interpreting DNA methylation data","volume":"13","author":"Bock","year":"2012","journal-title":"Nat. Rev. Genet."},{"key":"2023012711554590600_btu330-B2","doi-asserted-by":"crossref","first-page":"1767","DOI":"10.1093\/nar\/gkp1137","article-title":"The Sanger FASTQ file format for sequences with quality scores, and the Solexa\/Illumina FASTQ variants","volume":"38","author":"Cock","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012711554590600_btu330-B3","doi-asserted-by":"crossref","DOI":"10.1002\/0471200611","volume-title":"Elements of Information Theory","author":"Cover","year":"1991"},{"key":"2023012711554590600_btu330-B4","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCF tools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012711554590600_btu330-B5","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1109\/TIT.1975.1055349","article-title":"Universal codeword sets and representations of the integers","volume":"21","author":"Elias","year":"1975","journal-title":"Inf. Theory IEEE Trans."},{"key":"2023012711554590600_btu330-B6","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1101\/gr.114819.110","article-title":"Efficient storage of high throughput DNA sequencing data using reference-based compression","volume":"21","author":"Fritz","year":"2011","journal-title":"Genome Res."},{"key":"2023012711554590600_btu330-B7","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1186\/1471-2105-12-494","article-title":"Identifying elemental genomic track types and representing them uniformly","volume":"12","author":"Gundersen","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012711554590600_btu330-B8","doi-asserted-by":"crossref","DOI":"10.1145\/602259.602266","article-title":"R-trees: a dynamic index structure for spatial searching","volume-title":"Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data","author":"Guttman","year":"1984"},{"key":"2023012711554590600_btu330-B9","doi-asserted-by":"crossref","first-page":"1458","DOI":"10.1093\/bioinformatics\/btq164","article-title":"The genomedata format for storing large-scale functional genomics data","volume":"26","author":"Hoffman","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012711554590600_btu330-B10","doi-asserted-by":"crossref","first-page":"e39","DOI":"10.1093\/nar\/gks1026","article-title":"DiffSplice: the genome-wide detection of differential splicing events with RNA-seq","volume":"41","author":"Hu","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023012711554590600_btu330-B11","first-page":"1098","article-title":"A method for the construction of minimum-redundancy codes","volume-title":"Proceedings of the I.R.E","author":"Huffman","year":"1952"},{"key":"2023012711554590600_btu330-B12","doi-asserted-by":"crossref","first-page":"D764","DOI":"10.1093\/nar\/gkt1168","article-title":"The UCSC genome browser database: 2014 update","volume":"42","author":"Karolchik","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2023012711554590600_btu330-B13","doi-asserted-by":"crossref","first-page":"2204","DOI":"10.1093\/bioinformatics\/btq351","article-title":"BigWig and BigBed: enabling browsing of large distributed datasets","volume":"26","author":"Kent","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012711554590600_btu330-B14","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map (SAM) format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012711554590600_btu330-B15","doi-asserted-by":"crossref","first-page":"R83","DOI":"10.1186\/gb-2011-12-8-r83","article-title":"Cistrome: an integrative platform for transcriptional regulation studies","volume":"12","author":"Liu","year":"2011","journal-title":"Genome Biol."},{"key":"2023012711554590600_btu330-B16","doi-asserted-by":"crossref","DOI":"10.1137\/1.9781611972870.6","article-title":"Practical entropy-compressed rank\/select dictionary","volume-title":"Workshop on Algorithm Engineering and Experiments (ALENEX)","author":"Okanohara","year":"2007"},{"key":"2023012711554590600_btu330-B17","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1109\/FOCS.2008.83","article-title":"Succincter","volume-title":"Foundations of Computer Science, 2008. FOCS\u201908. IEEE 49th Annual IEEE Symposium on","author":"Patrascu","year":"2008"},{"key":"2023012711554590600_btu330-B18","article-title":"Succinct indexable dictionaries with applications to encoding k-Ary trees and multisets","volume-title":"Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms","author":"Raman","year":"2002"},{"key":"2023012711554590600_btu330-B19","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1038\/nbt.1754","article-title":"Integrative genomics viewer","volume":"29","author":"Robinson","year":"2011","journal-title":"Nat. Biotechnol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/18\/2543\/48929584\/bioinformatics_30_18_2543.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/18\/2543\/48929584\/bioinformatics_30_18_2543.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T12:40:52Z","timestamp":1674823252000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/18\/2543\/2475454"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,5,27]]},"references-count":19,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2014,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu330","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,9,15]]},"published":{"date-parts":[[2014,5,27]]}}}