{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T03:53:16Z","timestamp":1772077996335,"version":"3.50.1"},"reference-count":18,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Storing, transferring and maintaining genomic databases becomes a major challenge because of the rapid technology progress in DNA sequencing and correspondingly growing pace at which the sequencing data are being produced. Efficient compression, with support for extraction of arbitrary snippets of any sequence, is the key to maintaining those huge amounts of data.<\/jats:p>\n               <jats:p>Results: We present an LZ77-style compression scheme for relative compression of multiple genomes of the same species. While the solution bears similarity to known algorithms, it offers significantly higher compression ratios at compression speed over an order of magnitude greater. In particular, 69 differentially encoded human genomes are compressed over 400 times at fast compression, or even 1000 times at slower compression (the reference genome itself needs much more space). Adding fast random access to text snippets decreases the ratio to ~300.<\/jats:p>\n               <jats:p>Availability: GDC is available at http:\/\/sun.aei.polsl.pl\/gdc.<\/jats:p>\n               <jats:p>Contact: \u00a0sebastian.deorowicz@polsl.pl<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr505","type":"journal-article","created":{"date-parts":[[2011,9,7]],"date-time":"2011-09-07T00:24:53Z","timestamp":1315355093000},"page":"2979-2986","source":"Crossref","is-referenced-by-count":88,"title":["Robust relative compression of genomes with random access"],"prefix":"10.1093","volume":"27","author":[{"given":"Sebastian","family":"Deorowicz","sequence":"first","affiliation":[{"name":"1 Institute of Informatics, Silesian University of Technology, 44-100 Gliwice and 2Department of Computer Engineering, Technical University of \u0141\u00f3d\u017a, 90-924 \u0141\u00f3d\u017a, Poland"}]},{"given":"Szymon","family":"Grabowski","sequence":"additional","affiliation":[{"name":"1 Institute of Informatics, Silesian University of Technology, 44-100 Gliwice and 2Department of Computer Engineering, Technical University of \u0141\u00f3d\u017a, 90-924 \u0141\u00f3d\u017a, Poland"}]}],"member":"286","published-online":{"date-parts":[[2011,9,5]]},"reference":[{"key":"2023012511333142200_B1","doi-asserted-by":"crossref","first-page":"1731","DOI":"10.1093\/bioinformatics\/btp319","article-title":"Data structures and compression algorithms for genomic sequence data","volume":"25","author":"Brandon","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012511333142200_B2","first-page":"43","article-title":"A simple statistical algorithm for biological sequence compression","volume-title":"Proceedings of the DCC.","author":"Cao","year":"2007"},{"key":"2023012511333142200_B3","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1093\/bioinformatics\/btn582","article-title":"Human genomes as email attachments","volume":"25","author":"Christley","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012511333142200_B4","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1007\/978-3-540-89097-3_18","article-title":"Practical rank\/select queries over arbitrary sequences","volume":"5280","author":"Claude","year":"2008","journal-title":"Lect. Notes Comput. Sci."},{"key":"2023012511333142200_B5","first-page":"86","article-title":"Compressed q-gram indexing for highly repetitive biological sequences","volume-title":"Proceedings of the International Conference on Bioinformatics Bioengineering.","author":"Claude","year":"2010"},{"key":"2023012511333142200_B6","first-page":"768","article-title":"On the bit-complexity of Lempel\u2013Ziv compression","volume-title":"Proceedings of the SODA.","author":"Ferragina","year":"2009"},{"key":"2023012511333142200_B7","first-page":"1","article-title":"Engineering relative compression of genomes","author":"Grabowski","year":"2011","journal-title":"CoRR"},{"key":"2023012511333142200_B8","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511574931","volume-title":"Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology.","author":"Gusfield","year":"1997"},{"key":"2023012511333142200_B9","first-page":"239","article-title":"LZ77-like compression with fast random access","volume-title":"Proceedings of the DCC.","author":"Kreft","year":"2010"},{"key":"2023012511333142200_B10","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1007\/978-3-642-21458-5_6","article-title":"Self-Indexing based on LZ77","volume":"6661","author":"Kreft","year":"2011","journal-title":"Lect. Notes Comput. Sci."},{"key":"2023012511333142200_B11","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1007\/978-3-642-16321-0_20","article-title":"Relative Lempel\u2013Ziv compression of genomes for large-scale storage and retrieval","volume":"6393","author":"Kuruppu","year":"2010","journal-title":"Lect. Notes Comput. Sci."},{"key":"2023012511333142200_B12","article-title":"Iterative dictionary construction for compression of large DNA datasets","volume":"99","author":"Kuruppu","year":"2011","journal-title":"IEEE ACM Trans. Comput. Biol. Bioinformatics"},{"key":"2023012511333142200_B13","first-page":"91","article-title":"Optimized relative Lempel\u2013Ziv compression of genomes","volume-title":"Proceedings of the ACSC.","author":"Kuruppu","year":"2011"},{"key":"2023012511333142200_B14","article-title":"Reference sequence construction for relative compression of genomes","author":"Kuruppu","year":"2011","journal-title":"Proceedings of the SPIRE"},{"key":"2023012511333142200_B15","doi-asserted-by":"crossref","first-page":"1722","DOI":"10.1109\/5.892708","article-title":"Off-line dictionary-based compression","volume":"88","author":"Larsson","year":"2000","journal-title":"Proc. IEEE"},{"key":"2023012511333142200_B16","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1089\/cmb.2009.0169","article-title":"Storage and retrieval of highly repetitive sequence collections","volume":"17","author":"M\u00e4kinen","year":"2010","journal-title":"J. Comput. Biol."},{"key":"2023012511333142200_B17","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1002\/spe.619","article-title":"A simple and fast DNA compressor","volume":"34","author":"Manzini","year":"2004","journal-title":"Software Pract. Exper."},{"key":"2023012511333142200_B18","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1093\/nar\/gkr009","article-title":"A novel compression tool for efficient storage of genome resequencing data","volume":"39","author":"Wang","year":"2011","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/21\/2979\/48861795\/bioinformatics_27_21_2979.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/21\/2979\/48861795\/bioinformatics_27_21_2979.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T11:35:18Z","timestamp":1674646518000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/21\/2979\/217176"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,9,5]]},"references-count":18,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2011,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr505","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,11,1]]},"published":{"date-parts":[[2011,9,5]]}}}