{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T11:15:42Z","timestamp":1763810142671,"version":"build-2065373602"},"reference-count":98,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2019,11,2]],"date-time":"2019-11-02T00:00:00Z","timestamp":1572652800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["UID\/CEC\/00127\/2014, PTCD\/EEI-SII\/6608\/2014, UID\/CEC\/00127\/2019"],"award-info":[{"award-number":["UID\/CEC\/00127\/2014, PTCD\/EEI-SII\/6608\/2014, UID\/CEC\/00127\/2019"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method uses a competitive prediction model to estimate, for each symbol, the best class of models to be used before applying arithmetic encoding. There are two classes of models: weighted context models (including substitutional tolerant context models) and weighted stochastic repeat models. Both classes of models use specific sub-programs to handle inverted repeats efficiently. The results show that the proposed method attains a higher compression ratio than state-of-the-art approaches, on a balanced and diverse benchmark, using a competitive level of computational resources. An efficient implementation of the method is publicly available, under the GPLv3 license.<\/jats:p>","DOI":"10.3390\/e21111074","type":"journal-article","created":{"date-parts":[[2019,11,4]],"date-time":"2019-11-04T04:13:08Z","timestamp":1572840788000},"page":"1074","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1176-552X","authenticated-orcid":false,"given":"Diogo","family":"Pratas","sequence":"first","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Department of Electronics, Telecomunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Department of Virology, University of Helsinki, 00100 Helsinki, Finland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8962-8985","authenticated-orcid":false,"given":"Morteza","family":"Hosseini","sequence":"additional","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6331-6091","authenticated-orcid":false,"given":"Jorge M.","family":"Silva","sequence":"additional","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9164-0016","authenticated-orcid":false,"given":"Armando J.","family":"Pinho","sequence":"additional","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Department of Electronics, Telecomunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2019,11,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1109\/MSPEC.2013.6545119","article-title":"The DNA data deluge","volume":"50","author":"Schatz","year":"2013","journal-title":"IEEE Spectrum"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1038\/nprot.2016.182","article-title":"DNA sequencing technologies: 2006\u20132016","volume":"12","author":"Mardis","year":"2017","journal-title":"Nat. Protocols"},{"key":"ref_3","unstructured":"Marco, D. (2010). Metagenomics: Theory, Methods and Applications, Horizon Scientific Press."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3407","DOI":"10.1016\/j.cub.2016.10.061","article-title":"17th century variola virus reveals the recent history of smallpox","volume":"26","author":"Duggan","year":"2016","journal-title":"Curr. Biol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1038\/nrg3094","article-title":"Emerging biomedical applications of synthetic biology","volume":"13","author":"Weber","year":"2012","journal-title":"Nat. Rev. Genet."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1038\/nrg.2017.65","article-title":"Harnessing ancient genomes to study the history of human adaptation","volume":"18","author":"Marciniak","year":"2017","journal-title":"Nat. Rev. Genet."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., and Robinson, G.E. (2015). Big data: Astronomical or genomical?. PLoS Biol., 13.","DOI":"10.1371\/journal.pbio.1002195"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1093\/bioinformatics\/btr014","article-title":"Compression of DNA sequence reads in FASTQ format","volume":"27","author":"Deorowicz","year":"2011","journal-title":"Bioinformatics"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1109\/TIT.2009.2037052","article-title":"Compression of whole genome alignments","volume":"56","author":"Hanus","year":"2010","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"3189","DOI":"10.1109\/TIT.2012.2236605","article-title":"A compression model for DNA multiple sequence alignment blocks","volume":"59","author":"Matos","year":"2013","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2527","DOI":"10.1093\/bioinformatics\/bts467","article-title":"DELIMINATE\u2014A fast and efficient method for loss-less compression of genomic sequences","volume":"28","author":"Mohammed","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1093\/bioinformatics\/btt594","article-title":"MFCompress: A compression tool for fasta and multi-fasta data","volume":"30","author":"Pinho","year":"2013","journal-title":"Bioinformatics"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1389","DOI":"10.1093\/bioinformatics\/btu844","article-title":"Disk-based compression of data from genome sequencing","volume":"31","author":"Grabowski","year":"2015","journal-title":"Bioinformatics"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1093\/bioinformatics\/bts593","article-title":"SCALCE: Boosting sequence compression algorithms using locally consistent encoding","volume":"28","author":"Hach","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1038\/nmeth.3654","article-title":"Efficient genotype compression and analysis of large genetic-variation data sets","volume":"13","author":"Layer","year":"2016","journal-title":"Nat. Methods"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Bonfield, J.K., and Mahoney, M.V. (2013). Compression of FASTQ and SAM format sequencing data. PLoS ONE, 8.","DOI":"10.1371\/journal.pone.0059190"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, R., Bai, Y., Chu, Y.S., Wang, Z., Wang, Y., Sun, M., Li, J., Zang, T., and Wang, Y. (2018, January 3\u20136). DeepDNA: A hybrid convolutional and recurrent neural network for compressing human mitochondrial genomes. Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain.","DOI":"10.1109\/BIBM.2018.8621140"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Benoit, G., Lemaitre, C., Lavenier, D., Drezen, E., Dayris, T., Uricaru, R., and Rizk, G. (2015). Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinform., 16.","DOI":"10.1186\/s12859-015-0709-7"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ochoa, I., Li, H., Baumgarte, F., Hergenrother, C., Voges, J., and Hernaez, M. (2019, January 26\u201329). AliCo: A new efficient representation for SAM files. Proceedings of the 2019 Data Compression Conference (DCC), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2019.00017"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhang, C., and Ochoa, I. (2019). VEF: A Variant Filtering tool based on Ensemble methods. bioRxiv, 540286.","DOI":"10.1101\/540286"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"2674","DOI":"10.1093\/bioinformatics\/bty1015","article-title":"SPRING: A next-generation compressor for FASTQ data","volume":"35","author":"Chandak","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"825","DOI":"10.1089\/cmb.2018.0068","article-title":"Dynamic alignment-free and reference-free read compression","volume":"25","author":"Holley","year":"2018","journal-title":"J. Comput. Biol."},{"key":"ref_23","first-page":"4","article-title":"WBMFC: Efficient and Secure Storage of Genomic Data","volume":"26","author":"Kumar","year":"2018","journal-title":"Pertanika J. Sci. Technol."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Dougherty, E.R., Shmulevich, I., Chen, J., and Wang, Z.J. (2005). Genomic Signal Processing and Statistics, Hindawi Publishing Corporation.","DOI":"10.1155\/9789775945075"},{"key":"ref_25","unstructured":"Grumbach, S., and Tahi, F. (April, January 30). Compression of DNA sequences. Proceedings of the Data Compression Conference (DCC 1993), Snowbird, UT, USA."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1146\/annurev-biodatasci-072018-021229","article-title":"Genomic Data Compression","volume":"2","author":"Hernaez","year":"2019","journal-title":"Annu. Rev. Biomed. Data Sci."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1016\/S0169-5347(01)02187-5","article-title":"Chromosomal rearrangements and speciation","volume":"16","author":"Rieseberg","year":"2001","journal-title":"Trends Ecol. Evol."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/0092-8674(80)90131-2","article-title":"DNA rearrangements associated with a transposable element in yeast","volume":"21","author":"Roeder","year":"1980","journal-title":"Cell"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"3439","DOI":"10.1073\/pnas.1418652112","article-title":"Evidence for recent, population-specific evolution of the human mutation rate","volume":"112","author":"Harris","year":"2015","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.gde.2014.06.011","article-title":"Adaptations to local environments in modern human populations","volume":"29","author":"Jeong","year":"2014","journal-title":"Curr. Opin. Genet. Dev."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"e00403-16","DOI":"10.1128\/mBio.00403-16","article-title":"Transcriptome remodeling contributes to epidemic disease caused by the human pathogen Streptococcus pyogenes","volume":"7","author":"Beres","year":"2016","journal-title":"mBio"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/j.coi.2014.05.001","article-title":"Human genome variability, natural selection and infectious diseases","volume":"30","author":"Fumagalli","year":"2014","journal-title":"Curr. Opin. Immunol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1038\/s41559-017-0425-y","article-title":"Evolutionary determinants of genome-wide nucleotide composition","volume":"2","author":"Long","year":"2018","journal-title":"Nat. Ecol. Evol."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Golan, A. (2017). Foundations of Info-Metrics: Modeling and Inference with Imperfect Information, Oxford University Press.","DOI":"10.1093\/oso\/9780199349524.001.0001"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Hosseini, M., Pratas, D., and Pinho, A.J. (2016). A survey on data compression methods for biological sequences. Information, 7.","DOI":"10.3390\/info7040056"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"e45","DOI":"10.1093\/nar\/gkr009","article-title":"A novel compression tool for efficient storage of genome resequencing data","volume":"39","author":"Wang","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"ref_37","unstructured":"Kuruppu, S., Puglisi, S.J., and Zobel, J. (2011, January 17\u201320). Optimized relative Lempel\u2013Ziv compression of genomes. Proceedings of the 34th Australian Computer Science Conference (ACSC-2011), Perth, Australia."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2192","DOI":"10.1093\/bioinformatics\/btq346","article-title":"G-SQZ: Compact encoding of genomic sequence and quality data","volume":"26","author":"Tembe","year":"2010","journal-title":"Bioinformatics"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1101\/gr.114819.110","article-title":"Efficient storage of high throughput DNA sequencing data using reference-based compression","volume":"21","author":"Fritz","year":"2011","journal-title":"Genome Res."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1089\/cmb.2010.0253","article-title":"Compressing genomic sequence fragments using SlimGene","volume":"18","author":"Kozanitis","year":"2011","journal-title":"J. Comput. Biol."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"e27","DOI":"10.1093\/nar\/gkr1124","article-title":"GReEn: A tool for efficient compression of genome resequencing data","volume":"40","author":"Pinho","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1109\/TCBB.2013.122","article-title":"FRESCO: Referential compression of highly similar sequences","volume":"10","author":"Wandelt","year":"2013","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/srep11565","article-title":"GDC 2: Compression of large collections of genomes","volume":"5","author":"Deorowicz","year":"2015","journal-title":"Sci. Rep."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1093\/bioinformatics\/btu698","article-title":"iDoComp: A compression scheme for assembled genomes","volume":"31","author":"Ochoa","year":"2014","journal-title":"Bioinformatics"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"3364","DOI":"10.1093\/bioinformatics\/btx412","article-title":"High-speed and high-ratio referential genome compression","volume":"33","author":"Liu","year":"2017","journal-title":"Bioinformatics"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2058","DOI":"10.1093\/bioinformatics\/bty934","article-title":"High efficiency referential genome compression algorithm","volume":"35","author":"Shi","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"3405","DOI":"10.1093\/bioinformatics\/btw505","article-title":"NRGC: A novel referential genome compression algorithm","volume":"32","author":"Saha","year":"2016","journal-title":"Bioinformatics"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Tang, Y., Li, M., Sun, J., Zhang, T., Zhang, J., and Zheng, P. (2018). TRCMGene: A two-step referential compression method for the efficient storage of genetic data. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0206521"},{"key":"ref_49","first-page":"1","article-title":"Three approaches to the quantitative definition of information","volume":"1","author":"Kolmogorov","year":"1965","journal-title":"Probl. Inf. Transm."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Pratas, D., and Pinho, A.J. (2017). On the Approximation of the Kolmogorov Complexity for DNA Sequences. Iberian Conference on Pattern Recognition and Image Analysis, Springer.","DOI":"10.1007\/978-3-319-58838-4_29"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Goyal, M., Tatwawadi, K., Chandak, S., and Ochoa, I. (2018). DeepZip: Lossless Data Compression using Recurrent Neural Networks. arXiv.","DOI":"10.1109\/DCC.2019.00087"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1109\/TIT.1977.1055714","article-title":"A universal algorithm for sequential data compression","volume":"23","author":"Ziv","year":"1977","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1016\/0306-4573(94)90014-0","article-title":"A new challenge for compression algorithms: Genetic sequences","volume":"30","author":"Grumbach","year":"1994","journal-title":"Inf. Process. Manag."},{"key":"ref_54","unstructured":"Rivals, E., Delahaye, J.P., Dauchet, M., and Delgrange, O. (April, January 31). A guaranteed compression scheme for repetitive DNA sequences. Proceedings of the Data Compression Conference (DCC \u201996), Snowbird, UT, USA."},{"key":"ref_55","unstructured":"Loewenstern, D., and Yianilos, P.N. (1997, January 25\u201327). Significantly lower entropy estimates for natural DNA sequences. Proceedings of the Data Compression Conference (DCC \u201997), Snowbird, UT, USA."},{"key":"ref_56","unstructured":"Allison, L., Edgoose, T., and Dix, T.I. (July, January 28). Compression of strings with approximate repeats. Proceedings of the Intelligent Systems in Molecular Biology (ISMB \u201998), Montr\u00e9al, QC, Canada."},{"key":"ref_57","unstructured":"Apostolico, A., and Lonardi, S. (2000, January 28\u201330). Compression of biological sequences by greedy offline textual substitution. Proceedings of the Data Compression Conference (DCC 2000), Snowbird, UT, USA."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Puri, A., and Chen, T. (2000). 263 (including H.263+) and other ITU-T video coding standards. Multimedia Systems, Standards, and Networks, Marcel Dekker.","DOI":"10.1201\/9780203908440"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"1696","DOI":"10.1093\/bioinformatics\/18.12.1696","article-title":"DNACompress: Fast and effective DNA sequence compression","volume":"18","author":"Chen","year":"2002","journal-title":"Bioinformatics"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1093\/bioinformatics\/18.3.440","article-title":"PatternHunter: Faster and more sensitive homology search","volume":"18","author":"Ma","year":"2002","journal-title":"Bioinformatics"},{"key":"ref_61","first-page":"43","article-title":"Biological sequence compression algorithms","volume":"11","author":"Matsumoto","year":"2000","journal-title":"Genome Inform."},{"key":"ref_62","unstructured":"Tabus, I., Korodi, G., and Rissanen, J. (2003, January 25\u201327). DNA sequence compression using the normalized maximum likelihood model for discrete regression. Proceedings of the Data Compression Conference (DCC 2003), Snowbird, UT, USA."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/1055709.1055711","article-title":"An efficient normalized maximum likelihood algorithm for DNA sequence compression","volume":"23","author":"Korodi","year":"2005","journal-title":"ACM Trans. Inf. Sys."},{"key":"ref_64","unstructured":"Cherniavsky, N., and Ladner, R. (2004). Grammar-Based Compression of DNA Sequences, University of Washington. Technical Report."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1002\/spe.619","article-title":"A simple and fast DNA compressor","volume":"34","author":"Manzini","year":"2004","journal-title":"Softw. Pract. Exp."},{"key":"ref_66","unstructured":"Lee, A.J.T., and Chen, C. (2004). DNAC: An Efficient Compression Algorithm for DNA Sequences, National Taiwan University."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1007\/11496656_17","article-title":"DNA compression challenge revisited","volume":"Volume 3537","author":"Behzadi","year":"2005","journal-title":"Combinatorial Pattern Matching: Proceedings of CPM-2005"},{"key":"ref_68","unstructured":"Cao, M.D., Dix, T.I., Allison, L., and Mears, C. (2007, January 27\u201329). A simple statistical algorithm for biological sequence compression. Proceedings of the 2007 Data Compression Conference (DCC \u201907), Snowbird, UT, USA."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Vey, G. (2009). Differential direct coding: A compression algorithm for nucleotide sequence data. Database, 2009.","DOI":"10.1093\/database\/bap013"},{"key":"ref_70","first-page":"39","article-title":"An efficient horizontal and vertical method for online dna sequence compression","volume":"3","author":"Mishra","year":"2010","journal-title":"Int. J. Comput. Appl."},{"key":"ref_71","first-page":"25","article-title":"GENBIT Compress-Algorithm for repetitive and non repetitive DNA sequences","volume":"2","author":"Rajeswari","year":"2010","journal-title":"Int. J. Comput. Sci. Inf. Technol."},{"key":"ref_72","first-page":"245","article-title":"A novel approach for compressing DNA sequences using semi-statistical compressor","volume":"33","author":"Gupta","year":"2011","journal-title":"Int. J. Comput. Appl."},{"key":"ref_73","first-page":"99","article-title":"A scheme that facilitates searching and partial decompression of textual documents","volume":"1","author":"Gupta","year":"2008","journal-title":"Int. J. Adv. Comput. Eng."},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1109\/TEVC.2011.2160399","article-title":"DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm","volume":"15","author":"Zhu","year":"2011","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Pinho, A.J., Pratas, D., and Ferreira, P.J.S.G. (2011, January 28\u201330). Bacteria DNA sequence compression using a mixture of finite-context models. Proceedings of the 2011 IEEE Statistical Signal Processing Workshop (SSP), Nice, France.","DOI":"10.1109\/SSP.2011.5967637"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Pinho, A.J., Ferreira, P.J.S.G., Neves, A.J.R., and Bastos, C.A.C. (2011). On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS ONE, 6.","DOI":"10.1371\/journal.pone.0021588"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Roy, S., Khatua, S., Roy, S., and Bandyopadhyay, S.K. (2012). An efficient biological sequence compression technique using lut and repeat in the sequence. arXiv.","DOI":"10.9790\/0661-0614250"},{"key":"ref_78","unstructured":"Satyanvesh, D., Balleda, K., Padyana, A., and Baruah, P. (2012, January 18\u201322). GenCodex\u2014A Novel Algorithm for Compressing DNA sequences on Multi-cores and GPUs. Proceedings of the IEEE 19th International Conference on High Performance Computing (HiPC), Pune, India."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1007\/s12038-012-9230-6","article-title":"BIND\u2014An algorithm for loss-less compression of nucleotide sequence data","volume":"37","author":"Bose","year":"2012","journal-title":"J. Biosci."},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Li, P., Wang, S., Kim, J., Xiong, H., Ohno-Machado, L., and Jiang, X. (2013). DNA-COMPACT: DNA Compression Based on a Pattern-Aware Contextual Modeling Technique. PLoS ONE, 8.","DOI":"10.1371\/journal.pone.0080377"},{"key":"ref_81","unstructured":"Pratas, D., and Pinho, A.J. (2014, January 1\u20135). Exploring deep Markov models in genomic data compression using sequence pre-analysis. Proceedings of the 22th European Signal Processing Conference (EUSIPCO 2014), Lisbon, Portugal."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.ygeno.2014.08.007","article-title":"SeqCompress: An algorithm for biological sequence compression","volume":"104","author":"Sardaraz","year":"2014","journal-title":"Genomics"},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Guo, H., Chen, M., Liu, X., and Xie, M. (2015, January 29\u201331). Genome compression based on Hilbert space filling curve. Proceedings of the 3rd International Conference on Management, Education, Information and Control (MEICI 2015),  Shenyang, China.","DOI":"10.2991\/meici-15.2015.294"},{"key":"ref_84","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1109\/TCBB.2015.2430331","article-title":"CoGI: Towards compressing genomes as an image","volume":"12","author":"Xie","year":"2015","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1109\/26.387415","article-title":"Binary image compression using efficient partitioning into rectanglar regions","volume":"43","author":"Mohamed","year":"1995","journal-title":"IEEE Trans. Commun."},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Pratas, D., Pinho, A.J., and Ferreira, P.J.S.G. (April, January 30). Efficient compression of genomic sequences. Proceedings of the 2016 Data Compression Conference (DCC 2016), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2016.60"},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Pratas, D., Hosseini, M., and Pinho, A.J. (2017). Substitutional Tolerant Markov Models for Relative Compression of DNA Sequences. 11th International Conference on Practical Applications of Computational Biology & Bioinformatics, Springer.","DOI":"10.1007\/978-3-319-60816-7_32"},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Fdez-Riverola, F., Rocha, M., Mohamad, M.S., Zaki, N., and Castellanos-Garz\u00f3n, J.A. (2019). GeCo2: An optimized tool for lossless compression and analysis of DNA sequences. 13th International Conference on Practical Applications of Computational Biology and Bioinformatics, Springer International Publishing.","DOI":"10.1007\/978-3-030-23873-5"},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Chen, M., Shao, J., and Jia, X. (2017). Genome sequence compression based on optimized context weighting. Genet. Mol. Res. GMR, 16.","DOI":"10.4238\/gmr16026784"},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Mansouri, D., and Yuan, X. (2018). One-Bit DNA Compression Algorithm. International Conference on Neural Information Processing, Springer.","DOI":"10.1007\/978-3-030-04239-4_34"},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Pratas, D., and Pinho, A.J. (2018). A DNA Sequence Corpus for Compression Benchmark. International Conference on Practical Applications of Computational Biology & Bioinformatics, Springer.","DOI":"10.1007\/978-3-319-98702-6_25"},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Sayood, K. (2017). Introduction to Data Compression, Morgan Kaufmann.","DOI":"10.1016\/B978-0-12-809474-7.00019-7"},{"key":"ref_93","unstructured":"Bell, T.C., Cleary, J.G., and Witten, I.H. (1990). Text Compression, Prentice Hall."},{"key":"ref_94","doi-asserted-by":"crossref","first-page":"2148","DOI":"10.1109\/TBME.2006.879477","article-title":"A three-state model for DNA protein-coding regions","volume":"53","author":"Pinho","year":"2006","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Hosseini, M., Pratas, D., and Pinho, A.J. (2017). On the role of inverted repeats in DNA sequence similarity. International Conference on Practical Applications of Computational Biology & Bioinformatics, Springer.","DOI":"10.1007\/978-3-319-60816-7_28"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Miron, S. (2010). Finite-context models for DNA coding. Signal Processing, INTECH.","DOI":"10.5772\/3472"},{"key":"ref_97","doi-asserted-by":"crossref","unstructured":"Ferreira, P.J.S.G., and Pinho, A.J. (2014, January 4\u20139). Compression-based normal similarity measures for DNA sequences. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853630"},{"key":"ref_98","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1145\/290159.290162","article-title":"Arithmetic coding revisited","volume":"16","author":"Moffat","year":"1998","journal-title":"ACM Trans. Inf. Syst."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/11\/1074\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:31:23Z","timestamp":1760189483000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/11\/1074"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,2]]},"references-count":98,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,11]]}},"alternative-id":["e21111074"],"URL":"https:\/\/doi.org\/10.3390\/e21111074","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2019,11,2]]}}}