{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:32:20Z","timestamp":1760243540863,"version":"build-2065373602"},"reference-count":19,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2013,8,30]],"date-time":"2013-08-30T00:00:00Z","timestamp":1377820800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>It is widely accepted that the advances in DNA sequencing techniques have contributed to an unprecedented growth of genomic data. This fact has increased the interest in DNA compression, not only from the information theory and biology points of view, but also from a practical perspective, since such sequences require storage resources. Several compression methods exist, and particularly, those using finite-context models (FCMs) have received increasing attention, as they have been proven to effectively compress DNA sequences with low bits-per-base, as well as low encoding\/decoding time-per-base. However, the amount of run-time memory required to store high-order finite-context models may become impractical, since a context-order as low as 16 requires a maximum of 17.2 x 109 memory entries. This paper presents a method to reduce such a memory requirement by using a novel application of artificial neural networks (ANN) to build such probabilistic models in a compact way and shows how to use them to estimate the probabilities. Such a system was implemented, and its performance compared against state-of-the art compressors, such as XM-DNA (expert model) and FCM-Mx (mixture of finite-context models) , as well as with general-purpose compressors. Using a combination of order-10 FCM and ANN, similar encoding results to those of FCM, up to order-16, are obtained using only 17 megabytes of memory, whereas the latter, even employing hash-tables, uses several hundreds of megabytes.<\/jats:p>","DOI":"10.3390\/e15093435","type":"journal-article","created":{"date-parts":[[2013,8,30]],"date-time":"2013-08-30T13:20:35Z","timestamp":1377868835000},"page":"3435-3448","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Bacterial DNA Sequence Compression Models Using Artificial Neural Networks"],"prefix":"10.3390","volume":"15","author":[{"given":"Manuel","family":"Duarte","sequence":"first","affiliation":[{"name":"Instituto de Telecomunica\u04ab\u00f5es \/ Departamento de Electr\u00f3nica, Telecomunica\u04ab\u00f5es e Inform\u00e1tica, Campus Universit\u00e1rio de Santiago, Aveiro 3810-193, Portugal"}]},{"given":"Armando","family":"Pinho","sequence":"additional","affiliation":[{"name":"Instituto de Engenharia Electr\u00f3nica e Telem\u00e1tica de Aveiro \/ Departamento de Electr\u00f3nica, Telecomunica\u04ab\u00f5es e Inform\u00e1tica, Campus Universit\u00e1rio de Santiago, Aveiro 3810-193, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2013,8,30]]},"reference":[{"key":"ref_1","unstructured":"Grumbach, S., and Tahi, F. (April, January 30). Compression of DNA Sequences. Proceedings of the Data Compression Conference (DCC-93), Snowbird, Utah, USA."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1109\/TIT.1977.1055714","article-title":"A universal algorithm for sequential data compression","volume":"23","author":"Ziv","year":"1977","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1016\/0306-4573(94)90014-0","article-title":"A new challenge for compression algorithms: Genetic sequences","volume":"30","author":"Grumbach","year":"1994","journal-title":"Inf. Process. Manag."},{"key":"ref_4","unstructured":"Rivals, E., Delahaye, J.P., Dauchet, M., and Delgrange, O. (April, January 31). A Guaranteed Compression Scheme for Repetitive DNA Sequences. Proceedings of the Data Compression Conference (DCC-96), Snowbird, Utah, USA."},{"key":"ref_5","unstructured":"Loewenstern, D., and Yianilos, P.N. (1997, January 25\u201327). Significantly Lower Entropy Estimates for Natural DNA Sequences. Proceedings of the Data Compression Conference (DCC-97), Snowbird, Utah, USA."},{"key":"ref_6","unstructured":"Matsumoto, T., Sadakane, K., and Imai, H. (2000, January 18\u201319). Biological Sequence Compression Algorithms. Proceedings of the 11th Workshop Genome Informatics 2000, Tokyo, Japan."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/51.940049","article-title":"A compression algorithm for DNA sequences","volume":"20","author":"Chen","year":"2001","journal-title":"IEEE Eng. Med. Biol. Mag."},{"key":"ref_8","unstructured":"Tabus, I., Korodi, G., and Rissanen, J. (2003, January 25\u201327). DNA Sequence Compression Using the Normalized Maximum Likelihood Model for Discrete Regression. Proceedings of the Data Compression Conference (DCC-2003), Snowbird, Utah, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1002\/spe.619","article-title":"A simple and fast DNA compressor","volume":"34","author":"Manzini","year":"2004","journal-title":"Softw. Pract. Exp."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/1055709.1055711","article-title":"An efficient normalized maximum likelihood algorithm for DNA sequence compression","volume":"23","author":"Korodi","year":"2005","journal-title":"ACM Trans. Inf. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Korodi, G., and Tabus, I. (2007, January 27\u201329). Normalized Maximum Likelihood Model of order-1 for the Compression of DNA Sequences. Proceedings of the Data Compression Conference (DCC-2007), Snowbird, Utah, USA.","DOI":"10.1109\/DCC.2007.60"},{"key":"ref_12","unstructured":"Cao, M.D., Dix, T.I., Allison, L., and Mears, C. (2007, January 27\u201329). A Simple Statistical Algorithm for Biological Sequence Compression. Proceedings of the Data Compression Conference (DCC-2007), Snowbird, Utah, USA."},{"key":"ref_13","unstructured":"Pinho, A.J., Neves, A.J.R., and Ferreira, P.J.S.G. (2008, January 25\u201329). Inverted-Repeats-Aware Finite-Context Models for DNA Coding. Proceedings of 16th European Signal Processing Conference (EUSIPCO-2008), Lausanne, Switzerland."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Pinho, A.J., Neves, A.J.R., Bastos, C.A.C., and Ferreira, P.J.S.G. (2009, January 19\u201324). DNA Coding using Finite-Context Models and Arithmetic Coding. Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP-2009), Taipei, Taiwan.","DOI":"10.1109\/ICASSP.2009.4959928"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Pinho, A.J., Pratas, D., and Ferreira, P.J.S.G. (2011, January 28\u201330). Bacteria DNA Sequence Compression Using a Mixture of Finite-Context Models. Proceedings of the IEEE Statistical Signal Processing Workshop (SSP), Nice, France.","DOI":"10.1109\/SSP.2011.5967637"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Pinho, A.J., Ferreira, P.J.S.G., Neves, A.J.R., and Bastos, C.A.C. (2011). On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS One, 6.","DOI":"10.1371\/journal.pone.0021588"},{"key":"ref_17","unstructured":"Mitchell, T. (1997). Machine Learning, McGraw Hill."},{"key":"ref_18","unstructured":"Bishop, C.M. (2007). Pattern Recognition and Machine Learning, Springer. [1st ed.]."},{"key":"ref_19","unstructured":"National Center for Biotechnology Information, Available online: ftp:\/\/ftp.ncbi.nih.gov\/genomes\/Bacteria\/."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/15\/9\/3435\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:48:58Z","timestamp":1760219338000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/15\/9\/3435"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,30]]},"references-count":19,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2013,9]]}},"alternative-id":["e15093435"],"URL":"https:\/\/doi.org\/10.3390\/e15093435","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2013,8,30]]}}}