{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T05:40:18Z","timestamp":1775281218031,"version":"3.50.1"},"reference-count":89,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2009,12,29]],"date-time":"2009-12-29T00:00:00Z","timestamp":1262044800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.<\/jats:p>","DOI":"10.3390\/e12010034","type":"journal-article","created":{"date-parts":[[2009,12,29]],"date-time":"2009-12-29T11:20:23Z","timestamp":1262085623000},"page":"34-52","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["Data Compression Concepts and Algorithms and Their Applications to Bioinformatics"],"prefix":"10.3390","volume":"12","author":[{"given":"\u00d6zkan  U.","family":"Nalbantoglu","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, University of Nebraska-Lincoln, NE 68588-0511, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David  J.","family":"Russell","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, University of Nebraska-Lincoln, NE 68588-0511, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Khalid","family":"Sayood","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, University of Nebraska-Lincoln, NE 68588-0511, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2009,12,29]]},"reference":[{"key":"ref_1","unstructured":"Schrodinger, E. (1944). What is Life, Cambridge University Press."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/bioinformatics\/btp117","article-title":"Textual data compression in computational biology: A synopsis","volume":"25","author":"Giancarlo","year":"2009","journal-title":"Bioinformatics"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1016\/0022-5193(63)90083-3","article-title":"Triplet frequencies in DNA and the genetic program","volume":"5","author":"Gatlin","year":"1963","journal-title":"J. Theor. Biol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/0022-5193(66)90127-5","article-title":"The information content of DNA","volume":"10","author":"Gatlin","year":"1966","journal-title":"J. Theor. Biol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/0022-5193(68)90160-4","article-title":"The information content of DNA II","volume":"18","author":"Gatlin","year":"1968","journal-title":"J. Theor. Biol."},{"key":"ref_6","unstructured":"Gatlin, L. (1972). Information Theory and the Living System, Columbia University Press."},{"key":"ref_7","first-page":"379","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"AT&T Tech. J."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"65","DOI":"10.4161\/psb.2.2.4113","article-title":"Information and knowledge in biology: Time for reappraisal","volume":"2","author":"Kovac","year":"2007","journal-title":"Plant Signal. Behav."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1183","DOI":"10.1073\/pnas.86.4.1183","article-title":"Identifying protein-binding sites from unaligned DNA fragments","volume":"86","author":"Stormo","year":"1989","journal-title":"PNAS"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1016\/0022-2836(86)90165-8","article-title":"Information content of binding sites on nucleotide sequences","volume":"188","author":"Schneider","year":"1986","journal-title":"J. Mol. Biol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"6097","DOI":"10.1093\/nar\/18.20.6097","article-title":"Sequence logos: A new way to display consensus sequences","volume":"18","author":"Schneider","year":"1990","journal-title":"Nucleic Acids Res."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1016\/S0166-218X(96)00068-6","article-title":"Fast Multiple alignment of ungapped DNA sequences using information theory and a relaxation method","volume":"71","author":"Schneider","year":"1996","journal-title":"Discrete Appl. Math."},{"key":"ref_13","unstructured":"Bailey, T., and Elkan, C. (1994, January August). Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Stanford, CA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1093\/bioinformatics\/14.1.48","article-title":"Combining evidence using p-values: application to sequence homology searches","volume":"14","author":"Bailey","year":"1998","journal-title":"Bioinformatics"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"W253","DOI":"10.1093\/nar\/gkm272","article-title":"STAMP: a web tool for exploring DNA-binding motif similarities","volume":"35","author":"Mahony","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1093\/nar\/27.3.882","article-title":"Using sequence logos and information analysis of Lrp DNA binding sites to investigate discrepancies between natural selection and SELEX","volume":"27","author":"Shultzaberger","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"ref_17","first-page":"882","article-title":"Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation","volume":"27","author":"Schneider","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3828","DOI":"10.1093\/nar\/gkn189","article-title":"Discovery of novel tumor suppressor p53 response elements using information theory","volume":"36","author":"Lyakhov","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"7176","DOI":"10.1073\/pnas.90.15.7176","article-title":"Covariation of mutations in the V3 loop of human immunodeficiency virus Type I envelope protein: An information theoretic analysis","volume":"90","author":"Korber","year":"1993","journal-title":"PNAS"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Sayood, K., Hoffman, F., and Wood, C. (2009, January September). Use of Average Mutual Information for Studying Changes in HIV Populations. Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, MN, USA.","DOI":"10.1109\/IEMBS.2009.5332579"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1415","DOI":"10.1089\/088922202320935492","article-title":"Phylogenetic and phenotypic analysis of HIV Type 1 Env gp120 in cases of Subtype C mother-to-child transmission","volume":"18","author":"Zhang","year":"2002","journal-title":"AIDS Res. Hum. Retrov."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1097\/QAD.0b013e3282f486af","article-title":"Genetic variation in mother-child acute seroconverter pairs from Zambia","volume":"22","author":"Hoffman","year":"2008","journal-title":"AIDS"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"6312","DOI":"10.1103\/PhysRevE.58.6312","article-title":"Analysis of correlations between sites in models of protein sequences","volume":"58","author":"Giraud","year":"1998","journal-title":"Phys. Rev. E"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"800","DOI":"10.1103\/PhysRevE.55.800","article-title":"Correlations in DNA sequences: The role of protein coding segments","volume":"55","author":"Herzel","year":"1997","journal-title":"Phys. Rev. E"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1016\/0031-3203(95)00145-X","article-title":"Application of information theory to DNA sequence analysis: A review","volume":"29","author":"Oliver","year":"1996","journal-title":"Pattern Recognit."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"4116","DOI":"10.1093\/bioinformatics\/bti671","article-title":"Using information theory to search for co-evolving residues in proteins","volume":"21","author":"Martin","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.plrev.2004.01.002","article-title":"Information Theory in Molecular Biology","volume":"1","author":"Adami","year":"2004","journal-title":"Phys. Life Rev."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"5624","DOI":"10.1103\/PhysRevE.61.5624","article-title":"Species independence of mutual information in coding and noncoding regions","volume":"61","author":"Grosse","year":"2000","journal-title":"Phys. Rev. E"},{"key":"ref_29","unstructured":"Bauer, M. (2001). A Distance Measure for DNA Sequences. [PhD thesis, University of Nebraska-Lincoln]."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Bauer, M., Schuster, S., and Sayood, K. The average mutual information profile as a genomic signature. BMC Bioinf., http:\/\/www.biomedcentral.com\/1471-2105\/9\/48.","DOI":"10.1186\/1471-2105-9-48"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1142\/S0219477504001574","article-title":"Mutual information for examining correlations in DNA","volume":"4","author":"Berryman","year":"2004","journal-title":"Fluct. Noise Lett."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"061913:1","DOI":"10.1103\/PhysRevE.67.061913","article-title":"Repeats and correlations in human DNA sequences","volume":"67","author":"Holste","year":"2003","journal-title":"Phys. Rev. E"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1093\/bioinformatics\/19.1.22","article-title":"A divide and conquer approach to sequence assembly","volume":"19","author":"Otu","year":"2003","journal-title":"Bioinformatics"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1109\/TCOM.1980.1094577","article-title":"An algorithm for vector quantization design","volume":"COM-28","author":"Linde","year":"1980","journal-title":"IEEE Trans. Commun."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Butte, A., and Kohane, I. (2000, January January). Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements. Proceedings Pacific Symposium on Biocomputing 2000, Oahu, HI, USA.","DOI":"10.1142\/9789814447331_0040"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"S231","DOI":"10.1093\/bioinformatics\/18.suppl_2.S231","article-title":"The mututal information: Detecting and evaluating dependencies between variables","volume":"18","author":"Steur","year":"2002","journal-title":"Bioinformatics"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1038\/35076576","article-title":"Computational analysis of microarray data","volume":"2","author":"Quackenbush","year":"2001","journal-title":"Nat. Rev. Genet."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Chen, X., Kwong, S., and Li, M. (2000, January April). A compression algorithm for DNA sequences and its applications in Genome comparison. Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, Tokyo, Japan.","DOI":"10.1145\/332306.332352"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1109\/TIT.1977.1055714","article-title":"A Universal Algorithm for Data Compression","volume":"IT-23","author":"Ziv","year":"1977","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_40","unstructured":"Grumbach, A., and Tahi, F. (2,, January March). Compression of DNA Sequences. Proceedings of the IEEE Data Compression Conference, Snowbird, UT, USA."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1696","DOI":"10.1093\/bioinformatics\/18.12.1696","article-title":"DNA compress: fast and effective DNA sequence compression","volume":"18","author":"Chen","year":"2002","journal-title":"Bioinformatics"},{"key":"ref_42","first-page":"43","article-title":"Biological sequence compression algorithms","volume":"11","author":"Matsumoto","year":"2000","journal-title":"Genome Inform."},{"key":"ref_43","unstructured":"Behzadi, B., and Fessant, F.L. (2005). Lect. Notes Comput. SC, Springer."},{"key":"ref_44","unstructured":"Cao, M., Dix, T.I., Allison, L., and Mears, C. (2007, January March). A Simple Statistical Algorithm for Biological Sequence Compression. Proceedings of the IEEE Data Compression Conference, Snowbird, UT, USA."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1109\/TIT.1978.1055934","article-title":"Compression of individual sequences via variable-rate coding","volume":"IT-24","author":"Ziv","year":"1978","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"048702:1","DOI":"10.1103\/PhysRevLett.88.048702","article-title":"Language trees and zipping","volume":"88","author":"Benedetto","year":"2002","journal-title":"Phys. Rev. Lett."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1016\/S0167-2789(03)00047-2","article-title":"Data compression and learning in time sequence analysis","volume":"180","author":"Pugliosi","year":"2003","journal-title":"Physica D"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1017\/S0140525X00081061","article-title":"Natural-language and natural-selection","volume":"13","author":"Pinker","year":"1990","journal-title":"Behav. Brain Sci."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1111\/j.1749-6632.2009.04423.x","article-title":"The evolution of language","volume":"1156","author":"Corballis","year":"2009","journal-title":"Ann. N.Y. Acad. Sci."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"3250","DOI":"10.1109\/TIT.2004.838101","article-title":"The similarity metric","volume":"50","author":"Li","year":"2003","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Apostolico, A., Comin, M., and Parida, L. Mining, compressing and classifying with extensible motifs. Algorithm. Mol. Biol., http:\/\/www.almob.org\/content\/1\/1\/4.","DOI":"10.1186\/1748-7188-1-4"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1093\/bioinformatics\/btg295","article-title":"A new sequence distance measure for phylogenetic tree construction","volume":"19","author":"Otu","year":"2003","journal-title":"Bioinformatics"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1017\/S0953756203009079","article-title":"Utilization of the relative complexity measure to construct a phylogenetic tree for fungi","volume":"108","author":"Bastola","year":"2004","journal-title":"Mycol. Res."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1109\/TIT.1976.1055501","article-title":"On the complexity of finite sequences","volume":"IT-22","author":"Lempel","year":"1976","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Weeks, K., Chuzhanova, N., Donnison, I., and Scott, I. Evolutionary hierarchies of conserved blocks in 5\u2019-noncoding sequences of dicot rbcS genes. BMC Evol. Biol., http:\/\/www.biomedcentral.com\/1471-2148\/7\/51.","DOI":"10.1186\/1471-2148-7-51"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Russell, D., Otu, H., and Sayood, K. Grammar-based distance in progressive multiple sequence alignment. BMC Bioinf., http:\/\/www.biomedcentral.com\/1471-2105\/9\/306.","DOI":"10.1186\/1471-2105-9-306"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"1015","DOI":"10.1093\/bioinformatics\/bth031","article-title":"Measuring the similarity of protein structures by means of the universal similarity metric","volume":"20","author":"Krasnogor","year":"2004","journal-title":"Bioinformatics"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1093\/bioinformatics\/bti806","article-title":"Application of compression-based distance measures to protein sequence classification: a methodological study","volume":"22","author":"Kocsor","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_59","unstructured":"Pelta, D., Gonzales, J.R., and Krasnogor, N. (2005, January September). Protein Structure Comparison Through Fuzzy Contact Maps and the Universal Similarity Metric. Proceedings of the Joint 4th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT) and the 11th Rencontres Francophones sur la Logique Floue et ses Applications (LFA), Barcelona, Spain."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Ferragina, P., Giancarlo, R., Greco, V., Manzini, G., and Valiente, G. Compression-based classification of biological sequences and structures via the universal similarity metric: Experimental assessment. BMC Bioinf., http:\/\/www.biomedcentral.com\/1471-2105\/8\/252.","DOI":"10.1186\/1471-2105-8-252"},{"key":"ref_61","unstructured":"Loewenstern, D., Hirsh, H., Yianilos, P., and Noordewier, M. (1995). DNA Sequence Classification Using Compression-Based Induction, Rutgers University. DIMACS Technical Report 95-04."},{"key":"ref_62","unstructured":"Rocha, J., Rossello, F., and Segura, J. Compression ratios based on the universal similarity metric still yield protein distances far from CATH distances. http:\/\/arxiv.org\/abs\/q-bio\/0603007."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"2000","DOI":"10.1109\/18.841160","article-title":"Grammar based codes: A new class of universal lossless source codes","volume":"46","author":"Kieffer","year":"2000","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_64","unstructured":"Chomsky, N. (1955). Logical Structure of Linguistic Theory. [PhD thesis, University of Pennsylvania]."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1016\/S0019-9958(59)90362-6","article-title":"On certain formal properties of grammars","volume":"2","author":"Chomsky","year":"1959","journal-title":"Inform. Control"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"1077","DOI":"10.1089\/cmb.2006.13.1077","article-title":"Grammatical representations of macromolecular structure","volume":"13","author":"Chiang","year":"2006","journal-title":"J. Comput. Biol."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"5112","DOI":"10.1093\/nar\/22.23.5112","article-title":"Stochastic context-free grammars for tRNA modeling","volume":"22","author":"Sakakibara","year":"1994","journal-title":"Nucleic Acids Res."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1038\/nature01255","article-title":"The language of genes","volume":"420","author":"Searls","year":"2002","journal-title":"Nature"},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1002\/cfg.364","article-title":"A formal language-based approach in biology","volume":"5","author":"Gheorghe","year":"2004","journal-title":"Comp. Funct. Genom."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"2561","DOI":"10.1093\/nar\/12.5.2561","article-title":"Genome structure described by formal languages","volume":"12","author":"Brendel","year":"1984","journal-title":"Nucleic Acids Res."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1016\/S0092-8240(87)90018-8","article-title":"Formal language theory and DNA: An analysis of the generative capacity of specific recombinant behaviors","volume":"49","author":"Head","year":"1987","journal-title":"Bull. Math. Biol."},{"key":"ref_72","first-page":"579","article-title":"The linguistics of DNA","volume":"80","author":"Searls","year":"1992","journal-title":"Am. Sci."},{"key":"ref_73","unstructured":"Lusk, E., and Overbeek, R. Investigating the Linguistics of DNA with Definite Clause Grammars. Logic Programming: Proceedings North American Conference."},{"key":"ref_74","unstructured":"Searls, D.B. (1993). Artificial Intelligence and Molecular Biology, AAAI Press. Chapter 2."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1023\/A:1007477814995","article-title":"Predicting protein secondary structure using stochastic tree grammars","volume":"29","author":"Abe","year":"1997","journal-title":"Mach. Learn."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1110\/ps.24701","article-title":"Recursive domains in proteins","volume":"11","author":"Przytycka","year":"2002","journal-title":"Protein Sci."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-5193(89)80156-0","article-title":"A transformational-grammar approach to the study of the regulation of gene expression","volume":"136","year":"1989","journal-title":"J. Theor. Biol."},{"key":"ref_78","first-page":"415","article-title":"Syntactic recognition of regulatory regions in Escherichia coli","volume":"12","author":"Rosenblueth","year":"1996","journal-title":"Comput. Appl. Biosci."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1093\/bioinformatics\/17.3.226","article-title":"Basic gene grammars and DNA-chart parser for language processing of Escherichia coli promotor DNA sequences","volume":"17","author":"Leung","year":"2001","journal-title":"Bioinformatics"},{"key":"ref_80","unstructured":"Nevill-Manning, C.G. (1996). Inferring Sequential Structure. [PhD thesis, University of Waikato]."},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1613\/jair.374","article-title":"Identifying hierarchical structure in sequences: A linear-time algorithm","volume":"7","author":"Witten","year":"1997","journal-title":"J. Artif. Intell. Res."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"1372","DOI":"10.1016\/j.patcog.2004.03.021","article-title":"Learning context-free grammars using tabular representations","volume":"38","author":"Sakakibara","year":"2005","journal-title":"Pattern Recognit."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"1384","DOI":"10.1016\/j.patcog.2005.01.004","article-title":"Incremental learning of context free grammars based on bottom-up parsing and search","volume":"38","author":"Nakamura","year":"2005","journal-title":"Pattern Recognit."},{"key":"ref_84","unstructured":"Cherniavsky, N., and Ladner, R.E. (2004, January August). Grammar-based Compression of DNA Sequences. Presented at the DIMACS Working Group on the Burrows-Wheeler Transform, DIMACS Center, Rutgers University, Piscataway, NJ, USA. http:\/\/www.cs.washington.edu\/homes\/nchernia\/dnasequitur\/dnasequitur.pdf."},{"key":"ref_85","unstructured":"Nawrocki, E.P., and Eddy, S.R. Computational Identification of Functional RNA Homologs in Metagenomic Data. ftp:\/\/selab.janelia.org\/pub\/publications\/NawrockiEddy09\/NawrockiEddy09-preprint.pdf."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1101\/sqb.2006.71.003","article-title":"Computational analysis of RNAs","volume":"71","author":"Eddy","year":"2006","journal-title":"Cold Spring Harb. Sym."},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Eddy, S.R. A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinf., http:\/\/www.biomedcentral.com\/1471-2105\/3\/18\/.","DOI":"10.1186\/1471-2105-3-18"},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis, Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press.","DOI":"10.1017\/CBO9780511790492"},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"Sup329","DOI":"10.3395\/reciis.v1i2.Sup.104en","article-title":"Grammatical inference applied to linguistic modeling of biological regulation networks","volume":"1","author":"Bareinboim","year":"2007","journal-title":"RECIIS"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/12\/1\/34\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T22:12:09Z","timestamp":1760220729000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/12\/1\/34"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,12,29]]},"references-count":89,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2010,1]]}},"alternative-id":["e12010034"],"URL":"https:\/\/doi.org\/10.3390\/e12010034","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,12,29]]}}}