{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T12:00:01Z","timestamp":1763640001426,"version":"build-2065373602"},"reference-count":41,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2015,3,27]],"date-time":"2015-03-27T00:00:00Z","timestamp":1427414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>This study delves further into the analysis of genomic data by computing a variety of complexity measures. We analyze the effect of window size and evaluate the precision and recall of the prediction of gene zones, aided with a much larger dataset (full chromosomes). A technique based on the separation of two cases (gene-containing and non-gene-containing) has been developed as a basic gene predictor for automated DNA analysis. This predictor was tested on various sequences of human DNA obtained from public databases, in a set of three experiments. The first one covers window size and other parameters; the second one corresponds to an analysis of a full human chromosome (198 million nucleic acids); and the last one tests subject variability (with five different individual subjects). All three experiments have high-quality results, in terms of recall and precision, thus indicating the effectiveness of the predictor.<\/jats:p>","DOI":"10.3390\/e17041673","type":"journal-article","created":{"date-parts":[[2015,3,27]],"date-time":"2015-03-27T17:12:08Z","timestamp":1427476328000},"page":"1673-1689","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Analysis of Data Complexity in Human DNA for Gene-Containing Zone Prediction"],"prefix":"10.3390","volume":"17","author":[{"given":"Ricardo","family":"Monge","sequence":"first","affiliation":[{"name":"Escuela de Ciencias de la Computaci\u00f3n y de la Inform\u00e1tica, Universidad de Costa Rica, San Pedro de Montes de Oca, San Jos\u00e9, C\u00f3digo Postal 2060-San Jos\u00e9, Costa Rica"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Juan","family":"Crespo","sequence":"additional","affiliation":[{"name":"Escuela de Ingenier\u00eda El\u00e9ctrica, Universidad de Costa Rica, San Pedro de Montes de Oca, San Jos\u00e9, C\u00f3digo Postal 2060-San Jos\u00e9, Costa Rica"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2015,3,27]]},"reference":[{"unstructured":"Holzinger, A., H\u00f6rtenhuber, M., Mayer, C., Bachler, M., Wassertheurer, S., Pinho, A.J., and Koslicki, D. (2014). Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, Springer.","key":"ref_1"},{"doi-asserted-by":"crossref","unstructured":"Monge, R.E., and Crespo, J.L. (2014, January 16\u201318). Comparison of complexity measures for DNA sequence analysis. Liberia, Costa Rica.","key":"ref_2","DOI":"10.1109\/IWOBI.2014.6913941"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1101\/gr.200901","article-title":"The KA\/KS ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study","volume":"12","author":"Nekrutenko","year":"2002","journal-title":"Genome Res."},{"unstructured":"Johnson, N. (2009). Simply Complexity: A Clear Guide to Complexity Theory, Oneworld Publications.","key":"ref_4"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/0092-8674(74)90003-8","article-title":"A measurement of the sequence complexity of polysomal messenger RNA in sea urchin embryos","volume":"2","author":"Galau","year":"1974","journal-title":"Cell"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/0167-4781(82)90184-1","article-title":"Genome size and DNA complexity of Plasmodium falciparum","volume":"698","author":"Howard","year":"1982","journal-title":"Biochim. Biophys. Acta Gene Struct. Expr."},{"unstructured":"Farach, M., Noordewier, M., Savari, S., Shepp, L., Wyner, A., and Ziv, J. (1995, January 22\u201324). On the entropy of DNA: Algorithms and measurements based on memory and rapid convergence. San Francisco, CA, USA.","key":"ref_7"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"994","DOI":"10.1093\/bioinformatics\/15.12.994","article-title":"On the complexity measures of genetic sequences","volume":"15","author":"Gusev","year":"1999","journal-title":"Bioinformatics"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1002\/bies.20544","article-title":"The relationship between non-protein-coding DNA and eukaryotic complexity","volume":"29","author":"Taft","year":"2007","journal-title":"Bioessays"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1006\/jtbi.1997.0493","article-title":"Estimating the entropy of DNA sequences","volume":"188","author":"Schmitt","year":"1997","journal-title":"J. Theor. Biol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1016\/S0097-8485(99)00009-1","article-title":"Zones of low entropy in genomic sequences","volume":"23","author":"Crochemore","year":"1999","journal-title":"Comput. Chem."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1089\/cmb.1999.6.125","article-title":"Significantly lower entropy estimates for natural DNA sequences","volume":"6","author":"Loewenstern","year":"1999","journal-title":"J. Comput. Biol."},{"unstructured":"Lanctot, J.K., Li, M., and Yang, E.H. (2000, January 9\u201311). Estimating DNA sequence entropy. San Francisco, CA, USA.","key":"ref_13"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1093\/bioinformatics\/btr077","article-title":"Topological entropy of DNA sequences","volume":"27","author":"Koslicki","year":"2011","journal-title":"Bioinformatics"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1093\/bioinformatics\/18.5.679","article-title":"Sequence complexity profiles of prokaryotic genomic sequencs: A fast algorithm for calculating linguistic complexity","volume":"18","author":"Troyanskaya","year":"2002","journal-title":"Bioinformatics"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1007\/s00726-004-0148-7","article-title":"Using complexity measure factor to predict protein subcellular location","volume":"28","author":"Xiao","year":"2005","journal-title":"Amino Acids"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1109\/TBME.2006.873543","article-title":"Comparison of entropy and complexity measures for the assessment of depth of sedation","volume":"53","author":"Ferenets","year":"2006","journal-title":"IEEE Trans. Biomed."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2282","DOI":"10.1109\/TBME.2006.883696","article-title":"Interpretation of the Lempel-Ziv complexity measure in the context of biomedical signal analysis","volume":"53","author":"Aboy","year":"2006","journal-title":"IEEE Trans. Biomed."},{"doi-asserted-by":"crossref","unstructured":"Berthelsen, C.L., Glazier, J.A., and Skolnick, M.H. (1992). Global fractal dimension of human DNA sequences treated as pseudorandom walks. Phys. Rev. A, 45.","key":"ref_19","DOI":"10.1103\/PhysRevA.45.8902"},{"doi-asserted-by":"crossref","unstructured":"Ercolini, E., Valle, F., Adamcik, J., Witz, G., Metzler, R., De Los Rios, P., Roca, J., and Dietler, G. (2007). Fractal dimension and localization of DNA knots. Phys. Rev. Lett., 98.","key":"ref_20","DOI":"10.1103\/PhysRevLett.98.058102"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"43","DOI":"10.15517\/rmta.v16i1.1418","article-title":"Indices of Regularity and Indices of Randomness for m-ary Strings","volume":"16","author":"Skliar","year":"2009","journal-title":"Revista de Matem\u00e1tica Teor\u00eda y Aplicaciones"},{"unstructured":"L\u00e1scaris-Comneno, T., Skliar, O., and Medina, V. Determinaci\u00f3n de valores del \u00edndice de m\u00e1xima regularidad correspondientes a diversas secuencias de bases de ADN: un nuevo m\u00e9todo computacional en gen\u00e9tica. Concepci\u00f3n, Chile.","key":"ref_22"},{"key":"ref_23","first-page":"103","article-title":"An\u00e1lisis de regularidad de genomas para detecci\u00f3n de tel\u00f3meros y secuencias aut\u00f3nomamente replicativas","volume":"24","author":"Morales","year":"2009","journal-title":"UNICIENCIA"},{"unstructured":"L\u00e1scaris-Comneno, T., Ugalde, A., and Morales, Y. An\u00e1lisis de regularidad para el reconocimiento de tel\u00f3meros en Candida parapsilosis mitochondrion y el cromosoma XVI de Saccharomyces cerevisae. In Spanish.","key":"ref_24"},{"unstructured":"Ugalde, A., and L\u00e1scaris-Comneno, T. (2010, January 16\u201319). Mathematics Applied to the Detection of Genetic Regularities in the Yeast Yarrowia lipolytica. San Jose, Costa Rica.","key":"ref_25"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.plrev.2004.01.002","article-title":"Information theory in molecular biology","volume":"1","author":"Adami","year":"2004","journal-title":"Phys. Life Rev."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1006\/jtbi.2000.2138","article-title":"Information content of protein sequences","volume":"206","author":"Weiss","year":"2000","journal-title":"J. Theor. Biol."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1109\/MCS.2001.939938","article-title":"Measures of complexity: A nonexhaustive list","volume":"21","author":"Lloyd","year":"2001","journal-title":"IEEE Control Syst."},{"key":"ref_29","first-page":"377","article-title":"Renyi continuous entropy of DNA sequences","volume":"231","author":"Vinga","year":"2004","journal-title":"J. Comput. Biol."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1016\/S0375-9601(97)00855-4","article-title":"Measures of statistical complexity: Why?","volume":"238","author":"Feldman","year":"1998","journal-title":"Phys. Lett. A"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/0375-9601(95)00867-5","article-title":"A statistical measure of complexity","volume":"209","author":"Mancini","year":"1995","journal-title":"Phys. Lett. A"},{"key":"ref_33","first-page":"3","article-title":"Three approaches to the quantitative definition of information","volume":"1","author":"Kolmogorov","year":"1965","journal-title":"Probl. Inf. Transm."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1145\/321356.321363","article-title":"On the length of programs for computing finite binary sequences","volume":"13","author":"Chaitin","year":"1966","journal-title":"JACM"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1109\/TIT.1976.1055501","article-title":"On the complexity of finite sequences","volume":"22","author":"Lempel","year":"1976","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"D501","DOI":"10.1093\/nar\/gki025","article-title":"NCBI Reference Sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins","volume":"33","author":"Pruitt","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1146\/annurev-genom-082908-150105","article-title":"Genomic analyses of sex chromosome evolution","volume":"10","author":"Wilson","year":"2009","journal-title":"Annu. Rev. Genomics Hum. Genet."},{"unstructured":"Bull, J.J. (1983). Evolution of Sex Determining Mechanisms, Benjamin Cummings.","key":"ref_38"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"358","DOI":"10.1038\/81685","article-title":"Y chromosome sequence variation and the history of human populations","volume":"26","author":"Underhill","year":"2000","journal-title":"Nat. Genet."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1101\/gr.1971104","article-title":"Insertions and deletions are male biased too: A whole-genome analysis in rodents","volume":"14","author":"Makova","year":"2004","journal-title":"Genome Res."},{"doi-asserted-by":"crossref","unstructured":"Siva, N. (2008). 1000 Genomes project. Nat. Biotech., 26.","key":"ref_41","DOI":"10.1038\/nbt0308-256b"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/17\/4\/1673\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T20:44:01Z","timestamp":1760215441000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/17\/4\/1673"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,3,27]]},"references-count":41,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2015,4]]}},"alternative-id":["e17041673"],"URL":"https:\/\/doi.org\/10.3390\/e17041673","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2015,3,27]]}}}