{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T18:43:16Z","timestamp":1772822596430,"version":"3.50.1"},"reference-count":180,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,12,23]],"date-time":"2019-12-23T00:00:00Z","timestamp":1577059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["UID\/Multi\/04423\/2019"],"award-info":[{"award-number":["UID\/Multi\/04423\/2019"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["PTDC\/AAG-GLO\/6887\/2014"],"award-info":[{"award-number":["PTDC\/AAG-GLO\/6887\/2014"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["PTDC\/CTA-AMB\/31774\/2017"],"award-info":[{"award-number":["PTDC\/CTA-AMB\/31774\/2017"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Biomolecules"],"abstract":"<jats:p>Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene\/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence\/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence\/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features\/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting\/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features\/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical\u2013numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.<\/jats:p>","DOI":"10.3390\/biom10010026","type":"journal-article","created":{"date-parts":[[2019,12,24]],"date-time":"2019-12-24T05:56:15Z","timestamp":1577166975000},"page":"26","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Graph Theory-Based Sequence Descriptors as Remote Homology Predictors"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9908-2418","authenticated-orcid":false,"given":"Guillermin","family":"Ag\u00fcero-Chapin","sequence":"first","affiliation":[{"name":"CIIMAR\/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leix\u00f5es, Av. General Norton de Matos s\/n 4450-208 Porto, Portugal"},{"name":"Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal"}]},{"given":"Deborah","family":"Galpert","sequence":"additional","affiliation":[{"name":"Departamento de Ciencia de la Computaci\u00f3n. Universidad Central \u00a8Marta Abreu\u00a8 de Las Villas (UCLV), Santa Clara 54830, Cuba"}]},{"given":"Reinaldo","family":"Molina-Ruiz","sequence":"additional","affiliation":[{"name":"Centro de Bioactivos Qu\u00edmicos (CBQ), Universidad Central \u00a8Marta Abreu\u00a8 de Las Villas (UCLV), Santa Clara 54830, Cuba"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3478-6035","authenticated-orcid":false,"given":"Evys","family":"Ancede-Gallardo","sequence":"additional","affiliation":[{"name":"Programa de Doctorado en Fisicoqu\u00edmica Molecular, Facultad de Ciencias Exactas, Universidad Andr\u00e9s Bello, Av. Rep\u00fablica 239, Santiago 8370146, Chile"}]},{"given":"Gisselle","family":"P\u00e9rez-Machado","sequence":"additional","affiliation":[{"name":"EpiDisease S.L. Spin-Off of Centro de Investigaci\u00f3n Biom\u00e9dica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain"}]},{"given":"Gustavo A.","family":"De la Riva","sequence":"additional","affiliation":[{"name":"Laboratorio de Biotecnolog\u00eda Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carap\u00e1n, km 3.5, La Piedad, Michoac\u00e1n 59300, Mexico"},{"name":"Tecnol\u00f3gico Nacional de M\u00e9xico, Instituto Tecnol\u00f3gico de la Piedad, Av. Ricardo Guzm\u00e1n Romero, Santa Fe, La Piedad de Cavadas, Michoac\u00e1n 59370, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1328-1732","authenticated-orcid":false,"given":"Agostinho","family":"Antunes","sequence":"additional","affiliation":[{"name":"CIIMAR\/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leix\u00f5es, Av. General Norton de Matos s\/n 4450-208 Porto, Portugal"},{"name":"Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2019,12,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/0471250953.bi0301s42","article-title":"An introduction to sequence similarity (\u201chomology\u201d) searching","volume":"42","author":"Pearson","year":"2013","journal-title":"Curr. Protoc. Bioinform."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic Local Alignment Search Tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1501","DOI":"10.1006\/jmbi.1994.1104","article-title":"Hidden Markov models in computational biology. Applications to protein modeling","volume":"235","author":"Krogh","year":"1994","journal-title":"J. Mol. Biol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1002\/prot.10474","article-title":"Enriching the sequence substitution matrix by structural information","volume":"54","author":"Teodorescu","year":"2004","journal-title":"Proteins"},{"key":"ref_6","first-page":"pdb","article-title":"Using BLOSUM in Sequence Alignments","volume":"2008","author":"Mount","year":"2008","journal-title":"Csh. Protoc."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1093\/bib\/6.1.6","article-title":"The many faces of sequence alignment","volume":"6","author":"Batzoglou","year":"2005","journal-title":"Brief. Bioinform."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chatzou, M., Magis, C., Chang, J.-M., Kemena, C., Bussotti, G., Erb, I., and Notredame, C. (2015). Multiple sequence alignment modeling: Methods and applications. Brief. Bioinform., bbv099.","DOI":"10.1093\/bib\/bbv099"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.genrep.2016.02.004","article-title":"Fast and exact sequence alignment with the Smith\u2013Waterman algorithm: The SwissAlign webserver","volume":"4","author":"Ivan","year":"2016","journal-title":"Gene Rep."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"W79","DOI":"10.1093\/nar\/gkn275","article-title":"WAR: Webserver for aligning structural RNAs","volume":"36","author":"Torarinsson","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: Interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"W5","DOI":"10.1093\/nar\/gkn201","article-title":"NCBI BLAST: A better web interface","volume":"36","author":"Johnson","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/protein\/12.2.85","article-title":"Twilight zone of protein sequence alignments","volume":"12","author":"Rost","year":"1999","journal-title":"Protein Eng."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Schwarz, R.F., Fletcher, W., F\u00f6rster, F., Merget, B., Wolf, M., Schultz, J., and Markowetz, F. (2010). Evolutionary Distances in the Twilight Zone\u2014A Rational Kernel Approach. PLoS ONE.","DOI":"10.1371\/journal.pone.0015788"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.sbi.2005.05.005","article-title":"The limits of protein sequence comparison?","volume":"15","author":"Pearson","year":"2005","journal-title":"Curr. Opin. Strctural. Biol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison\u2014a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: Benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1093\/bib\/bbu005","article-title":"Editorial: Alignment-free methods in computational biology","volume":"15","author":"Vinga","year":"2014","journal-title":"Brief. Bioinform."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"217","DOI":"10.2174\/157016408786733770","article-title":"Alignment-Independent Techniques for Protein Classification","volume":"5","author":"Davies","year":"2008","journal-title":"Curr. Proteom."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ag\u00fcero-Chapin, G., S\u00e1nchez-Rodr\u00edguez, A., Hidalgo-Yanes, P.I., P\u00e9rez-Castillo, Y., Molina-Ruiz, R., Marchal, K., Vasconcelos, V., and Antunes, A. (2011). An alignment-free approach for eukaryotic ITS2 annotation and phylogenetic inference. PLoS ONE, 6.","DOI":"10.1371\/journal.pone.0026638"},{"key":"ref_21","unstructured":"Evans, S.B. (2016). Alignment-Free Methods for the Detection and Specificity Prediction of Adenylation Domains. Nonribosomal Peptide and Polyketide Biosynthesis: Methods and Protocols, Springer New York."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Ag\u00fcero-Chapin, G., Molina-Ruiz, R., P\u00e9rez-Machado, G., Vasconcelos, V., Rodr\u00edguez-Negrin, Z., and Antunes, A. (2016). TI2BioP\u2014Topological Indices to BioPolymers. A Graphical\u2013Numerical Approach for Bioinformatics. Recent Advances in Biopolymers, IntechOpen.","DOI":"10.5772\/61887"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1016\/j.bmcl.2005.10.057","article-title":"QSAR study for mycobacterial promoters with low sequence homology","volume":"16","author":"Uriarte","year":"2006","journal-title":"Bioorg. Med. Chem. Lett."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"750","DOI":"10.1002\/pmic.200700638","article-title":"Proteomics, networks and connectivity indices","volume":"8","author":"Santana","year":"2008","journal-title":"Proteomics"},{"key":"ref_25","first-page":"476","article-title":"Enzymes\/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices","volume":"254","author":"Munteanu","year":"2008","journal-title":"J. Biol."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.jtbi.2015.03.026","article-title":"Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes","volume":"374","author":"Barigye","year":"2015","journal-title":"J. Theor. Biol."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ruiz-Blanco, Y.B., Paz, W., Green, J., and Marrero-Ponce, Y. (2015). ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinform., 16.","DOI":"10.1186\/s12859-015-0586-0"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1396","DOI":"10.1093\/bioinformatics\/btv006","article-title":"Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification","volume":"31","author":"Borozan","year":"2015","journal-title":"Bioinformatics"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Galpert, D., Fernandez, A., Herrera, F., Antunes, A., Molina-Ruiz, R., and Aguero-Chapin, G. (2018). Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers. BMC Bioinform., 19.","DOI":"10.1186\/s12859-018-2148-8"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2296","DOI":"10.1093\/bioinformatics\/btn436","article-title":"Markov model plus k-word distributions: A synergy that produces novel statistical measures for sequence comparison","volume":"24","author":"Dai","year":"2008","journal-title":"Bioinformatics"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1002\/prot.340090107","article-title":"Database of homology-derived protein structures and the structural meaning of sequence alignment","volume":"9","author":"Sander","year":"1991","journal-title":"Proteins"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Capriotti, E., and Marti-Renom, M.A. (2010). Quantifying the relationship between sequence and three-dimensional structure conservation in RNA. BMC Bioinform., 11.","DOI":"10.1186\/1471-2105-11-322"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2433","DOI":"10.1093\/nar\/gki541","article-title":"A benchmark of multiple sequence alignment programs upon structural RNAs","volume":"33","author":"Gardner","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Bremges, A., Schirmer, S., and Giegerich, R. (2010). Fine-tuning structural RNA alignments in the twilight zone. Bmc Bioinform., 11.","DOI":"10.1186\/1471-2105-11-222"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Xiong, J. (2006). Essential Bioinformatics, Cambridge University Press.","DOI":"10.1017\/CBO9780511806087"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1093\/bioinformatics\/14.2.157","article-title":"Rose: Generating sequence families","volume":"14","author":"Stoye","year":"1998","journal-title":"Bioinformatics"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1879","DOI":"10.1093\/molbev\/msp098","article-title":"INDELible: A flexible simulator of biological sequence evolution","volume":"26","author":"Fletcher","year":"2009","journal-title":"Mol. Biol. Evol."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ag\u00fcero-Chapin, G., Molina-Ruiz, R., Maldonado, E., de la Riva, G., S\u00e1nchez-Rodr\u00edguez, A., Vasconcelos, V., and Antunes, A. (2013). Exploring the adenylation domain repertoire of nonribosomal peptide synthetases using an ensemble of sequence-search methods. PLoS ONE, 8.","DOI":"10.1371\/journal.pone.0065926"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Ruiz-Blanco, Y.B., Aguero-Chapin, G., Garcia-Hernandez, E., Alvarez, O., Antunes, A., and Green, J. (2017). Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone. BMC Bioinform., 18.","DOI":"10.1186\/s12859-017-1758-x"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1758","DOI":"10.1093\/bioinformatics\/btx055","article-title":"Accurate prediction of human essential genes using only nucleotide composition and association information","volume":"33","author":"Guo","year":"2017","journal-title":"Bioinformatics"},{"key":"ref_41","first-page":"121","article-title":"COPid: Composition based protein identification","volume":"8","author":"Kumar","year":"2008","journal-title":"In Silico Biol."},{"key":"ref_42","first-page":"236","article-title":"Some remarks on protein attribute prediction and pseudo amino acid composition","volume":"273","author":"Chou","year":"2011","journal-title":"J. Biol."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/j.jtbi.2014.05.016","article-title":"Extraction of high quality k-words for alignment-free sequence comparison","volume":"358","author":"Gunasinghe","year":"2014","journal-title":"J. Theor. Biol."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1991","DOI":"10.1093\/bioinformatics\/btu177","article-title":"Fast alignment-free sequence comparison using spaced-word frequencies","volume":"30","author":"Leimeister","year":"2014","journal-title":"Bioinformatics"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.ab.2014.04.001","article-title":"PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition","volume":"456","author":"Chen","year":"2014","journal-title":"Anal. Biochem."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1093\/protein\/15.9.713","article-title":"A study on the correlation of G-protein-coupled receptor types with amino acid composition","volume":"15","author":"Elrod","year":"2002","journal-title":"Protein Eng."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1007\/978-94-007-7423-0_10","article-title":"Bioinformatics tools for predicting GPCR gene functions","volume":"796","author":"Suwa","year":"2014","journal-title":"Adv. Exp. Med. Biol."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"559","DOI":"10.2174\/092986610791112693","article-title":"Prediction of G-protein-coupled receptor classes in low homology using Chou\u2019s pseudo amino acid composition with approximate entropy and hydrophobicity patterns","volume":"17","author":"Gu","year":"2010","journal-title":"Protein Pept. Lett."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1016\/j.ab.2009.04.009","article-title":"Prediction of G-protein-coupled receptor classes based on the concept of Chou\u2019s pseudo amino acid composition: An approach from discrete wavelet transform","volume":"390","author":"Qiu","year":"2009","journal-title":"Anal. Biochem."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1002\/prot.1035","article-title":"Prediction of protein cellular attributes using pseudo-amino acid composition","volume":"43","author":"Chou","year":"2001","journal-title":"Proteins Struct. Funct. Bioinform."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1016\/j.ab.2007.10.012","article-title":"PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition","volume":"373","author":"Shen","year":"2008","journal-title":"Anal. Biochem."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Shen, H.B., and Chou, K.C. (2007). EzyPred: A top-down approach for predicting enzyme functional classes and subclasses. Biochem. Biophys. Res. Commun.","DOI":"10.1016\/j.bbrc.2007.09.098"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"811","DOI":"10.2174\/092986607781483778","article-title":"Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network","volume":"14","author":"Ding","year":"2007","journal-title":"Protein Pept. Lett."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1002\/minf.201300084","article-title":"Protein Remote Homology Detection by Combining Chou\u2019s Pseudo Amino Acid Composition and Profile-Based Protein Representation","volume":"32","author":"Liu","year":"2013","journal-title":"Mol. Inf."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1038\/nbt.2023","article-title":"How to apply de Bruijn graphs to genome assembly","volume":"29","author":"Compeau","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"2253","DOI":"10.1093\/bioinformatics\/btt389","article-title":"Scalable metagenomic taxonomy classification using a reference genome database","volume":"29","author":"Ames","year":"2013","journal-title":"Bioinformatics"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Ounit, R., Wanamaker, S., Close, T.J., and Lonardi, S. (2015). CLARK: Fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom., 16.","DOI":"10.1186\/s12864-015-1419-2"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.tibtech.2004.04.006","article-title":"Codon bias and heterologous protein expression","volume":"22","author":"Gustafsson","year":"2004","journal-title":"Trends Biotechnol"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"3316","DOI":"10.1093\/bioinformatics\/bts599","article-title":"Real time metagenomics: Using k-mers to annotate metagenomes","volume":"28","author":"Edwards","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Dai, Q., and Wang, T. (2008). Comparison study on k-word statistical measures for protein: From sequence to \u2018sequence space\u2019. Bmc Bioinform., 9.","DOI":"10.1186\/1471-2105-9-394"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.1093\/bioinformatics\/btl376","article-title":"Remote homology detection based on oligomer distances","volume":"22","author":"Lingner","year":"2006","journal-title":"Bioinformatics"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"388","DOI":"10.2174\/092986612799789350","article-title":"Predicting protein structural class by incorporating patterns of over-represented k-mers into the general form of Chou\u2019s PseAAC","volume":"19","author":"Qin","year":"2012","journal-title":"Protein Pept. Lett."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1093\/bioinformatics\/btr176","article-title":"Alignment-free detection of local similarity among viral and bacterial genomes","volume":"27","author":"Haubold","year":"2011","journal-title":"Bioinformatics"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1080\/10635150701294741","article-title":"Is multiple-sequence alignment required for accurate inference of phylogeny?","volume":"56","author":"Hohl","year":"2007","journal-title":"Syst. Biol."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1745-6150-8-3","article-title":"Next-generation phylogenomics","volume":"8","author":"Chan","year":"2013","journal-title":"Biol. Direct."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"W45","DOI":"10.1093\/nar\/gkh362","article-title":"CVTree: A phylogenetic tree reconstruction tool based on whole genomes","volume":"32","author":"Qi","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Kang, Y., Yang, X., Lin, J., and Ye, K. (2019). PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction. Genes (Basel).","DOI":"10.3390\/genes10020073"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1089\/cmb.2012.0228","article-title":"Alignment-free sequence comparison based on next-generation sequencing reads","volume":"20","author":"Song","year":"2013","journal-title":"J. Comput. Biol."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1093\/bib\/bbt067","article-title":"New developments of alignment-free sequence comparison: Measures, statistics and next-generation sequencing","volume":"15","author":"Song","year":"2014","journal-title":"Brief. Bioinform."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"i249","DOI":"10.1093\/bioinformatics\/btm211","article-title":"A statistical method for alignment-free comparison of regulatory sequences","volume":"23","author":"Kantorovitz","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"2391","DOI":"10.1093\/bioinformatics\/btq453","article-title":"An alignment-free model for comparison of regulatory sequences","volume":"26","author":"Koohy","year":"2010","journal-title":"Bioinformatics"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Li, M., and Vit\u00e1nyi, P.M.B. (2008). An Introduction to Kolmogorov Complexity and its Applications, Springer. [3rd ed.].","DOI":"10.1007\/978-0-387-49820-1"},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1109\/TIT.1976.1055501","article-title":"On the complexity of finite sequences","volume":"22","author":"Lempel","year":"1976","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1093\/bioinformatics\/btg295","article-title":"A new sequence distance measure for phylogenetic tree construction","volume":"19","author":"Otu","year":"2003","journal-title":"Bioinformatics"},{"key":"ref_75","unstructured":"Li, M., Chen, X., Li, X., Ma, B., and Vit\u00e1nyi, P. (2003, January 12\u201314). The similarity metric. Proceedings of the Fourteenth Anual ACM-SIAM Symposium on Discrete Algorithms, Baltimore, MD, USA."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1093\/bioinformatics\/bti806","article-title":"Application of compression-based distance measures to protein sequence classification: A methodological study","volume":"22","author":"Kocsor","year":"2006","journal-title":"Bioinformatics"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Ferragina, P., Giancarlo, R., Greco, V., Manzini, G., and Valiente, G. (2007). Compression-based classification of biological sequences and structures via the Universal Similarity Metric: Experimental assessment. BMC Bioinform., 8.","DOI":"10.1186\/1471-2105-8-252"},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1093\/bioinformatics\/17.2.149","article-title":"An information-based sequence distance and its application to whole mitochondrial genome phylogeny","volume":"17","author":"Li","year":"2001","journal-title":"Bioinformatics"},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"1015","DOI":"10.1093\/bioinformatics\/bth031","article-title":"Measuring the similarity of protein structures by means of the universal similarity metric","volume":"20","author":"Krasnogor","year":"2004","journal-title":"Bioinformatics"},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/S0006-3495(96)79210-X","article-title":"The Shannon information entropy of protein sequences","volume":"71","author":"Strait","year":"1996","journal-title":"Biophys. J."},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_82","unstructured":"Nan, F., and Adjeroh, D. (2004, January 19). On complexity measures for biological sequences. Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, Stanford, CA, USA."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1145\/2500124.2500126","article-title":"Information entropy based methods for genome comparison","volume":"3","author":"Jani","year":"2013","journal-title":"ACM Sigbioinformatics Rec."},{"key":"ref_84","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1752-0509-6-S2-S4","article-title":"MISCORE: A new scoring function for characterizing DNA regulatory motifs in promoter sequences","volume":"6","author":"Wang","year":"2012","journal-title":"BMC Syst. Biol."},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Comin, M., and Antonelli, M. (2015). Fast Alignment-free Comparison for Regulatory Sequences using Multiple Resolution Entropic Profiles. Proceedings of BIOINFORMATICS, Methods and Algorithms (BIOSTEC 2015), SciTePress.","DOI":"10.5220\/0005251001710177"},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Erill, I., and O\u2019Neill, M.C. (2009). A reexamination of information theory-based methods for DNA-binding site identification. BMC Bioinform., 10.","DOI":"10.1186\/1471-2105-10-57"},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Xu, M., and Su, Z. (2010). A novel alignment-free method for comparing transcription factor binding site motifs. PLoS ONE, 5.","DOI":"10.1371\/journal.pone.0008797"},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1093\/bib\/bbt068","article-title":"Information theory applications for biological sequence analysis","volume":"15","author":"Vinga","year":"2014","journal-title":"Brief. Bioinform."},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1006\/bbrc.1999.1325","article-title":"A key driving force in determination of protein structural classes","volume":"264","author":"Chou","year":"1999","journal-title":"Biochem. Biophys. Res. Commun."},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1110\/ps.03328504","article-title":"Sensitivity and selectivity in protein structure comparison","volume":"13","author":"Sierk","year":"2004","journal-title":"Protein Sci."},{"key":"ref_91","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/s00453-003-1045-2","article-title":"Finding the consensus shape for a protein family","volume":"38","author":"Chew","year":"2004","journal-title":"Algorithmica"},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1089\/106652703322756113","article-title":"Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships","volume":"10","author":"Liao","year":"2003","journal-title":"J. Comput. Biol."},{"key":"ref_93","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1093\/bioinformatics\/btg431","article-title":"Mismatch string kernels for discriminative protein classification","volume":"20","author":"Leslie","year":"2004","journal-title":"Bioinformatics"},{"key":"ref_94","doi-asserted-by":"crossref","first-page":"790","DOI":"10.1021\/cr800198j","article-title":"Graphical representation of proteins","volume":"111","author":"Randic","year":"2011","journal-title":"Chem. Rev."},{"key":"ref_95","unstructured":"Biggs, N., Lloyd, E., and Wilson, R. (1986). Graph Theory, Oxford University Press."},{"key":"ref_96","doi-asserted-by":"crossref","first-page":"1573","DOI":"10.2174\/0929867013371923","article-title":"Recent advances on the role of topological indices in drug discovery research","volume":"8","author":"Estrada","year":"2001","journal-title":"Curr. Med. Chem."},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"211","DOI":"10.3998\/ark.5550190.0007.907","article-title":"Mathematical descriptors of DNA sequences: Development and applications","volume":"9","author":"Nandy","year":"2006","journal-title":"Arkivoc"},{"key":"ref_98","first-page":"136","article-title":"Generalized lattice graphs for 2D-visualization of biological information","volume":"261","author":"Paniagua","year":"2009","journal-title":"J. Biol."},{"key":"ref_99","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1016\/j.cplett.2005.03.086","article-title":"Four-color map representation of DNA or RNA sequences and their numerical characterization","volume":"407","author":"Randic","year":"2005","journal-title":"Chem. Phys. Lett."},{"key":"ref_100","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1016\/j.jmgm.2006.12.006","article-title":"On representation of proteins by star-like graphs","volume":"26","author":"Randic","year":"2007","journal-title":"J. Mol. Graph. Model."},{"key":"ref_101","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1016\/j.febslet.2005.12.072","article-title":"2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L.","volume":"580","author":"Molina","year":"2006","journal-title":"Febs. Lett."},{"key":"ref_102","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.1002\/jcc.20576","article-title":"2D-RNA-coupling numbers: A new computational chemistry approach to link secondary structure topology with biological function","volume":"28","author":"Varona","year":"2007","journal-title":"J. Comput. Chem."},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1021\/pr800867y","article-title":"Alignment-free prediction of polygalacturonases with pseudofolding topological indices: Experimental isolation from Coffea arabica and prediction of a new sequence","volume":"8","author":"Antunes","year":"2009","journal-title":"J. Proteome Res."},{"key":"ref_104","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/j.bmc.2004.10.024","article-title":"Predicting stability of Arc repressor mutants with protein stochastic moments","volume":"13","author":"Uriarte","year":"2005","journal-title":"Bioorg. Med. Chem."},{"key":"ref_105","doi-asserted-by":"crossref","first-page":"1124","DOI":"10.3390\/91201124","article-title":"Protein quadratic indices of the \u201cMacromolecular Pseudograph\u2019s \u03b1-Carbon Atom Adjacency Matrix\u201d. 1. Prediction of Arc repressor alanine-mutant\u2019s stability","volume":"9","author":"Ponce","year":"2004","journal-title":"Molecules"},{"key":"ref_106","doi-asserted-by":"crossref","first-page":"1676","DOI":"10.2174\/156802608786786543","article-title":"Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach","volume":"8","author":"Ubeira","year":"2008","journal-title":"Curr. Top Med. Chem."},{"key":"ref_107","doi-asserted-by":"crossref","first-page":"276","DOI":"10.3390\/i5110276","article-title":"Nucleic acid quadratic indices of the \u201cmacromolecular graph\u2019s nucleotides adjacency matrix\u201d modeling of footprints after the interaction of paromomycin with the HIV-1 \u03a8-RNA Packaging Region","volume":"5","author":"Ponce","year":"2004","journal-title":"Int. J. Mol. Sci."},{"key":"ref_108","doi-asserted-by":"crossref","first-page":"1716","DOI":"10.1039\/c2mb25039j","article-title":"Naive Bayes QSDR classification based on spiral-graph Shannon entropies for protein biomarkers in human colon cancer","volume":"8","author":"Munteanu","year":"2012","journal-title":"Mol. Biosyst."},{"key":"ref_109","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0009-2614(02)01784-0","article-title":"Novel 2-D graphical representation of DNA sequences and their numerical characterization","volume":"368","year":"2003","journal-title":"Chem. Phys. Lett."},{"key":"ref_110","first-page":"55","article-title":"Two-dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences","volume":"12","author":"Nandy","year":"1996","journal-title":"Comput. Appl. Biosci."},{"key":"ref_111","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1016\/j.jmgm.2008.10.004","article-title":"Graphical representation of proteins as four-color maps and their numerical characterization","volume":"27","author":"Randic","year":"2009","journal-title":"J. Mol. Graph. Model."},{"key":"ref_112","doi-asserted-by":"crossref","first-page":"2265","DOI":"10.1021\/ci8001809","article-title":"Comparative study of topological indices of macro\/supramolecular RNA complex networks","volume":"48","author":"Antunes","year":"2008","journal-title":"J. Chem. Inf. Model."},{"key":"ref_113","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1021\/tx700296t","article-title":"3D-MEDNEs: An alternative \u201cin silico\u201d technique for chemical research in toxicology. 2. quantitative proteome-toxicity relationships (QPTR) based on mass spectrum spiral entropy","volume":"21","author":"Borges","year":"2008","journal-title":"Chem. Res. Toxicol."},{"key":"ref_114","unstructured":"Gonz\u00e1lez-D\u00edaz, H., Molina-Ruiz, R., and Hernandez, I. MARCH-INSIDE v3.0 (MARkov CHains INvariants for SImulation & DEsign) 3.0 2007. p. Windows supported version under request to the main author contact email: gonzalezdiazh@yahoo.es."},{"key":"ref_115","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1007\/s00894-002-0088-7","article-title":"Markovian chemicals \u201cin silico\u201d design (MARCH-INSIDE), a promising approach for computer aided molecular design II: Experimental and theoretical assessment of a novel method for virtual screening of fasciolicides","volume":"8","author":"Olazabal","year":"2002","journal-title":"J. Mol. Model."},{"key":"ref_116","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1021\/ci950187r","article-title":"Spectral Moments of the Edge Adjacency Matrix in Molecular Graphs. 1. Definition and Applications to the Prediction of Physical Properties of Alkanes","volume":"36","author":"Estrada","year":"1996","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_117","doi-asserted-by":"crossref","first-page":"715","DOI":"10.1002\/prot.20159","article-title":"Markovian Backbone Negentropies: Molecular descriptors for protein research. I. Predicting protein stability in Arc repressor mutants","volume":"56","author":"Molina","year":"2004","journal-title":"Proteins"},{"key":"ref_118","doi-asserted-by":"crossref","first-page":"4815","DOI":"10.1016\/j.bmc.2004.07.017","article-title":"Stochastic-based descriptors studying peptides biological properties: Modeling the bitter tasting threshold of dipeptides","volume":"12","author":"Molina","year":"2004","journal-title":"Bioorg. Med. Chem."},{"key":"ref_119","doi-asserted-by":"crossref","first-page":"4691","DOI":"10.1016\/j.bmcl.2004.06.100","article-title":"Markov entropy backbone electrostatic descriptors for predicting proteins biological activity","volume":"14","author":"Molina","year":"2004","journal-title":"Bioorg. Med. Chem. Lett."},{"key":"ref_120","doi-asserted-by":"crossref","first-page":"2079","DOI":"10.1093\/bioinformatics\/btg285","article-title":"Markovian negentropies in bioinformatics. 1. A picture of footprints after the interaction of the HIV-1 Psi-RNA packaging region with drugs","volume":"19","author":"Molina","year":"2003","journal-title":"Bioinformatics"},{"key":"ref_121","doi-asserted-by":"crossref","unstructured":"Wang, F., Sun, X., Shi, X., Zhai, H., Tian, C., Kong, F., Liu, B., and Yuan, X. (2016). A Global Analysis of the Polygalacturonase Gene Family in Soybean (Glycine max). PLoS ONE, 11.","DOI":"10.1371\/journal.pone.0163012"},{"key":"ref_122","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1021\/ci7003225","article-title":"MMM-QSAR recognition of ribonucleases without alignment: Comparison with an HMM model and isolation from Schizosaccharomyces pombe, prediction, and experimental assay of a new sequence","volume":"48","author":"Rodriguez","year":"2008","journal-title":"J. Chem. Inf. Model."},{"key":"ref_123","doi-asserted-by":"crossref","first-page":"2231","DOI":"10.1074\/jbc.M309324200","article-title":"Evaluation of the RNA determinants for bacterial and yeast RNase III binding and cleavage","volume":"279","author":"Lamontagne","year":"2004","journal-title":"J. Biol. Chem."},{"key":"ref_124","doi-asserted-by":"crossref","first-page":"2377","DOI":"10.1093\/nar\/24.12.2377","article-title":"Purification and characterization of the Pac1 ribonuclease of Schizosaccharomyces pombe","volume":"24","author":"Rotondo","year":"1996","journal-title":"Nucleic Acids Res."},{"key":"ref_125","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1016\/j.biotechadv.2006.03.003","article-title":"Outlook for cellulase improvement: Screening and selection strategies","volume":"24","author":"Himmel","year":"2006","journal-title":"Biotechnol. Adv."},{"key":"ref_126","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/0378-1119(89)90339-9","article-title":"Cellulase families revealed by hydrophobic cluster analysis","volume":"81","author":"Henrissat","year":"1989","journal-title":"Gene"},{"key":"ref_127","unstructured":"Munteanu, C.R., and Gonz\u00e1lez-D\u00edaz, H. (2010). Network entropies classification of fungi and bacteria cellulases of interest for biotechnology. Topological Indices for Medicinal Chemistry, Biology, Parasitology, Neurological and Social Networks, Transworld Research Network."},{"key":"ref_128","doi-asserted-by":"crossref","first-page":"429","DOI":"10.2174\/1574893611308040005","article-title":"S2Snet: A tool for transforming characters and numeric sequences into star network topological indices in chemoinformatics, bioinformatics, biomedical, and social-legal sciences","volume":"8","author":"Pazos","year":"2013","journal-title":"Curr. Bioinform."},{"key":"ref_129","first-page":"458","article-title":"Alignment-free prediction of mycobacterial DNA promoters based on pseudo-folding lattice network or star-graph topological indices","volume":"256","author":"Munteanu","year":"2009","journal-title":"J. Biol."},{"key":"ref_130","doi-asserted-by":"crossref","first-page":"1510","DOI":"10.1002\/jcc.21170","article-title":"Computational chemistry study of 3D-structure-function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials","volume":"30","author":"Concu","year":"2009","journal-title":"J. Comput. Chem."},{"key":"ref_131","first-page":"775","article-title":"Natural\/random protein classification models based on star network topological indices","volume":"254","author":"Munteanu","year":"2008","journal-title":"J. Biol."},{"key":"ref_132","doi-asserted-by":"crossref","first-page":"771","DOI":"10.1016\/S0022-2836(03)00628-4","article-title":"Distinguishing Enzyme Structures from Non-enzymes Without Alignments","volume":"330","author":"Dobson","year":"2003","journal-title":"J. Mol. Biol."},{"key":"ref_133","doi-asserted-by":"crossref","first-page":"107","DOI":"10.6026\/97320630002107","article-title":"Prediction of enzymes and non-enzymes from protein sequences based on sequence derived features and PSSM matrix using artificial neural network","volume":"2","author":"Naik","year":"2007","journal-title":"Bioinformation"},{"key":"ref_134","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1007\/s00726-010-0653-9","article-title":"TI2BioP: Topological Indices to BioPolymers. Its practical use to unravel cryptic bacteriocin-like domains","volume":"40","author":"Vasconcelos","year":"2011","journal-title":"Amino Acids"},{"key":"ref_135","first-page":"167","article-title":"Non-linear models based on simple topological indices to identify RNase III protein members","volume":"273","author":"Vasconcelos","year":"2011","journal-title":"J. Biol."},{"key":"ref_136","doi-asserted-by":"crossref","unstructured":"Cotter, P., Hill, C., and Ross, R. (2006). What\u2019s in a name? Class distinction for bacteriocins. Nat. Rev. Microbiol., 4.","DOI":"10.1038\/nrmicro1273-c2"},{"key":"ref_137","doi-asserted-by":"crossref","first-page":"1425","DOI":"10.1016\/j.peptides.2003.10.028","article-title":"Peptide signal molecules and bacteriocins in Gram-negative bacteria: A genome-wide in silico screening for peptides containing a double-glycine leader sequence and their cognate transporters","volume":"25","author":"Dirix","year":"2004","journal-title":"Peptides"},{"key":"ref_138","doi-asserted-by":"crossref","first-page":"W116","DOI":"10.1093\/nar\/gki442","article-title":"InterProScan: Protein domains identifier","volume":"33","author":"Quevillon","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"ref_139","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1016\/j.febslet.2004.06.021","article-title":"Cryptic endotoxic nature of Bacillus thuringiensis Cry1Ab insecticidal crystal protein","volume":"570","author":"Aguero","year":"2004","journal-title":"Febs. Lett."},{"key":"ref_140","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1261\/rna.7204505","article-title":"A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota","volume":"11","author":"Schultz","year":"2005","journal-title":"RNA"},{"key":"ref_141","doi-asserted-by":"crossref","first-page":"2246","DOI":"10.1093\/bioinformatics\/bti349","article-title":"Predicting a set of minimal free energy RNA secondary structures common to two sequences","volume":"21","author":"Mathews","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_142","unstructured":"Kirk, P.M., Cannon, P.F., and Stalpers, J.A. (2008). The Dictionary of the Fungi, CABI. [10th ed.]."},{"key":"ref_143","doi-asserted-by":"crossref","first-page":"874","DOI":"10.1039\/b810283j","article-title":"Bioinformatic perspectives on NRPS\/PKS megasynthases: Advances and challenges","volume":"26","author":"Dittmann","year":"2009","journal-title":"Nat. Prod. Rep."},{"key":"ref_144","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1007\/s10822-004-5171-y","article-title":"TOMOCOMD-CARDD, a novel approach for computer-aided \u2018rational\u2019 drug design: I. Theoretical and experimental assessment of a promising method for computational screening and in silico design of new anthelmintic compounds","volume":"18","author":"Olazabal","year":"2004","journal-title":"J. Comput. Aided Mol. Des."},{"key":"ref_145","doi-asserted-by":"crossref","unstructured":"Marrero-Ponce, Y., Marrero, R.M., Torrens, F., Martinez, Y., Bernal, M.G., Zaldivar, V.R., Castro, E.A., and Abalo, R.G. (2005). Non-stochastic and stochastic linear indices of the molecular pseudograph\u2019s atom-adjacency matrix: A novel approach for computational in silico screening and \u201crational\u201d selection of new lead antibacterial agents. J. Mol. Model, 1\u201317.","DOI":"10.1007\/s00894-005-0024-8"},{"key":"ref_146","doi-asserted-by":"crossref","first-page":"3397","DOI":"10.1016\/j.bmc.2005.03.010","article-title":"Linear indices of the \u201cmacromolecular graph\u2019s nucleotides adjacency matrix\u201d as a promising approach for bioinformatics studies. Part 1: Prediction of paromomycin\u2019s affinity constant with HIV-1 W-RNA packaging region","volume":"13","author":"Nodarse","year":"2005","journal-title":"Bioorg. Med. Chem."},{"key":"ref_147","doi-asserted-by":"crossref","first-page":"3003","DOI":"10.1016\/j.bmc.2005.01.062","article-title":"Protein linear indices of the \u2018macromolecular pseudograph alpha-carbon atom adjacency matrix\u2019 in bioinformatics. Part 1: Prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor","volume":"13","author":"Torrens","year":"2005","journal-title":"Bioorg. Med. Chem."},{"key":"ref_148","doi-asserted-by":"crossref","first-page":"3118","DOI":"10.1111\/j.1742-4658.2010.07711.x","article-title":"TOMOCOMD-CAMPS and protein bilinear indices--novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor","volume":"277","author":"Diaz","year":"2010","journal-title":"Febs. J."},{"key":"ref_149","doi-asserted-by":"crossref","first-page":"533","DOI":"10.2174\/1574893610666151008011457","article-title":"Optimum search strategies or novel 3D molecular descriptors: Is there a stalemate?","volume":"10","year":"2015","journal-title":"Curr. Bioinform."},{"key":"ref_150","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1186\/s13321-016-0122-x","article-title":"Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets","volume":"8","author":"Barigye","year":"2016","journal-title":"J. Cheminform."},{"key":"ref_151","doi-asserted-by":"crossref","unstructured":"Ter\u00e1n, J.E., Marrero-Ponce, Y., Contreras-Torres, E., Garc\u00eda-Jacas, C.R., Vivas-Reyes, R., Ter\u00e1n, E., and Torres, F.J. (2019). Tensor Algebra-based Geometrical (3D) Biomacro-Molecular Descriptors for Protein Research: Theory, Applications and Comparison with other Methods. Sci. Rep., 9.","DOI":"10.1038\/s41598-019-47858-2"},{"key":"ref_152","first-page":"359","article-title":"The Autocorrelation of a topological structure. A new molecular descriptor","volume":"4","author":"Moreau","year":"1980","journal-title":"Nouv. J. Chim."},{"key":"ref_153","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1023\/A:1015952613760","article-title":"An electrotopological-state index for atoms in molecules","volume":"7","author":"Kier","year":"1990","journal-title":"Pharm. Res."},{"key":"ref_154","first-page":"1","article-title":"Building\u2013Block Computation of the Ivanciuc\u2013Balaban Indices for the Virtual Screening of Combinatorial Libraries","volume":"1","author":"Ivanciuc","year":"2002","journal-title":"Internet Electron. J. Mol. Des."},{"key":"ref_155","doi-asserted-by":"crossref","unstructured":"Todeschini, R., and Consonni, V. (2000). Handbook of Molecular Descriptors, Wiley-VCH. [1st ed.].","DOI":"10.1002\/9783527613106"},{"key":"ref_156","doi-asserted-by":"crossref","first-page":"1118","DOI":"10.1038\/nbt749","article-title":"Genome sequence of the dissimilatory metal ion\u2013reducing bacterium Shewanella oneidensis","volume":"20","author":"Heidelberg","year":"2002","journal-title":"Nat. Biotechnol."},{"key":"ref_157","doi-asserted-by":"crossref","first-page":"1734","DOI":"10.1002\/pro.3673","article-title":"ProtDCal-Suite: A web server for the numerical codification and functional analysis of proteins","volume":"28","author":"Green","year":"2019","journal-title":"Protein Sci."},{"key":"ref_158","unstructured":"Biggar, K.K., Ruiz-Blanco, Y.B., Charih, F., Fang, Q., Connolly, J., Frensemier, K., Adhikary, H., Li, S.S., and Green, J.R. (2018). MethylSight: Taking a wider view of lysine methylation through computer-aided discovery to provide insight into the human methyl-lysine proteome. bioRxiv, 274688."},{"key":"ref_159","doi-asserted-by":"crossref","first-page":"1255","DOI":"10.1021\/ci050507z","article-title":"Amino Acid Sequence Autocorrelation vectors and ensembles of Bayesian-Regularized Genetic Neural Networks for prediction of conformational stability of human lysozyme mutants","volume":"46","author":"Caballero","year":"2006","journal-title":"J. Chem. Inf. Model."},{"key":"ref_160","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1002\/prot.21349","article-title":"Amino acid sequence autocorrelation vectors and Bayesian-regularized genetic neural networks for modeling protein conformational stability: Gene V protein mutants","volume":"67","author":"Fernandez","year":"2007","journal-title":"Proteins"},{"key":"ref_161","doi-asserted-by":"crossref","unstructured":"Fernandez, M., Kumagai, Y., Standley, D.M., Sarai, A., Mizuguchi, K., and Ahmad, S. (2011). Prediction of dinucleotide-specific RNA-binding sites in proteins. BMC Bioinform., 12.","DOI":"10.1186\/1471-2105-12-S13-S5"},{"key":"ref_162","doi-asserted-by":"crossref","first-page":"241","DOI":"10.2174\/157489310794072490","article-title":"Graphical Representations of Protein Sequences for Alignment-Free Comparative and Predictive Studies. Recognition of Protease Inhibition Pattern from H-Depleted Molecular Graph Representation of Protease Sequences","volume":"5","author":"Fernandez","year":"2010","journal-title":"Curr. Bioinform."},{"key":"ref_163","first-page":"1442","article-title":"A Survey on Protein Sequence Classification with Data Mining Techniques","volume":"7","author":"Nandini","year":"2016","journal-title":"Int. J. Sci. Eng. Res."},{"key":"ref_164","doi-asserted-by":"crossref","first-page":"1682","DOI":"10.1093\/bioinformatics\/bth141","article-title":"Protein homology detection using string alignment kernels","volume":"20","author":"Saigo","year":"2004","journal-title":"Bioinformatics"},{"key":"ref_165","doi-asserted-by":"crossref","unstructured":"Salichos, L., and Rokas, A. (2011). Evaluating ortholog prediction algorithms in a yeast model clade. PLoS ONE, 6.","DOI":"10.1371\/journal.pone.0018755"},{"key":"ref_166","doi-asserted-by":"crossref","unstructured":"Mahmood, K., Webb, G.I., Song, J., Whisstock, J.C., and Konagurthu, A.S. (2012). Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs. Nucleic Acids Res., 40.","DOI":"10.1093\/nar\/gkr1261"},{"key":"ref_167","doi-asserted-by":"crossref","unstructured":"Byma, S., Dhasade, A., Altenhoff, A., Dessimoz, C., and Larus, J.R. (2019). Parallel and Scalable Precise Clustering for Homologous Protein Discovery. bioRxiv.","DOI":"10.1101\/751214"},{"key":"ref_168","doi-asserted-by":"crossref","unstructured":"Glover, N., Dessimoz, C., Ebersberger, I., Forslund, S.K., Gabald\u00f3n, T., Huerta-Cepas, J., Maria-Jesus, M., Muffato, M., Patricio, M., and Pereira, C. (2019). Advances and Applications in the Quest for Orthologs. Mol. Biol. Evol., 10.","DOI":"10.1093\/molbev\/msz150"},{"key":"ref_169","doi-asserted-by":"crossref","unstructured":"Chen, J., Liu, B., and Huang, D. (2016). Protein Remote Homology Detection Based on an Ensemble Learning Approach. Biomed Res. Int. Hindawi Publ. Corp., 11.","DOI":"10.1155\/2016\/5813645"},{"key":"ref_170","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1038\/nature02340","article-title":"Community structure and metabolism through reconstruction of microbial genomes from the environment","volume":"428","author":"Tyson","year":"2004","journal-title":"Nature"},{"key":"ref_171","first-page":"1235","article-title":"Mllib: Machine learning in apache spark","volume":"17","author":"Meng","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_172","unstructured":"Kashyap, H., Ahmed, H.A., Hoque, N., Roy, S., and Bhattacharyya, D.K. (2015). Big data analytics in bioinformatics: A machine learning perspective. arXiv."},{"key":"ref_173","unstructured":"Galpert, D., Garc\u00eda, S.d.R., Herrera, F., Ancede-Gallardo, E., Antunes, A., and Ag\u00fcero-Chapin, G. (2017). Big Data Supervised Pairwise Ortholog Detection in Yeasts. Yeast-Industrial Applications, IntechOpen."},{"key":"ref_174","doi-asserted-by":"crossref","unstructured":"Elloumi, M., and Zomaya, A.Y. (2011). Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, John Wiley & Sons.","DOI":"10.1002\/9780470892107"},{"key":"ref_175","doi-asserted-by":"crossref","unstructured":"Cattaneo, G., Petrillo, U.F., Giancarlo, R., and Roscigno, G. (2015, January 1\u20134). Alignment-free sequence comparison over Hadoop for computational biology. Proceedings of the 44th International Conference on Parallel Processing Workshops, Washington, DC, USA.","DOI":"10.1109\/ICPPW.2015.28"},{"key":"ref_176","doi-asserted-by":"crossref","unstructured":"Matsunaga, A., Tsugawa, M., and Fortes, J. (2008, January 7\u201312). Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics applications. Proceedings of the 2008 IEEE Fourth International Conference on eScience, Indianapolis, IN, USA.","DOI":"10.1109\/eScience.2008.62"},{"key":"ref_177","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat. Biotechnol."},{"key":"ref_178","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using DIAMOND","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat. Methods"},{"key":"ref_179","doi-asserted-by":"crossref","first-page":"748681","DOI":"10.1155\/2015\/748681","article-title":"An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species","volume":"2015","author":"Galpert","year":"2015","journal-title":"Biomed Res. Int."},{"key":"ref_180","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/s13059-019-1755-7","article-title":"Benchmarking of alignment-free sequence comparison methods","volume":"20","author":"Zielezinski","year":"2019","journal-title":"Genome Biol."}],"container-title":["Biomolecules"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-273X\/10\/1\/26\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:45:04Z","timestamp":1760190304000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-273X\/10\/1\/26"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,23]]},"references-count":180,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,1]]}},"alternative-id":["biom10010026"],"URL":"https:\/\/doi.org\/10.3390\/biom10010026","relation":{},"ISSN":["2218-273X"],"issn-type":[{"value":"2218-273X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12,23]]}}}