{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T20:12:28Z","timestamp":1765483948559},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,3,1]]},"abstract":"<jats:p>Motivation: Self-organizing maps (SOMs) are readily available bioinformatics methods for clustering and visualizing high-dimensional data, provided that such biological information is previously transformed to fixed-size, metric-based vectors. To increase the usefulness of SOM-based approaches for the analysis of genomic sequence data, novel representation methods are required that automatically and bijectively transform aligned nucleotide sequences into numeric vectors, dealing with both nucleotide ambiguity and gaps derived from sequence alignment.<\/jats:p><jats:p>Results: Six different codification variants based on Euclidean space, just like SOM processing, have been tested using two SOM models: the classical Kohonen\u2019s SOM and growing cell structures. They have been applied to two different sets of sequences: 32 sequences of small sub-unit ribosomal RNA from organisms belonging to the three domains of life, and 44 sequences of the reverse transcriptase region of the pol gene of human immunodeficiency virus type 1 belonging to different groups and sub-types. Our results show that the most important factor affecting the accuracy of sequence clustering is the assignment of an extra weight to the presence of alignment-derived gaps. Although each of the codification variants shows a different level of taxonomic consistency, the results are in agreement with sequence-based phylogenetic reconstructions and anticipate a broad applicability of this codification method.<\/jats:p><jats:p>Contact: \u00a0sole@eui.upm.es<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary Data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu708","type":"journal-article","created":{"date-parts":[[2014,10,25]],"date-time":"2014-10-25T03:54:31Z","timestamp":1414209271000},"page":"736-744","source":"Crossref","is-referenced-by-count":21,"title":["A novel representation of genomic sequences for taxonomic clustering and visualization by means of self-organizing maps"],"prefix":"10.1093","volume":"31","author":[{"given":"Soledad","family":"Delgado","sequence":"first","affiliation":[{"name":"1 \u00a01Department of Information Structure and Organization, Universidad Polit\u00e9cnica (UPM), Madrid 28031, 2Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, 3Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, 4CITIC, Campanillas, Malaga 29590, Spain, 5Department of Molecular Evolution, Centro de Astrobiolog\u00eda (CSIC-INTA), Torrej\u00f3n de Ardoz, Madrid 28850 and 6Centro de Investigaci\u00f3n Biom\u00e9dica en Red de enfermedades hep\u00e1ticas y digestivas (CIBERehd), Barcelona 08036, Spain"}]},{"given":"Federico","family":"Mor\u00e1n","sequence":"additional","affiliation":[{"name":"1 \u00a01Department of Information Structure and Organization, Universidad Polit\u00e9cnica (UPM), Madrid 28031, 2Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, 3Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, 4CITIC, Campanillas, Malaga 29590, Spain, 5Department of Molecular Evolution, Centro de Astrobiolog\u00eda (CSIC-INTA), Torrej\u00f3n de Ardoz, Madrid 28850 and 6Centro de Investigaci\u00f3n Biom\u00e9dica en Red de enfermedades hep\u00e1ticas y digestivas (CIBERehd), Barcelona 08036, Spain"}]},{"given":"Antonio","family":"Mora","sequence":"additional","affiliation":[{"name":"1 \u00a01Department of Information Structure and Organization, Universidad Polit\u00e9cnica (UPM), Madrid 28031, 2Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, 3Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, 4CITIC, Campanillas, Malaga 29590, Spain, 5Department of Molecular Evolution, Centro de Astrobiolog\u00eda (CSIC-INTA), Torrej\u00f3n de Ardoz, Madrid 28850 and 6Centro de Investigaci\u00f3n Biom\u00e9dica en Red de enfermedades hep\u00e1ticas y digestivas (CIBERehd), Barcelona 08036, Spain"}]},{"given":"Juan Juli\u00e1n","family":"Merelo","sequence":"additional","affiliation":[{"name":"1 \u00a01Department of Information Structure and Organization, Universidad Polit\u00e9cnica (UPM), Madrid 28031, 2Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, 3Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, 4CITIC, Campanillas, Malaga 29590, Spain, 5Department of Molecular Evolution, Centro de Astrobiolog\u00eda (CSIC-INTA), Torrej\u00f3n de Ardoz, Madrid 28850 and 6Centro de Investigaci\u00f3n Biom\u00e9dica en Red de enfermedades hep\u00e1ticas y digestivas (CIBERehd), Barcelona 08036, Spain"}]},{"given":"Carlos","family":"Briones","sequence":"additional","affiliation":[{"name":"1 \u00a01Department of Information Structure and Organization, Universidad Polit\u00e9cnica (UPM), Madrid 28031, 2Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, 3Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, 4CITIC, Campanillas, Malaga 29590, Spain, 5Department of Molecular Evolution, Centro de Astrobiolog\u00eda (CSIC-INTA), Torrej\u00f3n de Ardoz, Madrid 28850 and 6Centro de Investigaci\u00f3n Biom\u00e9dica en Red de enfermedades hep\u00e1ticas y digestivas (CIBERehd), Barcelona 08036, Spain"}]}],"member":"286","published-online":{"date-parts":[[2014,10,24]]},"reference":[{"key":"2023020116170498300_btu708-B1","doi-asserted-by":"crossref","first-page":"3064","DOI":"10.1093\/bioinformatics\/btp546","article-title":"Genome analysis with inter-nucleotide distances","volume":"25","author":"Afreixo","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020116170498300_btu708-B2","doi-asserted-by":"crossref","first-page":"100.","DOI":"10.1186\/1471-2105-10-100","article-title":"Biological sequences as pictures\u2014a genetic two dimensional solution for iterated maps","volume":"10","author":"Almeida","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020116170498300_btu708-B3","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1007\/s004220050357","article-title":"Classification of protein families and detection of the determinant residues with an improved self-organizing map","volume":"76","author":"Andrade","year":"1997","journal-title":"Biol. Cybern."},{"key":"2023020116170498300_btu708-B4","doi-asserted-by":"crossref","first-page":"4566","DOI":"10.1016\/j.watres.2007.06.030","article-title":"Comparison of self-organizing maps classification approach with cluster and principal components analysis for large environmental data sets","volume":"41","author":"Astel","year":"2007","journal-title":"Water Res."},{"key":"2023020116170498300_btu708-B5","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1038\/nature13668","article-title":"Comparative analysis of regulatory information and circuits across distant species","volume":"512","author":"Boyle","year":"2014","journal-title":"Nature"},{"key":"2023020116170498300_btu708-B6","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1016\/j.ympev.2004.10.020","article-title":"Reconstructing evolutionary relationships from functional data: a consistent classification of organisms based on translation inhibition response","volume":"34","author":"Briones","year":"2005","journal-title":"Mol. Phylogenet. Evol."},{"key":"2023020116170498300_btu708-B7","doi-asserted-by":"crossref","first-page":"e93233","DOI":"10.1371\/journal.pone.0093233","article-title":"Discovery of possible gene relationships through the application of self-organizing maps to DNA microarray databases","volume":"9","author":"Chavez-Alvarez","year":"2014","journal-title":"PLoS One"},{"key":"2023020116170498300_btu708-B8","doi-asserted-by":"crossref","first-page":"2624","DOI":"10.1016\/j.neucom.2011.03.021","article-title":"A combined measure for quantifying and qualifying the topology preservation of growing self-organizing maps","volume":"74","author":"Delgado","year":"2011","journal-title":"Neurocomputing"},{"key":"2023020116170498300_btu708-B9","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1128\/MMBR.05023-11","article-title":"Viral quasispecies evolution","volume":"76","author":"Domingo","year":"2012","journal-title":"Microbiol. Mol. Biol. Rev."},{"key":"2023020116170498300_btu708-B10","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/978-3-319-07695-9_8","article-title":"Visualization and classification of DNA sequences using pareto learning self organizing maps based on frequency and correlation coefficient","volume":"295","author":"Dozono","year":"2014","journal-title":"Adv. Intell. Syst. Comput."},{"key":"2023020116170498300_btu708-B11","doi-asserted-by":"crossref","first-page":"1846","DOI":"10.1093\/bioinformatics\/bti299","article-title":"Identification of GPI anchor attachment signals by Kohonen self-organizing map","volume":"21","author":"Fankhauser","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020116170498300_btu708-B12","doi-asserted-by":"crossref","first-page":"1441","DOI":"10.1016\/0893-6080(94)90091-4","article-title":"Growing cell structures\u2014a self-organizing network for unsupervised and supervised learning","volume":"7","author":"Fritzke","year":"1994","journal-title":"Neural Netw."},{"key":"2023020116170498300_btu708-B13","first-page":"173","article-title":"Median strings: a review. Data Mining in time series databases","volume":"57","author":"Jiang","year":"2004","journal-title":"World Sci."},{"key":"2023020116170498300_btu708-B14","first-page":"809","article-title":"Comparing self-organizing maps","volume-title":"Intl. Conf. Artif. Neural Netw. (ICANN)","author":"Kaski","year":"1996"},{"key":"2023020116170498300_btu708-B15","first-page":"307","article-title":"Numerical representation of DNA sequences","author":"Kwan","year":"2009"},{"key":"2023020116170498300_btu708-B16","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1007\/BF01731581","article-title":"A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences","volume":"16","author":"Kimura","year":"1980","journal-title":"J. Mol. Evol."},{"key":"2023020116170498300_btu708-B17","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-56927-2","volume-title":"Self-Organizing Maps","author":"Kohonen","year":"2001","edition":"3th edn"},{"key":"2023020116170498300_btu708-B18","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1016\/S0893-6080(02)00069-2","article-title":"How to make large self-organizing maps for nonvectorial data","volume":"15","author":"Kohonen","year":"2002","journal-title":"Neural Netw."},{"key":"2023020116170498300_btu708-B19","first-page":"1723","article-title":"Global visualization and comparison of DNA sequences by use of three-dimensional trajectories","volume":"23","author":"Lo","year":"2007","journal-title":"J. InforSci. Eng."},{"key":"2023020116170498300_btu708-B20","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1007\/BF03040854","article-title":"The Kohonen self-organizing map method: an assessment","volume":"12","author":"Murtagh","year":"1995","journal-title":"J. Classific."},{"key":"2023020116170498300_btu708-B21","first-page":"74","article-title":"A practical overview of quantitative structure-activity relationship","volume":"8","author":"Nantasenamat","year":"2009","journal-title":"EXCLI J."},{"key":"2023020116170498300_btu708-B22","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020116170498300_btu708-B23","first-page":"425","volume-title":"The New Foundations of Evolution: On the Tree of Life","author":"Sapp","year":"2009"},{"key":"2023020116170498300_btu708-B24","first-page":"404","article-title":"Generalized vs set median strings for histograms-based distances: algorithms and classification results in the image domain","volume":"4538","author":"Solnon","year":"2007","journal-title":"LNCS"},{"key":"2023020116170498300_btu708-B25","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1109\/72.846731","article-title":"Clustering of the self-organizing map","volume":"11","author":"Vesanto","year":"2000","journal-title":"IEEE Trans. Neural Netw."},{"key":"2023020116170498300_btu708-B26","doi-asserted-by":"crossref","first-page":"5088","DOI":"10.1073\/pnas.74.11.5088","article-title":"Phylogenetic structure of the prokaryotic domain: the primary kingdoms","volume":"74","author":"Woese","year":"1977","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020116170498300_btu708-B27","doi-asserted-by":"crossref","first-page":"645, 678","DOI":"10.1109\/TNN.2005.845141","article-title":"Survey of clustering algorithms","volume":"16","author":"Xu","year":"2005","journal-title":"IEEE Trans. Neural Netw."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/5\/736\/49011537\/bioinformatics_31_5_736.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/5\/736\/49011537\/bioinformatics_31_5_736.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,29]],"date-time":"2023-07-29T16:15:16Z","timestamp":1690647316000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/5\/736\/2748205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,10,24]]},"references-count":27,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2015,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu708","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,10,24]]}}}