{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T23:47:18Z","timestamp":1776469638388,"version":"3.51.2"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2019,6,21]],"date-time":"2019-06-21T00:00:00Z","timestamp":1561075200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Philipps-University of Marburg"},{"name":"the Paul Ehrlich Institute"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Classification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>https:\/\/cran.r-project.org\/.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz493","type":"journal-article","created":{"date-parts":[[2019,6,14]],"date-time":"2019-06-14T08:33:35Z","timestamp":1560501215000},"page":"272-279","source":"Crossref","is-referenced-by-count":65,"title":["Deep learning on chaos game representation for proteins"],"prefix":"10.1093","volume":"36","author":[{"given":"Hannah F","family":"L\u00f6chel","sequence":"first","affiliation":[{"name":"Department of Mathematics and Computer Science, Philipps-University of Marburg , Marburg 35032, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dominic","family":"Eger","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Philipps-University of Marburg , Marburg 35032, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Theodor","family":"Sperlea","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Philipps-University of Marburg , Marburg 35032, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dominik","family":"Heider","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Philipps-University of Marburg , Marburg 35032, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,6,21]]},"reference":[{"key":"2023013109502991200_btz493-B1","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1080\/15427951.2011.604548","article-title":"Keypathwayminer: detecting case-specific biological pathways using expression data","volume":"7","author":"Alcaraz","year":"2011","journal-title":"Internet Math"},{"key":"2023013109502991200_btz493-B2","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1093\/bioinformatics\/17.5.429","article-title":"Analysis of genomic sequences by chaos game representation","volume":"17","author":"Almeida","year":"2001","journal-title":"Bioinformatics"},{"key":"2023013109502991200_btz493-B3","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/j.ins.2018.06.052","article-title":"A two-tiered 2d visual tool for assessing classifier performance","volume":"463","author":"Armano","year":"2018","journal-title":"Inf. Sci"},{"key":"2023013109502991200_btz493-B4","first-page":"7.","article-title":"Phi-delta-diagrams: software implementation of a visual tool for assessing classifier and feature performance","volume":"1","author":"Armano","year":"2018","journal-title":"Mach. Learn. Knowl. Extract"},{"key":"2023013109502991200_btz493-B5","volume-title":"Fractals Everywhere: New Edition","author":"Barnsley","year":"2012"},{"key":"2023013109502991200_btz493-B6","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/S1093-3263(97)00106-X","article-title":"Chaos game representation of proteins","volume":"15","author":"Basu","year":"1997","journal-title":"J. Mol. Graph. Modell"},{"key":"2023013109502991200_btz493-B7","doi-asserted-by":"crossref","first-page":"3850","DOI":"10.1093\/nar\/gkg575","article-title":"Geno2pheno: estimating phenotypic drug resistance from hiv-1 genotypes","volume":"31","author":"Beerenwinkel","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023013109502991200_btz493-B8","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023013109502991200_btz493-B9","doi-asserted-by":"crossref","first-page":"4977","DOI":"10.1021\/jm4004285","article-title":"QSAR modeling: where have you been? Where are you going to?","volume":"57","author":"Cherkasov","year":"2014","journal-title":"J. Med. Chem"},{"key":"2023013109502991200_btz493-B10","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.1093\/oxfordjournals.molbev.a026048","article-title":"Genomic signature: characterization and classification of species assessed by chaos game representation of sequences","volume":"16","author":"Deschavanne","year":"1999","journal-title":"Mol. Biol. Evol"},{"key":"2023013109502991200_btz493-B11","doi-asserted-by":"crossref","first-page":"26.","DOI":"10.1186\/1756-0381-4-26","article-title":"Improved bevirimat resistance prediction by combination of structural and sequence-based classifiers","volume":"4","author":"Dybowski","year":"2011","journal-title":"BioData Min"},{"key":"2023013109502991200_btz493-B12","doi-asserted-by":"crossref","first-page":"16.","DOI":"10.1186\/1756-0381-4-16","article-title":"Interpol: an r package for preprocessing of protein sequences","volume":"4","author":"Heider","year":"2011","journal-title":"BioData Min"},{"key":"2023013109502991200_btz493-B13","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1177\/153303460900800503","article-title":"A computational approach for the identification of small GTPases based on preprocessed amino acid sequences","volume":"8","author":"Heider","year":"2009","journal-title":"Technol. Cancer Res. Treat"},{"key":"2023013109502991200_btz493-B14","doi-asserted-by":"crossref","first-page":"94.","DOI":"10.1186\/1756-0500-4-94","article-title":"Machine learning on normalized protein sequences","volume":"4","author":"Heider","year":"2011","journal-title":"BMC Res. Notes"},{"key":"2023013109502991200_btz493-B15","doi-asserted-by":"crossref","first-page":"7211","DOI":"10.1021\/bi00147a001","article-title":"Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks","volume":"31","author":"Hirst","year":"1992","journal-title":"Biochemistry"},{"key":"2023013109502991200_btz493-B16","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.ygeno.2016.08.002","article-title":"Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison","volume":"108","author":"Hoang","year":"2016","journal-title":"Genomics"},{"key":"2023013109502991200_btz493-B17","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1002\/prot.22192","article-title":"Predicting drug resistance of the HIV-1 protease using molecular interaction energy components","volume":"74","author":"Hou","year":"2009","journal-title":"Proteins Struct. Funct. Bioinform"},{"key":"2023013109502991200_btz493-B18","doi-asserted-by":"crossref","first-page":"2163","DOI":"10.1093\/nar\/18.8.2163","article-title":"Chaos game representation of gene structure","volume":"18","author":"Jeffrey","year":"1990","journal-title":"Nucleic Acids Res"},{"key":"2023013109502991200_btz493-B19","doi-asserted-by":"crossref","first-page":"243.","DOI":"10.1186\/1471-2105-7-243","article-title":"Chaos game representation for comparison of whole genomes","volume":"7","author":"Joseph","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023013109502991200_btz493-B20","doi-asserted-by":"crossref","first-page":"BBI","DOI":"10.4137\/BBI.S3382","article-title":"A rough set-based model of hiv-1 reverse transcriptase resistome","volume":"3","author":"Kierczak","year":"2009","journal-title":"Bioinform. Biol. Insights"},{"key":"2023013109502991200_btz493-B21","doi-asserted-by":"crossref","first-page":"2575","DOI":"10.1093\/bioinformatics\/bty170","article-title":"Scotch: subtype a coreceptor tropism classification in HIV-1","volume":"34","author":"L\u00f6chel","year":"2018","journal-title":"Bioinformatics"},{"key":"2023013109502991200_btz493-B22","doi-asserted-by":"crossref","first-page":"2804","DOI":"10.1110\/ps.051597405","article-title":"A novel representation of protein sequences for prediction of subcellular location using support vector machines","volume":"14","author":"Matsuda","year":"2005","journal-title":"Protein Sci"},{"key":"2023013109502991200_btz493-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-7-S2-S2","article-title":"A fourier transformation based method to mine peptide space for antimicrobial activity","volume":"7","author":"Nagarajan","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023013109502991200_btz493-B25","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1093\/nar\/gkg100","article-title":"Human immunodeficiency virus reverse transcriptase and protease sequence database","volume":"31","author":"Rhee","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023013109502991200_btz493-B26","doi-asserted-by":"crossref","first-page":"17355","DOI":"10.1073\/pnas.0607274103","article-title":"Genotypic predictors of human immunodeficiency virus type 1 drug resistance","volume":"103","author":"Rhee","year":"2006","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013109502991200_btz493-B27","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1145\/2983468.2983489","volume-title":"Proceedings of the 17th International Conference on Computer Systems and Technologies 2016","author":"Rizzo","year":"2016"},{"key":"2023013109502991200_btz493-B28","doi-asserted-by":"crossref","first-page":"77.","DOI":"10.1186\/1471-2105-12-77","article-title":"PROC: an open-source package for r and s+ to analyze and compare roc curves","volume":"12","author":"Robin","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023013109502991200_btz493-B30","doi-asserted-by":"crossref","first-page":"7881.","DOI":"10.1093\/bioinformatics\/bti623","article-title":"Rocr: visualizing classifier performance in r","volume":"21","author":"Sing","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013109502991200_btz493-B31","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1002\/(SICI)1097-0134(20000201)38:2<149::AID-PROT4>3.0.CO;2-#","article-title":"Optimized representations and maximal information in proteins","volume":"38","author":"Solis","year":"2000","journal-title":"Proteins Struct. Funct. Bioinform"},{"key":"2023013109502991200_btz493-B32","doi-asserted-by":"crossref","first-page":"7.","DOI":"10.1186\/s13040-019-0196-x","article-title":"Encodings and models for antimicrobial peptide classification for multi-resistant pathogens","volume":"12","author":"Sp\u00e4nig","year":"2019","journal-title":"BioData Min"},{"key":"2023013109502991200_btz493-B33","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1080\/00029890.2000.12005199","article-title":"Evaluating integrals using self-similarity","volume":"107","author":"Strichartz","year":"2000","journal-title":"Am. Math. Mon"},{"key":"2023013109502991200_btz493-B34","author":"Tzanov","year":"2015"},{"key":"2023013109502991200_btz493-B35","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1086\/377453","article-title":"Enhanced prediction of lopinavir resistance from genotype by use of artificial neural networks","volume":"188","author":"Wang","year":"2003","journal-title":"J. Infect. Dis"},{"key":"2023013109502991200_btz493-B36","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1016\/j.gene.2004.10.021","article-title":"The spectrum of genomic signatures: from dinucleotides to chaos game representation","volume":"346","author":"Wang","year":"2005","journal-title":"Gene"},{"key":"2023013109502991200_btz493-B37","doi-asserted-by":"crossref","first-page":"618","DOI":"10.1016\/j.jtbi.2008.12.027","article-title":"Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation","volume":"257","author":"Yang","year":"2009","journal-title":"J. Theor. Biol"},{"key":"2023013109502991200_btz493-B38","first-page":"342","volume-title":"Proceedings of the 2013 SIAM International Conference on Data Mining","author":"Yu","year":"2013"},{"key":"2023013109502991200_btz493-B39","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.jtbi.2003.09.009","article-title":"Chaos game representation of protein sequences based on the detailed hp model and their multifractal and correlation analyses","volume":"226","author":"Yu","year":"2004","journal-title":"J. Theor. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz493\/28914116\/btz493.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/1\/272\/48981566\/bioinformatics_36_1_272.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/1\/272\/48981566\/bioinformatics_36_1_272.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T13:32:27Z","timestamp":1675171947000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/1\/272\/5521624"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2019,6,21]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz493","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/575324","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,1,1]]},"published":{"date-parts":[[2019,6,21]]}}}