{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T22:28:12Z","timestamp":1761863292828},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,12,28]],"date-time":"2016-12-28T00:00:00Z","timestamp":1482883200000},"content-version":"vor","delay-in-days":1031,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Various approaches based on features extracted from protein sequences and often machine learning methods have been used in the prediction of protein folds. Finding an efficient technique for integrating these different protein features has received increasing attention. In particular, kernel methods are an interesting class of techniques for integrating heterogeneous data. Various methods have been proposed to fuse multiple kernels. Most techniques for multiple kernel learning focus on learning a convex linear combination of base kernels. In addition to the limitation of linear combinations, working with such approaches could cause a loss of potentially useful information.<\/jats:p>\n               <jats:p>Results: We design several techniques to combine kernel matrices by taking more involved, geometry inspired means of these matrices instead of convex linear combinations. We consider various sequence-based protein features including information extracted directly from position-specific scoring matrices and local sequence alignment. We evaluate our methods for classification on the SCOP PDB-40D benchmark dataset for protein fold recognition. The best overall accuracy on the protein fold recognition test set obtained by our methods is \u223c86.7%. This is an improvement over the results of the best existing approach. Moreover, our computational model has been developed by incorporating the functional domain composition of proteins through a hybridization model. It is observed that by using our proposed hybridization model, the protein fold recognition accuracy is further improved to 89.30%. Furthermore, we investigate the performance of our approach on the protein remote homology detection problem by fusing multiple string kernels.<\/jats:p>\n               <jats:p>Availability and implementation: The MATLAB code used for our proposed geometric kernel fusion frameworks are publicly available at http:\/\/people.cs.kuleuven.be\/\u223craf.vandebril\/homepage\/software\/geomean.php?menu=5\/<\/jats:p>\n               <jats:p>Contact: \u00a0pooyapaydar@gmail.com or yves.moreau@esat.kuleuven.be<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu118","type":"journal-article","created":{"date-parts":[[2014,3,4]],"date-time":"2014-03-04T05:09:20Z","timestamp":1393909760000},"page":"1850-1857","source":"Crossref","is-referenced-by-count":23,"title":["Protein fold recognition using geometric kernel data fusion"],"prefix":"10.1093","volume":"30","author":[{"given":"Pooya","family":"Zakeri","sequence":"first","affiliation":[{"name":"1 Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, 2 iMinds Medical IT and 3 Department of Computer Science, KU Leuven, 3001 Leuven, Belgium"},{"name":"1 Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, 2 iMinds Medical IT and 3 Department of Computer Science, KU Leuven, 3001 Leuven, Belgium"}]},{"given":"Ben","family":"Jeuris","sequence":"additional","affiliation":[{"name":"1 Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, 2 iMinds Medical IT and 3 Department of Computer Science, KU Leuven, 3001 Leuven, Belgium"}]},{"given":"Raf","family":"Vandebril","sequence":"additional","affiliation":[{"name":"1 Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, 2 iMinds Medical IT and 3 Department of Computer Science, KU Leuven, 3001 Leuven, Belgium"}]},{"given":"Yves","family":"Moreau","sequence":"additional","affiliation":[{"name":"1 Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, 2 iMinds Medical IT and 3 Department of Computer Science, KU Leuven, 3001 Leuven, Belgium"},{"name":"1 Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, 2 iMinds Medical IT and 3 Department of Computer Science, KU Leuven, 3001 Leuven, Belgium"},{"name":"1 Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, 2 iMinds Medical IT and 3 Department of Computer Science, KU Leuven, 3001 Leuven, Belgium"}]}],"member":"286","published-online":{"date-parts":[[2014,3,3]]},"reference":[{"key":"2023012711155629200_btu118-B1","doi-asserted-by":"crossref","DOI":"10.1515\/9781400830244","volume-title":"Optimization Algorithms on Matrix Manifolds","author":"Absil","year":"2008"},{"key":"2023012711155629200_btu118-B2","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1093\/nar\/29.1.37","article-title":"The interpro database, an integrated documentation resource for protein families, domains and functional sites","volume":"29","author":"Apweiler","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012711155629200_btu118-B3","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1137\/050637996","article-title":"Geometric means in a novel vector space structure on symmetric positive-definite matrices","volume":"29","author":"Arsigny","year":"2007","journal-title":"SIAM J. Matrix Anal. Appl."},{"key":"2023012711155629200_btu118-B4","doi-asserted-by":"crossref","DOI":"10.1145\/1015330.1015424","article-title":"Multiple kernel learning, conic duality, and the SMO algorithm","volume-title":"Proceedings of the 21st International Conference on Machine Learning (ICML)","author":"Bach","year":"2004"},{"key":"2023012711155629200_btu118-B5","volume-title":"Positive Definite Matrices. Princeton Series in Applied Mathematics","author":"Bhatia","year":"2007"},{"key":"2023012711155629200_btu118-B6","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1002\/jcb.10030","article-title":"Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect","volume":"84","author":"Cai","year":"2002","journal-title":"J. Cell. Biochem."},{"key":"2023012711155629200_btu118-B7","doi-asserted-by":"crossref","first-page":"3257","DOI":"10.1016\/S0006-3495(03)70050-2","article-title":"Support vector machines for predicting membrane protein types by using functional domain composition","volume":"84","author":"Cai","year":"2003","journal-title":"Biophys. J."},{"key":"2023012711155629200_btu118-B8","doi-asserted-by":"crossref","DOI":"10.1145\/1961189.1961199","article-title":"Libsvm: a library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"2023012711155629200_btu118-B9","doi-asserted-by":"crossref","first-page":"2843","DOI":"10.1093\/bioinformatics\/btm475","article-title":"Pfres: protein fold classification by using evolutionary information and predicted secondary structure","volume":"23","author":"Chen","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012711155629200_btu118-B10","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1002\/prot.1035","article-title":"Prediction of protein cellular attributes using pseudo-amino acid composition","volume":"43","author":"Chou","year":"2001","journal-title":"Proteins"},{"key":"2023012711155629200_btu118-B11","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1016\/j.bbrc.2004.07.059","article-title":"Predicting protein structural class by functional domain composition","volume":"321","author":"Chou","year":"2004","journal-title":"Biochem. Biophys. Res. Commun."},{"key":"2023012711155629200_btu118-B12","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1186\/gm39","article-title":"A kernel-based integration of genome-wide data for clinical decision support","volume":"1","author":"Daemen","year":"2009","journal-title":"Genome Med."},{"key":"2023012711155629200_btu118-B13","doi-asserted-by":"crossref","first-page":"1264","DOI":"10.1093\/bioinformatics\/btn112","article-title":"Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection","volume":"24","author":"Damoulas","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012711155629200_btu118-B14","doi-asserted-by":"crossref","first-page":"i125","DOI":"10.1093\/bioinformatics\/btm187","article-title":"Kernel-based data fusion for gene prioritization","volume":"23","author":"De Bie","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012711155629200_btu118-B15","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1093\/bioinformatics\/17.4.349","article-title":"Multi-class protein fold recognition using support vector machines and neural networks","volume":"17","author":"Ding","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012711155629200_btu118-B16","doi-asserted-by":"crossref","first-page":"8700","DOI":"10.1073\/pnas.92.19.8700","article-title":"Prediction of protein folding class using global description of amino acid sequence","volume":"92","author":"Dubchak","year":"1995","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012711155629200_btu118-B17","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1002\/(SICI)1097-0134(19990601)35:4<401::AID-PROT3>3.0.CO;2-K","article-title":"Recognition of a protein fold in the context of the scop classification","volume":"35","author":"Dubchak","year":"1999","journal-title":"Proteins"},{"key":"2023012711155629200_btu118-B18","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1090\/S0025-5718-1984-0725993-3","article-title":"The arithmetic-harmonic mean","volume":"42","author":"Foster","year":"1984","journal-title":"Math. Comput."},{"key":"2023012711155629200_btu118-B19","first-page":"2211","article-title":"Multiple kernel learning algorithms","volume":"12","author":"G\u00f6nen","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"2023012711155629200_btu118-B20","doi-asserted-by":"crossref","first-page":"D306","DOI":"10.1093\/nar\/gkr948","article-title":"Interpro in 2011: new developments in the family and domain prediction database","volume":"40","author":"Hunter","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012711155629200_btu118-B21","first-page":"379","article-title":"A survey and comparison of contemporary algorithms for computing the matrix geometric mean","volume":"39","author":"Jeuris","year":"2012","journal-title":"Electron. Trans. Numer. Anal."},{"key":"2023012711155629200_btu118-B22","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1110\/ps.0241703","article-title":"A neural-network based method for prediction of gamma-turns in proteins from multiple sequence alignment","volume":"12","author":"Kaur","year":"2003","journal-title":"Protein Sci."},{"key":"2023012711155629200_btu118-B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.compbiolchem.2010.12.001","article-title":"A protein fold classifier formed by fusing different modes of pseudo amino acid composition via {PSSM}","volume":"35","author":"Kavousi","year":"2011","journal-title":"Comput. Biol. Chem."},{"key":"2023012711155629200_btu118-B24","first-page":"27","article-title":"Learning the kernel matrix with semi-definite programming","volume":"5","author":"Lanckriet","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"2023012711155629200_btu118-B25","doi-asserted-by":"crossref","first-page":"2626","DOI":"10.1093\/bioinformatics\/bth294","article-title":"A statistical framework for genomic data fusion","volume":"20","author":"Lanckriet","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012711155629200_btu118-B26","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1089\/106652703322756113","article-title":"Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships","volume":"10","author":"Liao","year":"2003","journal-title":"J. Comput. Biol."},{"key":"2023012711155629200_btu118-B27","doi-asserted-by":"crossref","first-page":"e56499","DOI":"10.1371\/journal.pone.0056499","article-title":"Hierarchical classification of protein folds using a novel ensemble classifier","volume":"8","author":"Lin","year":"2013","journal-title":"PLoS One"},{"key":"2023012711155629200_btu118-B28","doi-asserted-by":"crossref","first-page":"D237","DOI":"10.1093\/nar\/gkl951","article-title":"CDD: a conserved domain database for interactive domain family analysis","volume":"35","author":"Marchler-Bauer","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012711155629200_btu118-B29","doi-asserted-by":"crossref","first-page":"D348","DOI":"10.1093\/nar\/gks1243","article-title":"Cdd: conserved domains and protein three-dimensional structure","volume":"41","author":"Marchler-Bauer","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023012711155629200_btu118-B30","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"Scop: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012711155629200_btu118-B31","doi-asserted-by":"crossref","first-page":"2434","DOI":"10.1016\/j.neucom.2006.01.026","article-title":"A novel ensemble of classifiers for protein fold recognition","volume":"69","author":"Nanni","year":"2006","journal-title":"Neurocomputing"},{"key":"2023012711155629200_btu118-B32","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/1472-6807-9-51","article-title":"A generic method for assignment of reliability scores applied to solvent accessibility predictions","volume":"9","author":"Petersen","year":"2009","journal-title":"BMC Struct. Biol."},{"key":"2023012711155629200_btu118-B33","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1109\/TCBB.2008.139","article-title":"A framework for multiple kernel support vector regression and its applications to sirna efficacy prediction","volume":"6","author":"Qiu","year":"2009","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"2023012711155629200_btu118-B34","author":"Rakotomamonjy","year":"2008"},{"key":"2023012711155629200_btu118-B35","doi-asserted-by":"crossref","first-page":"4239","DOI":"10.1093\/bioinformatics\/bti687","article-title":"Profile-based direct kernels for remote homology detection and fold recognition","volume":"21","author":"Rangwala","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012711155629200_btu118-B36","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1186\/1471-2105-8-337","article-title":"Support vector machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs","volume":"8","author":"Rashid","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012711155629200_btu118-B37","doi-asserted-by":"crossref","first-page":"2994","DOI":"10.1093\/nar\/29.14.2994","article-title":"Improving the accuracy of psi-blast protein database searches with composition-based statistics and other refinements","volume":"29","author":"Schaffer","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012711155629200_btu118-B38","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.jtbi.2012.12.008","article-title":"A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition","volume":"320","author":"Sharma","year":"2013","journal-title":"J. Theor. Biol."},{"key":"2023012711155629200_btu118-B39","doi-asserted-by":"crossref","first-page":"1717","DOI":"10.1093\/bioinformatics\/btl170","article-title":"Ensemble classifier for protein fold pattern recognition","volume":"22","author":"Shen","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012711155629200_btu118-B40","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1093\/protein\/gzm057","article-title":"Nuc-ploc: a new web-server for predicting protein subnuclear localization by fusing pseaa composition and psepssm","volume":"20","author":"Shen","year":"2007","journal-title":"Protein Eng. Des. Sel."},{"key":"2023012711155629200_btu118-B41","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1016\/j.jtbi.2008.10.007","article-title":"Predicting protein fold pattern with functional domain and sequential evolution information","volume":"256","author":"Shen","year":"2009","journal-title":"J. Theor. Biol."},{"key":"2023012711155629200_btu118-B42","first-page":"1531","article-title":"Large scale multiple kernel learning","volume":"7","author":"Sonnenburg","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"2023012711155629200_btu118-B43","article-title":"Multiple kernel learning and the SMO algorithm","volume-title":"Advances in Neural Information Processing Systems","author":"Vishwanathan","year":"2010"},{"key":"2023012711155629200_btu118-B44","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1037\/0033-2909.83.2.213","article-title":"Estimating coefficients in linear models: it don\u2019t make no nevermind","volume":"83","author":"Wainer","year":"1976","journal-title":"Psychol. Bull."},{"key":"2023012711155629200_btu118-B45","doi-asserted-by":"crossref","first-page":"W105","DOI":"10.1093\/nar\/gki359","article-title":"Locsvmpsi: a web server for subcellular localization of eukaryotic proteins using svm and profile of psi-blast","volume":"33","author":"Xie","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012711155629200_btu118-B46","doi-asserted-by":"crossref","first-page":"2053","DOI":"10.1002\/prot.23025","article-title":"Improving taxonomy-based protein fold recognition by using global and local features","volume":"79","author":"Yang","year":"2011","journal-title":"Proteins"},{"key":"2023012711155629200_btu118-B47","doi-asserted-by":"crossref","first-page":"12348","DOI":"10.1016\/j.eswa.2011.04.014","article-title":"Margin-based ensemble classifier for protein fold recognition","volume":"38","author":"Yang","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"2023012711155629200_btu118-B48","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1186\/1471-2105-10-267","article-title":"Enhanced protein fold recognition through a novel data integration approach","volume":"10","author":"Ying","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012711155629200_btu118-B49","volume-title":"Kernel-based Data Fusion for Machine Learning - Methods and Applications in Bioinformatics and Text Mining","author":"Yu","year":"2011"},{"key":"2023012711155629200_btu118-B50","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.jtbi.2010.10.026","article-title":"Prediction of protein submitochondria locations based on data fusion of various features of sequences","volume":"269","author":"Zakeri","year":"2011","journal-title":"J. Theor. Biol."},{"key":"2023012711155629200_btu118-B51","doi-asserted-by":"crossref","DOI":"10.1145\/1273496.1273646","article-title":"Multiclass multiple kernel learning","volume-title":"Proceedings of the 24th international conference on Machine learning","author":"Zien","year":"2007"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/13\/1850\/48925568\/bioinformatics_30_13_1850.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/13\/1850\/48925568\/bioinformatics_30_13_1850.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T11:53:24Z","timestamp":1674820404000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/13\/1850\/2422171"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,3,3]]},"references-count":51,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2014,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu118","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,7,1]]},"published":{"date-parts":[[2014,3,3]]}}}