{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:30:46Z","timestamp":1773246646782,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"22","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,11,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Drawing inferences from large, heterogeneous sets of biological data requires a theoretical framework that is capable of representing, e.g. DNA and protein sequences, protein structures, microarray expression data, various types of interaction networks, etc. Recently, a class of algorithms known as kernel methods has emerged as a powerful framework for combining diverse types of data. The support vector machine (SVM) algorithm is the most popular kernel method, due to its theoretical underpinnings and strong empirical performance on a wide variety of classification tasks. Furthermore, several recently described extensions allow the SVM to assign relative weights to various datasets, depending upon their utilities in performing a given classification task.<\/jats:p><jats:p>Results: In this work, we empirically investigate the performance of the SVM on the task of inferring gene functional annotations from a combination of protein sequence and structure data. Our results suggest that the SVM is quite robust to noise in the input datasets. Consequently, in the presence of only two types of data, an SVM trained from an unweighted combination of datasets performs as well or better than a more sophisticated algorithm that assigns weights to individual data types. Indeed, for this simple case, we can demonstrate empirically that no solution is significantly better than the naive, unweighted average of the two datasets. On the other hand, when multiple noisy datasets are included in the experiment, then the naive approach fares worse than the weighted approach. Our results suggest that for many applications, a naive unweighted sum of kernels may be sufficient.<\/jats:p><jats:p>Availability: \u00a0<\/jats:p><jats:p>Contact: \u00a0noble@gs.washington.edu<\/jats:p><jats:p>Supplementary information: Supplementary Data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl475","type":"journal-article","created":{"date-parts":[[2006,9,12]],"date-time":"2006-09-12T00:30:08Z","timestamp":1158021008000},"page":"2753-2760","source":"Crossref","is-referenced-by-count":82,"title":["Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure"],"prefix":"10.1093","volume":"22","author":[{"given":"Darrin P.","family":"Lewis","sequence":"first","affiliation":[{"name":"Department of Computer Science, Columbia University 1 \u00a0 1 \u00a0 \u00a0 New York, NY, 10027"}]},{"given":"Tony","family":"Jebara","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Columbia University 1 \u00a0 1 \u00a0 \u00a0 New York, NY, 10027"}]},{"given":"William Stafford","family":"Noble","sequence":"additional","affiliation":[{"name":"Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington 2 \u00a0 2 \u00a0 \u00a0 Seattle, WA, 98195, USA"}]}],"member":"286","published-online":{"date-parts":[[2006,9,11]]},"reference":[{"key":"2023012408405945600_b1","first-page":"6","article-title":"Multiple kernel learning, conic duality and the smo algorithm","author":"Bach","year":"2004"},{"key":"2023012408405945600_b2","article-title":"Computing regularization paths for learning multiple kernels","volume-title":"Advances in Neural Information Processing Systems","author":"Bach","year":"2004"},{"key":"2023012408405945600_b3","doi-asserted-by":"crossref","first-page":"i38","DOI":"10.1093\/bioinformatics\/bti1016","article-title":"Kernel methods for predicting protein\u2013protein interactions","volume":"21","author":"Ben-Hur","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012408405945600_b4","doi-asserted-by":"crossref","first-page":"i47","DOI":"10.1093\/bioinformatics\/bti1007","article-title":"Protein function prediction via graph kernels","volume":"210","author":"Borgwardt","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012408405945600_b5","first-page":"144","article-title":"A training algorithm for optimal margin classifiers","volume-title":"In proceedings of the 5th Annual ACM Workshop on COLT","author":"Boser","year":"1992"},{"key":"2023012408405945600_b6","volume-title":"An Introduction to Support Vector Machines","author":"Cristianini","year":"2000","edition":"1st edn"},{"key":"2023012408405945600_b7","first-page":"25","article-title":"Gene ontology: tool for the unification of biology","volume":"250","author":"Gene Ontology Consortium","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012408405945600_b8","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1006\/jmbi.1993.1489","article-title":"Protein structure comparison by alignment of distance matrices","volume":"233","author":"Holm","year":"1993","journal-title":"J. Mol. Biol."},{"key":"2023012408405945600_b9","first-page":"55","article-title":"Multi-task feature and kernel selection for SVMs","author":"Jebara","year":"2004"},{"key":"2023012408405945600_b10","first-page":"323","article-title":"Learning the kernel matrix with semi-definite programming","author":"Lanckriet","year":"2002"},{"key":"2023012408405945600_b11","doi-asserted-by":"crossref","first-page":"2626","DOI":"10.1093\/bioinformatics\/bth294","article-title":"A statistical framework for genomic data fusion","volume":"200","author":"Lanckriet","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012408405945600_b12","first-page":"300","article-title":"Kernel-based data fusion and its application to protein function prediction in yeast","author":"Lanckriet","year":"2004"},{"key":"2023012408405945600_b13","first-page":"27","article-title":"Learning the kernel matrix with semidefinite programming","volume":"5","author":"Gert","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"2023012408405945600_b14","first-page":"564","article-title":"The spectrum kernel: a string kernel for SVM protein classification","author":"Leslie","year":"2002"},{"key":"2023012408405945600_b15","first-page":"1441","article-title":"Mismatch string kernels for SVM protein classification","volume-title":"Advances in Neural Information Processing Systems","author":"Leslie","year":"2003"},{"key":"2023012408405945600_b16","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1093\/bioinformatics\/17.3.282","article-title":"Clustering of highly homologous sequences to reduce the size of large protein databases","volume":"170","author":"Li","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012408405945600_b17","first-page":"225","article-title":"Combining pairwise sequence similarity and support vector machines for remote protein homology detection","author":"Liao","year":"2002"},{"key":"2023012408405945600_b18","doi-asserted-by":"crossref","first-page":"71","DOI":"10.7551\/mitpress\/4057.003.0005","article-title":"Support vector machine applications in computational biology","volume-title":"Kernel methods in computational biology","author":"Noble","year":"2004"},{"key":"2023012408405945600_b19","first-page":"1043","article-title":"Learning the kernel with hyperkernels","volume":"6","author":"Ong","year":"2005","journal-title":"J. Mach. Learn. Res."},{"key":"2023012408405945600_b20","doi-asserted-by":"crossref","first-page":"2606","DOI":"10.1110\/ps.0215902","article-title":"MAMMOTH (Matching molecular models obtained from theory): an automated method for model comparison","volume":"11","author":"Ortiz","year":"2002","journal-title":"Protein Sci."},{"key":"2023012408405945600_b21","first-page":"242","article-title":"Gene functional classification from heterogeneous data","author":"Pavlidis","year":"2001"},{"key":"2023012408405945600_b22","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1089\/10665270252935539","article-title":"Learning gene functional classifications from multiple data types","volume":"90","author":"Pavlidis","year":"2002","journal-title":"J. Comput. Biol."},{"key":"2023012408405945600_b23","volume-title":"Advances in Kernel Methods: Support Vector Learning","author":"Sch\u00f6lkopf","year":"1999"},{"key":"2023012408405945600_b24","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1093\/protein\/11.9.739","article-title":"Protein structure alignment by incremental combinatorial extension (ce) of the optimal path","volume":"11","author":"Shindyalov","year":"1998","journal-title":"Protein Eng."},{"key":"2023012408405945600_b25","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023012408405945600_b26","article-title":"A general and efficient multiple kernel learning algorithm","volume-title":"Advances in Neural Information Processing Systems 18","author":"Sonnenburg","year":"2006"},{"key":"2023012408405945600_b27","first-page":"S1","article-title":"Learning interpretable SVMs for biological sequence classification","volume":"70","author":"Sonnenburg","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012408405945600_b28","article-title":"Large scale multiple kernel learning","author":"Sonnenburg","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"2023012408405945600_b29","first-page":"83","article-title":"Support vector classification with asymmetric kernel function","volume-title":"In Proceedings ESANN","author":"Tsuda","year":"1999"},{"key":"2023012408405945600_b30","volume-title":"Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control","author":"Vapnik","year":"1998"},{"key":"2023012408405945600_b31","article-title":"An automated combination of sequence motif kernels for predicting protein subcellular localization","volume-title":"Technical Report TR-146","author":"Zien","year":"2006"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/22\/2753\/48838684\/bioinformatics_22_22_2753.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/22\/2753\/48838684\/bioinformatics_22_22_2753.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T11:33:22Z","timestamp":1707219202000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/22\/2753\/197460"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,9,11]]},"references-count":31,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2006,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl475","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,11,15]]},"published":{"date-parts":[[2006,9,11]]}}}