{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:40:31Z","timestamp":1759333231871},"reference-count":21,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Predicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology. Recently, there has been considerable interest in using discriminative learning algorithms, in particular support vector machines (SVMs), for classification of proteins. However, because sufficiently many positive examples are required to train such classifiers, all SVM-based methods are hampered by limited coverage.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this study, we develop a hybrid machine learning approach for classifying proteins, and we apply the method to the problem of assigning proteins to structural categories based on their sequences or their 3D structures. The method combines a full-coverage but lower accuracy nearest neighbor method with higher accuracy but reduced coverage multiclass SVMs to produce a full coverage classifier with overall improved accuracy. The hybrid approach is based on the simple idea of \"punting\" from one method to another using a learned threshold.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>In cross-validated experiments on the SCOP hierarchy, the hybrid methods consistently outperform the individual component methods at all levels of coverage.<\/jats:p><jats:p>Code and data sets are available at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/noble.gs.washington.edu\/proj\/sabretooth\" ext-link-type=\"uri\">http:\/\/noble.gs.washington.edu\/proj\/sabretooth<\/jats:ext-link><\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-9-389","type":"journal-article","created":{"date-parts":[[2008,9,26]],"date-time":"2008-09-26T18:16:55Z","timestamp":1222453015000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Combining classifiers for improved classification of proteins from sequence or structure"],"prefix":"10.1186","volume":"9","author":[{"given":"Iain","family":"Melvin","sequence":"first","affiliation":[]},{"given":"Jason","family":"Weston","sequence":"additional","affiliation":[]},{"given":"Christina S","family":"Leslie","sequence":"additional","affiliation":[]},{"given":"William S","family":"Noble","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,9,22]]},"reference":[{"key":"2374_CR1","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1093\/protein\/11.9.739","volume":"11","author":"IN Shindyalov","year":"1998","unstructured":"Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 1998, 11: 739\u2013747. 10.1093\/protein\/11.9.739","journal-title":"Protein Engineering"},{"key":"2374_CR2","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1006\/jmbi.1993.1489","volume":"233","author":"L Holm","year":"1993","unstructured":"Holm L, Sander C: Protein Structure Comparison by Alignment of Distance Matrices. Journal of Molecular Biology 1993, 233: 123\u2013138. 10.1006\/jmbi.1993.1489","journal-title":"Journal of Molecular Biology"},{"key":"2374_CR3","doi-asserted-by":"publisher","first-page":"2606","DOI":"10.1110\/ps.0215902","volume":"11","author":"AR Ortiz","year":"2002","unstructured":"Ortiz AR, Strauss CEM, Olmea O: MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison. Protein Science 2002, 11: 2606\u20132621. 10.1110\/ps.0215902","journal-title":"Protein Science"},{"key":"2374_CR4","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","volume":"147","author":"T Smith","year":"1981","unstructured":"Smith T, Waterman M: Identification of common molecular subsequences. Journal of Molecular Biology 1981, 147: 195\u2013197. 10.1016\/0022-2836(81)90087-5","journal-title":"Journal of Molecular Biology"},{"key":"2374_CR5","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: A basic local alignment search tool. Journal of Molecular Biology 1990, 215: 403\u2013410.","journal-title":"Journal of Molecular Biology"},{"issue":"10","key":"2374_CR6","doi-asserted-by":"publisher","first-page":"846","DOI":"10.1093\/bioinformatics\/14.10.846","volume":"14","author":"K Karplus","year":"1998","unstructured":"Karplus K, Barrett C, Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998, 14(10):846\u201356. 10.1093\/bioinformatics\/14.10.846","journal-title":"Bioinformatics"},{"key":"2374_CR7","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Research"},{"key":"2374_CR8","first-page":"149","volume-title":"Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology","author":"T Jaakkola","year":"1999","unstructured":"Jaakkola T, Diekhans M, Haussler D: Using the Fisher kernel method to detect remote protein homologies. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA: AAAI Press; 1999:149\u2013158."},{"key":"2374_CR9","first-page":"144","volume-title":"5th Annual ACM Workshop on COLT","author":"BE Boser","year":"1992","unstructured":"Boser BE, Guyon IM, Vapnik VN: A Training Algorithm for Optimal Margin Classifiers. In 5th Annual ACM Workshop on COLT. Edited by: Haussler D. Pittsburgh, PA: ACM Press; 1992:144\u2013152."},{"key":"2374_CR10","first-page":"536","volume":"247","author":"AG Murzin","year":"1995","unstructured":"Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 1995, 247: 536\u2013540.","journal-title":"Journal of Molecular Biology"},{"key":"2374_CR11","doi-asserted-by":"crossref","first-page":"71","DOI":"10.7551\/mitpress\/4057.003.0005","volume-title":"Kernel methods in computational biology","author":"WS Noble","year":"2004","unstructured":"Noble WS: Support vector machine applications in computational biology. In Kernel methods in computational biology. Edited by: Schoelkopf B, Tsuda K, Vert JP. Cambridge, MA: MIT Press; 2004:71\u201392."},{"issue":"Suppl 4","key":"2374_CR12","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1471-2105-8-S4-S2","volume":"8","author":"I Melvin","year":"2007","unstructured":"Melvin I, Ie E, Kuang R, Weston J, Noble WS, Leslie C: SVM-fold: a tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics 2007, 8(Suppl 4):S2. 10.1186\/1471-2105-8-S4-S2","journal-title":"BMC Bioinformatics"},{"key":"2374_CR13","first-page":"1557","volume":"8","author":"I Melvin","year":"2007","unstructured":"Melvin I, Ie E, Weston J, Noble WS, Leslie C: Multi-class protein classification using adaptive codes. Journal of Machine Learning Research 2007, 8: 1557\u20131581.","journal-title":"Journal of Machine Learning Research"},{"issue":"7","key":"2374_CR14","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1186\/1471-2105-7-455","volume":"16","author":"H Rangwala","year":"2006","unstructured":"Rangwala H, Karypis G: Building multiclass classifiers for remote homology detection and fold recognition. BMC Bioinformatics 2006, 16(7):455. 10.1186\/1471-2105-7-455","journal-title":"BMC Bioinformatics"},{"issue":"24","key":"2374_CR15","doi-asserted-by":"publisher","first-page":"3320","DOI":"10.1093\/bioinformatics\/btm527","volume":"23","author":"MT Shamim","year":"2007","unstructured":"Shamim MT, Anwaruddin M, Nagarajaram HA: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinformatics 2007, 23(24):3320\u20133327. 10.1093\/bioinformatics\/btm527","journal-title":"Bioinformatics"},{"issue":"3","key":"2374_CR16","doi-asserted-by":"publisher","first-page":"527","DOI":"10.1142\/S021972000500120X","volume":"3","author":"R Kuang","year":"2005","unstructured":"Kuang R, Ie E, Wang K, Wang K, Siddiqi M, Freund Y, Leslie C: Profile-based string kernels for remote homology detection and motif extraction. Journal of Bioinformatics and Computational Biology 2005, 3(3):527\u2013550. 10.1142\/S021972000500120X","journal-title":"Journal of Bioinformatics and Computational Biology"},{"key":"2374_CR17","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1016\/j.jmb.2004.10.024","volume":"345","author":"PD Dobson","year":"2005","unstructured":"Dobson PD, Doig AJ: Predicting Enzyme Class From Protein Structure Without Alignments. Journal of Molecular Biology 2005, 345: 187\u2013199. 10.1016\/j.jmb.2004.10.024","journal-title":"Journal of Molecular Biology"},{"issue":"Suppl 1","key":"2374_CR18","doi-asserted-by":"publisher","first-page":"i47","DOI":"10.1093\/bioinformatics\/bti1007","volume":"21","author":"K Borgwardt","year":"2005","unstructured":"Borgwardt K, Ong CS, Schoenauer S, Vishwanathan S, Smola A, Kriegel HP: Protein Function Prediction via Graph Kernels. Bioinformatics 2005, 21(Suppl 1):i47-i56. 10.1093\/bioinformatics\/bti1007","journal-title":"Bioinformatics"},{"issue":"9","key":"2374_CR19","doi-asserted-by":"publisher","first-page":"1090","DOI":"10.1093\/bioinformatics\/btl642","volume":"23","author":"J Qiu","year":"2007","unstructured":"Qiu J, Hue M, Ben-Hur A, Vert JP, Noble WS: A structural alignment kernel for protein structures. Bioinformatics 2007, 23(9):1090\u20131098. 10.1093\/bioinformatics\/btl642","journal-title":"Bioinformatics"},{"issue":"2","key":"2374_CR20","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","volume":"5","author":"D Wolpert","year":"1992","unstructured":"Wolpert D: Stacked generalization. Neural Networks 1992, 5(2):241\u2013259. 10.1016\/S0893-6080(05)80023-1","journal-title":"Neural Networks"},{"issue":"10","key":"2374_CR21","doi-asserted-by":"publisher","first-page":"1203","DOI":"10.1093\/bioinformatics\/btm089","volume":"23","author":"E Jan","year":"2007","unstructured":"Jan E, Gewehr VH, Zimmer R: AutoSCOP: Automated Prediction of SCOP Classifications using Unique Pattern-Class Mappings. Bioinformatics 2007, 23(10):1203\u20131210. 10.1093\/bioinformatics\/btm089","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-389.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,1]],"date-time":"2024-03-01T02:49:52Z","timestamp":1709261392000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-389"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,9,22]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2374"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-389","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,9,22]]},"assertion":[{"value":"1 February 2008","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 September 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 September 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"389"}}