{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T15:53:48Z","timestamp":1762271628042},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1\/3 to 1\/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/mleg.cse.sc.edu\/LRensemble\/cgi-bin\/predict.cgi\" ext-link-type=\"uri\">http:\/\/mleg.cse.sc.edu\/LRensemble\/cgi-bin\/predict.cgi<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-13-157","type":"journal-article","created":{"date-parts":[[2012,7,3]],"date-time":"2012-07-03T18:13:47Z","timestamp":1341339227000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["Minimalist ensemble algorithms for genome-wide protein localization prediction"],"prefix":"10.1186","volume":"13","author":[{"given":"Jhih-Rong","family":"Lin","sequence":"first","affiliation":[]},{"given":"Ananda Mohan","family":"Mondal","sequence":"additional","affiliation":[]},{"given":"Rong","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Jianjun","family":"Hu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,7,3]]},"reference":[{"issue":"5","key":"5285_CR1","first-page":"604","volume":"16","author":"J Assfalg","year":"2010","unstructured":"Assfalg J, Gong J, Kriegel HP, Pryakhin A, Wei TD, Zimek A: Investigating a Correlation between Subcellular Localization and Fold of Proteins. J Univers Comput Sci. 2010, 16 (5): 604-621.","journal-title":"J Univers Comput Sci"},{"issue":"22","key":"5285_CR2","doi-asserted-by":"publisher","first-page":"3970","DOI":"10.1002\/pmic.201000274","volume":"10","author":"K Imai","year":"2010","unstructured":"Imai K, Nakai K: Prediction of subcellular locations of proteins: where to proceed?. Proteomics. 2010, 10 (22): 3970-3983. 10.1002\/pmic.201000274.","journal-title":"Proteomics"},{"issue":"Suppl 5","key":"5285_CR3","doi-asserted-by":"publisher","first-page":"S3","DOI":"10.1186\/1471-2105-7-S5-S3","volume":"7","author":"J Sprenger","year":"2006","unstructured":"Sprenger J, Fink JL, Teasdale RD: Evaluation and comparison of mammalian subcellular localization prediction methods. BMC Bioinformatics. 2006, 7 (Suppl 5): S3-10.1186\/1471-2105-7-S5-S3.","journal-title":"BMC Bioinformatics"},{"issue":"15","key":"5285_CR4","doi-asserted-by":"publisher","first-page":"e96","DOI":"10.1093\/nar\/gkm562","volume":"35","author":"J Liu","year":"2007","unstructured":"Liu J, Kang S, Tang C, Ellis LB, Li T: Meta-prediction of protein subcellular localization with reduced voting. Nucleic Acids Res. 2007, 35 (15): e96-10.1093\/nar\/gkm562.","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"5285_CR5","doi-asserted-by":"publisher","first-page":"975","DOI":"10.1007\/s00726-010-0724-y","volume":"40","author":"K Laurila","year":"2010","unstructured":"Laurila K, Vihinen M: PROlocalizer: integrated web service for protein subcellular localization prediction. Amino Acids. 2010, 40 (3): 975-980.","journal-title":"Amino Acids"},{"issue":"7","key":"5285_CR6","doi-asserted-by":"publisher","first-page":"3367","DOI":"10.1021\/pr900018z","volume":"8","author":"S Park","year":"2009","unstructured":"Park S, Yang JS, Jang SK, Kim S: Construction of functional interaction networks through consensus localization predictions of the human proteome. J Proteome Res. 2009, 8 (7): 3367-3376. 10.1021\/pr900018z.","journal-title":"J Proteome Res"},{"issue":"2","key":"5285_CR7","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1142\/S0219720009004072","volume":"7","author":"J Assfalg","year":"2009","unstructured":"Assfalg J, Gong J, Kriegel HP, Pryakhin A, Wei T, Zimek A: Supervised ensembles of prediction methods for subcellular localization. J Bioinform Comput Biol. 2009, 7 (2): 269-285. 10.1142\/S0219720009004072.","journal-title":"J Bioinform Comput Biol"},{"key":"5285_CR8","doi-asserted-by":"publisher","first-page":"420","DOI":"10.1186\/1471-2105-8-420","volume":"8","author":"YQ Shen","year":"2007","unstructured":"Shen YQ, Burger G: 'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools. BMC Bioinformatics. 2007, 8: 420-10.1186\/1471-2105-8-420.","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"5285_CR9","doi-asserted-by":"publisher","first-page":"444","DOI":"10.1016\/j.mito.2010.12.016","volume":"11","author":"KT Lythgow","year":"2011","unstructured":"Lythgow KT, Hudson G, Andras P, Chinnery PF: A critical analysis of the combined usage of protein localization prediction methods: Increasing the number of independent data sets can reduce the accuracy of predicted mitochondrial localization. Mitochondrion. 2011, 11 (3): 444-449. 10.1016\/j.mito.2010.12.016.","journal-title":"Mitochondrion"},{"issue":"9","key":"5285_CR10","doi-asserted-by":"publisher","first-page":"1232","DOI":"10.1093\/bioinformatics\/btq115","volume":"26","author":"S Briesemeister","year":"2010","unstructured":"Briesemeister S, Rahnenfuhrer J, Kohlbacher O: Going from where to why\u2013interpretable prediction of protein subcellular localization. Bioinformatics. 2010, 26 (9): 1232-1238. 10.1093\/bioinformatics\/btq115.","journal-title":"Bioinformatics"},{"key":"5285_CR11","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1186\/1471-2105-10-274","volume":"10","author":"T Blum","year":"2009","unstructured":"Blum T, Briesemeister S, Kohlbacher O: MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics. 2009, 10: 274-10.1186\/1471-2105-10-274.","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 15","key":"5285_CR12","doi-asserted-by":"publisher","first-page":"S8","DOI":"10.1186\/1471-2105-10-S15-S8","volume":"10","author":"HN Lin","year":"2009","unstructured":"Lin HN, Chen CT, Sung TY, Ho SY, Hsu WL: Protein subcellular localization prediction of eukaryotes using a knowledge-based approach. BMC Bioinformatics. 2009, 10 (Suppl 15): S8-10.1186\/1471-2105-10-S15-S8.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"5285_CR13","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1007\/s11030-008-9073-0","volume":"12","author":"B Niu","year":"2008","unstructured":"Niu B, Jin YH, Feng KY, Lu WC, Cai YD, Li GZ: Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Divers. 2008, 12 (1): 41-45. 10.1007\/s11030-008-9073-0.","journal-title":"Mol Divers"},{"key":"5285_CR14","doi-asserted-by":"publisher","first-page":"W585","DOI":"10.1093\/nar\/gkm259","volume":"35","author":"P Horton","year":"2007","unstructured":"Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K: WoLF PSORT: protein localization predictor. Nucleic Acids Res. 2007, 35: W585-W587. 10.1093\/nar\/gkm259.","journal-title":"Nucleic Acids Res"},{"issue":"14","key":"5285_CR15","doi-asserted-by":"publisher","first-page":"e408","DOI":"10.1093\/bioinformatics\/btl222","volume":"22","author":"A Pierleoni","year":"2006","unstructured":"Pierleoni A, Martelli PL, Fariselli P, Casadio R: BaCelLo: a balanced subcellular localization predictor. Bioinformatics. 2006, 22 (14): e408-416. 10.1093\/bioinformatics\/btl222.","journal-title":"Bioinformatics"},{"issue":"3","key":"5285_CR16","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1002\/prot.21018","volume":"64","author":"CS Yu","year":"2006","unstructured":"Yu CS, Chen YC, Lu CH, Hwang JK: Prediction of protein subcellular localization. Proteins. 2006, 64 (3): 643-651. 10.1002\/prot.21018.","journal-title":"Proteins"},{"issue":"8","key":"5285_CR17","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1093\/bioinformatics\/17.8.721","volume":"17","author":"S Hua","year":"2001","unstructured":"Hua S, Sun Z: Support vector machine approach for protein subcellular localization prediction. Bioinformatics. 2001, 17 (8): 721-728. 10.1093\/bioinformatics\/17.8.721.","journal-title":"Bioinformatics"},{"key":"5285_CR18","first-page":"142","volume-title":"BIBM","author":"MM Ananda","year":"2010","unstructured":"Ananda MM, Jianjun H: NetLoc: Network based protein localization prediction using protein-protein interaction and co-expression networks. BIBM. 2010, 142-148."},{"issue":"6","key":"5285_CR19","doi-asserted-by":"publisher","first-page":"523","DOI":"10.1002\/yea.706","volume":"18","author":"H Hishigaki","year":"2001","unstructured":"Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T: Assessment of prediction accuracy of protein function from protein\u2013protein interaction data. Yeast. 2001, 18 (6): 523-531. 10.1002\/yea.706.","journal-title":"Yeast"},{"issue":"20","key":"5285_CR20","doi-asserted-by":"publisher","first-page":"e136","DOI":"10.1093\/nar\/gkn619","volume":"36","author":"K Lee","year":"2008","unstructured":"Lee K, Chuang HY, Beyer A, Sung MK, Huh WK, Lee B, Ideker T: Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species. Nucleic Acids Res. 2008, 36 (20): e136-10.1093\/nar\/gkn619.","journal-title":"Nucleic Acids Res"},{"key":"5285_CR21","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1186\/1752-0509-3-28","volume":"3","author":"CJ Shin","year":"2009","unstructured":"Shin CJ, Wong S, Davis MJ, Ragan MA: Protein-protein interaction as a predictor of subcellular location. BMC Syst Biol. 2009, 3: 28-10.1186\/1752-0509-3-28.","journal-title":"BMC Syst Biol"},{"issue":"Database issue","key":"5285_CR22","doi-asserted-by":"publisher","first-page":"D535","DOI":"10.1093\/nar\/gkj109","volume":"34","author":"C Stark","year":"2006","unstructured":"Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 34 (Database issue): D535-539.","journal-title":"Nucleic Acids Res"},{"key":"5285_CR23","first-page":"871","volume-title":"Proc of KDD","author":"Z Lu XW","year":"2010","unstructured":"Lu XW Z, Zhu X, Bongard J: Ensemble pruning via individual contribution ordering. Proc of KDD. 2010, 871-880."},{"key":"5285_CR24","volume-title":"Correlation-based feature subset selection for machine learning.Dissertation","author":"MA Hall","year":"1999","unstructured":"Hall MA: Correlation-based feature subset selection for machine learning.Dissertation. 1999, University of Waikato, Hamilton, New Zealand"},{"issue":"6959","key":"5285_CR25","doi-asserted-by":"publisher","first-page":"686","DOI":"10.1038\/nature02026","volume":"425","author":"WK Huh","year":"2003","unstructured":"Huh WK, Falvo JV, Gerke LC, Carroll AS, Howson RW, Weissman JS, O'Shea EK: Global analysis of protein localization in budding yeast. Nature. 2003, 425 (6959): 686-691. 10.1038\/nature02026.","journal-title":"Nature"},{"issue":"Database issue","key":"5285_CR26","first-page":"D230","volume":"36","author":"J Sprenger","year":"2008","unstructured":"Sprenger J, Lynn Fink J, Karunaratne S, Hanson K, Hamilton NA, Teasdale RD: LOCATE: a mammalian protein subcellular localization database. Nucleic Acids Res. 2008, 36 (Database issue): D230-233.","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-157.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T19:31:39Z","timestamp":1630524699000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-157"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,7,3]]},"references-count":26,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["5285"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-157","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,7,3]]},"assertion":[{"value":"26 December 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 July 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 July 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"157"}}