{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T10:12:32Z","timestamp":1761559952843,"version":"3.32.0"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We have developed a system for predicting the subcellular localization of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines. The recall of the extracellular site and overall recall of our predictor reach 86.0% and 89.8%, respectively, in 5-fold cross-validation. To the best of our knowledge, these are the most accurate results for predicting subcellular localization in Gram-negative bacteria.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>Clustering 20 amino acids into a few groups by the proposed greedy algorithm provides a new way to extract features from protein sequences to cover more adjacent amino acids and hence reduce the dimensionality of the input vector of protein features. It was observed that a good amino acid grouping leads to an increase in prediction performance. Furthermore, a proper choice of a subset of complementary support vector machines constructed by different features of proteins maximizes the prediction accuracy.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-6-174","type":"journal-article","created":{"date-parts":[[2005,7,13]],"date-time":"2005-07-13T06:41:27Z","timestamp":1121236887000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":60,"title":["Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines"],"prefix":"10.1186","volume":"6","author":[{"given":"Jiren","family":"Wang","sequence":"first","affiliation":[]},{"given":"Wing-Kin","family":"Sung","sequence":"additional","affiliation":[]},{"given":"Arun","family":"Krishnan","sequence":"additional","affiliation":[]},{"given":"Kuo-Bin","family":"Li","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,7,13]]},"reference":[{"issue":"4","key":"499_CR1","doi-asserted-by":"publisher","first-page":"1005","DOI":"10.1006\/jmbi.2000.3903","volume":"300","author":"O Emanuelsson","year":"2000","unstructured":"Emanuelsson O, Nielsen H, Brunak S, von Heijne G: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 2000, 300(4):1005\u20131016. 10.1006\/jmbi.2000.3903","journal-title":"J Mol Biol"},{"issue":"8","key":"499_CR2","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1093\/bioinformatics\/17.8.721","volume":"17","author":"S Hua","year":"2001","unstructured":"Hua S, Sun Z: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001, 17(8):721\u2013728. 10.1093\/bioinformatics\/17.8.721","journal-title":"Bioinformatics"},{"key":"499_CR3","first-page":"147","volume":"5","author":"P Horton","year":"1997","unstructured":"Horton P, Nakai K: Better prediction of protein cellular localization sites with the k nearest neighbors classifier. Proc Int Conf Intell Syst Mol Biol 1997, 5: 147\u2013152.","journal-title":"Proc Int Conf Intell Syst Mol Biol"},{"issue":"1","key":"499_CR4","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1006\/jmbi.1994.1267","volume":"238","author":"H Nakashima","year":"1994","unstructured":"Nakashima H, Nishikawa K: Discrimination of Intracellular and Extracellular Proteins Using Amino Acid Composition and Residue-pair Frequencies. J Mol Biol 1994, 238(1):54\u201361. 10.1006\/jmbi.1994.1267","journal-title":"J Mol Biol"},{"key":"499_CR5","doi-asserted-by":"publisher","first-page":"425","DOI":"10.1016\/j.bbrc.2004.08.113","volume":"323","author":"YD Cai","year":"2004","unstructured":"Cai YD, Chou KC: Predicting 22 protein localizations in budding yeast. Biochem Biophys Res Commun 2004, 323: 425\u2013428. 10.1016\/j.bbrc.2004.08.113","journal-title":"Biochem Biophys Res Commun"},{"key":"499_CR6","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1002\/prot.1035","volume":"43","author":"KC Chou","year":"2001","unstructured":"Chou KC: Prediction of protein cellular attributes using pseudo-amino acid composition. PROTEINS: Proteins 2001, 43: 246\u2013255. (Erratum: ibid., 2001, 44: 60) 10.1002\/prot.1035","journal-title":"PROTEINS: Proteins"},{"key":"499_CR7","doi-asserted-by":"publisher","first-page":"743","DOI":"10.1016\/j.bbrc.2003.10.062","volume":"311","author":"KC Chou","year":"2003","unstructured":"Chou KC, Cai YD: A new hybrid approach to predict subcellular localization of proteins by incorporating Gene ontology. Biochem Biophys Res Commun 2003, 311: 743\u2013747. 10.1016\/j.bbrc.2003.10.062","journal-title":"Biochem Biophys Res Commun"},{"key":"499_CR8","doi-asserted-by":"publisher","first-page":"1250","DOI":"10.1002\/jcb.10719","volume":"90","author":"KC Chou","year":"2003","unstructured":"Chou KC, Cai YD: Prediction and classification of protein subcellular localization: sequence-order effect and pseudo amino acid composition. Journal of Cellular Biochemistry 2003, 90: 1250\u20131260. (Addendum, ibid. 2004, 91(5): 1085) 10.1002\/jcb.10719","journal-title":"Journal of Cellular Biochemistry"},{"key":"499_CR9","doi-asserted-by":"publisher","first-page":"1197","DOI":"10.1002\/jcb.10790","volume":"91","author":"KC Chou","year":"2004","unstructured":"Chou KC, Cai YD: Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. Journal of Cellular Biochemistry 2004, 91: 1197\u20131203. 10.1002\/jcb.10790","journal-title":"Journal of Cellular Biochemistry"},{"key":"499_CR10","doi-asserted-by":"publisher","first-page":"1236","DOI":"10.1016\/j.bbrc.2004.06.073","volume":"320","author":"KC Chou","year":"2004","unstructured":"Chou KC, Cai YD: Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Bioch Biophys Res Commun 2004, 320: 1236\u20131239. 10.1016\/j.bbrc.2004.06.073","journal-title":"Bioch Biophys Res Commun"},{"key":"499_CR11","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1002\/1097-0282(20010415)58:5<491::AID-BIP1024>3.0.CO;2-I","volume":"58","author":"ZP Feng","year":"2001","unstructured":"Feng ZP: Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. Biopolymers 2001, 58: 491\u2013499. 10.1002\/1097-0282(20010415)58:5<491::AID-BIP1024>3.0.CO;2-I","journal-title":"Biopolymers"},{"key":"499_CR12","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1023\/A:1007091128394","volume":"19","author":"ZP Feng","year":"2000","unstructured":"Feng ZP, Zhang CT: Prediction of membrane protein types based on the hydrophobic index of amino acids. Journal of Protein Chemistry 2000, 19: 269\u2013275. 10.1023\/A:1007091128394","journal-title":"Journal of Protein Chemistry"},{"key":"499_CR13","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1016\/S0141-8130(01)00121-0","volume":"28","author":"ZP Feng","year":"2001","unstructured":"Feng ZP, Zhang CT: Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acids. Int J Biol Macromol 2001, 28: 255\u2013261. 10.1016\/S0141-8130(01)00121-0","journal-title":"Int J Biol Macromol"},{"key":"499_CR14","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1023\/A:1025350409648","volume":"22","author":"YX Pan","year":"2003","unstructured":"Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang ZD, He L: Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach. Journal of Protein Chemistry 2003, 22: 395\u2013402. 10.1023\/A:1025350409648","journal-title":"Journal of Protein Chemistry"},{"key":"499_CR15","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1093\/protein\/gzh061","volume":"17","author":"M Wang","year":"2004","unstructured":"Wang M, Yang J, Liu GP, Xu ZJ, Chou KC: Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition. Protein Eng Des Sel 2004, 17: 509\u2013516. 10.1093\/protein\/gzh061","journal-title":"Protein Eng Des Sel"},{"key":"499_CR16","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1016\/j.jtbi.2004.07.023","volume":"232","author":"M Wang","year":"2005","unstructured":"Wang M, Yang J, Xu ZJ, Chou KC: SLLE for predicting membrane protein types. J Theor Biol 2005, 232: 7\u201315. 10.1016\/j.jtbi.2004.07.023","journal-title":"J Theor Biol"},{"issue":"1","key":"499_CR17","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1007\/s00726-004-0148-7","volume":"28","author":"X Xiao","year":"2005","unstructured":"Xiao X, Shao S, Ding Y, Huang Z, Huang Y, Chou KC: Using complexity measure factor to predict protein subcellular location. Amino Acids 2005, 28(1):57\u201361. 10.1007\/s00726-004-0148-7","journal-title":"Amino Acids"},{"key":"499_CR18","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1016\/S0014-5793(99)00506-2","volume":"451","author":"Z Yuan","year":"1999","unstructured":"Yuan Z: Prediction of protein subcellular locations using Markov chain models. FEBS Letters 1999, 451: 23\u201326. 10.1016\/S0014-5793(99)00506-2","journal-title":"FEBS Letters"},{"key":"499_CR19","doi-asserted-by":"publisher","first-page":"729","DOI":"10.1023\/A:1020713915365","volume":"17","author":"GP Zhou","year":"1998","unstructured":"Zhou GP: An intriguing controversy over protein structural class prediction. Journal of Protein Chemistry 1998, 17: 729\u2013738. 10.1023\/A:1020713915365","journal-title":"Journal of Protein Chemistry"},{"key":"499_CR20","first-page":"57","volume-title":"Some insights into protein structural class prediction","author":"GP Zhou","year":"2001","unstructured":"Zhou GP, Assa-Munt N: Some insights into protein structural class prediction. 2001, 44: 57\u201359. 10.1002\/prot.1071"},{"key":"499_CR21","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1002\/prot.10251","volume":"50","author":"GP Zhou","year":"2003","unstructured":"Zhou GP, Doctor K: Subcellular location prediction of apoptosis proteins. Proteins 2003, 50: 44\u201348. 10.1002\/prot.10251","journal-title":"Proteins"},{"key":"499_CR22","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1016\/S0065-3233(00)54009-1","volume":"54","author":"K Nakai","year":"2000","unstructured":"Nakai K: Protein sorting signals and prediction of subcellular localization. Adv Protein Chem 2000, 54: 277\u2013344.","journal-title":"Adv Protein Chem"},{"issue":"2","key":"499_CR23","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1002\/prot.340110203","volume":"11","author":"K Nakai","year":"1991","unstructured":"Nakai K, Kanehisa M: Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 1991, 11(2):95\u2013110. 10.1002\/prot.340110203","journal-title":"Proteins"},{"key":"499_CR24","doi-asserted-by":"publisher","first-page":"3613","DOI":"10.1093\/nar\/gkg602","volume":"31","author":"Gardy L Jennifer","year":"2003","unstructured":"Jennifer GardyL, Cory Spencer , Ke Wang , Martin Ester , Gabor TusnadyE, Istvan Simon , Sujun Hua , Katalin deFays , Christophe Lambert , Kenta Nakai , Fiona BrinkmanSL: PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Research 2003, 31: 3613\u201317. 10.1093\/nar\/gkg602","journal-title":"Nucleic Acids Research"},{"issue":"5","key":"499_CR25","doi-asserted-by":"publisher","first-page":"1402","DOI":"10.1110\/ps.03479604","volume":"13","author":"C-S Yu","year":"2004","unstructured":"Yu C-S, Lin C-J, Hwang J-K: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Science 2004, 13(5):1402\u20131406. 10.1110\/ps.03479604","journal-title":"Protein Science"},{"key":"499_CR26","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1093\/nar\/28.1.45","volume":"28","author":"A Bairoch","year":"2000","unstructured":"Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 2000, 28: 45\u201348. 10.1093\/nar\/28.1.45","journal-title":"Nucleic Acids Research"},{"issue":"2","key":"499_CR27","doi-asserted-by":"publisher","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","volume":"405","author":"BW Matthews","year":"1975","unstructured":"Matthews BW: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975, 405(2):442\u2013451.","journal-title":"Biochim Biophys Acta"},{"issue":"1","key":"499_CR28","first-page":"97","volume":"25","author":"CAF Andersen","year":"2004","unstructured":"Andersen CAF, Brunak S: Representation of protein-sequence information by amino acid subalphabets. AI Magazine 2004, 25(1):97\u2013104. [http:\/\/portal.acm.org\/citation.cfm?id=996927]","journal-title":"AI Magazine"},{"key":"499_CR29","first-page":"322","volume-title":"Multivariate Analysis","author":"KV Mardia","year":"1979","unstructured":"Mardia KV, Kent JT, Bibby JM: Multivariate Analysis. London: Academic Press; 1979:322\u2013381."},{"key":"499_CR30","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","volume":"36","author":"M Stone","year":"1974","unstructured":"Stone M: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society 1974, 36: 111\u2013147.","journal-title":"Journal of the Royal Statistical Society"},{"key":"499_CR31","volume-title":"PhD thesis","author":"R Kohavi","year":"1995","unstructured":"Kohavi R: Wrappers for performance enhancement and oblivious decision graphs. PhD thesis. Stanford University; 1995."},{"key":"499_CR32","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1073\/pnas.97.1.262","volume":"97","author":"MPS Brown","year":"2000","unstructured":"Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. PNAS 2000, 97: 262\u2013267. 10.1073\/pnas.97.1.262","journal-title":"PNAS"},{"key":"499_CR33","doi-asserted-by":"publisher","first-page":"1132","DOI":"10.1093\/bioinformatics\/btg102","volume":"19","author":"Y Lee","year":"2003","unstructured":"Lee Y, Lee C-K: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 2003, 19: 1132\u20131139. 10.1093\/bioinformatics\/btg102","journal-title":"Bioinformatics"},{"key":"499_CR34","doi-asserted-by":"publisher","first-page":"1650","DOI":"10.1093\/bioinformatics\/btg223","volume":"19","author":"JJ Ward","year":"2003","unstructured":"Ward JJ, McGuffin LJ, Buxton BF, Jones DT: Secondary structure prediction with support vector machines. Bioinformatics 2003, 19: 1650\u20131655. 10.1093\/bioinformatics\/btg223","journal-title":"Bioinformatics"},{"key":"499_CR35","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The nature of statistical learning theory","author":"V Vapnik","year":"1995","unstructured":"Vapnik V: The nature of statistical learning theory. Springer-Verlag, New York; 1995."},{"key":"499_CR36","volume-title":"Statistical learning theory","author":"V Vapnik","year":"1998","unstructured":"Vapnik V: Statistical learning theory. John-Wiley, New York; 1998."},{"key":"499_CR37","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1023\/A:1012427100071","volume":"46","author":"C-W Hsu","year":"2002","unstructured":"Hsu C-W, Lin C-J: A simple decomposition method for support vector machines. Machine Learning 2002, 46: 291\u2013314. 10.1023\/A:1012427100071","journal-title":"Machine Learning"},{"key":"499_CR38","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1093\/nar\/28.1.374","volume":"28","author":"S Kawashima","year":"2000","unstructured":"Kawashima S, Kanehisa M: AAindex: amino acid index database. Nucleic Acids Res 2000, 28: 374. 10.1093\/nar\/28.1.374","journal-title":"Nucleic Acids Res"},{"key":"499_CR39","volume-title":"Prentice Hall","author":"SJ Russel","year":"2003","unstructured":"Russel SJ, Norvig P: Artificial intelligence: a modern approach. Prentice Hall 2003."},{"issue":"13","key":"499_CR40","doi-asserted-by":"publisher","first-page":"1656","DOI":"10.1093\/bioinformatics\/btg222","volume":"19","author":"K-J Park","year":"2003","unstructured":"Park K-J, Kanehisa M: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 2003, 19(13):1656\u20131663. 10.1093\/bioinformatics\/btg222","journal-title":"Bioinformatics"},{"issue":"1\u20132","key":"499_CR41","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/S0004-3702(97)00043-X","volume":"97","author":"R Kohavi","year":"1997","unstructured":"Kohavi R, John GH: Wrappers for feature subset selection. Artificial intelligence 1997, 97(1\u20132):273\u2013324. 10.1016\/S0004-3702(97)00043-X","journal-title":"Artificial intelligence"},{"issue":"4","key":"499_CR42","doi-asserted-by":"publisher","first-page":"275","DOI":"10.3109\/10409239509083488","volume":"30","author":"KC Chou","year":"1995","unstructured":"Chou KC, Zhang CT: Review: Prediction of protein structural classes. Crit Rev Biochem Mol Biol 1995, 30(4):275\u2013349.","journal-title":"Crit Rev Biochem Mol Biol"},{"key":"499_CR43","unstructured":"Protein subcellular localization prediction for Gram-negative bacteria[http:\/\/protein.bii.a-star.edu.sg\/localization\/gram-negative\/]"},{"key":"499_CR44","unstructured":"BSVM[http:\/\/www.csie.ntu.edu.tw\/~cjlin\/bsvm\/index.html]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-174.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,2]],"date-time":"2025-01-02T13:23:37Z","timestamp":1735824217000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-174"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,7,13]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2005,12]]}},"alternative-id":["499"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-174","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2005,7,13]]},"assertion":[{"value":"14 February 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 July 2005","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 July 2005","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"174"}}