{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T15:03:11Z","timestamp":1761058991075},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"S15","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational approaches become highly desirable. Most of the PSL prediction systems are established for single-localized proteins. However, a significant number of eukaryotic proteins are known to be localized into multiple subcellular organelles. Many studies have shown that proteins may simultaneously locate or move between different cellular compartments and be involved in different biological processes with different roles.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In this study, we propose a knowledge based method, called KnowPred<jats:sub>site<\/jats:sub>, to predict the localization site(s) of both single-localized and multi-localized proteins. Based on the local similarity, we can identify the \"related sequences\" for prediction. We construct a knowledge base to record the possible sequence variations for protein sequences. When predicting the localization annotation of a query protein, we search against the knowledge base and used a scoring mechanism to determine the predicted sites. We downloaded the dataset from ngLOC, which consisted of ten distinct subcellular organelles from 1923 species, and performed ten-fold cross validation experiments to evaluate KnowPred<jats:sub>site<\/jats:sub>'s performance. The experiment results show that KnowPred<jats:sub>site<\/jats:sub> achieves higher prediction accuracy than ngLOC and Blast-hit method. For single-localized proteins, the overall accuracy of KnowPred<jats:sub>site<\/jats:sub> is 91.7%. For multi-localized proteins, the overall accuracy of KnowPred<jats:sub>site<\/jats:sub> is 72.1%, which is significantly higher than that of ngLOC by 12.4%. Notably, half of the proteins in the dataset that cannot find any Blast hit sequence above a specified threshold can still be correctly predicted by KnowPred<jats:sub>site<\/jats:sub>.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>KnowPred<jats:sub>site<\/jats:sub> demonstrates the power of identifying related sequences in the knowledge base. The experiment results show that even though the sequence similarity is low, the local similarity is effective for prediction. Experiment results show that KnowPred<jats:sub>site<\/jats:sub> is a highly accurate prediction method for both single- and multi-localized proteins. It is worth-mentioning the prediction process of KnowPred<jats:sub>site<\/jats:sub> is transparent and biologically interpretable and it shows a set of template sequences to generate the prediction result. The KnowPred<jats:sub>site<\/jats:sub> prediction server is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/bio-cluster.iis.sinica.edu.tw\/kbloc\/\" ext-link-type=\"uri\">http:\/\/bio-cluster.iis.sinica.edu.tw\/kbloc\/<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-10-s15-s8","type":"journal-article","created":{"date-parts":[[2009,12,3]],"date-time":"2009-12-03T20:26:51Z","timestamp":1259872011000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Protein subcellular localization prediction of eukaryotes using a knowledge-based approach"],"prefix":"10.1186","volume":"10","author":[{"given":"Hsin-Nan","family":"Lin","sequence":"first","affiliation":[]},{"given":"Ching-Tai","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Ting-Yi","family":"Sung","sequence":"additional","affiliation":[]},{"given":"Shinn-Ying","family":"Ho","sequence":"additional","affiliation":[]},{"given":"Wen-Lian","family":"Hsu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2009,12,3]]},"reference":[{"issue":"4","key":"3453_CR1","doi-asserted-by":"publisher","first-page":"917","DOI":"10.1002\/prot.10507","volume":"53","author":"R Nair","year":"2003","unstructured":"Nair R, Rost B: Better prediction of sub-cellular localization by combining evolutionary and structural information. Proteins 2003, 53(4):917\u2013930. 10.1002\/prot.10507","journal-title":"Proteins"},{"issue":"5","key":"3453_CR2","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1093\/bioinformatics\/bti057","volume":"21","author":"JL Gardy","year":"2005","unstructured":"Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS: PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 2005, 21(5):617\u2013623. 10.1093\/bioinformatics\/bti057","journal-title":"Bioinformatics"},{"issue":"10","key":"3453_CR3","doi-asserted-by":"publisher","first-page":"1158","DOI":"10.1093\/bioinformatics\/btl002","volume":"22","author":"A Hoglund","year":"2006","unstructured":"Hoglund A, Donnes P, Blum T, Adolph HW, Kohlbacher O: MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 2006, 22(10):1158\u20131165. 10.1093\/bioinformatics\/btl002","journal-title":"Bioinformatics"},{"key":"3453_CR4","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1186\/1471-2105-6-174","volume":"6","author":"JR Wang","year":"2005","unstructured":"Wang JR, Sung WK, Krishnan A, Li KB: Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinformatics 2005, 6: 174. 10.1186\/1471-2105-6-174","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"3453_CR5","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1002\/prot.21018","volume":"64","author":"CS Yu","year":"2006","unstructured":"Yu CS, Chen YC, Lu CH, Hwang JK: Prediction of protein subcellular localization. Proteins 2006, 64(3):643\u2013651. 10.1002\/prot.21018","journal-title":"Proteins"},{"issue":"5","key":"3453_CR6","doi-asserted-by":"publisher","first-page":"1402","DOI":"10.1110\/ps.03479604","volume":"13","author":"CS Yu","year":"2004","unstructured":"Yu CS, Lin CJ, Hwang JK: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci 2004, 13(5):1402\u20131406. 10.1110\/ps.03479604","journal-title":"Protein Sci"},{"issue":"2","key":"3453_CR7","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1002\/prot.21944","volume":"72","author":"JM Chang","year":"2008","unstructured":"Chang JM, Su EC, Lo A, Chiu HS, Sung TY, Hsu WL: PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins 2008, 72(2):693\u2013710. 10.1002\/prot.21944","journal-title":"Proteins"},{"issue":"10","key":"3453_CR8","doi-asserted-by":"publisher","first-page":"2522","DOI":"10.1093\/bioinformatics\/bti309","volume":"21","author":"M Bhasin","year":"2005","unstructured":"Bhasin M, Garg A, Raghava GP: PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 2005, 21(10):2522\u20132524. 10.1093\/bioinformatics\/bti309","journal-title":"Bioinformatics"},{"issue":"7","key":"3453_CR9","doi-asserted-by":"publisher","first-page":"944","DOI":"10.1093\/bioinformatics\/bti104","volume":"21","author":"KC Chou","year":"2005","unstructured":"Chou KC, Cai YD: Predicting protein localization in budding yeast. Bioinformatics 2005, 21(7):944\u2013950. 10.1093\/bioinformatics\/bti104","journal-title":"Bioinformatics"},{"issue":"13","key":"3453_CR10","doi-asserted-by":"publisher","first-page":"3613","DOI":"10.1093\/nar\/gkg602","volume":"31","author":"JL Gardy","year":"2003","unstructured":"Gardy JL, Spencer C, Wang K, Ester M, Tusnady GE, Simon I, Hua S, deFays K, Lambert C, Nakai K, et al.: PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res 2003, 31(13):3613\u20133617. 10.1093\/nar\/gkg602","journal-title":"Nucleic Acids Res"},{"issue":"17","key":"3453_CR11","doi-asserted-by":"publisher","first-page":"4655","DOI":"10.1093\/nar\/gkl638","volume":"34","author":"K Lee","year":"2006","unstructured":"Lee K, Kim DW, Na D, Lee KH, Lee D: PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res 2006, 34(17):4655\u20134666. 10.1093\/nar\/gkl638","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"3453_CR12","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/j.jmb.2005.02.025","volume":"348","author":"R Nair","year":"2005","unstructured":"Nair R, Rost B: Mimicking cellular sorting improves prediction of subcellular localization. J Mol Biol 2005, 348(1):85\u2013100. 10.1016\/j.jmb.2005.02.025","journal-title":"J Mol Biol"},{"key":"3453_CR13","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1186\/1471-2105-9-80","volume":"9","author":"WL Huang","year":"2008","unstructured":"Huang WL, Tung CW, Ho SW, Hwang SF, Ho SY: ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization. BMC Bioinformatics 2008, 9: 80. 10.1186\/1471-2105-9-80","journal-title":"BMC Bioinformatics"},{"issue":"22","key":"3453_CR14","doi-asserted-by":"publisher","first-page":"12115","DOI":"10.1073\/pnas.220399497","volume":"97","author":"EM Marcotte","year":"2000","unstructured":"Marcotte EM, Xenarios I, Bliek AM, Eisenberg D: Localizing proteins in the cell from their phylogenetic profiles. Proc Natl Acad Sci USA 2000, 97(22):12115\u201312120. 10.1073\/pnas.220399497","journal-title":"Proc Natl Acad Sci USA"},{"issue":"8","key":"3453_CR15","doi-asserted-by":"publisher","first-page":"1168","DOI":"10.1101\/gr.96802","volume":"12","author":"R Mott","year":"2002","unstructured":"Mott R, Schultz J, Bork P, Ponting CP: Predicting protein cellular localization using a domain projection method. Genome Res 2002, 12(8):1168\u20131174. 10.1101\/gr.96802","journal-title":"Genome Res"},{"key":"3453_CR16","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1186\/1471-2105-8-330","volume":"8","author":"EC Su","year":"2007","unstructured":"Su EC, Chiu HS, Lo A, Hwang JK, Sung TY, Hsu WL: Protein subcellular localization prediction based on compartment-specific features and structure conservation. BMC Bioinformatics 2007, 8: 330. 10.1186\/1471-2105-8-330","journal-title":"BMC Bioinformatics"},{"issue":"2","key":"3453_CR17","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1110\/ps.9.2.232","volume":"9","author":"L Rychlewski","year":"2000","unstructured":"Rychlewski L, Jaroszewski L, Li WZ, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science 2000, 9(2):232\u2013241.","journal-title":"Protein Science"},{"issue":"1","key":"3453_CR18","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1016\/S0022-2836(02)01371-2","volume":"326","author":"R Sadreyev","year":"2003","unstructured":"Sadreyev R, Grishin N: COMPASS: A tool for comparison of multiple protein alignments with assessment of statistical significance. Journal of Molecular Biology 2003, 326(1):317\u2013336. 10.1016\/S0022-2836(02)01371-2","journal-title":"Journal of Molecular Biology"},{"issue":"7","key":"3453_CR19","doi-asserted-by":"publisher","first-page":"2238","DOI":"10.1093\/nar\/gkm107","volume":"35","author":"D Przybylski","year":"2007","unstructured":"Przybylski D, Rost B: Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments. Nucleic Acids Research 2007, 35(7):2238\u20132246. 10.1093\/nar\/gkm107","journal-title":"Nucleic Acids Research"},{"issue":"19","key":"3453_CR20","doi-asserted-by":"publisher","first-page":"3836","DOI":"10.1093\/nar\/24.19.3836","volume":"24","author":"S Pietrokovski","year":"1996","unstructured":"Pietrokovski S: Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Research 1996, 24(19):3836\u20133845. 10.1093\/nar\/24.19.3836","journal-title":"Nucleic Acids Research"},{"issue":"5","key":"3453_CR21","doi-asserted-by":"publisher","first-page":"1257","DOI":"10.1006\/jmbi.2001.5293","volume":"315","author":"G Yona","year":"2002","unstructured":"Yona G, Levitt M: Within the twilight zone: A sensitive profile-profile comparison tool based on information theory. Journal of Molecular Biology 2002, 315(5):1257\u20131275. 10.1006\/jmbi.2001.5293","journal-title":"Journal of Molecular Biology"},{"key":"3453_CR22","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1186\/1471-2105-9-127","volume":"9","author":"S Zhang","year":"2008","unstructured":"Zhang S, Xia X, Shen J, Zhou Y, Sun Z: DBMLoc: a Database of proteins with multiple subcellular localizations. BMC Bioinformatics 2008, 9: 127. 10.1186\/1471-2105-9-127","journal-title":"BMC Bioinformatics"},{"key":"3453_CR23","doi-asserted-by":"crossref","unstructured":"King BR, Guda C: ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes. Genome Biology 2007., 8(5): 10.1186\/gb-2007-8-5-r68","DOI":"10.1186\/gb-2007-8-5-r68"},{"issue":"15","key":"3453_CR24","doi-asserted-by":"publisher","first-page":"3227","DOI":"10.1093\/bioinformatics\/bti524","volume":"21","author":"HN Lin","year":"2005","unstructured":"Lin HN, Chang JM, Wu KP, Sung TY, Hsu WL: HYPROSP II--a knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence. Bioinformatics 2005, 21(15):3227\u20133233. 10.1093\/bioinformatics\/bti524","journal-title":"Bioinformatics"},{"issue":"17","key":"3453_CR25","doi-asserted-by":"publisher","first-page":"5059","DOI":"10.1093\/nar\/gkh836","volume":"32","author":"KP Wu","year":"2004","unstructured":"Wu KP, Lin HN, Chang JM, Sung TY, Hsu WL: HYPROSP: a hybrid protein secondary structure prediction algorithm--a knowledge-based approach. Nucleic Acids Res 2004, 32(17):5059\u20135065. 10.1093\/nar\/gkh836","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"3453_CR26","doi-asserted-by":"publisher","first-page":"1287","DOI":"10.1142\/S0219720006002466","volume":"4","author":"CT Chen","year":"2006","unstructured":"Chen CT, Lin HN, Sung TY, Hsu WL: HYPLOSP: a knowledge-based approach to protein local structure prediction. J Bioinform Comput Biol 2006, 4(6):1287\u20131307. 10.1142\/S0219720006002466","journal-title":"J Bioinform Comput Biol"},{"issue":"10","key":"3453_CR27","doi-asserted-by":"publisher","first-page":"935","DOI":"10.1093\/bioinformatics\/17.10.935","volume":"17","author":"E Bolten","year":"2001","unstructured":"Bolten E, Schliep A, Schneckener S, Schomburg D, Schrader R: Clustering protein sequences-structure prediction by transitive homology. Bioinformatics 2001, 17(10):935\u2013941. 10.1093\/bioinformatics\/17.10.935","journal-title":"Bioinformatics"},{"issue":"3","key":"3453_CR28","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1016\/S0968-0004(01)02039-4","volume":"27","author":"DT Jones","year":"2002","unstructured":"Jones DT, Swindells MB: Getting the most from PSI-BLAST. Trends in Biochemical Sciences 2002, 27(3):161\u2013164. 10.1016\/S0968-0004(01)02039-4","journal-title":"Trends in Biochemical Sciences"},{"issue":"15","key":"3453_CR29","doi-asserted-by":"publisher","first-page":"1681","DOI":"10.1093\/bioinformatics\/btn312","volume":"24","author":"K Forslund","year":"2008","unstructured":"Forslund K, Sonnhammer ELL: Predicting protein function from domain content. Bioinformatics 2008, 24(15):1681\u20131687. 10.1093\/bioinformatics\/btn312","journal-title":"Bioinformatics"},{"issue":"1","key":"3453_CR30","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1016\/S0968-0004(98)01336-X","volume":"24","author":"K Nakai","year":"1999","unstructured":"Nakai K, Horton P: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 1999, 24(1):34\u201336. 10.1016\/S0968-0004(98)01336-X","journal-title":"Trends Biochem Sci"},{"issue":"24","key":"3453_CR31","doi-asserted-by":"publisher","first-page":"4434","DOI":"10.1093\/bioinformatics\/bti758","volume":"21","author":"C Guda","year":"2005","unstructured":"Guda C, Subramaniam S: pTARGET: a new method for predicting protein subcellular localization in eukaryotes. Bioinformatics 2005, 21(24):4434\u20134434. 10.1093\/bioinformatics\/bti758","journal-title":"Bioinformatics"},{"issue":"13","key":"3453_CR32","doi-asserted-by":"publisher","first-page":"1656","DOI":"10.1093\/bioinformatics\/btg222","volume":"19","author":"KJ Park","year":"2003","unstructured":"Park KJ, Kanehisa M: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 2003, 19(13):1656\u20131663. 10.1093\/bioinformatics\/btg222","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-10-S15-S8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T21:33:19Z","timestamp":1630445599000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-10-S15-S8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,12]]},"references-count":32,"journal-issue":{"issue":"S15","published-print":{"date-parts":[[2009,12]]}},"alternative-id":["3453"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-10-s15-s8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,12]]},"assertion":[{"value":"3 December 2009","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S8"}}