{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T07:55:28Z","timestamp":1778226928831,"version":"3.51.4"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"S22","license":[{"start":{"date-parts":[[2019,12,1]],"date-time":"2019-12-01T00:00:00Z","timestamp":1575158400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2019,12,30]],"date-time":"2019-12-30T00:00:00Z","timestamp":1577664000000},"content-version":"vor","delay-in-days":29,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this paper, we firstly introduce two kinds of protein sequences encoding schemes: dipeptide information with space and Gapped k-mer information. Then, the Gapped k-mer calculation method which is based on quad-tree is also introduced.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>&gt;From the prediction results, this method not only reduces the dimension, but also improves the prediction precision of protein subcellular localization.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-019-3232-4","type":"journal-article","created":{"date-parts":[[2019,12,30]],"date-time":"2019-12-30T08:02:27Z","timestamp":1577692947000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method"],"prefix":"10.1186","volume":"20","author":[{"given":"Yu-hua","family":"Yao","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ya-ping","family":"Lv","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ling","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hui-min","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bin-bin","family":"Ji","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jing","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chun","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bo","family":"Liao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xu-ying","family":"Nan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2019,12,30]]},"reference":[{"key":"3232_CR1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0020592","volume":"6","author":"X Xiao","year":"2011","unstructured":"Xiao X, Wu ZC, Chou KC. A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One. 2011;6:e20592.","journal-title":"PLoS One"},{"key":"3232_CR2","doi-asserted-by":"publisher","unstructured":"Liu G, Zhang WB, Qian G, Wang B, Mao B, Bichindaritz I. Bioimage-based prediction of protein subcellular location in human tissue with ensemble features and deep networks. IEEE\/ACM Trans Comput Biol Bioinform. 2019 May 20; https:\/\/doi.org\/10.1109\/TCBB.2019.2917429.","DOI":"10.1109\/TCBB.2019.2917429"},{"issue":"3","key":"3232_CR3","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1080\/1062936X.2019.1576222","volume":"30","author":"S Zhang","year":"2019","unstructured":"Zhang S, Zhang T, Liu C. Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine. SAR QSAR Environ Res. 2019;30(3):209\u201328.","journal-title":"SAR QSAR Environ Res"},{"key":"3232_CR4","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1016\/j.artmed.2017.05.007","volume":"78","author":"Q Xiang","year":"2017","unstructured":"Xiang Q, Liao B, Li X, Xu H, Chen J, Shi Z, Dai Q, Yao Y. Subcellular localization prediction of apoptosis proteins based on evolutionary information and support vector machine. Artif Intell Med. 2017 May;78:41\u20136.","journal-title":"Artif Intell Med"},{"issue":"Suppl 4","key":"3232_CR5","doi-asserted-by":"publisher","first-page":"S1","DOI":"10.1186\/1471-2105-16-S4-S1","volume":"16","author":"A Dehzangi","year":"2015","unstructured":"Dehzangi A, Sohrabi S, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A. Gram-positive and Gram-negative subcellular localization using rotation forest and physicochemical-based features. BMC Bioinform. 2015;16(Suppl 4):S1.","journal-title":"BMC Bioinform"},{"issue":"26","key":"3232_CR6","doi-asserted-by":"publisher","first-page":"6169","DOI":"10.1016\/j.febslet.2006.10.017","volume":"580","author":"ZH Zhang","year":"2006","unstructured":"Zhang ZH, Wang ZH, Zhang ZR, Wang YX. A novel method for apoptosisprotein subcellular localization prediction combining encoding based ongrouped weight and support vector machine. FEBS Lett. 2006;580(26):6169\u201374.","journal-title":"FEBS Lett"},{"issue":"2","key":"3232_CR7","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1016\/j.jtbi.2007.05.019","volume":"248","author":"YL Chen","year":"2007","unstructured":"Chen YL, Li QZ. Prediction of apoptosis protein subcellular location usingimproved hybrid approach and pseudo-amino acid composition. J Theor Biol. 2007;248(2):377\u201381.","journal-title":"J Theor Biol"},{"issue":"6","key":"3232_CR8","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1093\/bioinformatics\/btw723","volume":"33","author":"H Zhou","year":"2006","unstructured":"Zhou H, Yang Y, Shen HB. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features. Bioinformatics. 2006;33(6):843\u201353.","journal-title":"Bioinformatics"},{"key":"3232_CR9","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1016\/j.bbrc.2007.03.139","volume":"357","author":"PL Jia","year":"2007","unstructured":"Jia PL, Qian ZL, Zeng ZB, Cai YD, LiX Y. Prediction of subcellular protein localization based on functional domain composition. Biochem Biophys Res Commun. 2007;357:366\u201370.","journal-title":"Biochem Biophys Res Commun"},{"issue":"7","key":"3232_CR10","doi-asserted-by":"publisher","first-page":"944","DOI":"10.1093\/bioinformatics\/bti104","volume":"21","author":"KC Chou","year":"2005","unstructured":"Chou KC, Cai YD. Predicting protein localization in budding yeast. Bioinformatics. 2005;21(7):944\u201350.","journal-title":"Bioinformatics"},{"issue":"1","key":"3232_CR11","doi-asserted-by":"publisher","first-page":"478","DOI":"10.1186\/s12864-018-4849-9","volume":"19","author":"B Yu","year":"2018","unstructured":"Yu B, Li S, Qiu W, Wang M, Du J, Zhang Y, Chen X. Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction. BMC Genomics. 2018;19(1):478.","journal-title":"BMC Genomics"},{"issue":"48","key":"3232_CR12","doi-asserted-by":"publisher","first-page":"45765","DOI":"10.1074\/jbc.M204161200","volume":"277","author":"KC Chou","year":"2002","unstructured":"Chou KC, Cai YD. Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem. 2002;277(48):45765\u20139.","journal-title":"J Biol Chem"},{"key":"3232_CR13","doi-asserted-by":"crossref","unstructured":"Cheng X, Xiao X, Chou KC. pLoc-mGneg: predict subcellular localization of gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics. 2017;S0888754317301027","DOI":"10.1016\/j.ygeno.2017.10.002"},{"issue":"9","key":"3232_CR14","doi-asserted-by":"publisher","first-page":"1722","DOI":"10.1039\/C7MB00267J","volume":"13","author":"X Cheng","year":"2017","unstructured":"Cheng X, Xiao X, Chou KC. pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. Mol BioSyst. 2017;13(9):1722\u20137.","journal-title":"Mol BioSyst"},{"key":"3232_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.compbiolchem.2016.09.009","volume":"65","author":"SB Zhang","year":"2016","unstructured":"Zhang SB, Tang QR. Predicting protein subcellular localization based on information content of gene ontology terms. Comput Biol Chem. 2016;65:1\u20137.","journal-title":"Comput Biol Chem"},{"issue":"457","key":"3232_CR16","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/j.jtbi.2018.08.042","volume":"14","author":"S Zhang","year":"2018","unstructured":"Zhang S, Liang Y. Predicting apoptosis protein subcellular localization by integrating auto-cross correlation and PSSM into Chou\u2019s PseAAC. J Theor Biol. 2018;14(457):163\u20139.","journal-title":"J Theor Biol"},{"issue":"5","key":"3232_CR17","doi-asserted-by":"publisher","first-page":"pii: E919","DOI":"10.3390\/molecules24050919","volume":"24","author":"B Li","year":"2019","unstructured":"Li B, Cai L, Liao B, Fu X, Bing P, Yang J. Prediction of protein subcellular localization based on fusion of multi-view features. Molecules. 2019;24(5):pii: E919.","journal-title":"Molecules"},{"key":"3232_CR18","doi-asserted-by":"publisher","first-page":"1887","DOI":"10.1016\/j.patrec.2008.06.007","volume":"29","author":"YS Ding","year":"2008","unstructured":"Ding YS, Zhang TL. Using Chou\u2019s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier. Pattern Recogn Lett. 2008;29:1887\u201392.","journal-title":"Pattern Recogn Lett"},{"issue":"3","key":"3232_CR19","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1007\/s10441-008-9067-4","volume":"57","author":"H Lin","year":"2009","unstructured":"Lin H, Wang H, Ding H, Chen YL, Li QZ. Prediction of subcellular localization of apoptosis protein using Chou\u2019s pseudo amino acid composition. Acta Biotheor. 2009;57(3):321\u201330.","journal-title":"Acta Biotheor"},{"issue":"14","key":"3232_CR20","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1093\/bioinformatics\/btz337","volume":"35","author":"Z Yan","year":"2019","unstructured":"Yan Z, L\u00e9cuyer E, Blanchette M. Prediction of mRNA subcellular localization using deep recurrent neural networks. Bioinformatics. 2019;35(14):333\u201342.","journal-title":"Bioinformatics."},{"issue":"21","key":"3232_CR21","doi-asserted-by":"publisher","first-page":"3387","DOI":"10.1093\/bioinformatics\/btx431","volume":"33","author":"JJ Almagro Armenteros","year":"2017","unstructured":"Almagro Armenteros JJ, S\u00f8nderby CK, S\u00f8nderby SK, Nielsen H, Winther O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics. 2017;33(21):3387\u201395.","journal-title":"Bioinformatics."},{"issue":"5","key":"3232_CR22","doi-asserted-by":"publisher","first-page":"268","DOI":"10.2174\/1566523218666180913110949","volume":"18","author":"L Zhao","year":"2018","unstructured":"Zhao L, Wang J, Nabil MM, Zhang J. Deep Forest-based prediction of protein subcellular localization. Curr Gene Ther. 2018;18(5):268\u201374.","journal-title":"Curr Gene Ther"},{"key":"3232_CR23","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1016\/j.ab.2014.10.014","volume":"473","author":"SB Wan","year":"2015","unstructured":"Wan SB, Mak MW, Kung SY. mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. Anal Biochem. 2015;473:14\u201327.","journal-title":"Anal Biochem"},{"key":"3232_CR24","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1016\/j.jtbi.2014.06.031","volume":"360","author":"SB Wan","year":"2014","unstructured":"Wan SB, Mak MW, Kung SY. R3P-Loc: a compact multi-label predictor using ridge regression and random projection for protein subcellular localization. J Theor Biol. 2014;360:34\u201345.","journal-title":"J Theor Biol"},{"key":"3232_CR25","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1016\/j.compbiomed.2011.11.006","volume":"42","author":"RP Liang","year":"2012","unstructured":"Liang RP, Huang SY, Shi SP, Sun XY, Luo SB, Qiu JD. A novel algorithm combining support vector machine with the discrete wavelet transform for the prediction of protein subcellular localization. Comput Biol Med. 2012;42:180\u20137.","journal-title":"Comput Biol Med"},{"key":"3232_CR26","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1007\/s00726-006-0475-y","volume":"33","author":"JY Shi","year":"2007","unstructured":"Shi JY, Zhang SW, Pan Q, Chen YM, Xie J. Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids. 2007;33:69\u201374.","journal-title":"Amino Acids"},{"key":"3232_CR27","doi-asserted-by":"publisher","first-page":"3257","DOI":"10.1016\/S0006-3495(03)70050-2","volume":"84","author":"YD Cai","year":"2003","unstructured":"Cai YD, Zhou GP, Chou KC. Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J. 2003;84:3257\u201363.","journal-title":"Biophys J"},{"key":"3232_CR28","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1016\/j.jtbi.2015.07.034","volume":"384","author":"F Ali","year":"2015","unstructured":"Ali F, Hayat M. Classification of membrane protein types using voting feature interval in combination with Chou\u2019s pseudo amino acid composition. J Theor Biol. 2015;384:78\u201383.","journal-title":"J Theor Biol"},{"key":"3232_CR29","doi-asserted-by":"publisher","first-page":"1957","DOI":"10.1101\/gr.2650004","volume":"14","author":"MS Scott","year":"2014","unstructured":"Scott MS, Thomas DY, Hallett MT. Predicting subcellular localization via protein motif co-occurrence. Genome Res. 2014;14:1957\u201366.","journal-title":"Genome Res"},{"key":"3232_CR30","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1109\/TCBB.2009.82","volume":"8","author":"TH Lin","year":"2011","unstructured":"Lin TH, Murphy RF, Barjoseph Z. Discriminative motif finding for predicting protein subcellular localization. IEEE\/ACM Trans Comput Biol Bioinforma. 2011;8:441\u201351.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinforma"},{"key":"3232_CR31","doi-asserted-by":"publisher","first-page":"978","DOI":"10.1110\/ps.8.5.978","volume":"8","author":"O Emanuelsson","year":"1999","unstructured":"Emanuelsson O, Nielsen H, Heijne GV. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 1999;8:978\u201384.","journal-title":"Protein Sci"},{"issue":"1","key":"3232_CR32","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1002\/prot.10251","volume":"50","author":"GP Zhou","year":"2003","unstructured":"Zhou GP, Doctor K. Subcellular location prediction of apoptosis proteins. Proteins. 2003;50(1):44\u20138.","journal-title":"Proteins."},{"issue":"4","key":"3232_CR33","doi-asserted-by":"publisher","first-page":"775","DOI":"10.1016\/j.jtbi.2006.11.010","volume":"245","author":"YL Chen","year":"2007","unstructured":"Chen YL, Li QZ. Prediction of the subcellular location of apoptosis proteins. J Theor Biol. 2007;245(4):775\u201383.","journal-title":"J Theor Biol"},{"issue":"10","key":"3232_CR34","doi-asserted-by":"publisher","first-page":"1263","DOI":"10.2174\/092986610792231528","volume":"17","author":"TG Liu","year":"2010","unstructured":"Liu TG, Zheng XQ, Wang CH, Wang J. Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: an approach from auto covariance transformation. Protein Peptide Lett. 2010;17(10):1263\u20139.","journal-title":"Protein Peptide Lett"},{"key":"3232_CR35","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The nature of statistical learning theory","author":"V Vapnik","year":"1995","unstructured":"Vapnik V. The nature of statistical learning theory. New York: Springer; 1995."},{"key":"3232_CR36","volume-title":"Statistical learning theory","author":"V Vapnik","year":"1998","unstructured":"Vapnik V. Statistical learning theory. New York: Wiley; 1998."},{"key":"3232_CR37","doi-asserted-by":"crossref","unstructured":"Kre\u00dfel UH. Pairwise classification and support vector machines. Adv Kernel Meth. 1999:255\u201368.","DOI":"10.7551\/mitpress\/1130.003.0020"},{"key":"3232_CR38","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1016\/j.biochi.2014.06.001","volume":"104","author":"L Li","year":"2014","unstructured":"Li L, Yu S, Xiao W, Li Y, Li M, Huang L, Zheng X, Zhou S, Yang H. Prediction of bacterial protein subcellular localization by incorporating various features into Chou\u2019s PseAAC and a backward feature selection approach. Biochimie. 2014;104:100\u20137.","journal-title":"Biochimie."},{"key":"3232_CR39","first-page":"326e333","volume":"264","author":"HB Shen","year":"2010","unstructured":"Shen HB, Chou KC. Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of gram-negative bacterial proteins. J Theor Biol. 2010;264:326e333.","journal-title":"J Theor Biol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3232-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-019-3232-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3232-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,24]],"date-time":"2023-09-24T13:19:34Z","timestamp":1695561574000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-019-3232-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12]]},"references-count":39,"journal-issue":{"issue":"S22","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["3232"],"URL":"https:\/\/doi.org\/10.1186\/s12859-019-3232-4","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12]]},"assertion":[{"value":"30 December 2019","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"719"}}