{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T09:52:33Z","timestamp":1780653153166,"version":"3.54.1"},"reference-count":60,"publisher":"Springer Science and Business Media LLC","issue":"S3","license":[{"start":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T00:00:00Z","timestamp":1624406400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T00:00:00Z","timestamp":1624406400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62062067"],"award-info":[{"award-number":["62062067"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11661081"],"award-info":[{"award-number":["11661081"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11971421"],"award-info":[{"award-number":["11971421"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005273","name":"Natural Science Foundation of Yunnan Province","doi-asserted-by":"publisher","award":["2017FA032"],"award-info":[{"award-number":["2017FA032"]}],"id":[{"id":"10.13039\/501100005273","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Training Plan for Young and Middle-aged Academic Leaders of Yunnan Province","award":["2018HB031"],"award-info":[{"award-number":["2018HB031"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Antifreeze proteins (AFPs) are a group of proteins that inhibit body fluids from growing to ice crystals and thus improve biological antifreeze ability. It is vital to the survival of living organisms in extremely cold environments. However, little research is performed on sequences feature extraction and selection for antifreeze proteins classification in the structure and function prediction, which is of great significance.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>In this paper, to predict the antifreeze proteins, a feature representation of weighted generalized dipeptide composition (W-GDipC) and an ensemble feature selection based on two-stage and multi-regression method (LRMR-Ri) are proposed. Specifically, four feature selection algorithms: Lasso regression, Ridge regression, Maximal information coefficient and Relief are used to select the feature sets, respectively, which is the first stage of LRMR-Ri method. If there exists a common feature subset among the above four sets, it is the optimal subset; otherwise we use Ridge regression to select the optimal subset from the public set pooled by the four sets, which is the second stage of LRMR-Ri. The LRMR-Ri method combined with W-GDipC was performed both on the antifreeze proteins dataset (binary classification), and on the membrane protein dataset (multiple classification). Experimental results show that this method has good performance in support vector machine (SVM), decision tree (DT) and stochastic gradient descent (SGD). The values of ACC, RE and MCC of LRMR-Ri and W-GDipC with antifreeze proteins dataset and SVM classifier have reached as high as 95.56%, 97.06% and 0.9105, respectively, much higher than those of each single method: Lasso, Ridge, Mic and Relief, nearly 13% higher than single Lasso for ACC.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>The experimental results show that the proposed LRMR-Ri and W-GDipC method can significantly improve the accuracy of antifreeze proteins prediction compared with other similar single feature methods. In addition, our method has also achieved good results in the classification and prediction of membrane proteins, which verifies its widely reliability to a certain extent.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-021-04251-z","type":"journal-article","created":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T10:04:00Z","timestamp":1624442640000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Predicting antifreeze proteins with weighted generalized dipeptide composition and multi-regression feature selection ensemble"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1927-8753","authenticated-orcid":false,"given":"Shunfang","family":"Wang","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lin","family":"Deng","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xinnan","family":"Xia","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zicheng","family":"Cao","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yu","family":"Fei","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,6,23]]},"reference":[{"issue":"1","key":"4251_CR1","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1007\/s13369-017-2738-1","volume":"43","author":"MM Tab","year":"2018","unstructured":"Tab MM, Hashim NHF, Najimudin N, et al. Large-scale production of Glaciozyma antarctica antifreeze protein 1 (Afp1) by fed-batch fermentation of Pichia pastoris. Arab J Sci Eng. 2018;43(1):133\u201341.","journal-title":"Arab J Sci Eng."},{"issue":"2","key":"4251_CR2","doi-asserted-by":"publisher","first-page":"327","DOI":"10.1111\/j.1399-3054.1997.tb04790.x","volume":"100","author":"M Griffith","year":"1997","unstructured":"Griffith M, Antikainen M, Hon WC, et al. Antifreeze proteins in winter rye. Physiol Plant. 1997;100(2):327\u201332.","journal-title":"Physiol Plant"},{"issue":"1","key":"4251_CR3","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1002\/jcp.1030490103","volume":"49","author":"PF Scholander","year":"2010","unstructured":"Scholander PF, Dam LV, Kanwisher JW, et al. Supercooling and osmoregulation in arctic fish. J Cell Physiol. 2010;49(1):5\u201324.","journal-title":"J Cell Physiol"},{"issue":"8","key":"4251_CR4","doi-asserted-by":"publisher","first-page":"3485","DOI":"10.1073\/pnas.94.8.3485","volume":"94","author":"JM Logsdon","year":"1997","unstructured":"Logsdon JM, Doolittle WF. Origin of antifreeze protein genes: a cool tale in molecular evolution. Proc Natl Acad Sci. 1997;94(8):3485\u20137.","journal-title":"Proc Natl Acad Sci"},{"issue":"1423","key":"4251_CR5","doi-asserted-by":"publisher","first-page":"927","DOI":"10.1098\/rstb.2002.1081","volume":"357","author":"PL Davies","year":"2002","unstructured":"Davies PL, Baardsnes J, Kuiper MJ, et al. Structure and function of antifreeze proteins. Philos Trans R Soc Lond. 2002;357(1423):927\u201335.","journal-title":"Philos Trans R Soc Lond"},{"issue":"4","key":"4251_CR6","doi-asserted-by":"publisher","first-page":"1950029","DOI":"10.1142\/S021972001950029X","volume":"17","author":"F Yuan","year":"2019","unstructured":"Yuan F, Liu G, Yang XW, Wang SF, Wang XR. Prediction of oxidoreductase subfamily classes based on RFE-SND-CC-PSSM and machine learning methods. J Bioinform Comput Biol. 2019;17(4):1950029.","journal-title":"J Bioinform Comput Biol"},{"issue":"1","key":"4251_CR7","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1093\/bfgp\/elz036","volume":"19","author":"SW Sun","year":"2020","unstructured":"Sun SW, Wang CY, Ding H, Zou Q. Machine learning and its applications in plant molecular studies. Brief Funct Genomics. 2020;19(1):40\u20138.","journal-title":"Brief Funct Genomics"},{"issue":"17","key":"4251_CR8","doi-asserted-by":"publisher","first-page":"2756","DOI":"10.1093\/bioinformatics\/btx302","volume":"33","author":"J Wang","year":"2017","unstructured":"Wang J, Yang B, Revote J, et al. POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics. 2017;33(17):2756\u20138.","journal-title":"Bioinformatics"},{"issue":"25","key":"4251_CR9","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1186\/s12859-019-3276-5","volume":"20","author":"S Wang","year":"2019","unstructured":"Wang S, Wang X. Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion. BMC Bioinform. 2019;20(25):701.","journal-title":"BMC Bioinform"},{"key":"4251_CR10","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1016\/j.cplett.2012.02.030","volume":"531","author":"HJ Yu","year":"2012","unstructured":"Yu HJ, Huang DS. Novel 20-D descriptors of protein sequences and it\u2019s applications in similarity analysis. Chem Phys Lett. 2012;531:261\u20136.","journal-title":"Chem Phys Lett"},{"issue":"3","key":"4251_CR11","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1109\/TCBB.2019.2930993","volume":"17","author":"S Wang","year":"2020","unstructured":"Wang S, Cao Z, Li M, et al. G-DipC: an improved feature representation method for short sequences to predict the type of cargo in cell-penetrating peptides. IEEE\/ACM Trans Comput Biol Bioinf. 2020;17(3):739\u201347.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinf"},{"key":"4251_CR12","doi-asserted-by":"publisher","first-page":"212","DOI":"10.1016\/j.jpdc.2017.08.009","volume":"117","author":"LY Wei","year":"2018","unstructured":"Wei LY, Ding YJ, Su R, Tang JJ, Zou Q. Prediction of human protein subcellular localization using deep learning. J Parallel Distrib Comput. 2018;117:212\u20137.","journal-title":"J Parallel Distrib Comput"},{"key":"4251_CR13","doi-asserted-by":"publisher","unstructured":"Huang DS, Chi ZR. Finding complex roots of polynomials by feedforward neural networks. 2001;A13\u2013A18. https:\/\/doi.org\/10.1109\/IJCNN.2001.1016716.","DOI":"10.1109\/IJCNN.2001.1016716"},{"issue":"3","key":"4251_CR14","doi-asserted-by":"publisher","first-page":"428","DOI":"10.1016\/j.jtbi.2008.08.028","volume":"256","author":"RB Huang","year":"2009","unstructured":"Huang RB, Du QS, Wei YT, et al. Physics and chemistry-driven artificial neural network for predicting bioactivity of peptides and proteins and their design. J Theor Biol. 2009;256(3):428\u201335.","journal-title":"J Theor Biol"},{"key":"4251_CR15","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1016\/j.compbiolchem.2019.107094","volume":"81","author":"SF Wang","year":"2019","unstructured":"Wang SF, Li MY, Guo L, Cao ZC, Fei Y. Efficient utilization on PSSM combining with recurrent neural network for membrane protein types prediction. Comput Biol Chem. 2019;81:9\u201315.","journal-title":"Comput Biol Chem"},{"issue":"2","key":"4251_CR16","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1016\/j.jtbi.2008.02.031","volume":"253","author":"A Anand","year":"2008","unstructured":"Anand A, Pugalenthi G, Suganthan PN. Predicting protein structural class by SVM with class-wise optimized features and decision probabilities. J Theor Biol. 2008;253(2):375\u201380.","journal-title":"J Theor Biol"},{"issue":"3","key":"4251_CR17","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1504\/IJDMB.2013.056078","volume":"8","author":"Q Jiang","year":"2013","unstructured":"Jiang Q, Wang G, Jin S, et al. Predicting human microRNA-disease associations based on support vector machine. Int J Data Min Bioinform. 2013;8(3):282\u201393.","journal-title":"Int J Data Min Bioinform"},{"issue":"4","key":"4251_CR18","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1016\/j.jtbi.2008.10.026","volume":"256","author":"JD Qiu","year":"2009","unstructured":"Qiu JD, Luo SH, Huang JH, et al. Using support vector machines to distinguish enzymes: approached by incorporating wavelet transform. J Theor Biol. 2009;256(4):625\u201331.","journal-title":"J Theor Biol"},{"issue":"2","key":"4251_CR19","doi-asserted-by":"publisher","first-page":"478","DOI":"10.1093\/bioinformatics\/btz609","volume":"36","author":"Z Wen","year":"2020","unstructured":"Wen Z, He J, Huang SY. Topology-independent and global protein structure alignment through an FFT-based algorithm. Bioinformatics. 2020;36(2):478\u201386.","journal-title":"Bioinformatics"},{"issue":"12","key":"4251_CR20","doi-asserted-by":"publisher","first-page":"30343","DOI":"10.3390\/ijms161226237","volume":"16","author":"S Wang","year":"2015","unstructured":"Wang S, Liu S. Protein sub-nuclear localization based on effective fusion representations and dimension reduction algorithm LDA. Int J Mol Sci. 2015;16(12):30343\u201361.","journal-title":"Int J Mol Sci"},{"issue":"2","key":"4251_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.bbrc.2007.01.011","volume":"354","author":"H Lin","year":"2007","unstructured":"Lin H, Li QZ. Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun. 2007;354(2):1\u2013551.","journal-title":"Biochem Biophys Res Commun"},{"issue":"12","key":"4251_CR22","doi-asserted-by":"publisher","first-page":"2718","DOI":"10.3390\/ijms18122718","volume":"18","author":"S Wang","year":"2017","unstructured":"Wang S, Nie B, Yue K, et al. Protein subcellular localization with Gaussian kernel discriminant analysis and its kernel parameter selection. Int J Mol Sci. 2017;18(12):2718.","journal-title":"Int J Mol Sci"},{"issue":"1","key":"4251_CR23","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1109\/TCBB.2014.2351821","volume":"12","author":"G Yu","year":"2015","unstructured":"Yu G, Rangwala H, Domeniconi C, et al. Predicting protein function using multiple kernels. IEEE\/ACM Trans Comput Biol Bioinform. 2015;12(1):219\u201333.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"4251_CR24","doi-asserted-by":"crossref","unstructured":"Zhang T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on machine learning. 2004; p. 116.","DOI":"10.1145\/1015330.1015332"},{"key":"4251_CR25","doi-asserted-by":"crossref","unstructured":"Kabir F, Siddique S, Kotwal MRA, et al. Bangla text document categorization using stochastic gradient descent (sgd) classifier. In: 2015 International conference on cognitive computing and information processing (CCIP). IEEE, 2015; p. 1\u20134.","DOI":"10.1109\/CCIP.2015.7100687"},{"issue":"1","key":"4251_CR26","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1016\/j.jtbi.2010.10.037","volume":"270","author":"KK Kandaswamy","year":"2011","unstructured":"Kandaswamy KK, Chou KC, Martinetz T, et al. AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol. 2011;270(1):56\u201362.","journal-title":"J Theor Biol"},{"issue":"12","key":"4251_CR27","doi-asserted-by":"publisher","first-page":"2196","DOI":"10.3390\/ijms13022196","volume":"13","author":"X Zhao","year":"2012","unstructured":"Zhao X, Ma Z, Yin M. Using support vector machine and evolutionary profiles to predict antifreeze protein sequences. Int J Mol Sci. 2012;13(12):2196\u2013207.","journal-title":"Int J Mol Sci"},{"key":"4251_CR28","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1016\/j.jtbi.2014.04.006","volume":"356","author":"S Mondal","year":"2014","unstructured":"Mondal S, Pai PP. Chou\u2019s pseudo amino acid composition improves sequence-based antifreeze protein prediction. J Theor Biol. 2014;356:30\u20135.","journal-title":"J Theor Biol"},{"issue":"9","key":"4251_CR29","doi-asserted-by":"publisher","first-page":"21191","DOI":"10.3390\/ijms160921191","volume":"16","author":"Y Runtao","year":"2015","unstructured":"Runtao Y, Chengjin Z, Rui G, et al. An effective antifreeze protein predictor with ensemble classifiers and comprehensive sequence descriptors. Int J Mol Sci. 2015;16(9):21191\u2013214.","journal-title":"Int J Mol Sci"},{"issue":"6","key":"4251_CR30","doi-asserted-by":"publisher","first-page":"1005","DOI":"10.1007\/s00232-015-9811-z","volume":"248","author":"X He","year":"2015","unstructured":"He X, Han K, Hu J, Yan H, Yang JY, Shen HB, Yu DJ. TargetFreeze: identifying antifreeze proteins via a combination of weights using sequence evolutionary information and pseudo amino acid composition. J Membr Biol. 2015;248(6):1005\u201314.","journal-title":"J Membr Biol"},{"issue":"6","key":"4251_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s00232-016-9935-9","volume":"249","author":"X Xiao","year":"2016","unstructured":"Xiao X, Hui M, Liu Z. iAFP-Ense: an ensemble classifier for identifying antifreeze protein by incorporating grey model and PSSM into PseAAC. J Membr Biol. 2016;249(6):1\u201310.","journal-title":"J Membr Biol"},{"key":"4251_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2017\/9861752","volume":"2017","author":"R Pratiwi","year":"2017","unstructured":"Pratiwi R, Malik AA, Schaduangrat N, et al. CryoProtect: a web server for classifying antifreeze proteins from nonantifreeze proteins. J Chem. 2017;2017:1\u201315.","journal-title":"J Chem."},{"issue":"1","key":"4251_CR33","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1109\/TCBB.2016.2617337","volume":"15","author":"S Khan","year":"2016","unstructured":"Khan S, Naseem I, Togneri R, et al. RAFP-Pred: robust prediction of antifreeze proteins using localized analysis of n-peptide compositions. IEEE\/ACM Trans Comput Biol Bioinform. 2016;15(1):244\u201350.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"issue":"10","key":"4251_CR34","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1016\/j.neucom.2017.07.004","volume":"272","author":"A Nath","year":"2018","unstructured":"Nath A, Subbiah K. The role of pertinently diversified and balanced training as well as testing datasets in achieving the true performance of classifiers in predicting the antifreeze proteins. Neurocomputing. 2018;272(10):294\u2013305.","journal-title":"Neurocomputing"},{"key":"4251_CR35","doi-asserted-by":"crossref","unstructured":"Wang LY, Wang D, Chen YH. Prediction of protein subcellular multisite localization using a new feature extraction method. Genet Mol Res. 2016;15(3):gmr.15039013.","DOI":"10.4238\/gmr.15039013"},{"issue":"1","key":"4251_CR36","first-page":"1","volume":"21","author":"Q Zou","year":"2020","unstructured":"Zou Q, Lin G, Jiang XP, Liu XR, Zeng XX. Sequence clustering in bioinformatics: anempirical study. Brief Bioinform. 2020;21(1):1\u201310.","journal-title":"Brief Bioinform"},{"issue":"4","key":"4251_CR37","doi-asserted-by":"publisher","first-page":"0195636","DOI":"10.1371\/journal.pone.0195636","volume":"13","author":"SF Wang","year":"2018","unstructured":"Wang SF, Yue YT, Li XT. Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm. PLoS ONE. 2018;13(4):0195636.","journal-title":"PLoS ONE"},{"issue":"4","key":"4251_CR38","doi-asserted-by":"publisher","first-page":"2899","DOI":"10.1007\/s13369-018-03713-6","volume":"44","author":"S Lalwani","year":"2019","unstructured":"Lalwani S, Sharma H, Satapathy SC, et al. A survey on parallel particle swarm optimization algorithms. Arab J Sci Eng. 2019;44(4):2899\u2013923.","journal-title":"Arab J Sci Eng."},{"key":"4251_CR39","unstructured":"Zhang J, Huang DS, Liu KH. Multi-sub-swarm particle swarm optimization algorithm for multimodal function optimization. In: IEEE congress on evolutionary computation, 2007. CEC 2007. IEEE, 2007."},{"issue":"4","key":"4251_CR40","doi-asserted-by":"publisher","first-page":"1316","DOI":"10.1016\/j.patcog.2007.08.016","volume":"41","author":"DS Huang","year":"2008","unstructured":"Huang DS, Jia W, Zhang D. Palmprint verification based on principal lines. Pattern Recognit. 2008;41(4):1316\u201328.","journal-title":"Pattern Recognit"},{"issue":"9","key":"4251_CR41","doi-asserted-by":"publisher","first-page":"e56","DOI":"10.1093\/nar\/gky113","volume":"46","author":"Y Yan","year":"2018","unstructured":"Yan Y, Wen Z, Zhang D, et al. Determination of an effective scoring function for RNA\u2013RNA interactions with a physics-based double-iterative method. Nucleic Acids Res. 2018;46(9):e56\u2013e56.","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"4251_CR42","doi-asserted-by":"publisher","first-page":"1276","DOI":"10.1002\/med.21658","volume":"40","author":"S Basith","year":"2020","unstructured":"Basith S, Manavalan B, Hwan Shin T, et al. Machine intelligence in peptide therapeutics: a next-generation tool for rapid disease screening. Med Res Rev. 2020;40(4):1276\u2013314.","journal-title":"Med Res Rev."},{"key":"4251_CR43","doi-asserted-by":"publisher","first-page":"882","DOI":"10.1016\/j.omtn.2020.05.006","volume":"20","author":"J Yan","year":"2020","unstructured":"Yan J, Bhadra P, Li A, et al. Deep-AmPEP30: improve short antimicrobial peptides prediction with deep learning. Mol Ther-Nucleic Acids. 2020;20:882\u201394.","journal-title":"Mol Ther-Nucleic Acids."},{"issue":"2","key":"4251_CR44","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.bbrc.2007.06.027","volume":"360","author":"KC Chou","year":"2007","unstructured":"Chou KC, Shen HB. MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun. 2007;360(2):1\u2013345.","journal-title":"Biochem Biophys Res Commun"},{"issue":"3","key":"4251_CR45","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1002\/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L","volume":"28","author":"EL Sonnhammer","year":"1997","unstructured":"Sonnhammer EL, Eddy SR, Durbin R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 1997;28(3):405\u201320.","journal-title":"Proteins"},{"issue":"17","key":"4251_CR46","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389\u2013402.","journal-title":"Nucleic Acids Res"},{"issue":"13","key":"4251_CR47","doi-asserted-by":"publisher","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","volume":"22","author":"W Li","year":"2006","unstructured":"Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658.","journal-title":"Bioinformatics"},{"issue":"D1","key":"4251_CR48","doi-asserted-by":"publisher","first-page":"D1098","DOI":"10.1093\/nar\/gkv1266","volume":"44","author":"P Agrawal","year":"2016","unstructured":"Agrawal P, Bhalla S, Usmani SS, et al. CPPsite 2.0: a repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res. 2016;44(D1):D1098\u2013103.","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"4251_CR49","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1016\/j.ygeno.2013.05.006","volume":"102","author":"J Zahiri","year":"2013","unstructured":"Zahiri J, Yaghoubi O, Mohammad-Noori M, et al. PPIevo: protein-protein interaction prediction from PSSM based evolutionary information. Genomics. 2013;102(4):237\u201342.","journal-title":"Genomics"},{"key":"4251_CR50","first-page":"86","volume":"2014","author":"X Wang","year":"2014","unstructured":"Wang X, Li GZ, Zhang QW, Huang DS. MultiP-SChlo: multi-label protein subchloroplast localization prediction. IEEE. 2014;2014:86\u20139.","journal-title":"IEEE"},{"key":"4251_CR51","first-page":"129","volume":"2","author":"K Kira","year":"1992","unstructured":"Kira K, Rendell LA. The feature selection problem: traditional methods and a new algorithm. Aaai. 1992;2:129\u201334.","journal-title":"Aaai"},{"key":"4251_CR52","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1016\/j.future.2018.01.006","volume":"82","author":"W Peng","year":"2018","unstructured":"Peng W, Chen A, Chen J. Using general master equation for feature fusion. Future Gen Comput Syst. 2018;82:119\u201326.","journal-title":"Future Gen Comput Syst"},{"issue":"2","key":"4251_CR53","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","volume":"5","author":"DH Wolpert","year":"1992","unstructured":"Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241\u201359.","journal-title":"Neural Netw"},{"issue":"1","key":"4251_CR54","first-page":"273","volume":"73","author":"RJ Tibshirani","year":"1996","unstructured":"Tibshirani RJ. Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B Methodol. 1996;73(1):273\u201382.","journal-title":"J R Stat Soc Ser B Methodol"},{"issue":"1","key":"4251_CR55","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","volume":"12","author":"AE Hoerl","year":"1970","unstructured":"Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55\u201367.","journal-title":"Technometrics"},{"issue":"6062","key":"4251_CR56","doi-asserted-by":"publisher","first-page":"1518","DOI":"10.1126\/science.1205438","volume":"334","author":"DN Reshef","year":"2011","unstructured":"Reshef DN, Reshef YA, Finucane HK, et al. Detecting novel associations in large datasets. Science. 2011;334(6062):1518\u201324.","journal-title":"Science"},{"key":"4251_CR57","doi-asserted-by":"crossref","unstructured":"Kononenko, I. Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of European conference on machine learning, 1994; p. 171\u201382.","DOI":"10.1007\/3-540-57868-4_57"},{"issue":"1","key":"4251_CR58","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1016\/j.jtbi.2010.12.024","volume":"273","author":"KC Chou","year":"2011","unstructured":"Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2011;273(1):236\u201347.","journal-title":"J Theor Biol"},{"issue":"8","key":"4251_CR59","doi-asserted-by":"publisher","first-page":"1304","DOI":"10.1109\/TNNLS.2012.2199516","volume":"23","author":"JG Moreno-Torres","year":"2012","unstructured":"Moreno-Torres JG, Saez JA, Herrera F. Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans Neural Netw Learn Syst. 2012;23(8):1304\u201312.","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"6","key":"4251_CR60","doi-asserted-by":"publisher","first-page":"1448","DOI":"10.3390\/molecules23061448","volume":"23","author":"Z Jian","year":"2018","unstructured":"Jian Z, Haiting C, Song G, et al. High-throughput identification of mammalian secreted proteins using species-specific scheme and application to human proteome. Molecules. 2018;23(6):1448.","journal-title":"Molecules"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04251-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04251-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04251-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,4]],"date-time":"2023-02-04T08:05:29Z","timestamp":1675497929000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04251-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,23]]},"references-count":60,"journal-issue":{"issue":"S3","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["4251"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04251-z","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,23]]},"assertion":[{"value":"4 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 June 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 June 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"340"}}