{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T21:51:35Z","timestamp":1772833895998,"version":"3.50.1"},"reference-count":55,"publisher":"World Scientific Pub Co Pte Ltd","issue":"03","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Wavelets Multiresolut Inf. Process."],"published-print":{"date-parts":[[2023,5]]},"abstract":"<jats:p>A gene is a basic unit of congenital traits and a sequence of nucleotides in deoxyribonucleic acid that encrypts protein synthesis. Proteins are made up of amino acid residue and are classified for use in protein-related research, which includes identifying changes in genes, finding associations with diseases and phenotypes, and identifying potential drug targets. To this end, proteins are studied and classified, based on the family. For family prediction, however, a computational rather than an experimental approach is introduced, owing to the time involved in the latter process. Computational approaches to protein family prediction involve two important processes, feature selection and classification. Existing approaches to protein family prediction are alignment-based and alignment-free. The drawback of the former is that it searches for protein signatures by aligning every available sequence. Consequently, the latter alignment-free approach is taken for study, given that it only needs sequence-based features to predict the protein family and is far more efficient than the former. Nevertheless, the sequence-based characteristics taken for study have additional features to offer. There is, thus, a need to select the best features of all. When comes to classification still there is no perfection in classifying the protein. So, a comparison of different approaches is done to find the best feature selection technique and classification technique for protein family prediction. From the study, the feature subset selected provides the best classification accuracy of 96% for filter-based feature selection technique and the random forest classifier.<\/jats:p>","DOI":"10.1142\/s021969132250045x","type":"journal-article","created":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T06:40:11Z","timestamp":1674628811000},"source":"Crossref","is-referenced-by-count":2,"title":["Calibrating the classifier for protein family prediction with protein sequence using machine learning techniques: An empirical investigation"],"prefix":"10.1142","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6339-8165","authenticated-orcid":false,"given":"T.","family":"Idhaya","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Abhishekapatti, Tirunelveli, Tamil Nadu, India"}]},{"given":"A.","family":"Suruliandi","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Abhishekapatti, Tirunelveli, Tamil Nadu, India"}]},{"given":"Dragos","family":"Calitoiu","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Carleton University, Canada"}]},{"given":"S. P.","family":"Raja","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India"}]}],"member":"219","published-online":{"date-parts":[[2023,1,25]]},"reference":[{"key":"S021969132250045XBIB001","volume-title":"Molecular Biology of the Cell","author":"Alberts B.","year":"2002","edition":"4"},{"key":"S021969132250045XBIB002","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1186\/1471-2105-7-389","volume":"7","author":"Beckstette M.","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"S021969132250045XBIB003","doi-asserted-by":"crossref","first-page":"23262","DOI":"10.1074\/jbc.M401932200","volume":"279","author":"Bhasin M.","year":"2004","journal-title":"J. Biol. Chem."},{"key":"S021969132250045XBIB004","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1111\/j.1399-3011.1988.tb01258.x","volume":"32","author":"Bhaskaran R.","year":"1988","journal-title":"Int. J. Pept. Protein Res."},{"key":"S021969132250045XBIB005","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/0022-5193(67)90004-5","volume":"16","author":"Bigelow C. C.","year":"1967","journal-title":"J. Theor. Biol."},{"key":"S021969132250045XBIB006","first-page":"71","volume":"19","author":"Broto P.","year":"1984","journal-title":"Eur. J. Med. Chem."},{"issue":"1","key":"S021969132250045XBIB007","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1477-5956-10-1","volume":"10","author":"Caragea C.","year":"2012","journal-title":"Proteome Sci."},{"issue":"1","key":"S021969132250045XBIB008","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.compeleceng.2013.11.024","volume":"40","author":"Chandrashekar G.","year":"2014","journal-title":"Comput. Electr. Eng."},{"key":"S021969132250045XBIB009","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/0022-5193(81)90377-5","volume":"91","author":"Charton M.","year":"1981","journal-title":"J. Theor. Biol."},{"key":"S021969132250045XBIB010","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1016\/0022-5193(82)90191-6","volume":"99","author":"Charton M.","year":"1982","journal-title":"J. Theor. Biol."},{"key":"S021969132250045XBIB011","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/0022-2836(76)90191-1","volume":"105","author":"Chothia C.","year":"1976","journal-title":"J. Mol. Biol."},{"key":"S021969132250045XBIB012","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1006\/bbrc.2000.3815","volume":"278","author":"Chou K. C.","year":"2000","journal-title":"Biochem. Biophys. Res. Commun."},{"key":"S021969132250045XBIB013","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.1016\/j.bbrc.2004.06.073","volume":"320","author":"Chou K. C.","year":"2004","journal-title":"Biochem. Biophys. Res. Commun."},{"key":"S021969132250045XBIB014","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1093\/protein\/5.5.373","volume":"5","author":"Cid H.","year":"1992","journal-title":"Protein Eng."},{"key":"S021969132250045XBIB015","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1007\/978-3-540-30116-5_14","volume-title":"Knowledge Discovery in Databases","volume":"3202","author":"Cohen I.","year":"2004"},{"issue":"20","key":"S021969132250045XBIB016","doi-asserted-by":"crossref","first-page":"jcs226639","DOI":"10.1242\/jcs.226639","volume":"132","author":"Cruz-Acu\u00f1a R.","year":"2019","journal-title":"J. Cell Sci."},{"issue":"16","key":"S021969132250045XBIB017","doi-asserted-by":"crossref","first-page":"2800","DOI":"10.1002\/pmic.200700093","volume":"7","author":"Davies M. N.","year":"2007","journal-title":"Proteomic"},{"key":"S021969132250045XBIB018","first-page":"363","volume-title":"Altas of Protein Sequence and Structure","volume":"5","author":"Dayhoff H.","year":"1978"},{"key":"S021969132250045XBIB019","first-page":"163","volume-title":"Computational Intelligence in Data Mining","volume":"2","author":"Dongardive J.","year":"2015"},{"key":"S021969132250045XBIB020","doi-asserted-by":"crossref","first-page":"8700","DOI":"10.1073\/pnas.92.19.8700","volume":"92","author":"Dubchak I.","year":"1995","journal-title":"Proc. Natl. Acad. Sci. U. S. A."},{"key":"S021969132250045XBIB021","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1002\/(SICI)1097-0134(19990601)35:4<401::AID-PROT3>3.0.CO;2-K","volume":"35","author":"Dubchak I.","year":"1999","journal-title":"Proteins"},{"key":"S021969132250045XBIB022","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1023\/A:1007091128394","volume":"19","author":"Feng Z. P.","year":"2000","journal-title":"J. Protein Chem."},{"issue":"2","key":"S021969132250045XBIB023","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/S0021-9673(98)00721-3","volume":"826","author":"Fountoulakis M.","year":"1998","journal-title":"J. Chromatogr. A"},{"issue":"2","key":"S021969132250045XBIB024","first-page":"129","volume":"8","author":"Garg A.","year":"2008","journal-title":"Silico Biol."},{"key":"S021969132250045XBIB025","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2015\/978193","volume":"2015","author":"Geng H.","year":"2015","journal-title":"Biochem. Res. Int."},{"key":"S021969132250045XBIB026","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1126\/science.185.4154.862","volume":"185","author":"Grantham R.","year":"1974","journal-title":"Science"},{"key":"S021969132250045XBIB027","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1002\/bip.360270308","volume":"27","author":"Horne D. S.","year":"1988","journal-title":"Biopolymers"},{"key":"S021969132250045XBIB028","first-page":"79","volume":"8","author":"Hu J.","year":"2012","journal-title":"Evolut. Bioinformatics"},{"key":"S021969132250045XBIB029","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1093\/bioinformatics\/17.8.721","volume":"17","author":"Hua S.","year":"2001","journal-title":"Bioinformatics"},{"key":"S021969132250045XBIB030","first-page":"985","volume-title":"Proc. IEEE Int. Joint Conf. Neural Networks","author":"Huang G.-B.","year":"2004"},{"issue":"1","key":"S021969132250045XBIB031","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1016\/j.neucom.2005.12.126","volume":"70","author":"Huang G.-B.","year":"2006","journal-title":"Neurocomputing"},{"key":"S021969132250045XBIB032","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1471-2105-13-S17-S3","volume":"13","author":"Huang H.-L.","year":"2012","journal-title":"BMC Bioinformatics"},{"issue":"4196","key":"S021969132250045XBIB033","doi-asserted-by":"crossref","first-page":"50-1","DOI":"10.1126\/science.237322","volume":"189","author":"Jukes T. H.","year":"1975","journal-title":"Science"},{"key":"S021969132250045XBIB034","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","volume":"28","author":"Kanehisa M.","year":"2000","journal-title":"Nucl. Acids Res."},{"key":"S021969132250045XBIB035","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/nar\/28.1.374","volume":"28","author":"Kawashima S.","year":"2000","journal-title":"Nucl. Acids Res."},{"key":"S021969132250045XBIB036","first-page":"S166","author":"Leo Dencelin X.","year":"2016","journal-title":"Biomed. Res."},{"issue":"8","key":"S021969132250045XBIB037","doi-asserted-by":"crossref","first-page":"e0155290","DOI":"10.1371\/journal.pone.0155290","volume":"11","author":"Li Y. H.","year":"2016","journal-title":"PLoS"},{"key":"S021969132250045XBIB038","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1023\/A:1010967008838","volume":"20","author":"Lin Z.","year":"2001","journal-title":"J. Protein Chem."},{"key":"S021969132250045XBIB039","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1002\/prot.20605","volume":"62","author":"Lin H. H.","year":"2006","journal-title":"Proteins"},{"key":"S021969132250045XBIB041","doi-asserted-by":"crossref","first-page":"215","DOI":"10.3389\/fbioe.2019.00215","volume":"7","author":"Lv Z.","year":"2019","journal-title":"Front. Bioeng. Biotechnol."},{"issue":"15","key":"S021969132250045XBIB042","doi-asserted-by":"crossref","first-page":"1841","DOI":"10.1093\/bioinformatics\/btq302","volume":"26","author":"Murakami Y.","year":"2010","journal-title":"Bioinformatics"},{"issue":"1","key":"S021969132250045XBIB043","volume":"11","author":"Nijil R. N.","year":"2018","journal-title":"Biomed. Pharmacol. J."},{"key":"S021969132250045XBIB044","doi-asserted-by":"crossref","first-page":"3.1.1","DOI":"10.1002\/0471250953.bi0301s42","volume":"42","author":"Pearson W. R.","year":"2013","journal-title":"Curr. Protoc. Bioinformatics"},{"issue":"19","key":"S021969132250045XBIB047","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","volume":"23","author":"Saeys Y.","year":"2007","journal-title":"Bioinformatics"},{"key":"S021969132250045XBIB048","volume-title":"Biochemistry, Primary Protein Structure","author":"Sanvictores T.","year":"2022"},{"key":"S021969132250045XBIB049","series-title":"Studies in Computational Intelligence","volume-title":"Decision Tree Classifier for Classification of Proteins Using the Protein Data Bank. Integrated Intelligent Computing, Communication and Security","volume":"771","author":"Satpute B. S.","year":"2019"},{"key":"S021969132250045XBIB050","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1016\/S0006-3495(94)80782-9","volume":"66","author":"Schneider G.","year":"1994","journal-title":"Biophys. J."},{"key":"S021969132250045XBIB051","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1002\/prot.10290","volume":"50","author":"Shepherd A. J.","year":"2003","journal-title":"Proteins"},{"key":"S021969132250045XBIB052","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1002\/ajpa.20250","volume":"129","author":"Sokal R. R.","year":"2006","journal-title":"Am. J. Phys. Anthropol."},{"key":"S021969132250045XBIB053","doi-asserted-by":"crossref","first-page":"16380","DOI":"10.1038\/s41598-019-52532-8","volume":"9","author":"Trivedi R.","year":"2019","journal-title":"Sci. Rep."},{"key":"S021969132250045XBIB054","doi-asserted-by":"crossref","first-page":"898090","DOI":"10.1155\/2013\/898090","volume":"2013","author":"Vipsita S.","year":"2013","journal-title":"Comput. Biol. J."},{"key":"S021969132250045XBIB055","first-page":"1406","volume-title":"Proc. Int. Joint Conf. Neural Networks (IJCNN\u201905)","volume":"3","author":"Wang D.","year":"2005"},{"issue":"1","key":"S021969132250045XBIB056","first-page":"53","volume":"1","author":"Wang D.","year":"2003","journal-title":"Inf. Process. Lett. Rev."},{"key":"S021969132250045XBIB057","first-page":"764","volume-title":"Proc. 9th Int. Conf. Neural Information Processing","volume":"2","author":"Wang D.","year":"2002"},{"key":"S021969132250045XBIB058","first-page":"177","volume-title":"Proc. 6th Asia-Pacific Bioinformatics Conf. (APBC\u201908)","volume":"6","author":"Yang Y.","year":"2008"}],"container-title":["International Journal of Wavelets, Multiresolution and Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S021969132250045X","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T15:17:57Z","timestamp":1701789477000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S021969132250045X"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,25]]},"references-count":55,"journal-issue":{"issue":"03","published-print":{"date-parts":[[2023,5]]}},"alternative-id":["10.1142\/S021969132250045X"],"URL":"https:\/\/doi.org\/10.1142\/s021969132250045x","relation":{},"ISSN":["0219-6913","1793-690X"],"issn-type":[{"value":"0219-6913","type":"print"},{"value":"1793-690X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,25]]},"article-number":"2250045"}}