{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T09:07:56Z","timestamp":1781773676071,"version":"3.54.5"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2022,5,21]],"date-time":"2022-05-21T00:00:00Z","timestamp":1653091200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Proteins\/peptides have shown to be promising therapeutic agents for a variety of diseases. However, toxicity is one of the obstacles in protein\/peptide-based therapy. The current study describes a web-based tool, ToxinPred2, developed for predicting the toxicity of proteins. This is an update of ToxinPred developed mainly for predicting toxicity of peptides and small proteins. The method has been trained, tested and evaluated on three datasets curated from the recent release of the SwissProt. To provide unbiased evaluation, we performed internal validation on 80% of the data and external validation on the remaining 20% of data. We have implemented the following techniques for predicting protein toxicity; (i) Basic Local Alignment Search Tool-based similarity, (ii) Motif-EmeRging and with Classes-Identification-based motif search and (iii) Prediction models. Similarity and motif-based techniques achieved a high probability of correct prediction with poor sensitivity\/coverage, whereas models based on machine-learning techniques achieved balance sensitivity and specificity with reasonably high accuracy. Finally, we developed a hybrid method that combined all three approaches and achieved a maximum area under receiver operating characteristic curve around 0.99 with Matthews correlation coefficient 0.91 on the validation dataset. In addition, we developed models on alternate and realistic datasets. The best machine learning models have been implemented in the web server named \u2018ToxinPred2\u2019, which is available at https:\/\/webs.iiitd.edu.in\/raghava\/toxinpred2\/ and a standalone version at https:\/\/github.com\/raghavagps\/toxinpred2. This is a general method developed for predicting the toxicity of proteins regardless of their source of origin.<\/jats:p>","DOI":"10.1093\/bib\/bbac174","type":"journal-article","created":{"date-parts":[[2022,4,20]],"date-time":"2022-04-20T11:13:03Z","timestamp":1650453183000},"source":"Crossref","is-referenced-by-count":313,"title":["ToxinPred2: an improved method for predicting toxicity of proteins"],"prefix":"10.1093","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1765-3644","authenticated-orcid":false,"given":"Neelam","family":"Sharma","sequence":"first","affiliation":[{"name":"Department of Computational Biology , , Okhla Phase 3, New Delhi-110020 , India"},{"name":"Indraprastha Institute of Information Technology , , Okhla Phase 3, New Delhi-110020 , India"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2454-2862","authenticated-orcid":false,"given":"Leimarembi Devi","family":"Naorem","sequence":"additional","affiliation":[{"name":"Department of Computational Biology , , Okhla Phase 3, New Delhi-110020 , India"},{"name":"Indraprastha Institute of Information Technology , , Okhla Phase 3, New Delhi-110020 , India"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7045-5188","authenticated-orcid":false,"given":"Shipra","family":"Jain","sequence":"additional","affiliation":[{"name":"Department of Computational Biology , , Okhla Phase 3, New Delhi-110020 , India"},{"name":"Indraprastha Institute of Information Technology , , Okhla Phase 3, New Delhi-110020 , India"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8902-2876","authenticated-orcid":false,"given":"Gajendra P S","family":"Raghava","sequence":"additional","affiliation":[{"name":"Department of Computational Biology , , Okhla Phase 3, New Delhi-110020 , India"},{"name":"Indraprastha Institute of Information Technology , , Okhla Phase 3, New Delhi-110020 , India"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2022,5,21]]},"reference":[{"key":"2022092013200408900_ref1","first-page":"651","article-title":"Protein\/peptide drug delivery systems. Basic fundam","author":"Deb","year":"2019","journal-title":"Drug Deliv"},{"key":"2022092013200408900_ref2","doi-asserted-by":"crossref","first-page":"165","DOI":"10.4321\/S2340-98942015000300006","article-title":"Protein and peptide in drug targeting and its therapeutic approach","volume":"56","author":"Keservani","year":"2015","journal-title":"Ars Pharm"},{"key":"2022092013200408900_ref3","doi-asserted-by":"crossref","first-page":"1443","DOI":"10.4155\/tde.13.104","article-title":"Basics and recent advances in peptide and protein drug delivery","volume":"4","author":"Bruno","year":"2013","journal-title":"Ther Deliv"},{"key":"2022092013200408900_ref4","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1016\/j.drudis.2014.10.003","article-title":"Peptide therapeutics: current status and future directions","volume":"20","author":"Fosgerau","year":"2015","journal-title":"Drug Discov Today"},{"key":"2022092013200408900_ref5","doi-asserted-by":"crossref","first-page":"e0181748","DOI":"10.1371\/journal.pone.0181748","article-title":"THPdb: database of FDA-approved peptide and protein therapeutics","volume":"12","author":"Usmani","year":"2017","journal-title":"PLoS One"},{"key":"2022092013200408900_ref6","doi-asserted-by":"crossref","first-page":"62","DOI":"10.3389\/fchem.2014.00062","article-title":"Current challenges in peptide-based drug discovery","volume":"2","author":"Otvos","year":"2014","journal-title":"Front Chem"},{"key":"2022092013200408900_ref7","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1016\/j.tibs.2018.12.004","article-title":"Friends or foes? Emerging impacts of biological toxins","volume":"44","author":"Clark","year":"2019","journal-title":"Trends Biochem Sci"},{"key":"2022092013200408900_ref8","doi-asserted-by":"crossref","first-page":"903295","DOI":"10.1155\/2010\/903295","article-title":"Scorpion venom and the inflammatory response","volume":"2010","author":"Petricevich","year":"2010","journal-title":"Mediators Inflamm"},{"key":"2022092013200408900_ref9","doi-asserted-by":"crossref","first-page":"570","DOI":"10.1016\/j.tips.2020.05.006","article-title":"Causes and consequences of snake venom variation","volume":"41","author":"Casewell","year":"2020","journal-title":"Trends Pharmacol Sci"},{"key":"2022092013200408900_ref10","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1111\/bjh.14591","article-title":"Haemotoxic snake venoms: their functional activity, impact on snakebite victims and pharmaceutical promise","volume":"177","author":"Slagboom","year":"2017","journal-title":"Br J Haematol"},{"key":"2022092013200408900_ref11","first-page":"e1437","article-title":"Computational resources in healthcare. WIREs Data Min","author":"Sharma","year":"2021","journal-title":"Knowl Discov"},{"key":"2022092013200408900_ref12","doi-asserted-by":"crossref","first-page":"e1516","DOI":"10.1002\/wcms.1516","article-title":"Toxicity prediction based on artificial intelligence: a multidisciplinary overview","author":"P\u00e9rez Sant\u00edn","year":"2021","journal-title":"WIREs Comput Mol Sci"},{"key":"2022092013200408900_ref13","doi-asserted-by":"crossref","first-page":"80","DOI":"10.3389\/fenvs.2015.00080","article-title":"DeepTox: toxicity prediction using deep learning","volume":"3","author":"Mayr","year":"2016","journal-title":"Front Environ Sci"},{"key":"2022092013200408900_ref14","doi-asserted-by":"crossref","first-page":"W257","DOI":"10.1093\/nar\/gky318","article-title":"ProTox-II: a webserver for the prediction of toxicity of chemicals","volume":"46","author":"Banerjee","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2022092013200408900_ref15","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/s40360-018-0282-6","article-title":"eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates","volume":"20","author":"Pu","year":"2019","journal-title":"BMC Pharmacol Toxicol"},{"key":"2022092013200408900_ref16","first-page":"405","article-title":"BTXpred: prediction of bacterial toxins","volume":"7","author":"Saha","year":"2007","journal-title":"In Silico Biol"},{"key":"2022092013200408900_ref17","first-page":"369","article-title":"Prediction of neurotoxins based on their function and source","volume":"7","author":"Saha","year":"2007","journal-title":"In Silico Biol"},{"key":"2022092013200408900_ref18","doi-asserted-by":"crossref","first-page":"W363","DOI":"10.1093\/nar\/gkp299","article-title":"ClanTox: a classifier of short animal toxins","volume":"37","author":"Naamati","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2022092013200408900_ref19","doi-asserted-by":"crossref","first-page":"e66279","DOI":"10.1371\/journal.pone.0066279","article-title":"SVM-based prediction of propeptide cleavage sites in spider toxins identifies toxin innovation in an Australian tarantula","volume":"8","author":"Wong","year":"2013","journal-title":"PLoS One"},{"key":"2022092013200408900_ref20","doi-asserted-by":"crossref","first-page":"e90","DOI":"10.7717\/peerj-cs.90","article-title":"Machine learning can differentiate venom toxins from other proteins having non-toxic physiological functions","volume":"2","author":"Gacesa","year":"2016","journal-title":"PeerJ Comput Sci"},{"key":"2022092013200408900_ref21","doi-asserted-by":"crossref","first-page":"e7200","DOI":"10.7717\/peerj.7200","article-title":"TOXIFY: a deep learning approach to classify animal venom proteins","volume":"7","author":"Cole","year":"2019","journal-title":"PeerJ"},{"key":"2022092013200408900_ref22","doi-asserted-by":"crossref","first-page":"5159","DOI":"10.1093\/bioinformatics\/btaa656","article-title":"ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity","volume":"36","author":"Pan","year":"2021","journal-title":"Bioinformatics"},{"key":"2022092013200408900_ref23","doi-asserted-by":"crossref","first-page":"e73957","DOI":"10.1371\/journal.pone.0073957","article-title":"In silico approach for predicting toxicity of peptides and proteins","volume":"8","author":"Gupta","year":"2013","journal-title":"PLoS One"},{"key":"2022092013200408900_ref24","doi-asserted-by":"crossref","first-page":"17923","DOI":"10.1038\/s41598-019-54405-6","article-title":"NNTox: gene ontology-based protein toxicity prediction using neural network","volume":"9","author":"Jain","year":"2019","journal-title":"Sci Rep"},{"key":"2022092013200408900_ref25","doi-asserted-by":"crossref","first-page":"bbab041","DOI":"10.1093\/bib\/bbab041","article-title":"ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism","volume":"5","author":"Wei","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013200408900_ref26","doi-asserted-by":"crossref","first-page":"1514","DOI":"10.1093\/bioinformatics\/btac006","article-title":"ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning","volume":"6","author":"Wei","year":"2022","journal-title":"Bioinformatics"},{"key":"2022092013200408900_ref27","first-page":"21","article-title":"ToxiPred: a server for prediction of aqueous toxicity of small chemical molecules in T","volume":"1","author":"Mishra","year":"2014","journal-title":"Pyriformis. J Transl Toxicol"},{"key":"2022092013200408900_ref28","doi-asserted-by":"crossref","first-page":"22843","DOI":"10.1038\/srep22843","article-title":"A web server and mobile app for computing hemolytic potency of peptides","volume":"6","author":"Chaudhary","year":"2016","journal-title":"Sci Rep"},{"key":"2022092013200408900_ref29","doi-asserted-by":"crossref","first-page":"275","DOI":"10.4155\/fmc-2016-0188","article-title":"HemoPred: a web server for predicting the hemolytic activity of peptides","volume":"9","author":"Win","year":"2017","journal-title":"Future Med Chem"},{"key":"2022092013200408900_ref30","doi-asserted-by":"crossref","first-page":"880","DOI":"10.3389\/fphar.2017.00880","article-title":"ToxiM: a toxicity prediction tool for small molecules developed using machine learning and chemoinformatics approaches","volume":"8","author":"Sharma","year":"2017","journal-title":"Front Pharmacol"},{"key":"2022092013200408900_ref31","doi-asserted-by":"crossref","first-page":"e0191838","DOI":"10.1371\/journal.pone.0191838","article-title":"CLC-Pred: a freely available web-service for in silico prediction of human cell line cytotoxicity for drug-like compounds","volume":"13","author":"Lagunin","year":"2018","journal-title":"PLoS One"},{"key":"2022092013200408900_ref32","doi-asserted-by":"crossref","first-page":"3350","DOI":"10.1093\/bioinformatics\/btaa160","article-title":"HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation","volume":"36","author":"Hasan","year":"2020","journal-title":"Bioinformatics"},{"key":"2022092013200408900_ref33","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"UniProt Consortium","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022092013200408900_ref34","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1093\/nar\/28.1.45","article-title":"The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000","volume":"28","author":"Bairoch","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2022092013200408900_ref35","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2022092013200408900_ref36","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2022092013200408900_ref37","doi-asserted-by":"crossref","first-page":"W202","DOI":"10.1093\/nar\/gkl343","article-title":"AlgPred: prediction of allergenic proteins and mapping of IgE epitopes","volume":"34","author":"Saha","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2022092013200408900_ref38","doi-asserted-by":"crossref","first-page":"bbaa294","DOI":"10.1093\/bib\/bbaa294","article-title":"AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes","volume":"22","author":"Sharma","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013200408900_ref39","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1093\/bioinformatics\/btr110","article-title":"Identifying discriminative classification-based motifs in biological sequences","volume":"27","author":"Vens","year":"2011","journal-title":"Bioinformatics"},{"key":"2022092013200408900_ref40","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1016\/j.gpb.2019.04.004","article-title":"iLBE for computational identification of linear B-cell epitopes by integrating sequence and evolutionary features","volume":"18","author":"Hasan","year":"2020","journal-title":"Genom Proteom Bioinform"},{"key":"2022092013200408900_ref41","doi-asserted-by":"crossref","first-page":"1229","DOI":"10.1007\/s10822-020-00343-9","article-title":"ProIn-Fuse: improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations","volume":"34","author":"Khatun","year":"2020","journal-title":"J Comput Aided Mol Des"},{"key":"2022092013200408900_ref42","first-page":"599126","article-title":"Computing wide range of protein\/peptide features from their sequence and structure","author":"Pande","year":"2019","journal-title":"bioRxiv"},{"key":"2022092013200408900_ref43","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2022092013200408900_ref44","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1186\/1471-2105-8-463","article-title":"Identification of DNA-binding proteins using support vector machines and evolutionary profiles","volume":"8","author":"Kumar","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2022092013200408900_ref45","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2022092013200408900_ref46","first-page":"3146","article-title":"Lightgbm: a highly efficient gradient boosting decision tree","volume":"30","author":"Ke","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2022092013200408900_ref47","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","article-title":"Extremely randomized trees","volume":"63","author":"Geurts","year":"2006","journal-title":"Mach Learn"},{"key":"2022092013200408900_ref48","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1001\/jama.2016.7653","article-title":"Logistic regression: relating patient characteristics to outcomes","volume":"316","author":"Tolles","year":"2016","journal-title":"J Am Med Assoc"},{"key":"2022092013200408900_ref49","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1142\/S0218001405003983","article-title":"Exploring conditions for the optimality of Naive Bayes","volume":"19","author":"Zhang","year":"2005","journal-title":"Int J Pattern Recognit Artif Intell"},{"key":"2022092013200408900_ref50","first-page":"263","article-title":"Decision tree","volume":"63","author":"F\u00fcrnkranz","year":"2011","journal-title":"Encycl Mach Learn"},{"key":"2022092013200408900_ref51","first-page":"83","volume-title":"k-nearest neighbor classification. In: Data Mining in Agriculture","author":"Mucherino","year":"2009"},{"key":"2022092013200408900_ref52","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1145\/2939672.2939785","article-title":"XGBoost: a scalable tree boosting system. Proc. 22nd ACM SIGKDD","author":"Chen","year":"2016","journal-title":"Int Conf Knowl Discov Data Min"},{"key":"2022092013200408900_ref53","doi-asserted-by":"crossref","first-page":"1083","DOI":"10.1016\/j.procs.2013.05.137","article-title":"Knowledge-based support vector classification based on C-SVC","volume":"17","author":"Zhang","year":"2013","journal-title":"Proc Comput Sci"},{"key":"2022092013200408900_ref54","doi-asserted-by":"crossref","first-page":"bbaa153","DOI":"10.1093\/bib\/bbaa153","article-title":"AntiCP 2.0: an updated model for predicting anticancer peptides","volume":"22","author":"Agrawal","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013200408900_ref55","doi-asserted-by":"crossref","first-page":"104746","DOI":"10.1016\/j.compbiomed.2021.104746","article-title":"ChAlPred: a web server for prediction of allergenicity of chemical compounds","volume":"136","author":"Sharma","year":"2021","journal-title":"Comput Biol Med"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac174\/45937235\/bbac174.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac174\/45937235\/bbac174.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T06:49:50Z","timestamp":1700462990000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac174\/6590152"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,21]]},"references-count":55,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac174","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9]]},"published":{"date-parts":[[2022,5,21]]},"article-number":"bbac174"}}