{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,22]],"date-time":"2026-06-22T10:21:05Z","timestamp":1782123665078,"version":"3.54.5"},"reference-count":116,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2018,9,4]],"date-time":"2018-09-04T00:00:00Z","timestamp":1536019200000},"content-version":"vor","delay-in-days":6,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100000925","name":"National Health and Medical Research Council of Australia","doi-asserted-by":"publisher","award":["1092262"],"award-info":[{"award-number":["1092262"]}],"id":[{"id":"10.13039\/501100000925","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["LP110200333"],"award-info":[{"award-number":["LP110200333"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["DP120104460"],"award-info":[{"award-number":["DP120104460"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases of the National Institutes of Health","doi-asserted-by":"publisher","award":["AI111965"],"award-info":[{"award-number":["AI111965"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001779","name":"Monash University","doi-asserted-by":"publisher","award":["2018-28"],"award-info":[{"award-number":["2018-28"]}],"id":[{"id":"10.13039\/501100001779","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005683","name":"Collaborative Research Program of Institute for Chemical Research, Kyoto University","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100005683","id-type":"DOI","asserted-by":"publisher"}]},{"name":"NHMRC CJ Martin Early Career Research Fellowship","award":["1143366"],"award-info":[{"award-number":["1143366"]}]},{"name":"ARC Discovery Outstanding Research Award"},{"DOI":"10.13039\/100008333","name":"Informatics Institute of the School of Medicine at University of Alabama at Birmingham","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100008333","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,27]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.<\/jats:p>","DOI":"10.1093\/bib\/bby077","type":"journal-article","created":{"date-parts":[[2018,8,3]],"date-time":"2018-08-03T11:11:41Z","timestamp":1533294701000},"page":"2150-2166","source":"Crossref","is-referenced-by-count":77,"title":["Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods"],"prefix":"10.1093","volume":"20","author":[{"given":"Fuyi","family":"Li","sequence":"first","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yanan","family":"Wang","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"},{"name":"Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chen","family":"Li","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"},{"name":"Department of Biology, Institute of Molecular Systems Biology,ETH Z\u00fcrich, Z\u00fcrich 8093, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tatiana T","family":"Marquez-Lago","sequence":"additional","affiliation":[{"name":"Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andr\u00e9","family":"Leier","sequence":"additional","affiliation":[{"name":"Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Neil D","family":"Rawlings","sequence":"additional","affiliation":[{"name":"EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gholamreza","family":"Haffari","sequence":"additional","affiliation":[{"name":"Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jerico","family":"Revote","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tatsuya","family":"Akutsu","sequence":"additional","affiliation":[{"name":"Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kuo-Chen","family":"Chou","sequence":"additional","affiliation":[{"name":"Gordon Life Science Institute, Boston, MA 02478, USA"},{"name":"Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anthony W","family":"Purcell","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Robert N","family":"Pike","sequence":"additional","affiliation":[{"name":"La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia"},{"name":"ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Geoffrey I","family":"Webb","sequence":"additional","affiliation":[{"name":"Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"A","family":"Ian Smith","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"},{"name":"ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Trevor","family":"Lithgow","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Roger J","family":"Daly","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"James C","family":"Whisstock","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"},{"name":"ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8031-9086","authenticated-orcid":false,"given":"Jiangning","family":"Song","sequence":"additional","affiliation":[{"name":"Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia"},{"name":"Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia"},{"name":"ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2018,8,29]]},"reference":[{"key":"2020011102354510800_ref1","doi-asserted-by":"crossref","first-page":"3532","DOI":"10.1074\/mcp.M113.031310","article-title":"Proteolytic post-translational modification of proteins: proteomic tools and methodology","volume":"12","author":"Rogers","year":"2013","journal-title":"Mol Cell Proteomics"},{"key":"2020011102354510800_ref2","doi-asserted-by":"crossref","first-page":"20745","DOI":"10.1074\/jbc.274.30.20745","article-title":"Proteolytic processing in the secretory pathway","volume":"274","author":"Zhou","year":"1999","journal-title":"J Biol Chem"},{"key":"2020011102354510800_ref3","doi-asserted-by":"crossref","first-page":"233","DOI":"10.4161\/cc.1.4.129","article-title":"Proteolysis and the cell cycle","volume":"1","author":"Clarke","year":"2002","journal-title":"Cell Cycle"},{"key":"2020011102354510800_ref4","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1016\/j.biochi.2013.10.007","article-title":"The effect of proteolysis on the induction of cell death by monomeric alpha-lactalbumin","volume":"97","author":"Bruck","year":"2014","journal-title":"Biochimie"},{"key":"2020011102354510800_ref5","first-page":"34","article-title":"Regulated intramembrane proteolysis: signaling pathways and biological functions","volume":"26","author":"Lal","year":"2011","journal-title":"Physiology (Bethesda)"},{"key":"2020011102354510800_ref6","doi-asserted-by":"crossref","first-page":"1298","DOI":"10.1002\/pro.666","article-title":"The N-end rule pathway and regulation by proteolysis","volume":"20","author":"Varshavsky","year":"2011","journal-title":"Protein Sci"},{"key":"2020011102354510800_ref7","doi-asserted-by":"crossref","first-page":"1807","DOI":"10.1681\/ASN.2006010083","article-title":"Protein degradation by the ubiquitin-proteasome pathway in normal and disease states","volume":"17","author":"Lecker","year":"2006","journal-title":"J Am Soc Nephrol"},{"key":"2020011102354510800_ref8","doi-asserted-by":"crossref","first-page":"927","DOI":"10.1021\/acscentsci.6b00280","article-title":"Protein degradation by in-cell self-assembly of proteolysis targeting chimeras","volume":"2","author":"Lebraud","year":"2016","journal-title":"ACS Cent Sci"},{"key":"2020011102354510800_ref9","doi-asserted-by":"crossref","first-page":"892","DOI":"10.1021\/acschembio.6b01068","article-title":"Proteolysis-targeting chimeras: induced protein degradation as a therapeutic strategy","volume":"12","author":"Ottis","year":"2017","journal-title":"ACS Chem Biol"},{"key":"2020011102354510800_ref10","doi-asserted-by":"crossref","first-page":"2115","DOI":"10.1161\/01.CIR.96.7.2115","article-title":"Inflammation, metalloproteinases, and increased proteolysis: an emerging pathophysiological paradigm in aortic aneurysm","volume":"96","author":"Shah","year":"1997","journal-title":"Circulation"},{"key":"2020011102354510800_ref11","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1002\/jat.901","article-title":"Putative role of proteolysis and inflammatory response in the toxicity of nerve and blister chemical warfare agents: implications for multi-threat medical countermeasures","volume":"23","author":"Cowan","year":"2003","journal-title":"J Appl Toxicol"},{"key":"2020011102354510800_ref12","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/j.jcf.2004.07.003","article-title":"Cellular proteolysis and systemic inflammation during exacerbation in cystic fibrosis","volume":"3","author":"Ionescu","year":"2004","journal-title":"J Cyst Fibros"},{"key":"2020011102354510800_ref13","first-page":"rs2","article-title":"Systems-level analysis of proteolytic events in increased vascular permeability and complement activation in skin inflammation","volume":"6","author":"Keller","year":"2013","journal-title":"Sci Signal"},{"key":"2020011102354510800_ref14","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1002\/(SICI)1098-1004(1999)13:2<87::AID-HUMU1>3.0.CO;2-K","article-title":"Human genetic diseases of proteolysis","volume":"13","author":"Kato","year":"1999","journal-title":"Hum Mutat"},{"key":"2020011102354510800_ref15","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1152\/physrev.00023.2009","article-title":"Proteases and proteolysis in Alzheimer disease: a multifactorial view on the disease process","volume":"90","author":"De Strooper","year":"2010","journal-title":"Physiol Rev"},{"key":"2020011102354510800_ref16","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.neuron.2010.11.006","article-title":"Deconstruction for reconstruction: the role of proteolysis in neural plasticity and disease","volume":"69","author":"Bingol","year":"2011","journal-title":"Neuron"},{"key":"2020011102354510800_ref17","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1186\/s40478-018-0518-0","article-title":"Preventing mutant huntingtin proteolysis and intermittent fasting promote autophagy in models of Huntington disease","volume":"6","author":"Ehrnhoefer","year":"2018","journal-title":"Acta Neuropathol Commun"},{"key":"2020011102354510800_ref18","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1016\/j.ceb.2004.08.005","article-title":"Cell cycle, proteolysis and cancer","volume":"16","author":"Yamasaki","year":"2004","journal-title":"Curr Opin Cell Biol"},{"key":"2020011102354510800_ref19","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1016\/j.tcb.2010.12.002","article-title":"Proteolytic networks in cancer","volume":"21","author":"Mason","year":"2011","journal-title":"Trends Cell Biol"},{"key":"2020011102354510800_ref20","doi-asserted-by":"crossref","first-page":"2331","DOI":"10.1101\/gad.250647.114","article-title":"Pericellular proteolysis in cancer","volume":"28","author":"Sevenich","year":"2014","journal-title":"Genes Dev"},{"key":"2020011102354510800_ref21","doi-asserted-by":"crossref","first-page":"58244","DOI":"10.18632\/oncotarget.11309","article-title":"Proteolysis-a characteristic of tumor-initiating cells in murine metastatic breast cancer","volume":"7","author":"Hillebrand","year":"2016","journal-title":"Oncotarget"},{"key":"2020011102354510800_ref22","doi-asserted-by":"crossref","first-page":"D239","DOI":"10.1093\/nar\/gkn570","article-title":"The Degradome database: mammalian proteases and diseases of proteolysis","volume":"37","author":"Quesada","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2020011102354510800_ref23","first-page":"2210","article-title":"Overview of transcriptomic analysis of all human proteases, non-proteolytic homologs and inhibitors: organ, tissue and ovarian cancer cell line expression profiling of the human protease degradome by the CLIP-CHIP (TM) DNA microarray","volume":"2017","author":"Kappelhoff","year":"1864","journal-title":"Biochim Biophys Acta"},{"key":"2020011102354510800_ref24","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0142658","article-title":"Characterizing protease specificity: how many substrates do we need?","volume":"10","author":"Schauperl","year":"2015","journal-title":"PLoS One"},{"key":"2020011102354510800_ref25","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.cbpa.2006.11.021","article-title":"Methods for mapping protease specificity","volume":"11","author":"Diamond","year":"2007","journal-title":"Curr Opin Chem Biol"},{"key":"2020011102354510800_ref26","doi-asserted-by":"crossref","first-page":"7583","DOI":"10.1073\/pnas.0511108103","article-title":"Protease specificity determination by using cellular libraries of peptide substrates (CLiPS)","volume":"103","author":"Boulware","year":"2006","journal-title":"Proc Natl Acad Sci USA"},{"key":"2020011102354510800_ref27","doi-asserted-by":"crossref","first-page":"7754","DOI":"10.1073\/pnas.140132697","article-title":"Rapid and general profiling of protease specificity by using combinatorial fluorogenic substrate libraries","volume":"97","author":"Harris","year":"2000","journal-title":"Proc Natl Acad Sci USA"},{"key":"2020011102354510800_ref28","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1016\/j.cbpa.2009.07.026","article-title":"Methods for the proteomic identification of protease substrates","volume":"13","author":"Agard","year":"2009","journal-title":"Curr Opin Chem Biol"},{"key":"2020011102354510800_ref29","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1016\/j.cell.2008.06.038","article-title":"Global mapping of the topography and magnitude of proteolytic events in apoptosis","volume":"134","author":"Dix","year":"2008","journal-title":"Cell"},{"key":"2020011102354510800_ref30","doi-asserted-by":"crossref","first-page":"3642","DOI":"10.1021\/pr200271w","article-title":"Structural determinants of limited proteolysis","volume":"10","author":"Kazanov","year":"2011","journal-title":"J Proteome Res"},{"key":"2020011102354510800_ref31","doi-asserted-by":"crossref","first-page":"D546","DOI":"10.1093\/nar\/gkl813","article-title":"CutDB: a proteolytic event database","volume":"35","author":"Igarashi","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2020011102354510800_ref32","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1002\/pmic.201300416","article-title":"Sequence-derived structural features driving proteolytic processing","volume":"14","author":"Belushkin","year":"2014","journal-title":"Proteomics"},{"key":"2020011102354510800_ref33","doi-asserted-by":"crossref","first-page":"1101","DOI":"10.1038\/nsmb.1668","article-title":"Structural and kinetic determinants of protease substrates","volume":"16","author":"Timmer","year":"2009","journal-title":"Nat Struct Mol Biol"},{"key":"2020011102354510800_ref34","first-page":"531","article-title":"Protein identification and analysis tools in the ExPASy server","volume":"112","author":"Wilkins","year":"1999","journal-title":"Methods Mol Biol"},{"key":"2020011102354510800_ref35","doi-asserted-by":"crossref","first-page":"899","DOI":"10.1515\/BC.2003.101","article-title":"Toward computer-based cleavage site prediction of cysteine endopeptidases","volume":"384","author":"Lohmuller","year":"2003","journal-title":"Biol Chem"},{"key":"2020011102354510800_ref36","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1142\/S021972000500117X","article-title":"PoPS: a computational tool for modeling and predicting protease specificity","volume":"3","author":"Boyd","year":"2005","journal-title":"J Bioinform Comput Biol"},{"key":"2020011102354510800_ref37","doi-asserted-by":"crossref","first-page":"W208","DOI":"10.1093\/nar\/gki433","article-title":"GraBCas: a bioinformatics tool for score-based prediction of Caspase- and Granzyme B-cleavage sites in protein sequences","volume":"33","author":"Backes","year":"2005","journal-title":"Nucleic Acids Res"},{"issue":"Suppl 1","key":"2020011102354510800_ref38","doi-asserted-by":"crossref","first-page":"i169","DOI":"10.1093\/bioinformatics\/bti1034","article-title":"CaSPredictor: a new computer-based tool for caspase substrate prediction","volume":"21","author":"Garay-Malpartida","year":"2005","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref39","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1016\/j.tibs.2009.04.001","article-title":"SitePredicting the cleavage of proteinase substrates","volume":"34","author":"Verspurten","year":"2009","journal-title":"Trends Biochem Sci"},{"key":"2020011102354510800_ref40","article-title":"GPS-CCD: a novel computational program for the prediction of calpain cleavage sites","volume":"6","author":"Liu","year":"2011","journal-title":"PLoS One"},{"key":"2020011102354510800_ref41","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/1471-2105-13-14","article-title":"Developing a powerful in silico tool for the discovery of novel caspase-3 substrates: a preliminary screening of the human proteome","volume":"13","author":"Ayyash","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2020011102354510800_ref42","doi-asserted-by":"crossref","first-page":"3241","DOI":"10.1093\/bioinformatics\/btm334","article-title":"CASVM: web server for SVM-based prediction of caspase substrates cleavage sites","volume":"23","author":"Wee","year":"2007","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref43","doi-asserted-by":"crossref","first-page":"1714","DOI":"10.1093\/bioinformatics\/btq267","article-title":"Prediction of protease substrates using sequence and structure features","volume":"26","author":"Barkan","year":"2010","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref44","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1186\/1471-2105-11-320","article-title":"Pripper: prediction of caspase cleavage sites from whole proteomes","volume":"11","author":"Piippo","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2020011102354510800_ref45","doi-asserted-by":"crossref","first-page":"752","DOI":"10.1093\/bioinformatics\/btq043","article-title":"Cascleave: towards more accurate prediction of caspase substrate cleavage sites","volume":"26","author":"Song","year":"2010","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref46","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0019035","article-title":"Calpain cleavage prediction using multiple kernel learning","volume":"6","author":"DuVerle","year":"2011","journal-title":"PLoS One"},{"key":"2020011102354510800_ref47","doi-asserted-by":"crossref","first-page":"622","DOI":"10.1002\/prot.24217","article-title":"LabCaS: labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields","volume":"81","author":"Fan","year":"2013","journal-title":"Proteins"},{"key":"2020011102354510800_ref48","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0050300","article-title":"PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites","volume":"7","author":"Song","year":"2012","journal-title":"PLoS One"},{"key":"2020011102354510800_ref49","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1093\/bioinformatics\/btt603","article-title":"Cascleave 2.0, a new approach for predicting caspase and granzyme cleavage targets","volume":"30","author":"Wang","year":"2014","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref50","doi-asserted-by":"crossref","first-page":"2042","DOI":"10.1002\/pmic.201400002","article-title":"ScreenCap3: improving prediction of caspase-3 cleavage sites using experimentally verified noncleavage sites","volume":"14","author":"Fu","year":"2014","journal-title":"Proteomics"},{"key":"2020011102354510800_ref51","doi-asserted-by":"crossref","first-page":"5755","DOI":"10.1038\/s41598-017-06219-7","article-title":"Knowledge-transfer learning for prediction of matrix metalloprotease substrate-cleavage sites","volume":"7","author":"Wang","year":"2017","journal-title":"Sci Rep"},{"key":"2020011102354510800_ref52","doi-asserted-by":"crossref","first-page":"684","DOI":"10.1093\/bioinformatics\/btx670","article-title":"PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy","volume":"34","author":"Song","year":"2018","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref53","first-page":"bby028","article-title":"iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites","author":"Song","year":"2018","journal-title":"Brief Bioinform"},{"key":"2020011102354510800_ref54","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1142\/S0219720011005288","article-title":"Bioinformatic approaches for predicting substrates of proteases","volume":"9","author":"Song","year":"2011","journal-title":"J Bioinform Comput Biol"},{"key":"2020011102354510800_ref55","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1093\/bib\/bbr059","article-title":"A review of statistical methods for prediction of proteolytic cleavage","volume":"13","author":"duVerle","year":"2012","journal-title":"Brief Bioinform"},{"key":"2020011102354510800_ref56","doi-asserted-by":"crossref","first-page":"415","DOI":"10.2174\/157489312803901018","article-title":"Machine learning sequence classification techniques: application to cysteine protease cleavage prediction","volume":"7","author":"Verle","year":"2012","journal-title":"Curr Bioinform"},{"key":"2020011102354510800_ref57","first-page":"bby041","article-title":"Toward more accurate prediction of caspase cleavage sites: a comprehensive review of current methods, tools and features","author":"Bao","year":"2018","journal-title":"Brief Bioinform"},{"key":"2020011102354510800_ref58","doi-asserted-by":"crossref","first-page":"D624","DOI":"10.1093\/nar\/gkx1134","article-title":"The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database","volume":"46","author":"Rawlings","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2020011102354510800_ref59","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/S0006-291X(67)80055-X","article-title":"On the size of the active site in proteases. I. Papain","volume":"27","author":"Schechter","year":"1967","journal-title":"Biochem Biophys Res Commun"},{"issue":"Suppl 5","key":"2020011102354510800_ref60","first-page":"S14","article-title":"SVM-based prediction of caspase substrate cleavage sites, BMC","volume":"7","author":"Wee","year":"2006","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref61","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1126\/science.2876518","article-title":"Amino acid sequences common to rapidly degraded proteins: the PEST hypothesis","volume":"234","author":"Rogers","year":"1986","journal-title":"Science"},{"key":"2020011102354510800_ref62","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1016\/S0968-0004(96)10031-1","article-title":"PEST sequences and regulation by proteolysis","volume":"21","author":"Rechsteiner","year":"1996","journal-title":"Trends Biochem Sci"},{"key":"2020011102354510800_ref63","doi-asserted-by":"crossref","first-page":"2499","DOI":"10.1093\/bioinformatics\/bty140","article-title":"iFeature: a python package and web server for features extraction and selection from protein and peptide sequences","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref64","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2020011102354510800_ref65","author":"Quinlan","year":"2014","journal-title":"C4. 5: programs for machine learning"},{"key":"2020011102354510800_ref66","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2020011102354510800_ref67","doi-asserted-by":"crossref","first-page":"1411","DOI":"10.1093\/bioinformatics\/btu852","article-title":"GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome","volume":"31","author":"Li","year":"2015","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref68","doi-asserted-by":"crossref","first-page":"34595","DOI":"10.1038\/srep34595","article-title":"GlycoMine(struct): a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features","volume":"6","author":"Li","year":"2016","journal-title":"Sci Rep"},{"key":"2020011102354510800_ref69","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.jtbi.2018.01.023","article-title":"PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework","volume":"443","author":"Song","year":"2018","journal-title":"J Theor Biol"},{"key":"2020011102354510800_ref70","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1109\/TNB.2017.2661756","article-title":"PhosPred-RF: a novel sequence-based predictor for phosphorylation sites using sequential information only","volume":"16","author":"Wei","year":"2017","journal-title":"IEEE Trans Nanobioscience"},{"key":"2020011102354510800_ref71","doi-asserted-by":"crossref","DOI":"10.1201\/9781315139470","volume-title":"Classification and Regression Trees","author":"Breiman","year":"2017"},{"key":"2020011102354510800_ref72","volume-title":"Conditional random fields: Probabilistic models for segmenting and labeling sequence data","author":"Lafferty","year":"2001"},{"key":"2020011102354510800_ref73","article-title":"Semi-markov conditional random fields for information extraction","volume-title":"Advances in Neural Information Processing Systems","author":"Sarawagi","year":"2005"},{"key":"2020011102354510800_ref74","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1561\/2200000013","article-title":"An introduction to conditional random fields. Foundations and Trends\u00ae in","volume":"4","author":"Sutton","year":"2012","journal-title":"Machine Learning"},{"key":"2020011102354510800_ref75","first-page":"bty522","article-title":"Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref76","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1016\/j.omtn.2018.03.012","article-title":"iRNA-3typeA: identifying three types of modification at RNA\u2019s adenosine sites","volume":"11","author":"Chen","year":"2018","journal-title":"Mol Ther Nucleic Acids"},{"key":"2020011102354510800_ref77","article-title":"iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics","author":"Feng","year":"2018"},{"key":"2020011102354510800_ref78","doi-asserted-by":"crossref","first-page":"883","DOI":"10.7150\/ijbs.24616","article-title":"iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC","volume":"14","author":"Yang","year":"2018","journal-title":"Int J Biol Sci"},{"key":"2020011102354510800_ref79","doi-asserted-by":"crossref","first-page":"3518","DOI":"10.1093\/bioinformatics\/btx479","article-title":"iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties","volume":"33","author":"Chen","year":"2017","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref80","doi-asserted-by":"crossref","first-page":"1211","DOI":"10.1038\/nmeth.2646","article-title":"pLogo: a probabilistic approach to visualizing sequence motifs","volume":"10","author":"O'Shea","year":"2013","journal-title":"Nat Methods"},{"key":"2020011102354510800_ref81","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1003007","article-title":"Cleavage entropy as quantitative measure of protease specificity","volume":"9","author":"Fuchs","year":"2013","journal-title":"PLoS Comput Biol"},{"key":"2020011102354510800_ref82","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.inffus.2004.04.008","article-title":"Classifier selection for majority voting","volume":"6","author":"Ruta","year":"2005","journal-title":"Inf Fusion"},{"key":"2020011102354510800_ref83","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1109\/34.58871","article-title":"Neural network ensembles","volume":"12","author":"Hansen","year":"1990","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2020011102354510800_ref84","doi-asserted-by":"crossref","DOI":"10.1201\/b12207","volume-title":"Ensemble Methods: Foundations and Algorithms","author":"Zhou","year":"2012"},{"key":"2020011102354510800_ref85","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/pmic.201700262","article-title":"HPSLPred: an ensemble multi-label classifier for human protein subcellular location prediction with imbalanced source","volume":"17","author":"Wan","year":"2017","journal-title":"Proteomics"},{"key":"2020011102354510800_ref86","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1016\/j.omtn.2017.04.008","article-title":"2L-piRNA: a two-layer ensemble classifier for identifying Piwi-interacting RNAs and their function","volume":"7","author":"Liu","year":"2017","journal-title":"Mol Ther Nucleic Acids"},{"key":"2020011102354510800_ref87","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1093\/bioinformatics\/btw539","article-title":"iRSpot-EL: identify recombination spots with an ensemble learning approach","volume":"33","author":"Liu","year":"2017","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref88","doi-asserted-by":"crossref","first-page":"123","DOI":"10.4238\/2015.January.15.15","article-title":"imDC: an ensemble learning method for imbalanced classification with miRNA data","volume":"14","author":"Wang","year":"2015","journal-title":"Genet Mol Res"},{"key":"2020011102354510800_ref89","doi-asserted-by":"crossref","first-page":"40242","DOI":"10.1038\/srep40242","article-title":"Detecting N 6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines","volume":"7","author":"Chen","year":"2017","journal-title":"Sci Rep"},{"key":"2020011102354510800_ref90","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1613\/jair.614","article-title":"Popular ensemble methods: an empirical study","volume":"11","author":"Opitz","year":"1999","journal-title":"J Artif Intell Res"},{"key":"2020011102354510800_ref91","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2200\/S00240ED1V01Y200912DMK002","article-title":"Ensemble methods in data mining: improving accuracy through combining predictions","volume":"2","author":"Seni","year":"2010","journal-title":"Synth Lect Data Mining Knowledge Discov"},{"key":"2020011102354510800_ref92","first-page":"851","article-title":"Deep learning in bioinformatics","volume":"18","author":"Min","year":"2017","journal-title":"Brief Bioinform"},{"key":"2020011102354510800_ref93","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1016\/j.ymeth.2016.06.024","article-title":"Boosting compound-protein interaction prediction by deep learning","volume":"110","author":"Tian","year":"2016","journal-title":"Methods"},{"key":"2020011102354510800_ref94","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1016\/j.jpdc.2017.08.009","article-title":"Prediction of human protein subcellular localization using deep learning","volume":"117","author":"Wei","year":"2017","journal-title":"J Parallel Distributed Comput"},{"key":"2020011102354510800_ref95","doi-asserted-by":"crossref","first-page":"3909","DOI":"10.1093\/bioinformatics\/btx496","article-title":"MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction","volume":"33","author":"Wang","year":"2017","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref96","doi-asserted-by":"crossref","first-page":"2605","DOI":"10.1093\/bioinformatics\/bty166","article-title":"DeepSol: a deep learning framework for sequence-based protein solubility prediction","volume":"34","author":"Khurana","year":"2018","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref97","article-title":"Protein remote homology detection and fold recognition based on Sequence-Order Frequency Matrix","author":"Liu","year":"2017"},{"key":"2020011102354510800_ref98","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","article-title":"Deep learning in neural networks: an overview","volume":"61","author":"Schmidhuber","year":"2015","journal-title":"Neural Netw"},{"key":"2020011102354510800_ref99","doi-asserted-by":"crossref","first-page":"5947","DOI":"10.4249\/scholarpedia.5947","article-title":"Deep belief networks","volume":"4","author":"Hinton","year":"2009","journal-title":"Scholarpedia"},{"key":"2020011102354510800_ref100","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2020011102354510800_ref101","doi-asserted-by":"crossref","first-page":"2640","DOI":"10.1093\/bioinformatics\/bts504","article-title":"Positive-unlabeled learning for disease gene identification","volume":"28","author":"Yang","year":"2012","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref102","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1093\/bioinformatics\/btv550","article-title":"Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data","volume":"32","author":"Yang","year":"2016","journal-title":"Bioinformatics"},{"key":"2020011102354510800_ref103","first-page":"1287","article-title":"PAnDE: averaged n-dependence estimators for positive unlabeled learning. ICIC Expr Lett","volume":"8","author":"Li","year":"2017","journal-title":"Part B, Appl: An Int J Res Surv"},{"key":"2020011102354510800_ref104","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1186\/s12859-017-1546-7","article-title":"Positive-unlabeled learning for inferring drug interactions based on heterogeneous attributes","volume":"18","author":"Hameed","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2020011102354510800_ref105","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1145\/1401890.1401920","volume-title":"Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Elkan","year":"2008"},{"key":"2020011102354510800_ref106","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1109\/ISIP.2008.79","volume-title":"Information Processing (ISIP), 2008 International Symposiums on","author":"Zhang","year":"2008"},{"key":"2020011102354510800_ref107","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/bty508","article-title":"iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC","author":"Su","year":"2018"},{"key":"2020011102354510800_ref108","article-title":"Identifying sigma70 promoters with novel pseudo nucleotide composition","author":"Lin","year":"2017"},{"key":"2020011102354510800_ref109","first-page":"774","author":"O'Driscoll","year":"2013"},{"key":"2020011102354510800_ref110","first-page":"bbx086","article-title":"Big data management challenges in health research\u2014a literature review","author":"Wang","year":"2017","journal-title":"Brief Bioinform"},{"key":"2020011102354510800_ref111","first-page":"530","article-title":"A review of bioinformatic pipeline frameworks","volume":"18","author":"Leipzig","year":"2017","journal-title":"Brief Bioinform"},{"key":"2020011102354510800_ref112","first-page":"53","article-title":"HPTree: reconstructing phylogenetic trees for ultra-large unaligned DNA sequences via NJ model and Hadoop","author":"Zou","year":"2016"},{"key":"2020011102354510800_ref113","first-page":"1438","article-title":"Multiple sequence alignment and reconstructing phylogenetic trees with Hadoop","author":"Zou","year":"2016","journal-title":"Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference"},{"key":"2020011102354510800_ref114","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1093\/bib\/bbs088","article-title":"Survey of MapReduce frame operation in bioinformatics","volume":"15","author":"Zou","year":"2014","journal-title":"Brief Bioinform"},{"key":"2020011102354510800_ref115","first-page":"bbx039","article-title":"Improving data workflow systems with cloud services and use of open data for bioinformatics research","author":"Karim","year":"2017","journal-title":"Brief Bioinform"},{"key":"2020011102354510800_ref116","doi-asserted-by":"crossref","first-page":"1230","DOI":"10.1089\/cmb.2017.0040","article-title":"Multiple sequence alignment based on a suffix tree and center-star strategy: a linear method for multiple nucleotide sequence alignment on spark parallel framework","volume":"24","author":"Su","year":"2017","journal-title":"J Comput Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/20\/6\/2150\/31789333\/bby077.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/20\/6\/2150\/31789333\/bby077.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,4]],"date-time":"2023-09-04T04:28:27Z","timestamp":1693801707000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/20\/6\/2150\/5078120"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,29]]},"references-count":116,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2018,8,29]]},"published-print":{"date-parts":[[2019,11,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bby077","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,11]]},"published":{"date-parts":[[2018,8,29]]}}}