{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T21:41:04Z","timestamp":1776289264205,"version":"3.50.1"},"reference-count":36,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,3,8]],"date-time":"2022-03-08T00:00:00Z","timestamp":1646697600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Prediction of intrinsic disordered proteins is a hot area in the field of bio-information. Due to the high cost of evaluating the disordered regions of protein sequences using experimental methods, we used a low-complexity prediction scheme. Sequence complexity is used in this scheme to calculate five features for each residue of the protein sequence, including the Shannon entropy, the Topo-logical entropy, the Permutation entropy and the weighted average values of two propensities. Particularly, this is the first time that permutation entropy has been applied to the field of protein sequencing. In addition, in the data preprocessing stage, an appropriately sized sliding window and a comprehensive oversampling scheme can be used to improve the prediction performance of our scheme, and two ensemble learning algorithms are also used to verify the prediction results before and after. The results show that adding permutation entropy improves the performance of the prediction algorithm, in which the MCC value can be improved from the original 0.465 to 0.526 in our scheme, proving its universality. Finally, we compare the simulation results of our scheme with those of some existing schemes to demonstrate its effectiveness.<\/jats:p>","DOI":"10.3390\/a15030086","type":"journal-article","created":{"date-parts":[[2022,3,8]],"date-time":"2022-03-08T12:35:37Z","timestamp":1646742937000},"page":"86","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Prediction of Intrinsically Disordered Proteins Using Machine Learning Based on Low Complexity Methods"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6402-8198","authenticated-orcid":false,"given":"Xingming","family":"Zeng","sequence":"first","affiliation":[{"name":"Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology, School of Electronic Information and Optical Engineering, Nankai University, Tianjin 300350, China"}]},{"given":"Haiyuan","family":"Liu","sequence":"additional","affiliation":[{"name":"Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology, School of Electronic Information and Optical Engineering, Nankai University, Tianjin 300350, China"}]},{"given":"Hao","family":"He","sequence":"additional","affiliation":[{"name":"Department of Communication Engineering, School of Electronic Information, Hebei University of Technology, Tianjin 300400, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/S1093-3263(00)00138-8","article-title":"Intrinsically Disordered Protein","volume":"19","author":"Dunker","year":"2001","journal-title":"J. Mol. Graph. Model."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1146\/annurev.biophys.37.032807.125924","article-title":"Intrinsically Disordered Proteins in Human Diseases: Introducing the D2 Concept","volume":"37","author":"Uversky","year":"2008","journal-title":"Annu. Rev. Biophys."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1038\/nrm1589","article-title":"Intrinsically Unstructured Proteins and Their Functions","volume":"6","author":"Dyson","year":"2005","journal-title":"Nat. Rev. Mol. Cell Biol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"10448","DOI":"10.1021\/bi060981d","article-title":"Abundance of Intrinsic Disorder in Protein Associated with Cardiovascular Disease","volume":"45","author":"Cheng","year":"2006","journal-title":"Biochemistry"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"6844","DOI":"10.1021\/cr400713r","article-title":"Pathological Unfoldomics of Uncontrolled Chaos: Intrinsically Disordered Proteins and Human Diseases","volume":"114","author":"Uversky","year":"2014","journal-title":"Chem. Rev."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2164-9-S2-S4","article-title":"Protein Intrinsic Disorder Toolbox for Comparative Analysis of Viral Proteins","volume":"9","author":"Goh","year":"2008","journal-title":"BMC Genom."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1829","DOI":"10.1021\/pr0602388","article-title":"Protein Intrinsic Disorder and Human Papillomaviruses: Increased Amount of Disorder in E6 and E7 Oncoproteins from High Risk HPVs","volume":"5","author":"Uversky","year":"2006","journal-title":"J. Proteome Res."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"932","DOI":"10.2174\/092986610791498984","article-title":"Viral Disorder or Disordered Viruses: Do Viral Proteins Possess Unique Features?","volume":"17","author":"Xue","year":"2010","journal-title":"Protein Pept. Lett."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41531-021-00203-9","article-title":"Alpha-Synuclein Research: Defining Strategic Moves in the Battle Against Parkinson\u2019s Disease","volume":"7","author":"Oliveira","year":"2021","journal-title":"NPJ Parkinson Dis."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"140767","DOI":"10.1016\/j.bbapap.2022.140767","article-title":"A Unifying Framework for Amyloid-Mediated Membrane Damage: The Lipid-Chaperon Hypothesis","volume":"1870","author":"Tempra","year":"2021","journal-title":"Biochim. Biophys. Acta BBA Proteins Proteom."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1845","DOI":"10.1021\/acs.chemrev.0c00981","article-title":"Proteostasis of Islet Amyloid Polypeptide: A Molecular Perspective of Risk Factors and Protective Strategies for Type II Diabetes","volume":"121","author":"Milardi","year":"2021","journal-title":"Chem. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3701","DOI":"10.1093\/nar\/gkg519","article-title":"GlobPlot: Exploring Protein Sequences for Globularity and Disorder","volume":"31","author":"Linding","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3433","DOI":"10.1093\/bioinformatics\/bti541","article-title":"IUPred: Web Server for the Prediction of Intrinsically Unstructured Regions of Proteins Based on Estimated Energy Content","volume":"21","author":"Dosztanyi","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3435","DOI":"10.1093\/bioinformatics\/bti537","article-title":"FoldIndex: A Simple Tool to Predict Whether a given Protein Sequence Is Intrinsically Unfolded","volume":"21","author":"Prilusky","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"035004","DOI":"10.1088\/1478-3975\/8\/3\/035004","article-title":"The Ising Model for Prediction of Disordered Residues from Protein Sequence Alone","volume":"8","author":"Lobanov","year":"2011","journal-title":"Phys. Biol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1016\/j.jmb.2004.02.002","article-title":"Prediction and Functional Analysis of Native Disorder in Proteins from the Three Kingdoms of Life","volume":"337","author":"Ward","year":"2004","journal-title":"J. Mol. Biol."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1080\/073911012010525022","article-title":"SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-network based Method","volume":"29","author":"Zhang","year":"2012","journal-title":"J. Biomol. Struct. Dyn."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1093\/bioinformatics\/btr682","article-title":"ESpritz: Accurate and Fast Prediction of Protein Disorder","volume":"28","author":"Tosatto","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Kozlowski, L.P., and Bujnicki, J.M. (2012). MetaDisorder: A Meta-Server for the Prediction of Intrinsic Disorder in Proteins. BMC Bioinform., 13.","DOI":"10.1186\/1471-2105-13-111"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2138","DOI":"10.1093\/bioinformatics\/bth195","article-title":"The DISOPRED Server for the Prediction of Protein Disorder","volume":"20","author":"Ward","year":"2004","journal-title":"Bioinformatics"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"W460","DOI":"10.1093\/nar\/gkm363","article-title":"PrDOS: Prediction of Disordered Protein Regions from Amino Acid Sequence","volume":"35","author":"Ishida","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2337","DOI":"10.1093\/bioinformatics\/btm330","article-title":"POODLE-S: Web Application for Predicting Protein Disorder by Using Physicochemical Features and Reduced Amino Acid Set of a Position-Specific Scoring Matrix","volume":"23","author":"Shimizu","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Medina, M.W., Gao, F., Naidoo, D., Rudel, L.L., Temel, R.E., McDaniel, A.L., Marshall, S.M., and Krauss, R.M. (2011). Coordinately Regulated Alternative Splicing of Genes Involved in Cholesterol Biosynthesis and Uptake. PLoS ONE, 6.","DOI":"10.1371\/journal.pone.0019420"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3369","DOI":"10.1093\/bioinformatics\/bti534","article-title":"RONN: The Bio-Basis Function Neural Network Technique Applied to the Detection of Natively Disordered Regions in Proteins","volume":"21","author":"Yang","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1002\/prot.10528","article-title":"Prediction of Disordered Regions in Proteins from Position Specific Score Matrices","volume":"53","author":"Jones","year":"2003","journal-title":"Proteins"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Priti\u0161anac, I., Vernon, R.M., Moses, A.M., and Forman Kay, J.D. (2019). Entropy and Information within Intrinsically Disordered Protein Regions. Entropy, 21.","DOI":"10.3390\/e21070662"},{"key":"ref_27","first-page":"1","article-title":"A Low Computational Complexity Scheme for the Prediction of Intrinsically Disordered Protein Regions","volume":"2018","author":"Hao","year":"2018","journal-title":"Math. Probl. Eng."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Jin, S., Tan, R., Jiang, Q., Xu, L., Peng, J., Wang, Y., and Wang, Y. (2014). A Generalized Topological Entropy for Analyzing the Complexity of DNA Sequences. PLoS ONE, 9.","DOI":"10.1371\/journal.pone.0088519"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1093\/bioinformatics\/btr077","article-title":"Topological Entropy of DNA Sequences","volume":"27","author":"Koslicki","year":"2011","journal-title":"Bioinformatics"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"46","DOI":"10.3390\/a12020046","article-title":"The Prediction of Intrinsically Disordered Proteins Based on Feature Selection","volume":"12","author":"Hao","year":"2019","journal-title":"Algorithms"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1449","DOI":"10.1093\/bioinformatics\/btr175","article-title":"Proteins without 3D Structure: Definition, Detection and Beyond","volume":"27","author":"Orosz","year":"2011","journal-title":"Bioinformatics"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"174102","DOI":"10.1103\/PhysRevLett.88.174102","article-title":"Permutation Entropy: A Natural Complexity Measure for Time Series","volume":"88","author":"Bandt","year":"2002","journal-title":"Phys. Rev. Lett."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Le, N.Q.K., Do, D.T., Hung, T.N.K., Lam, L.H.T., Huynh, T.T., and Nguyen, N.T.K. (2020). A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification. Int. J. Mol. Sci., 21.","DOI":"10.3390\/ijms21239070"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Ho Thanh Lam, L., Le, N.H., Van Tuan, L., Tran Ban, H., Nguyen Khanh Hung, T., Nguyen, N.T.K., Huu Dang, L., and Le, N.Q.K. (2020). Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences. Biology, 9.","DOI":"10.3390\/biology9100325"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"14451","DOI":"10.1016\/j.eswa.2011.04.160","article-title":"Prediction of Disorder with New Computational Tool: BVDEA","volume":"38","author":"Kaya","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-7-319","article-title":"Protein Disorder Prediction by Condensed PSSM Considering Propensity for Order or Disorder","volume":"7","author":"Su","year":"2006","journal-title":"BMC Bioinform."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/3\/86\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:33:38Z","timestamp":1760135618000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/3\/86"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,8]]},"references-count":36,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["a15030086"],"URL":"https:\/\/doi.org\/10.3390\/a15030086","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,8]]}}}