{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T20:12:22Z","timestamp":1773087142246,"version":"3.50.1"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"S12","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>RNA-protein interaction plays an essential role in several biological processes, such as protein synthesis, gene expression, posttranscriptional regulation and viral infectivity. Identification of RNA-binding sites in proteins provides valuable insights for biologists. However, experimental determination of RNA-protein interaction remains time-consuming and labor-intensive. Thus, computational approaches for prediction of RNA-binding sites in proteins have become highly desirable. Extensive studies of RNA-binding site prediction have led to the development of several methods. However, they could yield low sensitivities in trade-off for high specificities.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We propose a method, RNAProB, which incorporates a new smoothed position-specific scoring matrix (PSSM) encoding scheme with a support vector machine model to predict RNA-binding sites in proteins. Besides the incorporation of evolutionary information from standard PSSM profiles, the proposed smoothed PSSM encoding scheme also considers the correlation and dependency from the neighboring residues for each amino acid in a protein. Experimental results show that smoothed PSSM encoding significantly enhances the prediction performance, especially for sensitivity. Using five-fold cross-validation, our method performs better than the state-of-the-art systems by 4.90%~6.83%, 0.88%~5.33%, and 0.10~0.23 in terms of overall accuracy, specificity, and Matthew's correlation coefficient, respectively. Most notably, compared to other approaches, RNAProB significantly improves sensitivity by 7.0%~26.9% over the benchmark data sets. To prevent data over fitting, a three-way data split procedure is incorporated to estimate the prediction performance. Moreover, physicochemical properties and amino acid preferences of RNA-binding proteins are examined and analyzed.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Our results demonstrate that smoothed PSSM encoding scheme significantly enhances the performance of RNA-binding site prediction in proteins. This also supports our assumption that smoothed PSSM encoding can better resolve the ambiguity of discriminating between interacting and non-interacting residues by modelling the dependency from surrounding residues. The proposed method can be used in other research areas, such as DNA-binding site prediction, protein-protein interaction, and prediction of posttranslational modification sites.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-s12-s6","type":"journal-article","created":{"date-parts":[[2008,12,12]],"date-time":"2008-12-12T19:14:15Z","timestamp":1229109255000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":103,"title":["Predicting RNA-binding sites of proteins using support vector machines and evolutionary information"],"prefix":"10.1186","volume":"9","author":[{"given":"Cheng-Wei","family":"Cheng","sequence":"first","affiliation":[]},{"given":"Emily Chia-Yu","family":"Su","sequence":"additional","affiliation":[]},{"given":"Jenn-Kang","family":"Hwang","sequence":"additional","affiliation":[]},{"given":"Ting-Yi","family":"Sung","sequence":"additional","affiliation":[]},{"given":"Wen-Lian","family":"Hsu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,12,12]]},"reference":[{"issue":"13","key":"2711_CR1","doi-asserted-by":"publisher","first-page":"4264","DOI":"10.1093\/nar\/gkm411","volume":"35","author":"S Sunita","year":"2007","unstructured":"Sunita S, Purta E, Durawa M, Tkaczuk KL, Swaathi J, Bujnicki JM, Sivaraman J: Functional specialization of domains tandemly duplicated within 16S rRNA methyltransferase RsmC. Nucleic Acids Res. 2007, 35 (13): 4264-4274. 10.1093\/nar\/gkm411.","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2711_CR2","doi-asserted-by":"publisher","first-page":"299","DOI":"10.1093\/nar\/gkl1021","volume":"35","author":"E Bechara","year":"2007","unstructured":"Bechara E, Davidovic L, Melko M, Bensaid M, Tremblay S, Grosgeorge J, Khandjian EW, Lalli E, Bardoni B: Fragile X related protein 1 isoforms differentially modulate the affinity of fragile X mental retardation protein for G-quartet RNA structure. Nucleic Acids Res. 2007, 35 (1): 299-306. 10.1093\/nar\/gkl1021.","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"2711_CR3","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1177\/095632020301400201","volume":"14","author":"KL McKnight","year":"2003","unstructured":"McKnight KL, Heinz BA: RNA as a target for developing antivirals. Antivir Chem Chemother. 2003, 14 (2): 61-73.","journal-title":"Antivir Chem Chemother"},{"issue":"Pt 6 No 1","key":"2711_CR4","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1107\/S0907444902003451","volume":"58","author":"HM Berman","year":"2002","unstructured":"Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S: The Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 2002, 58 (Pt 6 No 1): 899-907. 10.1107\/S0907444902003451.","journal-title":"Acta Crystallogr D Biol Crystallogr"},{"issue":"1","key":"2711_CR5","first-page":"105","volume":"15","author":"E Jeong","year":"2004","unstructured":"Jeong E, Chung IF, Miyano S: A neural network method for identification of RNA-interacting residues in protein. Genome Inform. 2004, 15 (1): 105-116.","journal-title":"Genome Inform"},{"issue":"8","key":"2711_CR6","doi-asserted-by":"publisher","first-page":"1450","DOI":"10.1261\/rna.2197306","volume":"12","author":"M Terribilini","year":"2006","unstructured":"Terribilini M, Lee JH, Yan C, Jernigan RL, Honavar V, Dobbs D: Prediction of RNA binding sites in proteins from amino acid sequence. RNA. 2006, 12 (8): 1450-1462. 10.1261\/rna.2197306.","journal-title":"RNA"},{"key":"2711_CR7","first-page":"123","volume-title":"Transactions on Computational Systems Biology","author":"E Jeong","year":"2006","unstructured":"Jeong E, Miyano S: A Weighted Profile Based Method for Protein-RNA Interacting Residue Prediction. Transactions on Computational Systems Biology. 2006, 123-139."},{"issue":"1","key":"2711_CR8","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1002\/prot.21677","volume":"71","author":"M Kumar","year":"2008","unstructured":"Kumar M, Gromiha MM, Raghava GP: Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins. 2008, 71 (1): 189-194. 10.1002\/prot.21677.","journal-title":"Proteins"},{"key":"2711_CR9","doi-asserted-by":"crossref","unstructured":"Wang L, Brown SJ: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 2006, W243-248. 10.1093\/nar\/gkl298. 34 Web Server","DOI":"10.1093\/nar\/gkl298"},{"key":"2711_CR10","volume-title":"Digital Image Processing","author":"RC Gonzalez","year":"2002","unstructured":"Gonzalez RC, Woods RE: Digital Image Processing. 2002, Prentice Hall"},{"key":"2711_CR11","doi-asserted-by":"crossref","unstructured":"Terribilini M, Sander JD, Lee JH, Zaback P, Jernigan RL, Honavar V, Dobbs D: RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res. 2007, W578-584. 10.1093\/nar\/gkm294. 35 Web Server","DOI":"10.1093\/nar\/gkm294"},{"key":"2711_CR12","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The Nature of Statistical Learning Theory","author":"VN Vapnik","year":"1995","unstructured":"Vapnik VN: The Nature of Statistical Learning Theory. 1995, Springer"},{"key":"2711_CR13","unstructured":"LIBSVM: a library for support vector machines. [http:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvm\/]"},{"issue":"17","key":"2711_CR14","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093\/nar\/25.17.3389.","journal-title":"Nucleic Acids Res"},{"issue":"22","key":"2711_CR15","doi-asserted-by":"publisher","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","volume":"89","author":"S Henikoff","year":"1992","unstructured":"Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992, 89 (22): 10915-10919. 10.1073\/pnas.89.22.10915.","journal-title":"Proc Natl Acad Sci USA"},{"key":"2711_CR16","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1186\/1471-2105-4-28","volume":"4","author":"MD Ritchie","year":"2003","unstructured":"Ritchie MD, White BC, Parker JS, Hahn LW, Moore JH: Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC Bioinformatics. 2003, 4: 28-10.1186\/1471-2105-4-28.","journal-title":"BMC Bioinformatics"},{"key":"2711_CR17","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1186\/1471-2105-8-330","volume":"8","author":"EC Su","year":"2007","unstructured":"Su EC, Chiu HS, Lo A, Hwang JK, Sung TY, Hsu WL: Protein subcellular localization prediction based on compartment-specific features and structure conservation. BMC Bioinformatics. 2007, 8: 330-10.1186\/1471-2105-8-330.","journal-title":"BMC Bioinformatics"},{"key":"2711_CR18","doi-asserted-by":"publisher","first-page":"5830","DOI":"10.1109\/IEMBS.2006.260025","volume":"1","author":"L Wang","year":"2006","unstructured":"Wang L, Brown SJ: Prediction of RNA-binding residues in protein sequences using support vector machines. Conf Proc IEEE Eng Med Biol Soc. 2006, 1: 5830-5833.","journal-title":"Conf Proc IEEE Eng Med Biol Soc"},{"issue":"2","key":"2711_CR19","doi-asserted-by":"publisher","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","volume":"405","author":"BW Matthews","year":"1975","unstructured":"Matthews BW: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975, 405 (2): 442-451.","journal-title":"Biochim Biophys Acta"},{"issue":"4857","key":"2711_CR20","doi-asserted-by":"publisher","first-page":"1285","DOI":"10.1126\/science.3287615","volume":"240","author":"JA Swets","year":"1988","unstructured":"Swets JA: Measuring the accuracy of diagnostic systems. Science. 1988, 240 (4857): 1285-1293. 10.1126\/science.3287615.","journal-title":"Science"},{"issue":"7","key":"2711_CR21","doi-asserted-by":"publisher","first-page":"1145","DOI":"10.1016\/S0031-3203(96)00142-2","volume":"30","author":"AP Bradley","year":"1997","unstructured":"Bradley AP: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition. 1997, 30 (7): 1145-1159. 10.1016\/S0031-3203(96)00142-2.","journal-title":"Pattern Recognition"},{"issue":"3","key":"2711_CR22","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1002\/prot.21018","volume":"64","author":"CS Yu","year":"2006","unstructured":"Yu CS, Chen YC, Lu CH, Hwang JK: Prediction of protein subcellular localization. Proteins. 2006, 64 (3): 643-651. 10.1002\/prot.21018.","journal-title":"Proteins"},{"issue":"2","key":"2711_CR23","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1002\/prot.21944","volume":"72","author":"JM Chang","year":"2008","unstructured":"Chang JM, Su EC, Lo A, Chiu HS, Sung TY, Hsu WL: PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins. 2008, 72 (2): 693-710. 10.1002\/prot.21944.","journal-title":"Proteins"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-S12-S6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T09:37:11Z","timestamp":1630489031000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-S12-S6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,12]]},"references-count":23,"journal-issue":{"issue":"S12","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2711"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-s12-s6","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,12]]},"assertion":[{"value":"12 December 2008","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S6"}}