{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T22:16:12Z","timestamp":1761862572678},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: A new representation for protein secondary structure prediction based on frequent amino acid patterns is described and evaluated. We discuss in detail how to identify frequent patterns in a protein sequence database using a level-wise search technique, how to define a set of features from those patterns and how to use those features in the prediction of the secondary structure of a protein sequence using support vector machines (SVMs).<\/jats:p>\n               <jats:p>Results: Three different sets of features based on frequent patterns are evaluated in a blind testing setup using 150 targets from the EVA contest and compared to predictions of PSI-PRED, PHD and PROFsec. Despite being trained on only 940 proteins, a simple SVM classifier based on this new representation yields results comparable to PSI-PRED and PROFsec. Finally, we show that the method contributes significant information to consensus predictions.<\/jats:p>\n               <jats:p>Availability: The method is available from the authors upon request.<\/jats:p>\n               <jats:p>Contact: \u00a0kramer@in.tum.de<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl453","type":"journal-article","created":{"date-parts":[[2006,8,30]],"date-time":"2006-08-30T08:21:24Z","timestamp":1156926084000},"page":"2628-2634","source":"Crossref","is-referenced-by-count":45,"title":["A new representation for protein secondary structure prediction based on frequent patterns"],"prefix":"10.1093","volume":"22","author":[{"given":"Fabian","family":"Birzele","sequence":"first","affiliation":[{"name":"Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University 1 \u00a0 1 \u00a0 \u00a0 Amalienstrasse 17, D-80333 M\u00fcnchen, Germany"}]},{"given":"Stefan","family":"Kramer","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t M\u00fcnchen, Institut f\u00fcr Informatik 2 \u00a0 2 \u00a0 \u00a0 Boltzmannstrasse 3, D-85748 Garching b. M\u00fcnchen, Germany"}]}],"member":"286","published-online":{"date-parts":[[2006,8,29]]},"reference":[{"key":"2023012408433673200_b1","first-page":"94","article-title":"Fast algorithms for mining association rules","volume-title":"Proceedings ACM SIGMOD International Conference on Management of Data (SIGMOD'94)","author":"Agrawal","year":"1994"},{"key":"2023012408433673200_b2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012408433673200_b3","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1021\/bi00699a002","article-title":"Prediction of protein conformation","volume":"13","author":"Chou","year":"1974","journal-title":"Biochemistry"},{"key":"2023012408433673200_b4","doi-asserted-by":"crossref","first-page":"1603","DOI":"10.1093\/bioinformatics\/bth132","article-title":"Protein secondary structure: entropy, correlations and prediction","volume":"20","author":"Crooks","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012408433673200_b5","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1002\/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4","article-title":"Evaluation and improvement of multiple sequence methods for protein secondary structure prediction","volume":"34","author":"Cuff","year":"1999","journal-title":"Proteins"},{"key":"2023012408433673200_b6","doi-asserted-by":"crossref","DOI":"10.1007\/11871637_17","article-title":"Optimal string mining under frequency constraints","volume-title":"Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2006).","author":"Fischer","year":"2006"},{"key":"2023012408433673200_b7","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-30109-7_11","article-title":"Learning ensembles of first-order clauses for recall-precision curves: a case study in biomedical information extraction","volume-title":"Proceedings of the 14th International Conference on Inductive Logic Programming (ILP) (2004)","author":"Goadrich","year":"2004"},{"key":"2023012408433673200_b8","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1214\/aos\/1028144844","article-title":"Classification by pairwise coupling","volume":"26","author":"Hastie","year":"1998","journal-title":"Ann. Stat."},{"key":"2023012408433673200_b9","first-page":"169","article-title":"Making large-scale SVM learning practical","volume-title":"Advances in Kernel Methods\u2014Support Vector Learning.","author":"Joachims","year":"1999"},{"key":"2023012408433673200_b10","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1006\/jmbi.1999.3091","article-title":"Protein secondary structure prediction based on position-specific scoring matrices","volume":"292","author":"Jones","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023012408433673200_b11","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2023012408433673200_b12","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1080\/10629360290023340","article-title":"Fragment generation and support vector machines for inducing SARs","volume":"13","author":"Kramer","year":"2002","journal-title":"SAR QSAR Environ. Res."},{"key":"2023012408433673200_b13","doi-asserted-by":"crossref","DOI":"10.1145\/1102351.1102416","article-title":"Predicting protein folds with structural repeats using a chain graph model","volume-title":"Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005)","author":"Liu","year":"2005"},{"key":"2023012408433673200_b14","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1023\/A:1009796218281","article-title":"Levelwise search and borders of theories in knowledge discovery","volume":"3","author":"Mannila","year":"1997","journal-title":"Data Mining and Knowledge Discovery"},{"key":"2023012408433673200_b15","first-page":"442","article-title":"Comparison of the predicted and observed secondary structure of t4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochem. Biophys. Acta"},{"key":"2023012408433673200_b16","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1002\/prot.10408","article-title":"Benchmarking secondary structure prediction for fold recognition","volume":"52","author":"McGuffin","year":"2003","journal-title":"Proteins"},{"key":"2023012408433673200_b17","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1093\/bioinformatics\/17.1.63","article-title":"What are the baselines for protein fold recognition?","volume":"17","author":"McGuffin","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012408433673200_b18","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012408433673200_b19","first-page":"61","article-title":"Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods","volume-title":"Advances in Large Margin Classifiers","author":"Platt","year":"1999"},{"key":"2023012408433673200_b20","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1002\/prot.10082","article-title":"Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles","volume":"47","author":"Pollastri","year":"2002","journal-title":"Proteins"},{"key":"2023012408433673200_b21","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1002\/prot.340090108","article-title":"Weak correlation between predictive power of individual sequence patterns and overall prediction accuracy in proteins","volume":"9","author":"Rooman","year":"1991","journal-title":"Proteins"},{"key":"2023012408433673200_b22","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1002\/prot.10051","article-title":"EVA: large-scale analysis of secondary structure prediction","volume":"5","author":"Rost","year":"2001","journal-title":"Proteins"},{"key":"2023012408433673200_b23","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1006\/jmbi.1993.1413","article-title":"Prediction of protein secondary structure at better than 70% accuracy","volume":"232","author":"Rost","year":"1993","journal-title":"J. Mol. Biol."},{"key":"2023012408433673200_b24","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1016\/0306-4573(88)90021-0","article-title":"Term weighting approaches in automatic text retrieval","volume":"24","author":"Salton","year":"1988","journal-title":"Information Processing and Management"},{"key":"2023012408433673200_b25","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1016\/S0022-5193(86)80075-3","article-title":"The classification of amino acid conservation","volume":"119","author":"Taylor","year":"1986","journal-title":"J. Theor. Biol."},{"key":"2023012408433673200_b26","volume-title":"Statistical Learning Theory","author":"Vapnik","year":"1998"},{"key":"2023012408433673200_b27","doi-asserted-by":"crossref","first-page":"1650","DOI":"10.1093\/bioinformatics\/btg223","article-title":"Secondary structure prediction with support vector machines","volume":"19","author":"Ward","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012408433673200_b28","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1007\/s10994-005-4258-6","article-title":"Not so Naive Bayes: aggregating one-dependence estimators","volume":"58","author":"Webb","year":"2005","journal-title":"Machine Learning"},{"key":"2023012408433673200_b29","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1002\/(SICI)1097-0134(19990201)34:2<220::AID-PROT7>3.0.CO;2-K","article-title":"A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment","volume":"34","author":"Zemla","year":"1999","journal-title":"Proteins"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/21\/2628\/48839582\/bioinformatics_22_21_2628.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/21\/2628\/48839582\/bioinformatics_22_21_2628.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T09:07:58Z","timestamp":1674551278000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/21\/2628\/251897"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,8,29]]},"references-count":29,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2006,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl453","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,11,1]]},"published":{"date-parts":[[2006,8,29]]}}}