{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T04:55:55Z","timestamp":1761540955964},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The precise prediction of one-dimensional (1D) protein structure as represented by the protein secondary structure and 1D string of discrete state of dihedral angles (i.e. Shape Strings) is a prerequisite for the successful prediction of three-dimensional (3D) structure as well as protein\u2013protein interaction. We have developed a novel 1D structure prediction method, called Frag1D, based on a straightforward fragment matching algorithm and demonstrated its success in the prediction of three sets of 1D structural alphabets, i.e. the classical three-state secondary structure, three- and eight-state Shape Strings.<\/jats:p>\n               <jats:p>Results: By exploiting the vast protein sequence and protein structure data available, we have brought secondary-structure prediction closer to the expected theoretical limit. When tested by a leave-one-out cross validation on a non-redundant set of PDB cutting at 30% sequence identity containing 5860 protein chains, the overall per-residue accuracy for secondary-structure prediction, i.e. Q3 is 82.9%. The overall per-residue accuracy for three- and eight-state Shape Strings are 85.1 and 71.5%, respectively. We have also benchmarked our program with the latest version of PSIPRED for secondary structure prediction and our program predicted 0.3% better in Q3 when tested on 2241 chains with the same training set. For Shape Strings, we compared our method with a recently published method with the same dataset and definition as used by that method. Our program predicted at 2.2% better in accuracy for three-state Shape Strings. By quantitatively investigating the effect of data base size on 1D structure prediction we show that the accuracy increases by \u223c1% with every doubling of the database size.<\/jats:p>\n               <jats:p>Availability: The program is available for download at http:\/\/www.fos.su.se\/\u223cnanjiang\/Frag1D\/download. Supplementary data are available at http:\/\/www.fos.su.se\/\u223cnanjiang\/Frag1D\/supplement\/suppl.html<\/jats:p>\n               <jats:p>Contact: \u00a0svenh@struc.su.se<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp679","type":"journal-article","created":{"date-parts":[[2009,12,10]],"date-time":"2009-12-10T01:46:35Z","timestamp":1260409595000},"page":"470-477","source":"Crossref","is-referenced-by-count":23,"title":["A novel method for accurate one-dimensional protein structure prediction based on fragment matching"],"prefix":"10.1093","volume":"26","author":[{"given":"Tuping","family":"Zhou","sequence":"first","affiliation":[{"name":"Division of Structural Chemistry, Stockholm University, Stockholm SE-106 91, Sweden"}]},{"given":"Nanjiang","family":"Shu","sequence":"additional","affiliation":[{"name":"Division of Structural Chemistry, Stockholm University, Stockholm SE-106 91, Sweden"}]},{"given":"Sven","family":"Hovm\u00f6ller","sequence":"additional","affiliation":[{"name":"Division of Structural Chemistry, Stockholm University, Stockholm SE-106 91, Sweden"}]}],"member":"286","published-online":{"date-parts":[[2009,12,9]]},"reference":[{"key":"2023012508022474000_B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012508022474000_B2","doi-asserted-by":"crossref","first-page":"D226","DOI":"10.1093\/nar\/gkh039","article-title":"SCOP database in 2004: refinements integrate structure and sequence family data","volume":"32","author":"Andreeva","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012508022474000_B3","doi-asserted-by":"crossref","first-page":"899","DOI":"10.1107\/S0907444902003451","article-title":"The Protein Data Bank","volume":"58","author":"Berman","year":"2002","journal-title":"Acta Crystallogr. D Biol. Crystallogr."},{"key":"2023012508022474000_B4","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1126\/science.1853201","article-title":"A method to identify protein sequences that fold into a known three-dimensional structure","volume":"253","author":"Bowie","year":"1991","journal-title":"Science"},{"issue":"Suppl. 6","key":"2023012508022474000_B5","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1002\/prot.10552","article-title":"Rosetta predictions in CASP5: successes, failures, and prospects for complete automation","volume":"53","author":"Bradley","year":"2003","journal-title":"Proteins"},{"key":"2023012508022474000_B6","doi-asserted-by":"crossref","first-page":"W36","DOI":"10.1093\/nar\/gki410","article-title":"Protein structure prediction servers at University College London","volume":"33","author":"Bryson","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012508022474000_B7","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1006\/jmbi.2000.3837","article-title":"HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins","volume":"301","author":"Bystroff","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023012508022474000_B8","doi-asserted-by":"crossref","first-page":"2628","DOI":"10.1093\/bioinformatics\/btm379","article-title":"Consensus Data Mining (CDM) Protein secondary structure prediction server: combining GOR V and fragment database mining (FDM)","volume":"23","author":"Cheng","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508022474000_B9","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1021\/bi00699a002","article-title":"Prediction of protein conformation","volume":"13","author":"Chou","year":"1974","journal-title":"Biochemistry"},{"key":"2023012508022474000_B10","doi-asserted-by":"crossref","first-page":"W197","DOI":"10.1093\/nar\/gkn238","article-title":"The Jpred 3 secondary structure prediction server","volume":"36","author":"Cole","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012508022474000_B11","author":"DeLano","year":"2002","journal-title":"The PyMOL Molecular Graphics System on World Wide Web."},{"key":"2023012508022474000_B12","doi-asserted-by":"crossref","first-page":"838","DOI":"10.1002\/prot.21298","article-title":"Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training","volume":"66","author":"Dor","year":"2007","journal-title":"Proteins"},{"key":"2023012508022474000_B13","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1016\/S0959-440X(96)80056-X","article-title":"Hidden Markov models","volume":"6","author":"Eddy","year":"1996","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012508022474000_B14","doi-asserted-by":"crossref","first-page":"16227","DOI":"10.1073\/pnas.0508415102","article-title":"Building native protein conformation from highly approximate backbone torsion angles","volume":"102","author":"Gong","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508022474000_B15","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1002\/prot.21527","article-title":"Prediction of protein secondary structure content for the twilight zone sequences","volume":"69","author":"Homaeian","year":"2007","journal-title":"Proteins"},{"key":"2023012508022474000_B16","doi-asserted-by":"crossref","first-page":"768","DOI":"10.1107\/S0907444902003359","article-title":"Conformations of amino acids in proteins","volume":"58","author":"Hovm\u00f6ller","year":"2002","journal-title":"Acta Crystallogr. D Biol. Crystallogr."},{"key":"2023012508022474000_B17","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1109\/MEMB.2005.1436459","article-title":"Proteins and their shape strings. An exemplary computer representation of protein structure","volume":"24","author":"Ison","year":"2005","journal-title":"IEEE Eng. Med. Biol. Mag."},{"key":"2023012508022474000_B18","doi-asserted-by":"crossref","first-page":"797","DOI":"10.1006\/jmbi.1999.2583","article-title":"GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences","volume":"287","author":"Jones","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023012508022474000_B19","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1006\/jmbi.1999.3091","article-title":"Protein secondary structure prediction based on position-specific scoring matrices","volume":"292","author":"Jones","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023012508022474000_B20","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2023012508022474000_B21","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1016\/0022-2836(90)90154-E","article-title":"Improvements in protein secondary structure prediction by an enhanced neural network","volume":"214","author":"Kneller","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023012508022474000_B22","doi-asserted-by":"crossref","first-page":"1612","DOI":"10.1093\/bioinformatics\/bth136","article-title":"Protein backbone angle prediction with machine learning approaches","volume":"20","author":"Kuang","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012508022474000_B23","doi-asserted-by":"crossref","first-page":"4321","DOI":"10.1093\/nar\/gkf544","article-title":"A comparison of profile hidden Markov model procedures for remote homology detection","volume":"30","author":"Madera","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012508022474000_B24","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1146\/annurev.biophys.29.1.291","article-title":"Comparative protein structure modeling of genes and genomes","volume":"29","author":"Marti-Renom","year":"2000","journal-title":"Annu. Rev. Biophys. Biomol. Struct."},{"key":"2023012508022474000_B25","doi-asserted-by":"crossref","first-page":"1531","DOI":"10.1093\/bioinformatics\/btg185","article-title":"Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments","volume":"19","author":"Mittelman","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508022474000_B26","doi-asserted-by":"crossref","first-page":"4239","DOI":"10.1093\/bioinformatics\/bti687","article-title":"Profile-based direct kernels for remote homology detection and fold recognition","volume":"21","author":"Rangwala","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508022474000_B27","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1006\/jmbi.1993.1413","article-title":"Prediction of protein secondary structure at better than 70% accuracy","volume":"232","author":"Rost","year":"1993","journal-title":"J. Mol. Biol."},{"key":"2023012508022474000_B28","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1002\/prot.340190108","article-title":"Combining evolutionary information and neural networks to predict protein secondary structure","volume":"19","author":"Rost","year":"1994","journal-title":"Proteins"},{"key":"2023012508022474000_B29","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1016\/S0022-2836(05)80007-5","article-title":"Redefining the goals of protein secondary structure prediction","volume":"235","author":"Rost","year":"1994","journal-title":"J. Mol. Biol."},{"key":"2023012508022474000_B30","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1016\/S0022-2836(02)01371-2","article-title":"COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance","volume":"326","author":"Sadreyev","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023012508022474000_B31","doi-asserted-by":"crossref","first-page":"310","DOI":"10.2174\/138920308785132703","article-title":"Describing and comparing protein structures using shape strings","volume":"9","author":"Shu","year":"2008","journal-title":"Curr. Protein Pept. Sci."},{"key":"2023012508022474000_B32","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1093\/bioinformatics\/btm618","article-title":"Prediction of zinc-binding sites in proteins from sequence","volume":"24","author":"Shu","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508022474000_B33","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1002\/(SICI)1097-0134(19990101)34:1<82::AID-PROT7>3.0.CO;2-A","article-title":"Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins","volume":"34","author":"Simons","year":"1999","journal-title":"Proteins"},{"key":"2023012508022474000_B34","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM-HMM comparison","volume":"21","author":"Soding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508022474000_B35","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1002\/prot.10474","article-title":"Enriching the sequence substitution matrix by structural information","volume":"54","author":"Teodorescu","year":"2004","journal-title":"Proteins"},{"key":"2023012508022474000_B36","doi-asserted-by":"crossref","first-page":"1589","DOI":"10.1093\/bioinformatics\/btg224","article-title":"PISCES: a protein sequence culling server","volume":"19","author":"Wang","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508022474000_B37","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1002\/prot.20435","article-title":"Protein secondary structure prediction with dihedral angles","volume":"59","author":"Wood","year":"2005","journal-title":"Proteins-Struct. Funct. & Bioinformatics"},{"key":"2023012508022474000_B38","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1006\/jmbi.1993.1464","article-title":"Protein secondary structure prediction using nearest-neighbor methods","volume":"232","author":"Yi","year":"1993","journal-title":"J. Mol. Biol."},{"key":"2023012508022474000_B39","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1002\/(SICI)1097-0134(19990201)34:2<220::AID-PROT7>3.0.CO;2-K","article-title":"A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment","volume":"34","author":"Zemla","year":"1999","journal-title":"Proteins-Struct. Funct. Genet."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/4\/470\/48854403\/bioinformatics_26_4_470.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/4\/470\/48854403\/bioinformatics_26_4_470.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:03:28Z","timestamp":1674633808000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/4\/470\/243021"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,12,9]]},"references-count":39,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2010,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp679","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,2,15]]},"published":{"date-parts":[[2009,12,9]]}}}