{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T06:27:33Z","timestamp":1774592853371,"version":"3.50.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"20","license":[{"start":{"date-parts":[[2021,5,13]],"date-time":"2021-05-13T00:00:00Z","timestamp":1620864000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["DP180102060"],"award-info":[{"award-number":["DP180102060"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["DP210101875"],"award-info":[{"award-number":["DP210101875"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,10,25]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Knowing protein secondary and other one-dimensional structural properties are essential for accurate protein structure and function prediction. As a result, many methods have been developed for predicting these one-dimensional structural properties. However, most methods relied on evolutionary information that may not exist for many proteins due to a lack of sequence homologs. Moreover, it is computationally intensive for obtaining evolutionary information as the library of protein sequences continues to expand exponentially. Here, we developed a new single-sequence method called SPOT-1D-Single based on a large training dataset of 39 120 proteins deposited prior to 2016 and an ensemble of hybrid long-short-term-memory bidirectional neural network and convolutional neural network.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We showed that SPOT-1D-Single consistently improves over SPIDER3-Single and ProteinUnet for secondary structure, solvent accessibility, contact number and backbone angles prediction for all seven independent test sets (TEST2018, SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ, CASP12 and CASP13 free-modeling targets). For example, the predicted three-state secondary structure\u2019s accuracy ranges from 72.12% to 74.28% by SPOT-1D-Single, compared to 69.1\u201372.6% by SPIDER3-Single and 70.6\u201373% by ProteinUnet. SPOT-1D-Single also predicts SS3 and SS8 with 6.24% and 6.98% better accuracy than SPOT-1D on SPOT-2018 proteins with no homologs (Neff\u2009=\u20091), respectively. The new method\u2019s improvement over existing techniques is due to a larger training set combined with ensembled learning.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Standalone-version of SPOT-1D-Single is available at https:\/\/github.com\/jas-preet\/SPOT-1D-Single. Direct prediction can also be made at https:\/\/sparks-lab.org\/server\/spot-1d-single. The datasets used in this research can also be downloaded from GitHub.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab316","type":"journal-article","created":{"date-parts":[[2021,4,26]],"date-time":"2021-04-26T20:46:37Z","timestamp":1619469997000},"page":"3464-3472","source":"Crossref","is-referenced-by-count":40,"title":["SPOT-1D-Single: improving the single-sequence-based prediction of protein secondary structure, backbone angles, solvent accessibility and half-sphere exposures using a large training set and ensembled deep learning"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4974-5188","authenticated-orcid":false,"given":"Jaspreet","family":"Singh","sequence":"first","affiliation":[{"name":"Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University , Brisbane, QLD 4111, Australia"}]},{"given":"Thomas","family":"Litfin","sequence":"additional","affiliation":[{"name":"School of Information and Communication Technology, Griffith University , Southport, QLD 4222, Australia"}]},{"given":"Kuldip","family":"Paliwal","sequence":"additional","affiliation":[{"name":"Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University , Brisbane, QLD 4111, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0478-5533","authenticated-orcid":false,"given":"Jaswinder","family":"Singh","sequence":"additional","affiliation":[{"name":"Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University , Brisbane, QLD 4111, Australia"}]},{"given":"Anil Kumar","family":"Hanumanthappa","sequence":"additional","affiliation":[{"name":"Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University , Brisbane, QLD 4111, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9958-5699","authenticated-orcid":false,"given":"Yaoqi","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Information and Communication Technology, Griffith University , Southport, QLD 4222, Australia"},{"name":"Institute for Glycomics, Griffith University , Southport, QLD 4222, Australia"},{"name":"Institue for Systems and Physical Biology, Shenzhen Bay Laboratory , Shenzhen 518055, China"}]}],"member":"286","published-online":{"date-parts":[[2021,5,13]]},"reference":[{"key":"2023051609024297100_btab316-B1","author":"Agarap","year":"2018"},{"key":"2023051609024297100_btab316-B2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-019-2932-0","article-title":"ProteinNet: a standardized data set for machine learning of protein structure","volume":"20","author":"AlQuraishi","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023051609024297100_btab316-B3","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023051609024297100_btab316-B4","first-page":"1","volume-title":"Noise Reduction in Speech Processing","author":"Benesty","year":"2009"},{"key":"2023051609024297100_btab316-B5","doi-asserted-by":"crossref","first-page":"e1003926","DOI":"10.1371\/journal.pcbi.1003926","article-title":"ECOD: an evolutionary classification of protein domains","volume":"10","author":"Cheng","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023051609024297100_btab316-B6","doi-asserted-by":"crossref","first-page":"1361","DOI":"10.1002\/prot.25767","article-title":"Estimation of model accuracy in CASP13","volume":"87","author":"Cheng","year":"2019","journal-title":"Proteins"},{"key":"2023051609024297100_btab316-B7","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/248338a0","article-title":"Hydrophobic bonding and accessible surface area in proteins","volume":"248","author":"Chothia","year":"1974","journal-title":"Nature"},{"key":"2023051609024297100_btab316-B8","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1023\/A:1008392405740","article-title":"Protein backbone angle restraints from searching a database for chemical shift and sequence homology","volume":"13","author":"Cornilescu","year":"1999","journal-title":"J. Biomol. NMR"},{"key":"2023051609024297100_btab316-B9","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1002\/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-Q","article-title":"Application of multiple sequence alignment profiles to improve protein secondary structure prediction","volume":"40","author":"Cuff","year":"2000","journal-title":"Proteins"},{"key":"2023051609024297100_btab316-B10","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1002\/prot.25487","article-title":"MUFOLD-SS: new deep inception-inside-inception networks for protein secondary structure prediction","volume":"86","author":"Fang","year":"2018","journal-title":"Proteins"},{"key":"2023051609024297100_btab316-B11","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1007\/978-1-4939-6406-2_10","volume-title":"Prediction of Protein Secondary Structure","author":"Faraggi","year":"2017"},{"key":"2023051609024297100_btab316-B12","doi-asserted-by":"crossref","first-page":"4039","DOI":"10.1093\/bioinformatics\/bty481","article-title":"Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks","volume":"34","author":"Hanson","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051609024297100_btab316-B13","doi-asserted-by":"crossref","first-page":"2403","DOI":"10.1093\/bioinformatics\/bty1006","article-title":"Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks","volume":"35","author":"Hanson","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051609024297100_btab316-B14","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1093\/bioinformatics\/btv665","article-title":"Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins","volume":"32","author":"Heffernan","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051609024297100_btab316-B15","doi-asserted-by":"crossref","first-page":"2210","DOI":"10.1002\/jcc.25534","article-title":"Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning","volume":"39","author":"Heffernan","year":"2018","journal-title":"J. Comput. Chem"},{"key":"2023051609024297100_btab316-B16","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1186\/s12859-019-3220-8","article-title":"Modeling aspects of the language of life through transfer-learning protein sequences","volume":"20","author":"Heinzinger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023051609024297100_btab316-B17","author":"Ioffe","year":"2015"},{"key":"2023051609024297100_btab316-B18","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2023051609024297100_btab316-B19","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1002\/prot.25674","article-title":"NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning","volume":"87","author":"Klausen","year":"2019","journal-title":"Proteins"},{"key":"2023051609024297100_btab316-B20","article-title":"ProteinUnet-An efficient alternative to SPIDER3-single for sequence-based prediction of protein secondary structures","volume":"42","author":"Kotowski","year":"2020","journal-title":"J. Comput. Chem"},{"key":"2023051609024297100_btab316-B21","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1002\/prot.25823","article-title":"Critical assessment of methods of protein structure prediction (CASP)-Round XIII","volume":"87","author":"Kryshtafovych","year":"2019","journal-title":"Proteins"},{"key":"2023051609024297100_btab316-B22","doi-asserted-by":"crossref","first-page":"1082","DOI":"10.1002\/prot.25798","article-title":"Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13","volume":"87","author":"Li","year":"2019","journal-title":"Proteins"},{"key":"2023051609024297100_btab316-B23","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-04898-2","volume-title":"International Encyclopedia of Statistical Science","author":"Lovric","year":"2011"},{"key":"2023051609024297100_btab316-B24","doi-asserted-by":"crossref","first-page":"2040","DOI":"10.1002\/jcc.23718","article-title":"Predicting backbone C\u03b1 angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network","volume":"35","author":"Lyons","year":"2014","journal-title":"J. Comput. Chem"},{"key":"2023051609024297100_btab316-B25","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1093\/bioinformatics\/16.4.404","article-title":"The PSIPRED protein structure prediction server","volume":"16","author":"McGuffin","year":"2000","journal-title":"Bioinformatics"},{"key":"2023051609024297100_btab316-B26","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1126\/science.aah4043","article-title":"Protein structure determination using metagenome sequence data","volume":"355","author":"Ovchinnikov","year":"2017","journal-title":"Science"},{"key":"2023051609024297100_btab316-B27","first-page":"9689","article-title":"Evaluating protein transfer learning with TAPE","volume-title":"Advances in Neural Information Processing Systems","author":"Rao","year":"2019"},{"key":"2023051609024297100_btab316-B28","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1038\/nmeth.1818","article-title":"HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment","volume":"9","author":"Remmert","year":"2012","journal-title":"Nat. Methods"},{"key":"2023051609024297100_btab316-B29","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","author":"Rives","year":"2021"},{"key":"2023051609024297100_btab316-B30","first-page":"234","author":"Ronneberger","year":"2015"},{"key":"2023051609024297100_btab316-B31","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1002\/prot.25407","article-title":"Assessment of contact predictions in CASP12: co-evolution and deep learning coming of age","volume":"86","author":"Schaarschmidt","year":"2018","journal-title":"Proteins"},{"key":"2023051609024297100_btab316-B32","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1109\/78.650093","article-title":"Bidirectional recurrent neural networks","volume":"45","author":"Schuster","year":"1997","journal-title":"IEEE Trans. Signal Process"},{"key":"2023051609024297100_btab316-B33","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023051609024297100_btab316-B34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-019-3019-7","article-title":"HH-suite3 for fast remote homology detection and deep protein annotation","volume":"20","author":"Steinegger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023051609024297100_btab316-B35","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2023051609024297100_btab316-B36","doi-asserted-by":"crossref","first-page":"18962","DOI":"10.1038\/srep18962","article-title":"Protein secondary structure prediction using deep convolutional neural fields","volume":"6","author":"Wang","year":"2016","journal-title":"Sci. Rep"},{"key":"2023051609024297100_btab316-B37","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1093\/bioinformatics\/btz477","article-title":"Protein contact prediction using metagenome sequence data and residual neural networks","volume":"36","author":"Wu","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051609024297100_btab316-B38","doi-asserted-by":"crossref","first-page":"5021","DOI":"10.1093\/bioinformatics\/btaa629","article-title":"OPUS-TASS: a protein backbone torsion angles and secondary structure predictor based on ensemble neural networks","volume":"36","author":"Xu","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051609024297100_btab316-B39","first-page":"482","article-title":"Sixty-five years of the long march in protein secondary structure prediction: the final stretch?","volume":"19","author":"Yang","year":"2018","journal-title":"Brief. Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab316\/38391268\/btab316.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/20\/3464\/50338313\/btab316.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/20\/3464\/50338313\/btab316.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T09:03:53Z","timestamp":1684227833000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/20\/3464\/6275257"}},"subtitle":[],"editor":[{"given":"Dr. Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,5,13]]},"references-count":39,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2021,10,25]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab316","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,10,15]]},"published":{"date-parts":[[2021,5,13]]}}}