{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T18:02:26Z","timestamp":1775325746650,"version":"3.50.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2020,3,18]],"date-time":"2020-03-18T00:00:00Z","timestamp":1584489600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100012579","name":"ISF","doi-asserted-by":"publisher","award":["1122\/14"],"award-info":[{"award-number":["1122\/14"]}],"id":[{"id":"10.13039\/100012579","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The Protein Data Bank (PDB), the ultimate source for data in structural biology, is inherently imbalanced. To alleviate biases, virtually all structural biology studies use nonredundant (NR) subsets of the PDB, which include only a fraction of the available data. An alternative approach, dubbed redundancy-weighting (RW), down-weights redundant entries rather than discarding them. This approach may be particularly helpful for machine-learning (ML) methods that use the PDB as their source for data. Methods for secondary structure prediction (SSP) have greatly improved over the years with recent studies achieving above 70% accuracy for eight-class (DSSP) prediction. As these methods typically incorporate ML techniques, training on RW datasets might improve accuracy, as well as pave the way toward larger and more informative secondary structure classes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>This study compares the SSP performances of deep-learning models trained on either RW or NR datasets. We show that training on RW sets consistently results in better prediction of 3- (HCE), 8- (DSSP) and 13-class (STR2) secondary structures.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The ML models, the datasets used for their derivation and testing, and a stand-alone SSP program for DSSP and STR2 predictions, are freely available under LGPL license in http:\/\/meshi1.cs.bgu.ac.il\/rw.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa196","type":"journal-article","created":{"date-parts":[[2020,3,16]],"date-time":"2020-03-16T12:35:41Z","timestamp":1584362141000},"page":"3733-3738","source":"Crossref","is-referenced-by-count":10,"title":["Redundancy-weighting the PDB for detailed secondary structure prediction using deep-learning models"],"prefix":"10.1093","volume":"36","author":[{"given":"Tomer","family":"Sidi","sequence":"first","affiliation":[{"name":"Department of Computer Science , Ben-Gurion University, P.O.B 653, Be'er Sheva 84105, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chen","family":"Keasar","sequence":"additional","affiliation":[{"name":"Department of Computer Science , Ben-Gurion University, P.O.B 653, Be'er Sheva 84105, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,3,18]]},"reference":[{"key":"2023063011071120500_btaa196-B1","first-page":"19","article-title":"TensorFlow: large-scale machine learning on heterogeneous distributed systems","author":"Abadi","year":"2016"},{"key":"2023063011071120500_btaa196-B2","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1107\/97809553602060000722","volume-title":"International Tables for Crystallography Volume F: Crystallography of Biological Macromolecules","author":"Berman","year":"2006"},{"key":"2023063011071120500_btaa196-B3","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1016\/S0022-2836(77)80200-3","article-title":"The Protein Data Bank: a computer-based archival file for macromolecular structures","volume":"112","author":"Bernstein","year":"1977","journal-title":"J. Mol. Biol"},{"key":"2023063011071120500_btaa196-B4","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023063011071120500_btaa196-B5","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1186\/1471-2105-8-113","article-title":"Improved residue contact prediction using support vector machines and a large feature set","volume":"8","author":"Cheng","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023063011071120500_btaa196-B6","author":"Chollet","year":"2015"},{"key":"2023063011071120500_btaa196-B7","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1021\/bi00699a002","article-title":"Prediction of protein conformation","volume":"13","author":"Chou","year":"1974","journal-title":"Biochemistry"},{"key":"2023063011071120500_btaa196-B8","doi-asserted-by":"crossref","first-page":"838","DOI":"10.1002\/prot.21298","article-title":"Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training","volume":"66","author":"Dor","year":"2007","journal-title":"Proteins"},{"key":"2023063011071120500_btaa196-B9","article-title":"High quality prediction of protein Q8 secondary structure by diverse neural network architectures","volume":"1811","author":"Drori","year":"2018","journal-title":"ArXiv"},{"key":"2023063011071120500_btaa196-B10","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1002\/prot.1173","article-title":"Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations","volume":"45","author":"Fariselli","year":"2001","journal-title":"Proteins"},{"key":"2023063011071120500_btaa196-B11","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/0022-2836(78)90297-8","article-title":"Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins","volume":"120","author":"Garnier","year":"1978","journal-title":"J. Mol. Biol"},{"key":"2023063011071120500_btaa196-B12","doi-asserted-by":"crossref","first-page":"3804","DOI":"10.1093\/nar\/gkg504","article-title":"ORFeus: detection of distant homology using sequence profiles and predicted secondary structure","volume":"31","author":"Ginalski","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023063011071120500_btaa196-B13","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1016\/j.sbi.2008.01.006","article-title":"The structure of protein evolution and the evolution of protein structure","volume":"18","author":"Goldstein","year":"2008","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023063011071120500_btaa196-B14","doi-asserted-by":"crossref","first-page":"11476","DOI":"10.1038\/srep11476","article-title":"Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning","volume":"5","author":"Heffernan","year":"2015","journal-title":"Sci. Rep"},{"key":"2023063011071120500_btaa196-B15","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023063011071120500_btaa196-B16","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1006\/jmbi.1999.3091","article-title":"Protein secondary structure prediction based on position-specific scoring matrices 11 edited by G. von Heijne","volume":"292","author":"Jones","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023063011071120500_btaa196-B17","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2023063011071120500_btaa196-B18","doi-asserted-by":"crossref","first-page":"3931","DOI":"10.1093\/bioinformatics\/bti630","article-title":"MESHI: a new library of java classes for molecular modeling","volume":"21","author":"Kalisman","year":"2005","journal-title":"Bioinformatics"},{"key":"2023063011071120500_btaa196-B19","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1002\/prot.10369","article-title":"Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry","volume":"51","author":"Karchin","year":"2003","journal-title":"Proteins"},{"key":"2023063011071120500_btaa196-B20","doi-asserted-by":"crossref","first-page":"2453","DOI":"10.1093\/bioinformatics\/btn438","article-title":"Predict-2nd: a tool for generalized protein local structure prediction","volume":"24","author":"Katzman","year":"2008","journal-title":"Bioinformatics"},{"key":"2023063011071120500_btaa196-B21","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1002\/prot.25674","article-title":"NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning","volume":"87","author":"Klausen","year":"2019","journal-title":"Proteins"},{"key":"2023063011071120500_btaa196-B22","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-Hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023063011071120500_btaa196-B23","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1093\/bioinformatics\/btq020","article-title":"Improving protein secondary structure prediction using a simple k-mer model","volume":"26","author":"Madera","year":"2010","journal-title":"Bioinformatics (Oxford)"},{"key":"2023063011071120500_btaa196-B24","doi-asserted-by":"crossref","first-page":"2496","DOI":"10.1093\/bioinformatics\/btx222","article-title":"SVMQA: support-vector-machine-based protein single-model quality assessment","volume":"33","author":"Manavalan","year":"2017","journal-title":"Bioinformatics"},{"key":"2023063011071120500_btaa196-B25","doi-asserted-by":"crossref","first-page":"3789","DOI":"10.1093\/nar\/gkg620","article-title":"UniqueProt: creating representative protein sequence sets","volume":"31","author":"Mika","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023063011071120500_btaa196-B26","doi-asserted-by":"crossref","first-page":"534","DOI":"10.1021\/ma00145a039","article-title":"Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation","volume":"18","author":"Miyazawa","year":"1985","journal-title":"Macromolecules"},{"key":"2023063011071120500_btaa196-B27","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1006\/jmbi.1996.0114","article-title":"Residue\u2014residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading","volume":"256","author":"Miyazawa","year":"1996","journal-title":"J. Mol. Biol"},{"key":"2023063011071120500_btaa196-B28","article-title":"An introduction to convolutional neural networks","author":"O\u2019Shea","year":"2015","journal-title":"ArXiv"},{"key":"2023063011071120500_btaa196-B29","doi-asserted-by":"crossref","first-page":"1021","DOI":"10.1002\/prot.24787","article-title":"MQAPsingle: a quasi single-model approach for estimation of the quality of individual protein structure models","volume":"84","author":"Pawlowski","year":"2016","journal-title":"Proteins"},{"key":"2023063011071120500_btaa196-B30","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1186\/1471-2105-13-224","article-title":"Improved model quality assessment using ProQ2","volume":"13","author":"Ray","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023063011071120500_btaa196-B31","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1016\/S0076-6879(04)83004-0","article-title":"Protein structure prediction using Rosetta","volume":"383","author":"Rohl","year":"2004","journal-title":"Methods Enzymol"},{"key":"2023063011071120500_btaa196-B32","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1006\/jmbi.1993.1413","article-title":"Prediction of protein secondary structure at better than 70% accuracy","volume":"232","author":"Rost","year":"1993","journal-title":"J. Mol. Biol"},{"key":"2023063011071120500_btaa196-B33","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM\u2013HMM comparison","volume":"21","author":"S\u00f6ding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023063011071120500_btaa196-B34","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/TCBB.2014.2343960","article-title":"A deep learning network approach to ab initio protein secondary structure prediction","volume":"12","author":"Spencer","year":"2015","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023063011071120500_btaa196-B35","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"UniProt: a worldwide hub of protein knowledge","volume":"47","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023063011071120500_btaa196-B36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-48786-x","article-title":"Deeper profiles and cascaded recurrent and convolutional neural networks for state-of-the-art protein secondary structure prediction","volume":"9","author":"Torrisi","year":"2019","journal-title":"Sci. Rep"},{"key":"2023063011071120500_btaa196-B37","doi-asserted-by":"crossref","first-page":"W430","DOI":"10.1093\/nar\/gkw306","article-title":"RaptorX-Property: a web server for protein structure property prediction","volume":"44","author":"Wang","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023063011071120500_btaa196-B38","doi-asserted-by":"crossref","first-page":"e1005324","DOI":"10.1371\/journal.pcbi.1005324","article-title":"Accurate de novo prediction of protein contact map by ultra-deep learning model","volume":"13","author":"Wang","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023063011071120500_btaa196-B39","doi-asserted-by":"crossref","first-page":"2295","DOI":"10.1093\/bioinformatics\/btu242","article-title":"Redundancy-weighting for better inference of protein structural features","volume":"30","author":"","year":"2014","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa196\/33294374\/btaa196.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/12\/3733\/50747206\/bioinformatics_36_12_3733.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/12\/3733\/50747206\/bioinformatics_36_12_3733.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T11:08:18Z","timestamp":1688123298000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/12\/3733\/5809527"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,3,18]]},"references-count":39,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2020,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa196","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,6,15]]},"published":{"date-parts":[[2020,3,18]]}}}