{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T22:45:58Z","timestamp":1776379558044,"version":"3.51.2"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2017,12,8]],"date-time":"2017-12-08T00:00:00Z","timestamp":1512691200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R01GM093123"],"award-info":[{"award-number":["R01GM093123"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a target protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein sequence into one of 1195 known folds, which is useful for both fold recognition and the study of sequence\u2013structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and maps it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding an average classification accuracy of 75.3%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 73.0%. We compare our method with a top profile\u2013profile alignment method\u2014HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 12.63\u201326.32% higher than HHSearch on template-free modeling targets and 3.39\u201317.09% higher on hard template-based modeling targets for top 1, 5 and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The DeepSF server is publicly available at: http:\/\/iris.rnet.missouri.edu\/DeepSF\/.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx780","type":"journal-article","created":{"date-parts":[[2017,12,7]],"date-time":"2017-12-07T20:13:50Z","timestamp":1512677630000},"page":"1295-1303","source":"Crossref","is-referenced-by-count":165,"title":["DeepSF: deep convolutional neural network for mapping protein sequences to folds"],"prefix":"10.1093","volume":"34","author":[{"given":"Jie","family":"Hou","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA"}]},{"given":"Badri","family":"Adhikari","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO, USA"}]},{"given":"Jianlin","family":"Cheng","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA"},{"name":"Informatics Institute, University of Missouri, Columbia, MO, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,12,8]]},"reference":[{"key":"2023012713004287800_btx780-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023012713004287800_btx780-B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023012713004287800_btx780-B3","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023012713004287800_btx780-B4","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.ymeth.2015.09.011","article-title":"Integrated protein function prediction by mining function associations, sequences, and protein\u2013protein and gene\u2013gene interaction networks","volume":"93","author":"Cao","year":"2016","journal-title":"Methods"},{"key":"2023012713004287800_btx780-B5","article-title":"SCOPe: manual Curation and artifact removal in the structural classification of proteinsextended database","volume":"429","author":"Chandonia","year":"2016","journal-title":"J. Mol. Biol"},{"key":"2023012713004287800_btx780-B6","doi-asserted-by":"crossref","first-page":"e1003926","DOI":"10.1371\/journal.pcbi.1003926","article-title":"ECOD: an evolutionary classification of protein domains","volume":"10","author":"Cheng","year":"2014","journal-title":"PLoS Computat. Biol"},{"key":"2023012713004287800_btx780-B7","doi-asserted-by":"crossref","first-page":"1456","DOI":"10.1093\/bioinformatics\/btl102","article-title":"A machine learning information retrieval approach to protein fold recognition","volume":"22","author":"Cheng","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B8","first-page":"179","author":"Chung","year":"2003"},{"key":"2023012713004287800_btx780-B9","doi-asserted-by":"crossref","first-page":"i332","DOI":"10.1093\/bioinformatics\/btw271","article-title":"CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction","volume":"32","author":"Cui","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B10","doi-asserted-by":"crossref","first-page":"1264","DOI":"10.1093\/bioinformatics\/btn112","article-title":"Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection","volume":"24","author":"Damoulas","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B11","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1146\/annurev.biophys.37.092707.153558","article-title":"The protein folding problem","volume":"37","author":"Dill","year":"2008","journal-title":"Annu. Rev. Biophys"},{"key":"2023012713004287800_btx780-B12","doi-asserted-by":"crossref","first-page":"2655","DOI":"10.1093\/bioinformatics\/btp500","article-title":"A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation","volume":"25","author":"Dong","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B13","doi-asserted-by":"crossref","first-page":"3066","DOI":"10.1093\/bioinformatics\/bts598","article-title":"Predicting protein residue\u2013residue contacts using deep networks and boosting","volume":"28","author":"Eickholt","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B14","doi-asserted-by":"crossref","first-page":"D291","DOI":"10.1093\/nar\/gkl959","article-title":"The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution","volume":"35","author":"Greene","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023012713004287800_btx780-B15","doi-asserted-by":"crossref","first-page":"1099","DOI":"10.1016\/S0969-2126(99)80177-4","article-title":"A systematic comparison of protein structure classifications: SCOP, CATH and FSSP","volume":"7","author":"Hadley","year":"1999","journal-title":"Structure"},{"key":"2023012713004287800_btx780-B16","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713004287800_btx780-B17","first-page":"3600","article-title":"The FSSP database of structurally aligned protein fold families","volume":"22","author":"Holm","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2023012713004287800_btx780-B18","doi-asserted-by":"crossref","first-page":"10428","DOI":"10.1021\/bi00107a010","article-title":"Folding of chymotrypsin inhibitor 2. 1. Evidence for a two-state transition","volume":"30","author":"Jackson","year":"1991","journal-title":"Biochemistry"},{"key":"2023012713004287800_btx780-B19","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1186\/1471-2105-15-S11-S14","article-title":"Improving protein fold recognition by random forest","volume":"15","author":"Jo","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023012713004287800_btx780-B20","doi-asserted-by":"crossref","first-page":"17573","DOI":"10.1038\/srep17573","article-title":"Improving protein fold recognition by deep learning networks","volume":"5","author":"Jo","year":"2015","journal-title":"Sci. Rep"},{"key":"2023012713004287800_btx780-B21","author":"Kalchbrenner","year":"2014"},{"key":"2023012713004287800_btx780-B22","author":"Kim","year":"2014"},{"key":"2023012713004287800_btx780-B23","doi-asserted-by":"crossref","DOI":"10.1002\/prot.24982","article-title":"CASP 11 target classification","volume":"84","author":"Kinch","year":"2016","journal-title":"Proteins Struct. Funct. Bioinform"},{"key":"2023012713004287800_btx780-B24","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1002\/prot.23190","article-title":"CASP9 target classification","volume":"79","author":"Kinch","year":"2011","journal-title":"Proteins Struct. Funct. Bioinform"},{"key":"2023012713004287800_btx780-B25","first-page":"1097","author":"Krizhevsky","year":"2012"},{"key":"2023012713004287800_btx780-B26","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B27","doi-asserted-by":"crossref","first-page":"e1003500","DOI":"10.1371\/journal.pcbi.1003500","article-title":"MRFalign: protein homology detection through alignment of Markov random fields","volume":"10","author":"Ma","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023012713004287800_btx780-B28","doi-asserted-by":"crossref","first-page":"2592","DOI":"10.1093\/bioinformatics\/btu352","article-title":"SSpro\/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity","volume":"30","author":"Magnan","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B29","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1093\/bioinformatics\/16.4.404","article-title":"The PSIPRED protein structure prediction server","volume":"16","author":"McGuffin","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B30","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol"},{"key":"2023012713004287800_btx780-B31","doi-asserted-by":"crossref","first-page":"1717","DOI":"10.1093\/bioinformatics\/btl170","article-title":"Ensemble classifier for protein fold pattern recognition","volume":"22","author":"Shen","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B32","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM\u2013HMM comparison","volume":"21","author":"S\u00f6ding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B33","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/TCBB.2014.2343960","article-title":"A deep learning network approach to ab initio protein secondary structure prediction","volume":"12","author":"Spencer","year":"2015","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023012713004287800_btx780-B34","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023012713004287800_btx780-B35","article-title":"Protein secondary structure prediction using deep convolutional neural fields","volume":"6","author":"Wang","year":"2016","journal-title":"Sci. Rep"},{"key":"2023012713004287800_btx780-B36","doi-asserted-by":"crossref","first-page":"e1005324","DOI":"10.1371\/journal.pcbi.1005324","article-title":"Accurate de novo prediction of protein contact map by ultra-deep learning model","volume":"13","author":"Wang","year":"2017","journal-title":"PLOS Comput. Biol"},{"key":"2023012713004287800_btx780-B37","doi-asserted-by":"crossref","first-page":"17315","DOI":"10.3390\/ijms160817315","article-title":"DeepCNF-D: predicting protein order\/disorder regions by weighted deep convolutional neural fields","volume":"16","author":"Wang","year":"2015","journal-title":"Int. J. Mol. Sci"},{"key":"2023012713004287800_btx780-B38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-1-4939-0366-5_1","article-title":"Protein structure modeling with MODELLER","volume":"1137","author":"Webb","year":"2014","journal-title":"Methods Mol Biol"},{"key":"2023012713004287800_btx780-B39","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1109\/TNB.2015.2450233","article-title":"Enhanced protein fold prediction method through a novel feature extraction technique","volume":"14","author":"Wei","year":"2015","journal-title":"IEEE Trans. Nanobiosci"},{"key":"2023012713004287800_btx780-B40","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1093\/bioinformatics\/btw768","article-title":"An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier","volume":"33","author":"Xia","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B41","doi-asserted-by":"crossref","first-page":"889","DOI":"10.1093\/bioinformatics\/btq066","article-title":"How significant is a protein structure similarity with TM-score= 0.5?","volume":"26","author":"Xu","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012713004287800_btx780-B42","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/8\/1295\/48914742\/bioinformatics_34_8_1295.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/8\/1295\/48914742\/bioinformatics_34_8_1295.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,29]],"date-time":"2023-08-29T19:21:33Z","timestamp":1693336893000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/8\/1295\/4708302"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,12,8]]},"references-count":42,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2018,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx780","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,4,15]]},"published":{"date-parts":[[2017,12,8]]}}}