{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T05:36:14Z","timestamp":1770528974521,"version":"3.49.0"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2018,4,26]],"date-time":"2018-04-26T00:00:00Z","timestamp":1524700800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100010438","name":"Francis Crick Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100010438","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000289","name":"Cancer Research UK","doi-asserted-by":"publisher","award":["FC001002"],"award-info":[{"award-number":["FC001002"]}],"id":[{"id":"10.13039\/501100000289","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000265","name":"UK Medical Research Council","doi-asserted-by":"crossref","award":["FC001002"],"award-info":[{"award-number":["FC001002"]}],"id":[{"id":"10.13039\/501100000265","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100004440","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["FC001002"],"award-info":[{"award-number":["FC001002"]}],"id":[{"id":"10.13039\/100004440","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Research Council Advanced Grant","award":["695558"],"award-info":[{"award-number":["695558"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue\u2013residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>DeepCov is freely available at https:\/\/github.com\/psipred\/DeepCov.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty341","type":"journal-article","created":{"date-parts":[[2018,4,25]],"date-time":"2018-04-25T19:59:04Z","timestamp":1524686344000},"page":"3308-3315","source":"Crossref","is-referenced-by-count":172,"title":["High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features"],"prefix":"10.1093","volume":"34","author":[{"given":"David T","family":"Jones","sequence":"first","affiliation":[{"name":"Department of Computer Science, University College London, London, UK"},{"name":"Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2671-2140","authenticated-orcid":false,"given":"Shaun M","family":"Kandathil","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University College London, London, UK"},{"name":"Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK"}]}],"member":"286","published-online":{"date-parts":[[2018,4,26]]},"reference":[{"key":"2023012712484199000_bty341-B1","author":"Al-Rfou","year":"2016"},{"key":"2023012712484199000_bty341-B2","doi-asserted-by":"crossref","first-page":"9122","DOI":"10.1073\/pnas.1702664114","article-title":"Origins of coevolution between residues distant in protein 3D structures","volume":"114","author":"Anishchenko","year":"2017","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712484199000_bty341-B3","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1002\/prot.22934","article-title":"Learning generative models for protein fold families","volume":"79","author":"Balakrishnan","year":"2011","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023012712484199000_bty341-B4","first-page":"78","author":"Buchan","year":"2017"},{"key":"2023012712484199000_bty341-B5","doi-asserted-by":"crossref","first-page":"2684","DOI":"10.1093\/bioinformatics\/btx217","article-title":"EigenTHREADER: analogous protein fold recognition by efficient contact map threading","volume":"33","author":"Buchan","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B6","doi-asserted-by":"crossref","first-page":"e1000633.","DOI":"10.1371\/journal.pcbi.1000633","article-title":"Disentangling direct from indirect co-evolution of residues in protein alignments","volume":"6","author":"Burger","year":"2010","journal-title":"PLOS Comput. Biol"},{"key":"2023012712484199000_bty341-B7","doi-asserted-by":"crossref","first-page":"e1003926","DOI":"10.1371\/journal.pcbi.1003926","article-title":"ECOD: an Evolutionary Classification of Protein Domains","volume":"10","author":"Cheng","year":"2014","journal-title":"PLOS Comput. Biol"},{"key":"2023012712484199000_bty341-B8","doi-asserted-by":"crossref","first-page":"113.","DOI":"10.1186\/1471-2105-8-113","article-title":"Improved residue contact prediction using support vector machines and a large feature set","volume":"8","author":"Cheng","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012712484199000_bty341-B9","doi-asserted-by":"crossref","first-page":"1224.","DOI":"10.12688\/f1000research.11543.1","article-title":"Co-evolution techniques are reshaping the way we do structural bioinformatics","volume":"6","author":"de Oliveira","year":"2017","journal-title":"F1000Research"},{"key":"2023012712484199000_bty341-B10","doi-asserted-by":"crossref","first-page":"2449","DOI":"10.1093\/bioinformatics\/bts475","article-title":"Deep architectures for protein contact map prediction","volume":"28","author":"Di Lena","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B11","author":"Dieleman","year":"2015"},{"key":"2023012712484199000_bty341-B12","author":"Dumoulin","year":"2016"},{"key":"2023012712484199000_bty341-B13","doi-asserted-by":"crossref","first-page":"S12.","DOI":"10.1186\/1471-2105-14-S14-S12","article-title":"A study and benchmark of DNcon: a method for protein residue\u2013residue contact prediction using deep networks","volume":"14","author":"Eickholt","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023012712484199000_bty341-B14","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.jcp.2014.07.024","article-title":"Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences","volume":"276","author":"Ekeberg","year":"2014","journal-title":"J. Comput. Phys"},{"key":"2023012712484199000_bty341-B15","doi-asserted-by":"crossref","first-page":"012707","DOI":"10.1103\/PhysRevE.87.012707","article-title":"Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models","volume":"87","author":"Ekeberg","year":"2013","journal-title":"Phys. Rev. E"},{"key":"2023012712484199000_bty341-B16","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B17","first-page":"249","volume-title":"Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics","author":"Glorot","year":"2010"},{"key":"2023012712484199000_bty341-B18","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1002\/prot.340180402","article-title":"Correlated mutations and residue contacts in proteins","volume":"18","author":"G\u00f6bel","year":"1994","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023012712484199000_bty341-B19","first-page":"4222","volume-title":"Advances in Neural Information Processing Systems 29","author":"Golkov","year":"2016"},{"key":"2023012712484199000_bty341-B20","first-page":"1319","volume-title":"Proceedings of the 30th International Conference on Machine Learning","author":"Goodfellow","year":"2013"},{"key":"2023012712484199000_bty341-B21","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1002\/prot.20739","article-title":"CASP6 assessment of contact prediction","volume":"61","author":"Gra\u00f1a","year":"2005","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023012712484199000_bty341-B22","first-page":"770","author":"He","year":"2016"},{"key":"2023012712484199000_bty341-B23","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","article-title":"Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups","volume":"29","author":"Hinton","year":"2012","journal-title":"IEEE Signal Process. Mag"},{"key":"2023012712484199000_bty341-B24","author":"Hinton","year":"2012"},{"key":"2023012712484199000_bty341-B25","author":"Ioffe","year":"2015"},{"key":"2023012712484199000_bty341-B26","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1093\/bioinformatics\/btr638","article-title":"PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments","volume":"28","author":"Jones","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B27","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1093\/bioinformatics\/btu791","article-title":"MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins","volume":"31","author":"Jones","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B28","doi-asserted-by":"crossref","first-page":"15674","DOI":"10.1073\/pnas.1314045110","article-title":"Assessing the utility of coevolution-based residue\u2013residue contact predictions in a sequence- and structure-rich era","volume":"110","author":"Kamisetty","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712484199000_bty341-B29","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1002\/prot.24374","article-title":"One contact for every twelve residues allows robust and accurate topology-level protein structure modeling","volume":"82","author":"Kim","year":"2014","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023012712484199000_bty341-B30","author":"Kingma","year":"2014"},{"key":"2023012712484199000_bty341-B31","doi-asserted-by":"crossref","first-page":"e92197.","DOI":"10.1371\/journal.pone.0092197","article-title":"De novo structure prediction of globular proteins aided by sequence variation-derived contacts","volume":"9","author":"Kosciolek","year":"2014","journal-title":"Plos One"},{"key":"2023012712484199000_bty341-B32","first-page":"1097","volume-title":"Advances in Neural Information Processing Systems 25","author":"Krizhevsky","year":"2012"},{"key":"2023012712484199000_bty341-B33","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2023012712484199000_bty341-B34","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B35","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.cels.2017.11.014","article-title":"Enhancing evolutionary couplings with deep convolutional neural networks","volume":"6","author":"Liu","year":"2018","journal-title":"Cell Syst"},{"key":"2023012712484199000_bty341-B36","doi-asserted-by":"crossref","first-page":"e28766","DOI":"10.1371\/journal.pone.0028766","article-title":"Protein 3D structure computed from evolutionary sequence variation","volume":"6","author":"Marks","year":"2011","journal-title":"Plos One"},{"key":"2023012712484199000_bty341-B37","doi-asserted-by":"crossref","first-page":"2859","DOI":"10.1093\/bioinformatics\/btx332","article-title":"Predicting accurate contacts in thousands of Pfam domain families using PconsC3","volume":"33","author":"Michel","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B38","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1002\/prot.24943","article-title":"New encouraging developments in contact prediction: assessment of the CASP11 results","volume":"84","author":"Monastyrskyy","year":"2016","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023012712484199000_bty341-B39","doi-asserted-by":"crossref","first-page":"E1293","DOI":"10.1073\/pnas.1111471108","article-title":"Direct-coupling analysis of residue coevolution captures native contacts across many protein families","volume":"108","author":"Morcos","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712484199000_bty341-B40","doi-asserted-by":"crossref","first-page":"E1540","DOI":"10.1073\/pnas.1120036109","article-title":"Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis","volume":"109","author":"Nugent","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712484199000_bty341-B41","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1002\/prot.24974","article-title":"Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta","volume":"84","author":"Ovchinnikov","year":"2016","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023012712484199000_bty341-B42","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1145\/1553374.1553486","volume-title":"Proceedings of the 26th Annual International Conference on Machine Learning","author":"Raina","year":"2009"},{"key":"2023012712484199000_bty341-B43","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"2023012712484199000_bty341-B44","doi-asserted-by":"crossref","first-page":"3128","DOI":"10.1093\/bioinformatics\/btu500","article-title":"CCMpred\u2014fast and precise prediction of protein residue\u2013residue contacts from correlated mutations","volume":"30","author":"Seemayer","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B45","author":"Shelhamer","year":"2016"},{"key":"2023012712484199000_bty341-B46","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023012712484199000_bty341-B47","doi-asserted-by":"crossref","first-page":"303.","DOI":"10.1186\/s12859-017-1713-x","article-title":"EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction","volume":"18","author":"Stahl","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023012712484199000_bty341-B48","doi-asserted-by":"crossref","first-page":"1115","DOI":"10.1109\/ICDAR.2005.251","volume-title":"Eighth International Conference on Document Analysis and Recognition (ICDAR'05)","author":"Steinkraus","year":"2005"},{"key":"2023012712484199000_bty341-B49","author":"Sutskever","year":"2013"},{"key":"2023012712484199000_bty341-B50","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1016\/j.sbi.2013.04.001","article-title":"Prediction of contacts from correlated sequence substitutions","volume":"23","author":"Taylor","year":"2013","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023012712484199000_bty341-B51","doi-asserted-by":"crossref","first-page":"e1005324","DOI":"10.1371\/journal.pcbi.1005324","article-title":"Accurate de novo prediction of protein contact map by ultra-deep learning model","volume":"13","author":"Wang","year":"2017","journal-title":"PLOS Comput. Biol"},{"key":"2023012712484199000_bty341-B52","first-page":"67","author":"Wang","year":"2017"},{"key":"2023012712484199000_bty341-B53","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1073\/pnas.0805923106","article-title":"Identification of direct residue contacts in protein\u2013protein interaction by message passing","volume":"106","author":"Weigt","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712484199000_bty341-B54","doi-asserted-by":"crossref","first-page":"2675","DOI":"10.1093\/bioinformatics\/btx296","article-title":"A deep learning framework for improving long-range residue\u2013residue contact prediction using a hierarchical strategy","volume":"33","author":"Xiong","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712484199000_bty341-B55","author":"Xiong","year":"2016"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/19\/3308\/48918856\/bioinformatics_34_19_3308.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/19\/3308\/48918856\/bioinformatics_34_19_3308.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:41:47Z","timestamp":1674826907000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/19\/3308\/4987145"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,4,26]]},"references-count":55,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2018,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty341","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,10,1]]},"published":{"date-parts":[[2018,4,26]]}}}