{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T01:58:09Z","timestamp":1775872689027,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009818","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,2,3]],"date-time":"2022-02-03T00:00:00Z","timestamp":1643846400000}}],"reference-count":54,"publisher":"Public Library of Science (PLoS)","issue":"1","license":[{"start":{"date-parts":[[2022,1,24]],"date-time":"2022-01-24T00:00:00Z","timestamp":1642982400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010434","name":"\u201cla Caixa\u201d Foundation","doi-asserted-by":"publisher","award":["LCF\/BQ\/PI18\/11630003"],"award-info":[{"award-number":["LCF\/BQ\/PI18\/11630003"]}],"id":[{"id":"10.13039\/100010434","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Spanish Ministry of Science","award":["RYC2019-026415-I"],"award-info":[{"award-number":["RYC2019-026415-I"]}]},{"DOI":"10.13039\/501100003741","name":"Instituci\u00f3 Catalana de Recerca i Estudis Avan\u00e7ats","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003741","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence tools, such as AlphaFold, that can predict with high accuracy the folding of proteins for which the availability of homology templates is limited. Here we quantify the effect of the recently released AlphaFold database of protein structural models in our knowledge on human proteins. Our results indicate that our current baseline for structural coverage of 48%, considering experimentally-derived or template-based homology models, elevates up to 76% when including AlphaFold predictions. At the same time the fraction of dark proteome is reduced from 26% to just 10% when AlphaFold models are considered. Furthermore, although the coverage of disease-associated genes and mutations was near complete before AlphaFold release (69% of Clinvar pathogenic mutations and 88% of oncogenic mutations), AlphaFold models still provide an additional coverage of 3% to 13% of these critically important sets of biomedical genes and mutations. Finally, we show how the contribution of AlphaFold models to the structural coverage of non-human organisms, including important pathogenic bacteria, is significantly larger than that of the human proteome. Overall, our results show that the sequence-structure gap of human proteins has almost disappeared, an outstanding success of direct consequences for the knowledge on the human genome and the derived medical applications.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009818","type":"journal-article","created":{"date-parts":[[2022,1,24]],"date-time":"2022-01-24T13:40:41Z","timestamp":1643031641000},"page":"e1009818","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":124,"title":["The structural coverage of the human proteome before and after AlphaFold"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7900-4239","authenticated-orcid":true,"given":"Eduard","family":"Porta-Pardo","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3991-0514","authenticated-orcid":true,"given":"Victoria","family":"Ruiz-Serra","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3656-8274","authenticated-orcid":true,"given":"Samuel","family":"Valentini","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8937-6789","authenticated-orcid":true,"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,1,24]]},"reference":[{"key":"pcbi.1009818.ref001","doi-asserted-by":"crossref","first-page":"662","DOI":"10.1038\/181662a0","article-title":"A three-dimensional model of the myoglobin molecule obtained by x-ray analysis","volume":"181","author":"JC Kendrew","year":"1958","journal-title":"Nature"},{"key":"pcbi.1009818.ref002","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"HM Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009818.ref003","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1002\/j.1460-2075.1986.tb04288.x","article-title":"The relation between the divergence of sequence and structure in proteins","volume":"5","author":"C Chothia","year":"1986","journal-title":"EMBO J"},{"key":"pcbi.1009818.ref004","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1006\/jmbi.1993.1626","article-title":"Comparative protein modelling by satisfaction of spatial restraints","volume":"234","author":"A Sali","year":"1993","journal-title":"J Mol Biol"},{"key":"pcbi.1009818.ref005","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/0022-2836(92)90693-E","article-title":"Topology fingerprint approach to the inverse protein folding problem","volume":"227","author":"A Godzik","year":"1992","journal-title":"J Mol Biol"},{"key":"pcbi.1009818.ref006","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1002\/prot.340180402","article-title":"Correlated mutations and residue contacts in proteins","volume":"18","author":"U G\u00f6bel","year":"1994","journal-title":"Proteins"},{"key":"pcbi.1009818.ref007","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1126\/science.1853201","article-title":"A method to identify protein sequences that fold into a known three-dimensional structure","volume":"253","author":"JU Bowie","year":"1991","journal-title":"Science"},{"key":"pcbi.1009818.ref008","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1038\/358086a0","article-title":"A new approach to protein fold recognition","volume":"358","author":"DT Jones","year":"1992","journal-title":"Nature"},{"key":"pcbi.1009818.ref009","doi-asserted-by":"crossref","first-page":"ii","DOI":"10.1002\/prot.340230303","article-title":"A large-scale experiment to assess protein structure prediction methods","volume":"23","author":"J Moult","year":"1995","journal-title":"Proteins"},{"key":"pcbi.1009818.ref010","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1002\/prot.25823","article-title":"Critical assessment of methods of protein structure prediction (CASP)-Round XIII.","volume":"87","author":"A Kryshtafovych","year":"2019","journal-title":"Proteins"},{"key":"pcbi.1009818.ref011","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1126\/science.1107387","article-title":"Computational thermostabilization of an enzyme","volume":"308","author":"A Korkegian","year":"2005","journal-title":"Science"},{"key":"pcbi.1009818.ref012","doi-asserted-by":"crossref","first-page":"146","DOI":"10.2174\/157340911795677602","article-title":"Molecular docking: a powerful approach for structure-based drug discovery.","volume":"7","author":"X-Y Meng","year":"2011","journal-title":"Curr Comput Aided Drug Des"},{"key":"pcbi.1009818.ref013","doi-asserted-by":"crossref","first-page":"3719","DOI":"10.1158\/0008-5472.CAN-15-3190","article-title":"Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure.","volume":"76","author":"C Tokheim","year":"2016","journal-title":"Cancer Res"},{"key":"pcbi.1009818.ref014","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1038\/nmeth.3289","article-title":"dSysMap: exploring the edgetic role of disease mutations","volume":"12","author":"R Mosca","year":"2015","journal-title":"Nat Methods"},{"key":"pcbi.1009818.ref015","doi-asserted-by":"crossref","first-page":"1034","DOI":"10.1016\/j.cell.2018.07.034","article-title":"Comprehensive Characterization of Cancer Driver Genes and Mutations","volume":"174","author":"MH Bailey","year":"2018","journal-title":"Cell"},{"key":"pcbi.1009818.ref016","doi-asserted-by":"crossref","article-title":"CHASMplus reveals the scope of somatic missense mutations driving human cancers.","author":"C Tokheim","DOI":"10.1016\/j.cels.2019.05.005"},{"key":"pcbi.1009818.ref017","article-title":"Predicting functional effect of human missense mutations using PolyPhen-2","author":"I Adzhubei","year":"2013","journal-title":"Curr Protoc Hum Genet"},{"key":"pcbi.1009818.ref018","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1186\/s13059-020-01954-z","article-title":"Comprehensive assessment of computational algorithms in predicting cancer driver mutations","volume":"21","author":"H Chen","year":"2020","journal-title":"Genome Biol"},{"key":"pcbi.1009818.ref019","article-title":"Highly accurate protein structure prediction with AlphaFold","author":"J Jumper","year":"2021","journal-title":"Nature"},{"key":"pcbi.1009818.ref020","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/d41586-020-03348-4","article-title":"It will change everything\u201d: DeepMind\u2019s AI makes gigantic leap in solving protein structures.","volume":"588","author":"E. Callaway","year":"2020","journal-title":"Nature"},{"key":"pcbi.1009818.ref021","article-title":"Highly accurate protein structure prediction for the human proteome","author":"K Tunyasuvunakool","year":"2021","journal-title":"Nature"},{"key":"pcbi.1009818.ref022","doi-asserted-by":"crossref","first-page":"D884","DOI":"10.1093\/nar\/gkaa942","article-title":"Ensembl 2021.","volume":"49","author":"KL Howe","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009818.ref023","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"UniProt Consortium","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009818.ref024","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","author":"SF Altschul","year":"1990","journal-title":"Journal of Molecular Biology"},{"key":"pcbi.1009818.ref025","first-page":"2021","article-title":"A structural biology community assessment of AlphaFold 2 applications.","author":"M Akdel","year":"2021","journal-title":"bioRxiv"},{"key":"pcbi.1009818.ref026","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/S1093-3263(00)00138-8","article-title":"Intrinsically disordered protein","volume":"19","author":"AK Dunker","year":"2001","journal-title":"J Mol Graph Model"},{"key":"pcbi.1009818.ref027","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: The protein families database in 2021","volume":"49","author":"J Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009818.ref028","doi-asserted-by":"crossref","first-page":"15898","DOI":"10.1073\/pnas.1508380112","article-title":"Unexpected features of the dark proteome","volume":"112","author":"N Perdig\u00e3o","year":"2015","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1009818.ref029","doi-asserted-by":"crossref","first-page":"1551","DOI":"10.1038\/nprot.2013.092","article-title":"Large-scale gene function analysis with the PANTHER classification system","volume":"8","author":"H Mi","year":"2013","journal-title":"Nat Protoc"},{"key":"pcbi.1009818.ref030","doi-asserted-by":"crossref","DOI":"10.3390\/life11020088","article-title":"The Emerging Physiological Role of AGMO 10 Years after Its Gene Identification.","volume":"11","author":"S Sailer","year":"2021","journal-title":"Life"},{"key":"pcbi.1009818.ref031","doi-asserted-by":"crossref","first-page":"1229","DOI":"10.1172\/JCI124159","article-title":"DEGS1-associated aberrant sphingolipid metabolism impairs nervous system function in humans","volume":"129","author":"G Karsai","year":"2019","journal-title":"J Clin Invest"},{"key":"pcbi.1009818.ref032","doi-asserted-by":"crossref","first-page":"1266","DOI":"10.1096\/fj.04-3580com","article-title":"Polymorphism of the PEMT gene and susceptibility to nonalcoholic fatty liver disease (NAFLD).","volume":"19","author":"J Song","year":"2005","journal-title":"FASEB J"},{"key":"pcbi.1009818.ref033","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1016\/j.drudis.2017.08.004","article-title":"Structural coverage of the proteome for pharmaceutical applications.","volume":"22","author":"JC Somody","year":"2017","journal-title":"Drug Discov Today"},{"key":"pcbi.1009818.ref034","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1002\/humu.22","article-title":"SNPs, protein structure, and disease","volume":"17","author":"Z Wang","year":"2001","journal-title":"Hum Mutat"},{"key":"pcbi.1009818.ref035","doi-asserted-by":"crossref","first-page":"34490","DOI":"10.1038\/srep34490","article-title":"Insights into cancer severity from biomolecular interaction mechanisms","volume":"6","author":"F Raimondi","year":"2016","journal-title":"Sci Rep"},{"key":"pcbi.1009818.ref036","first-page":"D845","article-title":"The DisGeNET knowledge platform for disease genomics: 2019 update","volume":"48","author":"J Pi\u00f1ero","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009818.ref037","article-title":"OncoKB: A Precision Oncology Knowledge Base.","volume":"2017","author":"D Chakravarty","year":"2017","journal-title":"JCO Precis Oncol"},{"key":"pcbi.1009818.ref038","doi-asserted-by":"crossref","first-page":"D835","DOI":"10.1093\/nar\/gkz972","article-title":"ClinVar: improvements to accessing data","volume":"48","author":"MJ Landrum","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009818.ref039","article-title":"In silico saturation mutagenesis of cancer genes","author":"F Mui\u00f1os","year":"2021","journal-title":"Nature"},{"key":"pcbi.1009818.ref040","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1002\/humu.22963","article-title":"mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome","volume":"37","author":"MJ Meyer","year":"2016","journal-title":"Hum Mutat"},{"key":"pcbi.1009818.ref041","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","author":"M Baek","year":"2021","journal-title":"Science"},{"key":"pcbi.1009818.ref042","doi-asserted-by":"crossref","first-page":"7070","DOI":"10.1093\/nar\/gky587","article-title":"Loose ends: almost one in five human genes still have unresolved coding status","volume":"46","author":"F Abascal","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009818.ref043","doi-asserted-by":"crossref","first-page":"e1004518","DOI":"10.1371\/journal.pcbi.1004518","article-title":"A Pan-Cancer Catalogue of Cancer Driver Protein Interaction Interfaces.","volume":"11","author":"E Porta-Pardo","year":"2015","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1009818.ref044","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1038\/nbt.2106","article-title":"Three-dimensional reconstruction of protein networks provides insight into human genetic disease","volume":"30","author":"X Wang","year":"2012","journal-title":"Nat Biotechnol"},{"key":"pcbi.1009818.ref045","first-page":"2021","article-title":"Protein complex prediction with AlphaFold-Multimer.","author":"R Evans","year":"2021","journal-title":"bioRxiv"},{"key":"pcbi.1009818.ref046","doi-asserted-by":"crossref","first-page":"2098","DOI":"10.1093\/bioinformatics\/btv092","article-title":"AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain-domain interaction prediction","volume":"31","author":"D Xu","year":"2015","journal-title":"Bioinformatics"},{"key":"pcbi.1009818.ref047","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1016\/j.annonc.2020.05.008","article-title":"Neoantigen prediction and computational perspectives towards clinical benefit: recommendations from the ESMO Precision Medicine Working Group.","volume":"31","author":"L De Mattos-Arruda","year":"2020","journal-title":"Ann Oncol"},{"key":"pcbi.1009818.ref048","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1186\/1471-2105-8-298","article-title":"Predicting active site residue annotations in the Pfam database","volume":"8","author":"J Mistry","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1009818.ref049","doi-asserted-by":"crossref","first-page":"W329","DOI":"10.1093\/nar\/gky384","article-title":"IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding","volume":"46","author":"B M\u00e9sz\u00e1ros","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009818.ref050","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1038\/s41592-021-01117-3","article-title":"Critical assessment of protein intrinsic disorder prediction.","volume":"18","author":"CAID Predictors, DisProt Curators","year":"2021","journal-title":"Nat Methods."},{"key":"pcbi.1009818.ref051","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1038\/nature19057","article-title":"Analysis of protein-coding genetic variation in 60,706 humans","volume":"536","author":"M Lek","year":"2016","journal-title":"Nature"},{"key":"pcbi.1009818.ref052","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/s13059-016-0974-4","article-title":"The Ensembl Variant Effect Predictor","volume":"17","author":"W McLaren","year":"2016","journal-title":"Genome Biol"},{"key":"pcbi.1009818.ref053","doi-asserted-by":"crossref","first-page":"678","DOI":"10.1111\/j.1541-0420.2011.01616.x","article-title":"ggplot2: Elegant Graphics for Data Analysis by WICKHAM, H.","author":"L. Wilkinson","year":"2011","journal-title":"Biometrics"},{"key":"pcbi.1009818.ref054","doi-asserted-by":"crossref","first-page":"1605","DOI":"10.1002\/jcc.20084","article-title":"UCSF Chimera\u2014a visualization system for exploratory research and analysis","volume":"25","author":"EF Pettersen","year":"2004","journal-title":"J Comput Chem"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009818","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,2,3]],"date-time":"2022-02-03T00:00:00Z","timestamp":1643846400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009818","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,3]],"date-time":"2022-02-03T13:56:24Z","timestamp":1643896584000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009818"}},"subtitle":[],"editor":[{"given":"Silvio C. E.","family":"Tosatto","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,1,24]]},"references-count":54,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,1,24]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009818","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.08.03.454980","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,24]]}}}