{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T05:29:16Z","timestamp":1775280556761,"version":"3.50.1"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2024,11,22]],"date-time":"2024-11-22T00:00:00Z","timestamp":1732233600000},"content-version":"vor","delay-in-days":21,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"German Ministry for Research and Education"},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["DFG-GZ: RO1320\/4-1"],"award-info":[{"award-number":["DFG-GZ: RO1320\/4-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Exhaustive experimental annotation of the effect of all known protein variants remains daunting and expensive, stressing the need for scalable effect predictions. We introduce VespaG, a blazingly fast missense amino acid variant effect predictor, leveraging protein language model (pLM) embeddings as input to a minimal deep learning model.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To overcome the sparsity of experimental training data, we created a dataset of 39 million single amino acid variants from the human proteome applying the multiple sequence alignment-based effect predictor GEMME as a pseudo standard-of-truth. This setup increases interpretability compared to the baseline pLM and is easily retrainable with novel or updated pLMs. Assessed against the ProteinGym benchmark (217 multiplex assays of variant effect\u2014MAVE\u2014with 2.5 million variants), VespaG achieved a mean Spearman correlation of 0.48\u2009\u00b1\u20090.02, matching top-performing methods evaluated on the same data. VespaG has the advantage of being orders of magnitude faster, predicting all mutational landscapes of all proteins in proteomes such as Homo sapiens or Drosophila melanogaster in under 30\u2009min on a consumer laptop (12-core CPU, 16 GB RAM).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>VespaG is available freely at https:\/\/github.com\/jschlensok\/vespag. The associated training data and predictions are available at https:\/\/doi.org\/10.5281\/zenodo.11085958.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae621","type":"journal-article","created":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T23:24:05Z","timestamp":1732145045000},"source":"Crossref","is-referenced-by-count":33,"title":["Expert-guided protein language models enable accurate and blazingly fast fitness prediction"],"prefix":"10.1093","volume":"40","author":[{"given":"C\u00e9line","family":"Marquet","sequence":"first","affiliation":[{"name":"School of Computation, Information, and Technology, Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich , Garching\/Munich 85748,","place":["Germany"]}]},{"given":"Julius","family":"Schlensok","sequence":"additional","affiliation":[{"name":"School of Computation, Information, and Technology, Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich , Garching\/Munich 85748,","place":["Germany"]}]},{"given":"Marina","family":"Abakarova","sequence":"additional","affiliation":[{"name":"Laboratory of Computational and Quantitative Biology, UMR 7238, Sorbonne Universit\u00e9, CNRS, IBPS , Paris 75005,","place":["France"]},{"name":"UMR U1284, Universite Paris Cit\u00e9, INSERM , Paris 75004,","place":["France"]}]},{"given":"Burkhard","family":"Rost","sequence":"additional","affiliation":[{"name":"School of Computation, Information, and Technology, Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich , Garching\/Munich 85748,","place":["Germany"]},{"name":"School of Life Sciences Weihenstephan, Technical University of Munich , Freising,","place":["Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4870-6304","authenticated-orcid":false,"given":"Elodie","family":"Laine","sequence":"additional","affiliation":[{"name":"Laboratory of Computational and Quantitative Biology, UMR 7238, Sorbonne Universit\u00e9, CNRS, IBPS , Paris 75005,","place":["France"]},{"name":"Institut Universitaire de France , Paris 75005,","place":["France"]}]}],"member":"286","published-online":{"date-parts":[[2024,11,22]]},"reference":[{"key":"2024112518323894700_btae621-B1","doi-asserted-by":"crossref","first-page":"evad201","DOI":"10.1093\/gbe\/evad201","article-title":"Alignment-based protein mutational landscape prediction: doing more with less","volume":"15","author":"Abakarova","year":"2023","journal-title":"Genome Biol Evol"},{"key":"2024112518323894700_btae621-B2","doi-asserted-by":"crossref","first-page":"726","DOI":"10.1038\/s41416-022-02059-z","article-title":"Lynch syndrome, molecular mechanisms and variant classification","volume":"128","author":"Abildgaard","year":"2023","journal-title":"Br J Cancer"},{"key":"2024112518323894700_btae621-B3","first-page":"7","article-title":"Predicting functional effect of human missense mutations using PolyPhen-2","volume":"76","author":"Adzhubei","year":"2013","journal-title":"Curr Protocols Human Genet"},{"key":"2024112518323894700_btae621-B6","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1038\/s41586-021-04184-w","article-title":"De novo protein design by deep network hallucination","volume":"600","author":"Anishchenko","year":"2021","journal-title":"Nature"},{"key":"2024112518323894700_btae621-B7133206","doi-asserted-by":"crossref","first-page":"1075570","DOI":"10.3389\/fmolb.2022.1075570","article-title":"Challenges in predicting stabilizing variations: An exploration","volume":"9","author":"Benevenuta","year":"2023","journal-title":"Front Mol Biosci"},{"key":"2024112518323894700_btae621-B3397396","doi-asserted-by":"crossref","first-page":"e82593","DOI":"10.7554\/eLife.82593","article-title":"Rapid protein stability prediction using deep learning representations","volume":"12","author":"Blaabjerg","year":"2023","journal-title":"eLife"},{"key":"2024112518323894700_btae621-B7","doi-asserted-by":"crossref","first-page":"4175","DOI":"10.1038\/s41467-023-39909-0","article-title":"Discovering functionally important sites in proteins","volume":"14","author":"Cagiada","year":"2023","journal-title":"Nat Commun"},{"key":"2024112518323894700_btae621-B8","doi-asserted-by":"crossref","first-page":"eadg7492","DOI":"10.1126\/science.adg7492","article-title":"Accurate proteome-wide missense variant effect prediction with AlphaMissense","volume":"381","author":"Cheng","year":"2023","journal-title":"Science"},{"key":"2024112518323894700_btae621-B9","author":"Devlin","year":"2019"},{"key":"2024112518323894700_btae621-B10","doi-asserted-by":"publisher","author":"Ding","year":"2024","DOI":"10.1101\/2024.03.07.584001"},{"key":"2024112518323894700_btae621-B11","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: Toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2024112518323894700_btae621-B12","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1038\/nmeth.3027","article-title":"Deep mutational scanning: a new style of protein science","volume":"11","author":"Fowler","year":"2014","journal-title":"Nat Methods"},{"key":"2024112518323894700_btae621-B13","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1038\/s41586-021-04043-8","article-title":"Disease variant prediction with deep generative models of evolutionary data","volume":"599","author":"Frazer","year":"2021","journal-title":"Nature"},{"key":"2024112518323894700_btae621-B14","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1186\/s13059-023-02935-8","article-title":"A comprehensive map of human glucokinase variant activity","volume":"24","author":"Gersing","year":"2023","journal-title":"Genome Biol"},{"key":"2024112518323894700_btae621-B15","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1016\/j.cels.2017.11.003","article-title":"Quantitative missense variant effect prediction using large-scale mutagenesis data","volume":"6","author":"Gray","year":"2018","journal-title":"Cell Syst"},{"key":"2024112518323894700_btae621-B16","doi-asserted-by":"crossref","first-page":"3937","DOI":"10.1016\/j.jmb.2013.07.028","article-title":"News from the protein mutability landscape","volume":"425","author":"Hecht","year":"2013","journal-title":"JMol Biol"},{"key":"2024112518323894700_btae621-B17","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2164-16-S8-S1","article-title":"Better prediction of functional effects for sequence variants","volume":"16","author":"Hecht","year":"2015","journal-title":"BMC Genomics"},{"key":"2024112518323894700_btae621-B18","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1038\/nbt.3769","article-title":"Mutation effects predicted from sequence co-variation","volume":"35","author":"Hopf","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2024112518323894700_btae621-B19","doi-asserted-by":"crossref","first-page":"e24109","DOI":"10.1371\/journal.pone.0024109","article-title":"RosettaRemodel: a generalized framework for flexible backbone protein design","volume":"6","author":"Huang","year":"2011","journal-title":"PLoS One"},{"key":"2024112518323894700_btae621-B20","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1016\/j.ajhg.2016.08.016","article-title":"REVEL: an ensemble method for predicting the pathogenicity of rare missense variants","volume":"99","author":"Ioannidis","year":"2016","journal-title":"Am J Hum Genet"},{"key":"2024112518323894700_btae621-B21","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2024112518323894700_btae621-B22","doi-asserted-by":"crossref","first-page":"e2122676119","DOI":"10.1073\/pnas.2122676119","article-title":"Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation","volume":"119","author":"Kim","year":"2022","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024112518323894700_btae621-B24","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1016\/j.chom.2022.06.008","article-title":"The logic of virus evolution","volume":"30","author":"Koonin","year":"2022","journal-title":"Cell Host Microbe"},{"key":"2024112518323894700_btae621-B25","doi-asserted-by":"crossref","first-page":"2604","DOI":"10.1093\/molbev\/msz179","article-title":"GEMME: a simple and fast global epistatic model predicting mutational effects","volume":"36","author":"Laine","year":"2019","journal-title":"Mol BiolEvol"},{"key":"2024112518323894700_btae621-B26","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2024112518323894700_btae621-B27","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2024112518323894700_btae621-B28","doi-asserted-by":"crossref","first-page":"e11474","DOI":"10.15252\/msb.202211474","article-title":"Updated benchmarking of variant effect predictors using deep mutational scanning","volume":"19","author":"Livesey","year":"2023","journal-title":"Mol Syst Biol"},{"key":"2024112518323894700_btae621-B29","doi-asserted-by":"publisher","author":"Livesey","year":"2024","DOI":"10.48550\/arXiv.2404.10807"},{"key":"2024112518323894700_btae621-B31","doi-asserted-by":"crossref","first-page":"1608","DOI":"10.1038\/s41598-017-01054-2","article-title":"Common sequence variants affect molecular function more than rare variants?","volume":"7","author":"Mahlich","year":"2017","journal-title":"Sci Rep"},{"key":"2024112518323894700_btae621-B32","doi-asserted-by":"crossref","first-page":"1629","DOI":"10.1007\/s00439-021-02411-y","article-title":"Embeddings from protein language models predict conservation and variant effects","volume":"141","author":"Marquet","year":"2022","journal-title":"Hum Genet"},{"key":"2024112518323894700_btae621-B33","doi-asserted-by":"crossref","first-page":"bio036103","DOI":"10.1242\/bio.036103","article-title":"Extending chemical perturbations of the ubiquitin fitness landscape in a classroom setting reveals new constraints on sequence tolerance","volume":"7","author":"Mavor","year":"2018","journal-title":"Biol Open"},{"key":"2024112518323894700_btae621-B34","doi-asserted-by":"crossref","first-page":"e15802","DOI":"10.7554\/eLife.15802","article-title":"Determination of ubiquitin fitness landscapes under different chemical stresses in a classroom setting","volume":"5","author":"Mavor","year":"2016","journal-title":"eLife"},{"key":"2024112518323894700_btae621-B35","doi-asserted-by":"publisher","author":"Meier","year":"2021","DOI":"10.1101\/2021.07.09.450648"},{"key":"2024112518323894700_btae621-B37","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1038\/s41592-022-01488-1","article-title":"ColabFold: making protein folding accessible to all","volume":"19","author":"Mirdita","year":"2022","journal-title":"Nat Methods"},{"key":"2024112518323894700_btae621-B38","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1016\/B978-0-12-802104-0.00024-X","volume-title":"Pharmacognosy","author":"Murray","year":"2017"},{"key":"2024112518323894700_btae621-B39","doi-asserted-by":"crossref","first-page":"3812","DOI":"10.1093\/nar\/gkg509","article-title":"SIFT: predicting amino acid changes that affect protein function","volume":"31","author":"Ng","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2024112518323894700_btae621-B40","doi-asserted-by":"crossref","first-page":"968","DOI":"10.1016\/j.cels.2023.10.002","article-title":"ProGen2: Exploring the boundaries of protein language models","volume":"14","author":"Nijkamp","year":"2023","journal-title":"Cell Systems"},{"key":"2024112518323894700_btae621-B41","doi-asserted-by":"publisher","first-page":"e2017228118","DOI":"10.1073\/pnas.2017228118","article-title":"Protein sequence design by conformational landscape optimization","volume":"118","author":"Norn","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024112518323894700_btae621-B42","doi-asserted-by":"publisher","author":"Notin","year":"2022","DOI":"10.48550\/ARXIV.2205.13760"},{"key":"2024112518323894700_btae621-B43","doi-asserted-by":"publisher","author":"Notin","year":"2022","DOI":"10.1101\/2022.12.07.519495"},{"key":"2024112518323894700_btae621-B44","doi-asserted-by":"publisher","author":"Notin","year":"2023","DOI":"10.1101\/2023.12.07.570727"},{"key":"2024112518323894700_btae621-B46","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1186\/s12859-020-3439-4","article-title":"Variant effect predictions capture some aspects of deep mutational scanning experiments","volume":"21","author":"Reeb","year":"2020","journal-title":"BMC Bioinform"},{"key":"2024112518323894700_btae621-B47","doi-asserted-by":"crossref","first-page":"1363","DOI":"10.1016\/j.jmb.2013.01.032","article-title":"Analyses of the effects of all ubiquitin point mutants on yeast growth rate","volume":"425","author":"Roscoe","year":"2013","journal-title":"J Mol Biol"},{"key":"2024112518323894700_btae621-B50","doi-asserted-by":"crossref","first-page":"1776","DOI":"10.1038\/s41587-023-01714-x","article-title":"Global detection of human variants and isoforms by deep proteome sequencing","volume":"41","author":"Sinitcyn","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2024112518323894700_btae621-B52","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nature Biotechnol"},{"key":"2024112518323894700_btae621-B53","doi-asserted-by":"publisher","author":"Su","year":"2024","DOI":"10.1101\/2023.10.01.560349"},{"key":"2024112518323894700_btae621-B54","doi-asserted-by":"crossref","first-page":"D523","DOI":"10.1093\/nar\/gkac1052","article-title":"UniProt: the universal protein knowledgebase in 2023","volume":"51","author":"The UniProt Consortium","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2024112518323894700_btae621-B55","doi-asserted-by":"crossref","first-page":"2176","DOI":"10.1016\/j.bpj.2022.12.031","article-title":"Interpreting the molecular mechanisms of disease variants in human transmembrane proteins","volume":"122","author":"Tiemann","year":"2023","journal-title":"Biophys J"},{"key":"2024112518323894700_btae621-B56","doi-asserted-by":"publisher","author":"Truong","year":"2023","DOI":"10.48550\/arXiv.2306.06156"},{"key":"2024112518323894700_btae621-B57","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1038\/s41586-023-06328-6","article-title":"Mega-scale experimental analysis of protein folding stability in biology and design","volume":"620","author":"Tsuboyama","year":"2023","journal-title":"Nature"},{"key":"2024112518323894700_btae621-B58","doi-asserted-by":"publisher","author":"Wolf","year":"2020","DOI":"10.48550\/arXiv.1910.03771"},{"key":"2024112518323894700_btae621-B60","doi-asserted-by":"crossref","first-page":"1496","DOI":"10.1073\/pnas.1914677117","article-title":"Improved protein structure prediction using predicted interresidue orientations","volume":"117","author":"Yang","year":"2020","journal-title":"Proc Natl Acad Sci USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae621\/60791521\/btae621.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/11\/btae621\/60811415\/btae621.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/11\/btae621\/60811415\/btae621.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,25]],"date-time":"2024-11-25T13:33:02Z","timestamp":1732541582000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae621\/7907184"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,11,1]]},"references-count":52,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae621","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.04.24.590982","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,11]]},"published":{"date-parts":[[2024,11,1]]},"article-number":"btae621"}}