{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T12:17:52Z","timestamp":1772799472722,"version":"3.50.1"},"reference-count":67,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T00:00:00Z","timestamp":1700697600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the \u201clanguage of proteins\u201d invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.<\/jats:p>","DOI":"10.3389\/fbinf.2023.1304099","type":"journal-article","created":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T11:10:58Z","timestamp":1700737858000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":30,"title":["The promises of large language models for protein design and modeling"],"prefix":"10.3389","volume":"3","author":[{"given":"Giorgio","family":"Valentini","sequence":"first","affiliation":[]},{"given":"Dario","family":"Malchiodi","sequence":"additional","affiliation":[]},{"given":"Jessica","family":"Gliozzo","sequence":"additional","affiliation":[]},{"given":"Marco","family":"Mesiti","sequence":"additional","affiliation":[]},{"given":"Mauricio","family":"Soto-Gomez","sequence":"additional","affiliation":[]},{"given":"Alberto","family":"Cabri","sequence":"additional","affiliation":[]},{"given":"Justin","family":"Reese","sequence":"additional","affiliation":[]},{"given":"Elena","family":"Casiraghi","sequence":"additional","affiliation":[]},{"given":"Peter N.","family":"Robinson","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,11,23]]},"reference":[{"key":"B1","volume-title":"Layer normalization","author":"Ba","year":"2016"},{"key":"B2","article-title":"Neural machine translation by jointly learning to align and translate","volume-title":"3rd international conference on learning representations","author":"Bahdanau","year":"2015"},{"key":"B3","doi-asserted-by":"crossref","first-page":"610","DOI":"10.1145\/3442188.3445922","article-title":"On the dangers of stochastic parrots: can language models be too big?","volume-title":"Proceedings of the 2021 ACM conference on fairness, accountability, and transparency","author":"Bender","year":"2021"},{"key":"B4","first-page":"1137","article-title":"A neural probabilistic language model","volume":"3","author":"Bengio","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"B5","doi-asserted-by":"crossref","first-page":"3889","DOI":"10.18653\/v1\/2022.acl-long.269","article-title":"Is attention explanation? an introduction to the debate","volume-title":"Proceedings of the 60th annual Meeting of the Association for computational linguistics (volume 1: long papers)","author":"Bibal","year":"2022"},{"key":"B6","article-title":"Language models can explain neurons in language models","author":"Bills","year":"2023","journal-title":"OpenAI"},{"key":"B7","first-page":"07258","article-title":"On the opportunities and risks of foundation models","author":"Bommasani","year":"2021","journal-title":"ArXiv abs\/2108"},{"key":"B8","doi-asserted-by":"publisher","first-page":"2102","DOI":"10.1093\/bioinformatics\/btac020","article-title":"ProteinBERT: a universal deep-learning model of protein sequence and function","volume":"38","author":"Brandes","year":"2022","journal-title":"Bioinformatics"},{"key":"B9","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"B10","first-page":"04380","article-title":"Model compression as constrained optimization, with application to neural nets. part V: combining compressions","author":"Carreira-Perpi\u00f1\u00e1n","year":"2021","journal-title":"Corr. abs\/2107"},{"key":"B11","doi-asserted-by":"publisher","first-page":"840","DOI":"10.1038\/s42256-022-00532-1","article-title":"Transformer-based protein generation with regularized latent space optimization","volume":"4","author":"Castro","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"B12","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1145\/1390156.1390177","article-title":"A unified architecture for natural language processing: deep neural networks with multitask learning","volume-title":"Proceedings of the 25th international conference on machine learning","author":"Collobert","year":"2008"},{"key":"B13","first-page":"4171","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologie","author":"Devlin","year":"2019"},{"key":"B14","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1186\/1471-2105-10-323","article-title":"A stochastic context free grammar based framework for analysis of protein sequences","volume":"10","author":"Dyrka","year":"2009","journal-title":"BMC Bioinforma."},{"key":"B15","doi-asserted-by":"publisher","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"Prottrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"B16","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1038\/s42256-022-00499-z","article-title":"Controllable protein design with language models","volume":"4","author":"Ferruz","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"B17","doi-asserted-by":"publisher","first-page":"4348","DOI":"10.1038\/s41467-022-32007-7","article-title":"Protgpt2 is a deep unsupervised language model for protein design","volume":"13","author":"Ferruz","year":"2022","journal-title":"Nat. Commun."},{"key":"B18","doi-asserted-by":"publisher","first-page":"1061","DOI":"10.1162\/tacl_a_00413","article-title":"Compressing large-scale transformer-based models: a case study on BERT","volume":"9","author":"Ganesh","year":"2021","journal-title":"Trans. Assoc. Comput. Linguistics"},{"key":"B19","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1038\/s41598-020-79682-4","article-title":"Transformer neural network for protein-specific de novo drug generation as a machine translation problem","volume":"11","author":"Grechishnikova","year":"2021","journal-title":"Sci. Rep."},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1101\/2023.07.23.550085","article-title":"ProstT5: bilingual Language Model for protein sequence and structure","author":"Heinzinger","year":"2023","journal-title":"bioRxiv"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.1038\/s41587-023-01763-2","article-title":"Efficient evolution of human antibodies from general protein language models","author":"Hie","year":"2023","journal-title":"Nat. Biotechnol"},{"key":"B22","doi-asserted-by":"crossref","first-page":"187","DOI":"10.18653\/v1\/2020.acl-demos.22","article-title":"exBERT: a visual analysis tool to explore learned representations in transformer models","volume-title":"Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations","author":"Hoover","year":"2020"},{"key":"B23","doi-asserted-by":"publisher","first-page":"1295","DOI":"10.1093\/bioinformatics\/btx780","article-title":"DeepSF: deep convolutional neural network for mapping protein sequences to folds","volume":"34","author":"Hou","year":"2017","journal-title":"Bioinformatics"},{"key":"B24","doi-asserted-by":"crossref","first-page":"4198","DOI":"10.18653\/v1\/2020.acl-main.386","article-title":"Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness?","volume-title":"Proceedings of the 58th annual meeting of the association for computational linguistics","author":"Jacovi","year":"2020"},{"key":"B25","article-title":"Residual connections encourage iterative inference","volume-title":"International conference on learning representations","author":"Jastrzebski","year":"2018"},{"key":"B26","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"B27","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1909.05858","article-title":"CTRL: a conditional transformer Language Model for controllable generation","author":"Keskar","year":"2019","journal-title":"arXiv"},{"key":"B28","first-page":"16","article-title":"Bert meets shapley: extending shap explanations to transformer-based classifiers","author":"Kokalj","year":"2021","journal-title":"Proc. EACL Hackashop News Media Content Analysis Automated Rep. Generation"},{"key":"B29","volume-title":"Multiplicative LSTM for sequence modelling. ICLR Workshop track","author":"Krause","year":"2017"},{"key":"B30","doi-asserted-by":"publisher","first-page":"1346","DOI":"10.1038\/s41551-022-00914-1","article-title":"Self-supervised learning in medicine and healthcare","volume":"6","author":"Krishnan","year":"2022","journal-title":"Nat. Biomed. Eng."},{"key":"B31","doi-asserted-by":"publisher","first-page":"1501","DOI":"10.1006\/jmbi.1994.1104","article-title":"Hidden markov models in computational biology: applications to protein modeling","volume":"235","author":"Krogh","year":"1994","journal-title":"J. Mol. Biol."},{"key":"B32","doi-asserted-by":"publisher","first-page":"729","DOI":"10.1016\/j.tibtech.2019.12.008","article-title":"Protein engineering for improving and diversifying natural product biosynthesis","volume":"38","author":"Li","year":"2020","journal-title":"Trends Biotechnol."},{"key":"B33","article-title":"A unified approach to interpreting model predictions","volume":"30","author":"Lundberg","year":"2017","journal-title":"Adv. neural Inf. Process. Syst."},{"key":"B34","doi-asserted-by":"publisher","first-page":"1099","DOI":"10.1038\/s41587-022-01618-2","article-title":"Large language models generate functional protein sequences across diverse families","volume":"26","author":"Madani","year":"2023","journal-title":"Nat. Biotechnol."},{"key":"B35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3546577","article-title":"Post-hoc interpretability for neural nlp: a survey","volume":"55","author":"Madsen","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"B36","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1162\/COLI_a_00239","article-title":"Computational linguistics and deep learning","volume":"41","author":"Manning","year":"2015","journal-title":"Comput. Linguist."},{"key":"B37","doi-asserted-by":"publisher","first-page":"D523","DOI":"10.1093\/nar\/gkac1052","article-title":"UniProt: the universal protein knowledgebase in 2023","volume":"51","author":"Martin","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"B38","article-title":"Efficient estimation of word representations in vector space","author":"Mikolov","year":"2013"},{"key":"B39","doi-asserted-by":"publisher","first-page":"e2215907120","DOI":"10.1073\/pnas.2215907120","article-title":"The debate over understanding in ai\u2019s large language models","volume":"120","author":"Mitchell","year":"2023","journal-title":"Proc. Natl. Acad. Sci."},{"key":"B40","doi-asserted-by":"publisher","DOI":"10.1101\/2022.01.27.478087","article-title":"Design in the dark: learning deep generative models for de novo protein design","author":"Moffat","year":"2022","journal-title":"bioRxiv"},{"key":"B41","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1038\/s41586-023-05881-4","article-title":"Foundation models for generalist medical artificial intelligence","volume":"616","author":"Moor","year":"2023","journal-title":"Nature"},{"key":"B42","doi-asserted-by":"publisher","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: nlp, machine learning & protein sequences","volume":"19","author":"Ofer","year":"2021","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"B43","doi-asserted-by":"publisher","first-page":"e4524","DOI":"10.1002\/pro.4524","article-title":"LambdaPP: Fast and accessible protein-specific phenotype predictions","volume":"32","author":"Olenyi","year":"2023","journal-title":"Protein Sci."},{"key":"B44","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.08774","article-title":"GPT-4 technical Report","year":"2023","journal-title":"arXiv"},{"key":"B45","article-title":"Improving language understanding by generative pre-training","author":"Radford","year":"2018","journal-title":"OpenAI blog"},{"key":"B46","article-title":"Language models are unsupervised multitask learners","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"key":"B47","first-page":"1","article-title":"Evaluating protein transfer learning with tape","volume-title":"Proceedings of the 33rd international conference on neural information processing systems","author":"Rao","year":"2019"},{"key":"B48","first-page":"8844","article-title":"MSA transformer","volume-title":"Proceedings of the 38th international Conference on machine learning","author":"Rao","year":"2021"},{"key":"B49","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1145\/2939672.2939778","article-title":"Why should i trust you? explaining the predictions of any classifier","author":"Ribeiro","year":"2016","journal-title":"Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. data Min."},{"key":"B50","doi-asserted-by":"publisher","first-page":"1527","DOI":"10.1609\/aaai.v32i1.11491","article-title":"Anchors: high-precision model-agnostic explanations","volume":"32","author":"Ribeiro","year":"2018","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"B51","doi-asserted-by":"publisher","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc. Natl. Acad. Sci."},{"key":"B52","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","article-title":"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead","volume":"1","author":"Rudin","year":"2019","journal-title":"Nat. Mach. Intell."},{"key":"B53","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1910.01108","article-title":"DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter","author":"Sanh","year":"2019","journal-title":"arXiv"},{"key":"B54","doi-asserted-by":"publisher","first-page":"3316","DOI":"10.1039\/C9SC05704H","article-title":"Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy","volume":"11","author":"Schwaller","year":"2020","journal-title":"Chem. Sci."},{"key":"B55","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1038\/s42256-020-00284-w","article-title":"Mapping the space of chemical reactions using attention-based neural networks","volume":"3","author":"Schwaller","year":"2021","journal-title":"Nat. Mach. Intell."},{"key":"B56","doi-asserted-by":"publisher","DOI":"10.1101\/2021.12.13.472419","article-title":"Generative language modeling for antibody design","author":"Shuai","year":"2022","journal-title":"bioRxiv"},{"key":"B57","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.09355","article-title":"To compress or not to compress-self-supervised learning and information theory: a review","author":"Shwartz-Ziv","year":"2023","journal-title":"arXiv"},{"key":"B58","first-page":"129","article-title":"Parsing natural scenes and natural language with recursive neural networks","volume-title":"Proc. 28th Int. Conf. Mach. Learn.","author":"Socher","year":"2011"},{"key":"B59","doi-asserted-by":"publisher","first-page":"23705","DOI":"10.1038\/s41598-021-03100-6","article-title":"New explainability method for bert-based model in fake news detection","volume":"11","author":"Szczepa\u0144ski","year":"2021","journal-title":"Sci. Rep."},{"key":"B60","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/j.aiopen.2020.11.001","article-title":"Neural machine translation: a review of methods, resources, and tools","volume":"1","author":"Tan","year":"2020","journal-title":"AI Open"},{"key":"B61","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1038\/s42256-022-00457-9","article-title":"Learning functional properties of proteins with language models","volume":"4","author":"Unsal","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"B62","first-page":"6000","article-title":"Attention is all you need","volume-title":"Proceedings of the 31st international conference on neural information processing systems","author":"Vaswani","year":"2017"},{"key":"B63","doi-asserted-by":"crossref","first-page":"37","DOI":"10.18653\/v1\/P19-3007","article-title":"A multiscale visualization of attention in the transformer model","volume-title":"Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations","author":"Vig","year":"2019"},{"key":"B64","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"B65","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2309.03631","article-title":"Insights into the inner workings of transformer models for protein function prediction","author":"Wenzel","year":"2023","journal-title":"CoRR"},{"key":"B66","first-page":"473","article-title":"Named entity recognition using an hmm-based chunk tagger","volume-title":"Proceedings of the 40th annual meeting on association for computational linguistics","author":"Zhou","year":"2002"},{"key":"B67","doi-asserted-by":"publisher","first-page":"btad046","DOI":"10.1093\/bioinformatics\/btad046","article-title":"Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions","volume":"39","author":"Zhou","year":"2023","journal-title":"Bioinformatics"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2023.1304099\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T11:11:20Z","timestamp":1700737880000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2023.1304099\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,23]]},"references-count":67,"alternative-id":["10.3389\/fbinf.2023.1304099"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2023.1304099","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,23]]},"article-number":"1304099"}}