{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T16:49:08Z","timestamp":1781714948086,"version":"3.54.5"},"reference-count":80,"publisher":"National Academy of Sciences","issue":"15","license":[{"start":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T00:00:00Z","timestamp":1617580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1339362"],"award-info":[{"award-number":["1339362"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.pnas.org"],"crossmark-restriction":true},"short-container-title":["Proc. Natl. Acad. Sci. U.S.A."],"published-print":{"date-parts":[[2021,4,13]]},"abstract":"<jats:title>Significance<\/jats:title>\n                  <jats:p>Learning biological properties from sequence data is a logical step toward generative and predictive artificial intelligence for biology. Here, we propose scaling a deep contextual language model with unsupervised learning to sequences spanning evolutionary diversity. We find that without prior knowledge, information emerges in the learned representations on fundamental properties of proteins such as secondary structure, contacts, and biological activity. We show the learned representations are useful across benchmarks for remote homology detection, prediction of secondary structure, long-range residue\u2013residue contacts, and mutational effect. Unsupervised representation learning enables state-of-the-art supervised prediction of mutational effect and secondary structure and improves state-of-the-art features for long-range contact prediction.<\/jats:p>","DOI":"10.1073\/pnas.2016239118","type":"journal-article","created":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T16:37:18Z","timestamp":1617640638000},"update-policy":"https:\/\/doi.org\/10.1073\/pnas.cm10313","source":"Crossref","is-referenced-by-count":2869,"title":["Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences"],"prefix":"10.1073","volume":"118","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2208-0796","authenticated-orcid":false,"given":"Alexander","family":"Rives","sequence":"first","affiliation":[{"name":"Facebook AI Research, New York, NY 10003;"},{"name":"Department of Computer Science, New York University, New York, NY 10012;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Joshua","family":"Meier","sequence":"additional","affiliation":[{"name":"Facebook AI Research, New York, NY 10003;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2947-6064","authenticated-orcid":false,"given":"Tom","family":"Sercu","sequence":"additional","affiliation":[{"name":"Facebook AI Research, New York, NY 10003;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Siddharth","family":"Goyal","sequence":"additional","affiliation":[{"name":"Facebook AI Research, New York, NY 10003;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zeming","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, New York University, New York, NY 10012;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jason","family":"Liu","sequence":"additional","affiliation":[{"name":"Facebook AI Research, New York, NY 10003;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Demi","family":"Guo","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA 02138;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Myle","family":"Ott","sequence":"additional","affiliation":[{"name":"Facebook AI Research, New York, NY 10003;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"C. Lawrence","family":"Zitnick","sequence":"additional","affiliation":[{"name":"Facebook AI Research, New York, NY 10003;"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jerry","family":"Ma","sequence":"additional","affiliation":[{"name":"Booth School of Business, University of Chicago, Chicago, IL 60637;"},{"name":"Yale Law School, New Haven, CT 06511"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rob","family":"Fergus","sequence":"additional","affiliation":[{"name":"Department of Computer Science, New York University, New York, NY 10012;"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"341","published-online":{"date-parts":[[2021,4,5]]},"reference":[{"key":"e_1_3_4_1_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.146.3651.1593"},{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(87)90352-4"},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.1093\/protein\/2.3.193"},{"key":"e_1_3_4_4_2","doi-asserted-by":"publisher","DOI":"10.1002\/prot.340180402"},{"key":"e_1_3_4_5_2","doi-asserted-by":"publisher","DOI":"10.1080\/00437956.1954.11659520"},{"key":"e_1_3_4_6_2","unstructured":"J. Devlin M.-W. Chang K. Lee K. Toutanova BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv [Preprint] (2018). arXiv:1810.04805 (Accessed 6 August 2020)."},{"key":"e_1_3_4_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390177"},{"key":"e_1_3_4_8_2","first-page":"3079","volume-title":"Advances in Neural Information Processing Systems","author":"Dai A. M.","year":"2015","unstructured":"A. M. Dai, Q. V. Le, \u201cSemi-supervised sequence learning\u201d in Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett, Eds. (Curran Associates, Inc., Red Hook, NY, 2015), pp. 3079\u20133087."},{"key":"e_1_3_4_9_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_4_10_2","doi-asserted-by":"crossref","unstructured":"A. Baevski S. Edunov Y. Liu L. Zettlemoyer M. Auli Cloze-driven pretraining of self-attention networks. arXiv [Preprint] (2019). arXiv:1903.07785 (Accessed 6 August 2020).","DOI":"10.18653\/v1\/D19-1539"},{"key":"e_1_3_4_11_2","unstructured":"A. Radford . Language models are unsupervised multitask learners. OpenAI Blog [Preprint] (2019). https:\/\/openai.com\/blog\/better-language-models (Accessed 6 August 2020)."},{"key":"e_1_3_4_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-2836(05)80360-2"},{"key":"e_1_3_4_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0968-0004(98)01298-5"},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/14.9.755"},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.1818"},{"key":"e_1_3_4_16_2","doi-asserted-by":"publisher","DOI":"10.1021\/bi00613a026"},{"key":"e_1_3_4_17_2","doi-asserted-by":"crossref","unstructured":"A. S. Lapedes B. G. Giraud L. Liu G. D. Stormo Correlated mutations in models of protein sequences: Phylogenetic and structural effects. Lecture Notes-Monograph Series 236\u2013256 (1999).","DOI":"10.1214\/lnms\/1215455556"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2007.70225"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0805923106"},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1111471108"},{"key":"e_1_3_4_21_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btr638"},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","DOI":"10.1002\/prot.22934"},{"key":"e_1_3_4_23_2","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.87.012707"},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msv211"},{"key":"e_1_3_4_25_2","doi-asserted-by":"publisher","DOI":"10.1038\/nbt.3769"},{"key":"e_1_3_4_26_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-018-0138-4"},{"key":"e_1_3_4_27_2","first-page":"1137","article-title":"A neural probabilistic language model","volume":"3","author":"Bengio Y.","year":"2003","unstructured":"Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137\u20131155 (2003).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_4_28_2","unstructured":"A. Radford K. Narasimhan T. Salimans I. Sutskever Improving language understanding by generative pre-training. OpenAI Blog [Preprint] (2018). https:\/\/openai.com\/blog\/language-unsupervised (Accessed 6 August 2020)."},{"key":"e_1_3_4_29_2","unstructured":"T. Mikolov K. Chen G. Corrado J. Dean Efficient estimation of word representations in vector space. arXiv [Preprint] (2013). https:\/\/arxiv.org\/abs\/1301.3781 (Accessed 6 August 2020)."},{"key":"e_1_3_4_30_2","unstructured":"T. Mikolov . Subword language modeling with neural networks. The website of T. Mikolov [Preprint] (2012). http:\/\/www.fit.vutbr.cz\/\u223cimikolov\/rnnlm\/char.pdf (Accessed 14 March 2021)."},{"key":"e_1_3_4_31_2","first-page":"2741","volume-title":"Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016","author":"Kim Y.","year":"2016","unstructured":"Y. Kim, Y. Jernite, D. Sontag, A. M. Rush, \u201cCharacter-aware neural language models\u201d in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, D. Schuurmans, M. Wellman, Eds. (AAAI Press, Palo Alto, CA, 2016), pp. 2741\u20132749."},{"key":"e_1_3_4_32_2","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani A.","year":"2017","unstructured":"A. Vaswani ., \u201cAttention is all you need\u201d in Advances in Neural Information Processing Systems, I. Guyon, Ed. . (Curran Associates, Inc., Red Hook, NY, 2017), pp. 5998\u20136008."},{"key":"e_1_3_4_33_2","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkm895"},{"key":"e_1_3_4_34_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu739"},{"key":"e_1_3_4_35_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm098"},{"key":"e_1_3_4_36_2","doi-asserted-by":"publisher","DOI":"10.1038\/srep02919"},{"key":"e_1_3_4_37_2","first-page":"51","article-title":"Evolution of proteins and proteomes: A phylogenetics approach","volume":"1","author":"Gabald\u00f3n T.","year":"2007","unstructured":"T. Gabald\u00f3n, Evolution of proteins and proteomes: A phylogenetics approach. Evol. Bioinform. Online 1, 51\u201361 (2007).","journal-title":"Evol. Bioinform. Online"},{"key":"e_1_3_4_38_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1007010"},{"key":"e_1_3_4_39_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.1059128"},{"key":"e_1_3_4_40_2","doi-asserted-by":"publisher","DOI":"10.1093\/hmg\/ddg359"},{"key":"e_1_3_4_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459469"},{"key":"e_1_3_4_42_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten L.","year":"2008","unstructured":"L. van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579\u20132605 (2008).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_4_43_2","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky1085"},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkt1240"},{"key":"e_1_3_4_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sbi.2011.03.005"},{"key":"e_1_3_4_46_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1003500"},{"key":"e_1_3_4_47_2","unstructured":"J. Johnson M. Douze H. J\u00e9gou Billion-scale similarity search with GPUs. arXiv [Preprint] (2017). arXiv:1702.08734 (Accessed 6 August 2020)."},{"key":"e_1_3_4_48_2","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkt1223"},{"key":"e_1_3_4_49_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0028766"},{"key":"e_1_3_4_50_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1702664114"},{"key":"e_1_3_4_51_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu500"},{"key":"e_1_3_4_52_2","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkw1081"},{"key":"e_1_3_4_53_2","doi-asserted-by":"publisher","DOI":"10.1002\/prot.25674"},{"key":"e_1_3_4_54_2","doi-asserted-by":"crossref","unstructured":"J. Xu Distance-based protein folding powered by deep learning. arXiv [Preprint] (2018). arXiv:1811.03481 (Accessed 6 August 2020).","DOI":"10.1101\/465955"},{"key":"e_1_3_4_55_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bty341"},{"key":"e_1_3_4_56_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-019-1923-7"},{"key":"e_1_3_4_57_2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4"},{"key":"e_1_3_4_58_2","doi-asserted-by":"publisher","DOI":"10.1002\/prot.25823"},{"key":"e_1_3_4_59_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1005324"},{"key":"e_1_3_4_60_2","doi-asserted-by":"publisher","DOI":"10.1002\/prot.25064"},{"key":"e_1_3_4_61_2","doi-asserted-by":"publisher","DOI":"10.1002\/prot.25415"},{"key":"e_1_3_4_62_2","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.3027"},{"key":"e_1_3_4_63_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-019-0496-6"},{"key":"e_1_3_4_64_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cels.2017.11.003"},{"key":"e_1_3_4_65_2","doi-asserted-by":"crossref","unstructured":"A. Rives . Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv [Preprint] (2019). https:\/\/doi.org\/10.1101\/622803 (Accessed 6 August 2020).","DOI":"10.1101\/622803"},{"key":"e_1_3_4_66_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-019-0598-1"},{"key":"e_1_3_4_67_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-019-3220-8"},{"key":"e_1_3_4_68_2","doi-asserted-by":"publisher","DOI":"10.1101\/676825"},{"key":"e_1_3_4_69_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bty178"},{"key":"e_1_3_4_70_2","unstructured":"T. Bepler B. Berger \u201cLearning protein sequence embeddings using information from structure\u201d in International Conference on Learning Representations (OpenReview.net 2019)."},{"key":"e_1_3_4_71_2","doi-asserted-by":"crossref","unstructured":"A. J. Riesselman . Accelerating protein design using autoregressive generative models. bioRxiv [Preprint] (2019). https:\/\/doi.org\/10.1101\/757252 (Accessed 6 August 2020).","DOI":"10.1101\/757252"},{"key":"e_1_3_4_72_2","doi-asserted-by":"crossref","unstructured":"A. Madani . ProGen: Language modeling for protein generation. arXiv [Preprint] (2020). arXiv:2004.03497 (Accessed 6 August 2020).","DOI":"10.1101\/2020.03.07.982272"},{"key":"e_1_3_4_73_2","doi-asserted-by":"crossref","unstructured":"J. Vig . BERTology meets biology: Interpreting attention in protein language models. arXiv [Preprint] (2020). arXiv:2006.15222 (Accessed 6 August 2020).","DOI":"10.1101\/2020.06.26.174417"},{"key":"e_1_3_4_74_2","doi-asserted-by":"crossref","unstructured":"A. Elnaggar M. Heinzinger C. Dallago B. Rost End-to-end multitask learning from protein language to protein features without alignments. bioRxiv [Preprint] (2019). https:\/\/doi.org\/10.1101\/864405 (Accessed 6 August 2020).","DOI":"10.1101\/864405"},{"key":"e_1_3_4_75_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btaa003"},{"key":"e_1_3_4_76_2","doi-asserted-by":"crossref","unstructured":"D. Repecka . Expanding functional protein sequence space using generative adversarial networks. bioRxiv [Preprint] (2019). https:\/\/doi.org\/10.1101\/789719 (Accessed 6 August 2020).","DOI":"10.1101\/789719"},{"key":"e_1_3_4_77_2","doi-asserted-by":"crossref","unstructured":"A. Hawkins-Hooker . Generating functional protein variants with variational autoencoders. bioRxiv [Preprint] (2019). https:\/\/doi.org\/10.1101\/2020.04.07.029264 (Accessed 6 August 2020).","DOI":"10.1101\/2020.04.07.029264"},{"key":"e_1_3_4_78_2","doi-asserted-by":"crossref","unstructured":"T. Amimeur . Designing feature-controlled humanoid antibody discovery libraries using generative adversarial networks. bioRxiv [Preprint] (2019). https:\/\/doi.org\/10.1101\/2020.04.12.024844 (Accessed 6 August 2020).","DOI":"10.1101\/2020.04.12.024844"},{"key":"e_1_3_4_79_2","unstructured":"A. Wang K. Cho BERT has a mouth and it must speak: BERT as a markov random field language model. arXiv [Preprint] (2019). arXiv:1902.04094 (Accessed 6 August 2020)."},{"key":"e_1_3_4_80_2","doi-asserted-by":"crossref","unstructured":"Y. Luo . Evolutionary context-integrated deep sequence modeling for protein engineering. bioRxiv [Preprint] (2020). https:\/\/doi.org\/10.1101\/2020.01.16.908509 (Accessed 6 August 2020).","DOI":"10.1101\/2020.01.16.908509"}],"container-title":["Proceedings of the National Academy of Sciences"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.pnas.org\/syndication\/doi\/10.1073\/pnas.2016239118","content-type":"unspecified","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/pnas.org\/doi\/pdf\/10.1073\/pnas.2016239118","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,13]],"date-time":"2022-04-13T07:18:03Z","timestamp":1649834283000},"score":1,"resource":{"primary":{"URL":"https:\/\/pnas.org\/doi\/full\/10.1073\/pnas.2016239118"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,5]]},"references-count":80,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2021,4,13]]}},"alternative-id":["10.1073\/pnas.2016239118"],"URL":"https:\/\/doi.org\/10.1073\/pnas.2016239118","relation":{"has-review":[{"id-type":"doi","id":"10.3410\/f.739876259.793593441","asserted-by":"object"},{"id-type":"doi","id":"10.3410\/f.739876259.793585293","asserted-by":"object"}],"has-preprint":[{"id-type":"doi","id":"10.1101\/622803","asserted-by":"object"}]},"ISSN":["0027-8424","1091-6490"],"issn-type":[{"value":"0027-8424","type":"print"},{"value":"1091-6490","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,5]]},"assertion":[{"value":"2021-04-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e2016239118"}}