{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T15:18:59Z","timestamp":1778339939328,"version":"3.51.4"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T00:00:00Z","timestamp":1696809600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T00:00:00Z","timestamp":1696809600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Mach Intell"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Protein structure prediction pipelines based on artificial intelligence, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on multiple sequence alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time consuming, usually taking tens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary structures of proteins. Our proposed method, HelixFold-Single, combines a large-scale protein language model with the superior geometric learning capability of AlphaFold2. HelixFold-Single first pre-trains a large-scale protein language model with thousands of millions of primary structures utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs for learning the co-evolution information. Then, by combining the pre-trained protein language model and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the three-dimensional coordinates of atoms from only the primary structure. HelixFold-Single is validated on datasets CASP14 and CAMEO, achieving competitive accuracy with the MSA-based methods on targets with large homologous families. Furthermore, HelixFold-Single consumes much less time than the mainstream pipelines for protein structure prediction, demonstrating its potential in tasks requiring many predictions.<\/jats:p>","DOI":"10.1038\/s42256-023-00721-6","type":"journal-article","created":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T12:01:56Z","timestamp":1696852916000},"page":"1087-1096","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":89,"title":["A method for multiple-sequence-alignment-free protein structure prediction using a protein language model"],"prefix":"10.1038","volume":"5","author":[{"given":"Xiaomin","family":"Fang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5373-4302","authenticated-orcid":false,"given":"Fan","family":"Wang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0003-5815-1047","authenticated-orcid":false,"given":"Lihang","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Jingzhou","family":"He","sequence":"additional","affiliation":[]},{"given":"Dayong","family":"Lin","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4505-7735","authenticated-orcid":false,"given":"Yingfei","family":"Xiang","sequence":"additional","affiliation":[]},{"given":"Kunrui","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Xiaonan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Hua","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Hui","family":"Li","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9655-2787","authenticated-orcid":false,"given":"Le","family":"Song","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,10,9]]},"reference":[{"key":"721_CR1","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","volume":"596","author":"J Jumper","year":"2021","unstructured":"Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583\u2013589 (2021).","journal-title":"Nature"},{"key":"721_CR2","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1016\/j.sbi.2005.05.011","volume":"15","author":"J Moult","year":"2005","unstructured":"Moult, J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15, 285\u2013289 (2005).","journal-title":"Curr. Opin. Struct. Biol."},{"key":"721_CR3","doi-asserted-by":"publisher","unstructured":"Petroni, F. et al. Language models as knowledge bases? In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) https:\/\/doi.org\/10.18653\/v1\/D19-1250 (ACL, 2019).","DOI":"10.18653\/v1\/D19-1250"},{"key":"721_CR4","unstructured":"Vaswani, A. et al. Attention is all you need. In NIPS'17: Proc. 31st International Conference on Neural Information Processing Systems Vol. 30 (eds von Luxburg, U. et al.) 6000\u20136010 (Curran, 2017)."},{"key":"721_CR5","unstructured":"Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) 4171\u20134186 (Association for Computational Linguistics, 2019)."},{"key":"721_CR6","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877\u20131901 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"721_CR7","unstructured":"Rao, R. et al. Evaluating protein transfer learning with TAPE. In NIPS'19: Proc. 33rd International Conference on Neural Information Processing Systems Vol. 32 (eds Wallach, H. M. et al.) 9689\u20139701 (2019)."},{"key":"721_CR8","doi-asserted-by":"publisher","unstructured":"Elnaggar, A. et al. ProtTrans: towards cracking the language of life\u2019s code through self-supervised deep learning and high performance computing. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2007.06225 (2021).","DOI":"10.48550\/arXiv.2007.06225"},{"key":"721_CR9","doi-asserted-by":"crossref","unstructured":"Rao, R., Meier, J., Sercu, T., Ovchinnikov, S. & Rives, A. Transformer protein language models are unsupervised structure learners. In 9th International Conference on Learning Representations (ICLR, 2021).","DOI":"10.1101\/2020.12.15.422761"},{"key":"721_CR10","doi-asserted-by":"publisher","unstructured":"Xiao, Y., Qiu, J., Li, Z., Hsieh, C.-Y. & Tang, J. Modeling protein using large-scale pretrain language model. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2108.07435 (2021).","DOI":"10.48550\/arXiv.2108.07435"},{"key":"721_CR11","doi-asserted-by":"publisher","unstructured":"Chowdhury, R. et al. Single-sequence protein structure prediction using language models from deep learning. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/2021.08.02.454840 (2021).","DOI":"10.1101\/2021.08.02.454840"},{"key":"721_CR12","doi-asserted-by":"crossref","unstructured":"Wei\u00dfenow, K., Heinzinger, M. & Rost, B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure 30, 1169\u20131177.E4 (2022).","DOI":"10.1016\/j.str.2022.05.001"},{"key":"721_CR13","doi-asserted-by":"crossref","unstructured":"Wang, W., Peng, Z. & Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2, 804\u2013814 (2022).","DOI":"10.1038\/s43588-022-00373-3"},{"key":"721_CR14","doi-asserted-by":"publisher","first-page":"1618","DOI":"10.1002\/prot.26202","volume":"89","author":"LN Kinch","year":"2021","unstructured":"Kinch, L. N., Schaeffer, R. D., Kryshtafovych, A. & Grishin, N. V. Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14). Proteins 89, 1618\u20131632 (2021).","journal-title":"Proteins"},{"key":"721_CR15","doi-asserted-by":"publisher","first-page":"1607","DOI":"10.1002\/prot.26237","volume":"89","author":"A Kryshtafovych","year":"2021","unstructured":"Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)\u2013Round XIV. Proteins 89, 1607\u20131617 (2021).","journal-title":"Proteins"},{"key":"721_CR16","doi-asserted-by":"publisher","first-page":"1977","DOI":"10.1002\/prot.26213","volume":"89","author":"X Robin","year":"2021","unstructured":"Robin, X. et al. Continuous Automated Model EvaluatiOn (CAMEO)\u2014perspectives on the future of fully automated evaluation of structure prediction methods. Proteins 89, 1977\u20131986 (2021).","journal-title":"Proteins"},{"key":"721_CR17","doi-asserted-by":"publisher","first-page":"871","DOI":"10.1126\/science.abj8754","volume":"373","author":"M Baek","year":"2021","unstructured":"Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871\u2013876 (2021).","journal-title":"Science"},{"key":"721_CR18","doi-asserted-by":"publisher","first-page":"702","DOI":"10.1002\/prot.20264","volume":"57","author":"Y Zhang","year":"2004","unstructured":"Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702\u2013710 (2004).","journal-title":"Proteins"},{"key":"721_CR19","first-page":"31","volume":"18","author":"PF Brown","year":"1992","unstructured":"Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Lai, J. C. & Mercer, R. L. An estimate of an upper bound for the entropy of English. Comput. Linguist. 18, 31\u201340 (1992).","journal-title":"Comput. Linguist."},{"key":"721_CR20","doi-asserted-by":"crossref","unstructured":"Rao, R. M. et al. MSA Transformer. Proc. Mach. Learning Res. 139, 8844\u20138856 (2021).","DOI":"10.1101\/2021.02.12.430858"},{"key":"721_CR21","unstructured":"Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training OpenAI (2018); https:\/\/openai.com\/research\/language-unsupervised"},{"key":"721_CR22","doi-asserted-by":"publisher","first-page":"1496","DOI":"10.1073\/pnas.1914677117","volume":"117","author":"J Yang","year":"2020","unstructured":"Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496\u20131503 (2020).","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"721_CR23","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1038\/nmeth.3213","volume":"12","author":"J Yang","year":"2015","unstructured":"Yang, J. et al. The I-TASSER Suite: protein structure and function prediction. Nat. Methods 12, 7\u20138 (2015).","journal-title":"Nat. Methods"},{"key":"721_CR24","doi-asserted-by":"publisher","first-page":"5634","DOI":"10.1038\/s41596-021-00628-9","volume":"16","author":"Z Du","year":"2021","unstructured":"Du, Z. et al. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 16, 5634\u20135651 (2021).","journal-title":"Nat. Protoc."},{"key":"721_CR25","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1002\/prot.23175","volume":"79","author":"J Peng","year":"2011","unstructured":"Peng, J. & Xu, J. RaptorX: exploiting structure information for protein alignment by statistical inference. Proteins 79, 161\u2013171 (2011).","journal-title":"Proteins"},{"key":"721_CR26","doi-asserted-by":"publisher","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","volume":"118","author":"A Rives","year":"2021","unstructured":"Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"721_CR27","unstructured":"He, P., Liu, X., Gao, J. & Chen, W. DeBERTa: decoding-enhanced BERT with disentangled attention. In 9th International Conference on Learning Representations (ICLR, 2021)."},{"key":"721_CR28","doi-asserted-by":"publisher","first-page":"D170","DOI":"10.1093\/nar\/gkw1081","volume":"45","author":"M Mirdita","year":"2017","unstructured":"Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170\u2013D176 (2017).","journal-title":"Nucleic Acids Res."},{"key":"721_CR29","doi-asserted-by":"publisher","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","volume":"31","author":"BE Suzek","year":"2014","unstructured":"Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926\u2013932 (2014).","journal-title":"Bioinformatics"},{"key":"721_CR30","unstructured":"The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523\u2013D531 (2023)."},{"key":"721_CR31","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/nar\/28.1.235","volume":"28","author":"HM Berman","year":"2000","unstructured":"Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235\u2013242 (2000).","journal-title":"Nucleic Acids Res."},{"key":"721_CR32","doi-asserted-by":"publisher","first-page":"D437","DOI":"10.1093\/nar\/gkaa1038","volume":"49","author":"SK Burley","year":"2020","unstructured":"Burley, S. K. et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 49, D437\u2013D451 (2020).","journal-title":"Nucleic Acids Res."},{"key":"721_CR33","doi-asserted-by":"publisher","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","volume":"50","author":"M Varadi","year":"2021","unstructured":"Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439\u2013D444 (2021).","journal-title":"Nucleic Acids Res."},{"key":"721_CR34","doi-asserted-by":"publisher","unstructured":"xiaoyao4573 et al. Paddlepaddle\/paddlehelix: v1.2.2. Zenodo https:\/\/doi.org\/10.5281\/zenodo.8202943 (2023).","DOI":"10.5281\/zenodo.8202943"}],"container-title":["Nature Machine Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00721-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00721-6","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00721-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,18]],"date-time":"2023-10-18T19:03:00Z","timestamp":1697655780000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00721-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,9]]},"references-count":34,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["721"],"URL":"https:\/\/doi.org\/10.1038\/s42256-023-00721-6","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-1969991\/v1","asserted-by":"object"}]},"ISSN":["2522-5839"],"issn-type":[{"value":"2522-5839","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,9]]},"assertion":[{"value":"17 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 August 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}