{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:43Z","timestamp":1772138023281,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,4,11]],"date-time":"2024-04-11T00:00:00Z","timestamp":1712793600000},"content-version":"vor","delay-in-days":15,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100018537","name":"National Science and Technology Major Project","doi-asserted-by":"publisher","award":["2022ZD0115103"],"award-info":[{"award-number":["2022ZD0115103"]}],"id":[{"id":"10.13039\/501100018537","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Nature Science Foundation of China","doi-asserted-by":"publisher","award":["62173304"],"award-info":[{"award-number":["62173304"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Key Project of Zhejiang Provincial Natural Science Foundation of China","award":["LZ20F030002"],"award-info":[{"award-number":["LZ20F030002"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,27]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.<\/jats:p>","DOI":"10.1093\/bib\/bbae146","type":"journal-article","created":{"date-parts":[[2024,3,20]],"date-time":"2024-03-20T05:38:18Z","timestamp":1710913098000},"source":"Crossref","is-referenced-by-count":5,"title":["SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition"],"prefix":"10.1093","volume":"25","author":[{"given":"Hui","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dong","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kailong","family":"Zhao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yajun","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7815-5884","authenticated-orcid":false,"given":"Guijun","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,4,9]]},"reference":[{"key":"2024041103174848300_ref1","doi-asserted-by":"crossref","DOI":"10.3390\/ijms222111741","article-title":"Protein design with deep learning","volume":"22","author":"Defresne","year":"2021","journal-title":"Int J Mol Sci"},{"key":"2024041103174848300_ref2","doi-asserted-by":"crossref","first-page":"9265","DOI":"10.1021\/acscatal.9b02509","article-title":"Minimalist de novo Design of Protein Catalysts","volume":"9","author":"Marshall","year":"2019","journal-title":"ACS Catal"},{"key":"2024041103174848300_ref3","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1093\/bib\/bbac102","article-title":"Protein design via deep learning","volume":"23","author":"Ding","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024041103174848300_ref4","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.cbpa.2021.08.004","article-title":"Structure-based protein design with deep learning","volume":"65","author":"Ovchinnikov","year":"2021","journal-title":"Curr Opin Chem Biol"},{"key":"2024041103174848300_ref5","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1016\/j.sbi.2021.01.007","article-title":"Deep learning techniques have significantly impacted protein structure prediction and protein design","volume":"68","author":"Pearce","year":"2021","journal-title":"Curr Opin Struct Biol"},{"key":"2024041103174848300_ref6","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2024041103174848300_ref7","doi-asserted-by":"crossref","first-page":"7363","DOI":"10.1021\/acs.jcim.3c01527","article-title":"A multimodal deep learning framework for predicting PPI-modulator interactions","volume":"63","author":"Sun","year":"2023","journal-title":"J Chem Inf Model"},{"key":"2024041103174848300_ref8","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad718","article-title":"DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model","volume":"39","author":"Fang","year":"2023","journal-title":"Bioinformatics"},{"key":"2024041103174848300_ref9","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1126\/science.add2187","article-title":"Robust deep learning-based protein sequence design using ProteinMPNN","volume":"378","author":"Dauparas","year":"2022","journal-title":"Science"},{"key":"2024041103174848300_ref10","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1021\/acs.jcim.9b00438","article-title":"To improve protein sequence profile prediction through image captioning on pairwise residue distance map","volume":"60","author":"Chen","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2024041103174848300_ref11","article-title":"Accurate and efficient protein sequence design through learning concise local environment of residues","volume":"39","author":"Huang","year":"2022","journal-title":"Bioinformatics"},{"key":"2024041103174848300_ref12","doi-asserted-by":"crossref","first-page":"5667","DOI":"10.1021\/acs.jcim.0c00593","article-title":"De novo protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks","volume":"60","author":"Karimi","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2024041103174848300_ref13","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1016\/j.sbi.2021.11.008","article-title":"Deep generative modeling for protein design","volume":"72","author":"Strokach","year":"2022","journal-title":"Curr Opin Struct Biol"},{"key":"2024041103174848300_ref14","article-title":"Generative De novo protein design with global context","volume":"1","author":"Tan","journal-title":"IEEE International Conference on Acoustics, Speech and Signal Processing"},{"key":"2024041103174848300_ref15","doi-asserted-by":"crossref","first-page":"6349","DOI":"10.1038\/s41598-018-24760-x","article-title":"Computational protein design with deep learning neural networks","volume":"8","author":"Wang","year":"2018","journal-title":"Sci Rep"},{"key":"2024041103174848300_ref16","doi-asserted-by":"crossref","first-page":"2565","DOI":"10.1002\/prot.24620","article-title":"Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles","volume":"82","author":"Li","year":"2014","journal-title":"Proteins"},{"key":"2024041103174848300_ref17","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1002\/prot.25489","article-title":"SPIN2: predicting sequence profiles from protein structures using deep neural networks","volume":"86","author":"O'Connell","year":"2018","journal-title":"Proteins"},{"key":"2024041103174848300_ref18","doi-asserted-by":"crossref","first-page":"1245","DOI":"10.1021\/acs.jcim.0c00043","article-title":"DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet","volume":"60","author":"Qi","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2024041103174848300_ref19","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1002\/prot.25868","article-title":"ProDCoNN: protein design using a convolutional neural network","volume":"88","author":"Zhang","year":"2020","journal-title":"Proteins"},{"key":"2024041103174848300_ref20","first-page":"abs\/2209.12643","article-title":"PiFold: toward effective and efficient protein inverse folding","author":"Gao","year":"2022","journal-title":"ArXiv"},{"key":"2024041103174848300_ref21","first-page":"abs\/2201.08821","article-title":"Representing long-range context for graph neural networks with global attention","volume-title":"ArXiv","author":"Wu","year":"2022"},{"key":"2024041103174848300_ref22","article-title":"Learning from protein structure with geometric vector Perceptrons","author":"Jing","year":"2020","journal-title":"ArXiv"},{"key":"2024041103174848300_ref23","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1011330","article-title":"SPIN-CGNN: improved fixed backbone protein design with contact map-based graph construction and contact graph neural network","volume":"19","author":"Zhang","year":"2023","journal-title":"PLoS Comput Biol"},{"key":"2024041103174848300_ref24","doi-asserted-by":"crossref","first-page":"2102","DOI":"10.1093\/bioinformatics\/btac020","article-title":"ProteinBERT: a universal deep-learning model of protein sequence and function","volume":"38","author":"Brandes","year":"2022","journal-title":"Bioinformatics"},{"key":"2024041103174848300_ref25","doi-asserted-by":"crossref","DOI":"10.1101\/2022.04.10.487779","article-title":"Learning inverse folding from millions of predicted structures","volume-title":"International Conference on MachineLearning","author":"Hsu"},{"key":"2024041103174848300_ref26","doi-asserted-by":"crossref","DOI":"10.1101\/2023.02.03.526917","article-title":"Structure-informed language models are protein designers","volume-title":"International Conference on MachineLearning","author":"Zheng"},{"key":"2024041103174848300_ref27","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1126\/science.abj8754","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","volume":"373","author":"Baek","year":"2021","journal-title":"Science"},{"key":"2024041103174848300_ref28","doi-asserted-by":"crossref","first-page":"1617","DOI":"10.1038\/s41587-022-01432-w","article-title":"Single-sequence protein structure prediction using a language model and deep learning","volume":"40","author":"Chowdhury","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2024041103174848300_ref29","article-title":"Language models of protein sequences at the scale of evolution enable accurate structure prediction","author":"Zeming","year":"2022","journal-title":"bioRxiv"},{"key":"2024041103174848300_ref30","doi-asserted-by":"crossref","first-page":"4350","DOI":"10.1093\/bioinformatics\/btab484","article-title":"MMpred: a distance-assisted multimodal conformation sampling for de novo protein structure prediction","volume":"37","author":"Zhao","year":"2021","journal-title":"Bioinformatics"},{"key":"2024041103174848300_ref31","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1038\/s42003-023-04605-8","article-title":"Protein structure and folding pathway prediction based on remote homologs recognition using PAthreader","volume":"6","author":"Zhao","year":"2023","journal-title":"Commun Biol"},{"key":"2024041103174848300_ref32","doi-asserted-by":"crossref","first-page":"1895","DOI":"10.1093\/bioinformatics\/btac056","article-title":"DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning","volume":"38","author":"Guo","year":"2022","journal-title":"Bioinformatics"},{"key":"2024041103174848300_ref33","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad027","article-title":"PDBench: evaluating computational methods for protein-sequence design","volume":"39","author":"Castorina","year":"2023","journal-title":"Bioinformatics"},{"key":"2024041103174848300_ref34","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1007\/978-1-4939-7000-1_26","article-title":"Protein data Bank (PDB): the single global macromolecular structure archive","volume":"1607","author":"Burley","year":"2017","journal-title":"Methods Mol Biol"},{"key":"2024041103174848300_ref35","doi-asserted-by":"crossref","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","article-title":"AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024041103174848300_ref36","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2024041103174848300_ref37","volume-title":"International Conference on Machine Learning","author":"Gilmer","year":"2017"},{"key":"2024041103174848300_ref38","volume-title":"Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2024041103174848300_ref39","doi-asserted-by":"crossref","first-page":"D266","DOI":"10.1093\/nar\/gkaa1079","article-title":"CATH: increased structural coverage of functional space","volume":"49","author":"Sillitoe","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024041103174848300_ref40","first-page":"abs\/2302.02277","article-title":"SE(3) diffusion model with application to protein backbone generation","author":"Yim","year":"2023","journal-title":"ArXiv"},{"key":"2024041103174848300_ref41","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1006\/jmbi.2000.4427","article-title":"Crystal structure of unligated guanylate kinase from yeast reveals GMP-induced conformational changes","volume":"307","author":"Blaszczyk","year":"2001","journal-title":"J Mol Biol"},{"key":"2024041103174848300_ref42","article-title":"High-level expression of Palmitoylated MPP1 recombinant protein in mammalian cells","volume":"11","author":"Chytla","year":"2021","journal-title":"Membranes (Basel)"},{"key":"2024041103174848300_ref43","doi-asserted-by":"crossref","first-page":"4159","DOI":"10.1074\/jbc.M110792200","article-title":"Structural basis for nucleotide-dependent regulation of membrane-associated guanylate kinase-like domains","volume":"277","author":"Li","year":"2002","journal-title":"J Biol Chem"},{"key":"2024041103174848300_ref44","doi-asserted-by":"crossref","first-page":"4986","DOI":"10.1038\/emboj.2011.428","article-title":"Guanylate kinase domains of the MAGUK family scaffold proteins as specific phospho-protein-binding modules","volume":"30","author":"Zhu","year":"2011","journal-title":"EMBO J"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/3\/bbae146\/57188966\/bbae146.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/3\/bbae146\/57188966\/bbae146.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T23:18:45Z","timestamp":1712791125000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae146\/7642672"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,27]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae146","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.12.14.571651","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,5]]},"published":{"date-parts":[[2024,3,27]]},"article-number":"bbae146"}}