{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T22:39:38Z","timestamp":1779230378183,"version":"3.51.4"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2025,4,10]],"date-time":"2025-04-10T00:00:00Z","timestamp":1744243200000},"content-version":"vor","delay-in-days":40,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>In the realm of protein design, the efficient construction of protein sequences that accurately fold into predefined structures has become an important area of research. Although advancements have been made in the study of long-chain proteins, the design of short-chain proteins requires equal consideration. The structural information inherent in short and single chains is typically less comprehensive than that of full-length chains, which can negatively impact their performance. To address this challenge, we introduce ScFold, a novel model that incorporates an innovative node module. This module utilizes spatial dimensionality reduction and positional encoding mechanisms to enhance the extraction of structural features. Experimental results indicate that ScFold achieves a recovery rate of 52.22$\\%$ on the CATH4.2 dataset, demonstrating notable efficacy for short-chain proteins, with a recovery rate of 41.6$\\%$. Additionally, ScFold further exhibits enhanced recovery rates of 59.32$\\%$ and 61.59$\\%$ on the TS50 and TS500 datasets, respectively, demonstrating its effectiveness across diverse protein types. Additionally, we performed protein length stratification on the TS500 and CATH4.2 datasets and tested ScFold on length-specific sub-datasets. The results confirm the model\u2019s superiority in handling short-chain proteins. Finally, we selected several protein sequence groups from the CATH4.2 dataset for structural visualization analysis and provided comparisons between the model-generated sequences and the target sequences.<\/jats:p>","DOI":"10.1093\/bib\/bbaf156","type":"journal-article","created":{"date-parts":[[2025,4,10]],"date-time":"2025-04-10T05:09:36Z","timestamp":1744261776000},"source":"Crossref","is-referenced-by-count":2,"title":["ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6334-9304","authenticated-orcid":false,"given":"Jiancheng","family":"Zhong","sequence":"first","affiliation":[{"name":"College of Information Science and Engineering, Hunan Normal University , 36 Lushan Road, Yuelu District, Changsha 410081, Hunan ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiwei","family":"Zou","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Hunan Normal University , 36 Lushan Road, Yuelu District, Changsha 410081, Hunan ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jie","family":"Qiu","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Hunan Normal University , 36 Lushan Road, Yuelu District, Changsha 410081, Hunan ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2741-7914","authenticated-orcid":false,"given":"Shaokai","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Hong Kong University of Science and Technology , Clear Water Bay, Hong Kong SAR ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,4,10]]},"reference":[{"key":"2025041005093006500_ref1","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1007\/978-3-031-20053-3_26","article-title":"Automix: Unveiling the power of mixup for stronger classifiers","volume-title":"European Conference on Computer Vision","author":"Liu","year":"2022"},{"key":"2025041005093006500_ref2","first-page":"18770","article-title":"Temporal attention unit: Towards efficient spatiotemporal predictive learning","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cheng","year":"2023"},{"key":"2025041005093006500_ref3","doi-asserted-by":"publisher","first-page":"451","DOI":"10.1038\/s43588-022-00273-6","article-title":"Rotamer-free protein sequence design based on deep learning and self-consistency","volume":"2","author":"Yufeng Liu","year":"2022","journal-title":"Nat Comput Sci"},{"key":"2025041005093006500_ref4","doi-asserted-by":"publisher","first-page":"2022","DOI":"10.1101\/2022.04.15.488492","article-title":"A deep se (3)-equivariant model for learning inverse protein folding","author":"McPartlon","year":"2022","journal-title":"BioRxiv"},{"key":"2025041005093006500_ref5","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btad122","article-title":"Accurate and efficient protein sequence design through learning concise local environment of residues","volume":"39","author":"Huang","year":"2023","journal-title":"Bioinformatics"},{"key":"2025041005093006500_ref6","doi-asserted-by":"publisher","first-page":"2022","DOI":"10.1101\/2022.08.10.503344","article-title":"Petribert: Augmenting bert with tridimensional encoding for inverse protein folding and design","author":"Dumortier","year":"2022","journal-title":"BioRxiv"},{"key":"2025041005093006500_ref7","doi-asserted-by":"publisher","first-page":"e4554","DOI":"10.1002\/pro.4554","article-title":"Neural network-derived potts models for structure-based protein design using backbone atomic coordinates and tertiary motifs","volume":"32","author":"Li","year":"2023","journal-title":"Protein Sci"},{"key":"2025041005093006500_ref8","doi-asserted-by":"publisher","first-page":"e1009037","DOI":"10.1371\/journal.pcbi.1009037","article-title":"Xenet: Using a new graph convolution to accelerate the timeline for protein design on quantum computers","volume":"17","author":"Maguire","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"2025041005093006500_ref9","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2204.13048","article-title":"Terminator: A neural framework for structure-based protein design using tertiary repeating motifs.","author":"Li","year":"2022"},{"key":"2025041005093006500_ref10","first-page":"32349","article-title":"Importance weighted expectation-maximization for protein sequence design","volume-title":"International Conference on Machine Learning","author":"Song","year":"2023"},{"key":"2025041005093006500_ref11","doi-asserted-by":"publisher","first-page":"1099","DOI":"10.1038\/s41587-022-01618-2","article-title":"Large language models generate functional protein sequences across diverse families","volume":"41","author":"Madani","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2025041005093006500_ref12","first-page":"2022","article-title":"Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models","author":"Watson","year":"2022","journal-title":"BioRxiv"},{"key":"2025041005093006500_ref13","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1016\/j.cbpa.2021.04.004","article-title":"Protein sequence design with deep generative models","volume":"65","author":"Zachary","year":"2021","journal-title":"Curr Opin Chem Biol"},{"key":"2025041005093006500_ref14","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1016\/j.sbi.2021.01.007","article-title":"Deep learning techniques have significantly impacted protein structure prediction and protein design","volume":"68","author":"Pearce","year":"2021","journal-title":"Curr Opin Struct Biol"},{"key":"2025041005093006500_ref15","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1016\/j.cbpa.2021.08.004","article-title":"Structure-based protein design with deep learning","volume":"65","author":"Ovchinnikov","year":"2021","journal-title":"Curr Opin Chem Biol"},{"key":"2025041005093006500_ref16","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbac102","article-title":"Protein design via deep learning","volume":"23","author":"Ding","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025041005093006500_ref17","doi-asserted-by":"publisher","first-page":"100142","DOI":"10.1016\/j.patter.2020.100142","article-title":"Deep learning in protein structural modeling and design","volume":"1","author":"Gao","year":"2020","journal-title":"Patterns"},{"key":"2025041005093006500_ref18","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1016\/j.sbi.2021.11.008","article-title":"Deep generative modeling for protein design","volume":"72","author":"Strokach","year":"2022","journal-title":"Curr Opin Struct Biol"},{"key":"2025041005093006500_ref19","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1016\/j.csbj.2014.09.001","article-title":"3d representations of amino acids\u2013applications to protein sequence comparison and classification","volume":"11","author":"Li","year":"2014","journal-title":"Comput Struct Biotechnol J"},{"key":"2025041005093006500_ref20","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-018-34533-1","article-title":"Design of metalloproteins and novel protein folds using variational autoencoders","volume":"8","author":"Greener","year":"2018","journal-title":"Sci Rep"},{"key":"2025041005093006500_ref21","doi-asserted-by":"publisher","first-page":"5667","DOI":"10.1021\/acs.jcim.0c00593","article-title":"De novo protein design for novel folds using guided conditional wasserstein generative adversarial networks","volume":"60","author":"Karimi","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2025041005093006500_ref22","doi-asserted-by":"publisher","first-page":"547","DOI":"10.1038\/s41586-021-04184-w","article-title":"De novo protein design by deep network hallucination","volume":"600","author":"Anishchenko","year":"2021","journal-title":"Nature"},{"key":"2025041005093006500_ref23","first-page":"1261","article-title":"Fold2seq: A joint sequence (1d)-fold (3d) embedding-based generative model for protein design","volume-title":"International Conference on Machine Learning","author":"Cao","year":"2021"},{"key":"2025041005093006500_ref24","doi-asserted-by":"publisher","first-page":"2565","DOI":"10.1002\/prot.24620","article-title":"Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles","volume":"82","author":"Li","year":"2014","journal-title":"Proteins"},{"key":"2025041005093006500_ref25","doi-asserted-by":"publisher","first-page":"629","DOI":"10.1002\/prot.25489","article-title":"Spin2: Predicting sequence profiles from protein structures using deep neural networks","volume":"86","author":"O\u2019Connell","year":"2018","journal-title":"Proteins"},{"key":"2025041005093006500_ref26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-018-24760-x","article-title":"Computational protein design with deep learning neural networks","volume":"8","author":"Wang","year":"2018","journal-title":"Sci Rep"},{"key":"2025041005093006500_ref27","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1021\/acs.jcim.9b00438","article-title":"To improve protein sequence profile prediction through image captioning on pairwise residue distance map","volume":"60","author":"Chen","year":"2019","journal-title":"J Chem Inf Model"},{"key":"2025041005093006500_ref28","doi-asserted-by":"publisher","first-page":"43a","DOI":"10.1016\/j.bpj.2019.11.419","article-title":"Prodconn-protein design using a convolutional neural network","volume":"118","author":"Zhang","year":"2020","journal-title":"Biophys J"},{"key":"2025041005093006500_ref29","doi-asserted-by":"publisher","first-page":"1245","DOI":"10.1021\/acs.jcim.0c00043","article-title":"Densecpd: Improving the accuracy of neural-network-based computational protein sequence design with densenet","volume":"60","author":"Qi","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2025041005093006500_ref30","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-022-28313-9","article-title":"Protein sequence design with a learned potential","volume":"13","author":"Anand","year":"2022","journal-title":"Nat Commun"},{"key":"2025041005093006500_ref31","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2209.12643","article-title":"Pifold: Toward effective and efficient protein inverse folding","author":"Gao","year":"2022"},{"key":"2025041005093006500_ref32","article-title":"Convolutional neural networks on graphs with fast localized spectral filtering","volume":"29","author":"Defferrard","year":"2016","journal-title":"Advances in neural information processing systems"},{"key":"2025041005093006500_ref33","doi-asserted-by":"publisher","first-page":"10","DOI":"10.48550\/arXiv.1710.10903","article-title":"Graph attention networks","volume":"1050","author":"Velickovic","year":"2017","journal-title":"Stat"},{"key":"2025041005093006500_ref34","article-title":"Generative models for graph-based protein design","volume":"32","author":"Ingraham","year":"2019","journal-title":"Advances in neural information processing systems"},{"key":"2025041005093006500_ref35","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2204.10673","article-title":"Generative de novo protein design with global context.","author":"Cheng","year":"2022"},{"key":"2025041005093006500_ref36","article-title":"Learning from protein structure with geometric vector perceptrons","volume-title":"International Conference on Learning Representations","author":"Jing","year":"2020"},{"key":"2025041005093006500_ref37","first-page":"8946","article-title":"Learning inverse folding from millions of predicted structures","volume-title":"International conference on machine learning","author":"Hsu","year":"2022"},{"key":"2025041005093006500_ref38","doi-asserted-by":"publisher","first-page":"402","DOI":"10.1016\/j.cels.2020.08.016","article-title":"Fast and flexible protein design using deep graph neural networks","volume":"11","author":"Strokach","year":"2020","journal-title":"Cell Syst"},{"key":"2025041005093006500_ref39","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2202.01079","article-title":"Alphadesign: A graph protein design method and benchmark on AlphaFoldDB","volume-title":"Nature Communications","author":"Gao","year":"2023"},{"key":"2025041005093006500_ref40","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1126\/science.add2187","article-title":"Robust deep learning\u2013based protein sequence design using proteinmpnn","volume":"378","author":"Dauparas","year":"2022","journal-title":"Science"},{"key":"2025041005093006500_ref41","first-page":"42317","article-title":"Structure-informed language models are protein designers","volume-title":"International conference on machine learning","author":"Zheng","year":"2023"},{"key":"2025041005093006500_ref42","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1109\/TKDE.2020.2981333","article-title":"Deep learning on graphs: A survey","volume":"34","author":"Zhang","year":"2020","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2025041005093006500_ref43","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/j.aiopen.2021.01.001","article-title":"Graph neural networks: A review of methods and applications","volume":"1","author":"Zhou","year":"2020","journal-title":"AI open"},{"key":"2025041005093006500_ref44","doi-asserted-by":"crossref","first-page":"1873","DOI":"10.1145\/3447548.3467323","article-title":"Geometric graph representation learning on protein structure prediction","volume-title":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","author":"Xia","year":"2021"},{"key":"2025041005093006500_ref45","first-page":"4054","article-title":"Topology optimization based graph convolutional network","volume-title":"IJCAI","author":"Liang","year":"2019"},{"key":"2025041005093006500_ref46","doi-asserted-by":"crossref","DOI":"10.1109\/JBHI.2024.3390092","article-title":"Subgraph-aware graph kernel neural network for link prediction in biological networks","volume":"28","author":"Li","year":"2024","journal-title":"IEEE J Biomed Health Inform"},{"key":"2025041005093006500_ref47","doi-asserted-by":"publisher","first-page":"15966","DOI":"10.1002\/anie.201609977","article-title":"Functional proteins from short peptides: Dayhoff\u2019s hypothesis turns 50","volume":"55","author":"Luisa Romero Romero","year":"2016","journal-title":"Angew Chem Int Ed"},{"key":"2025041005093006500_ref48","first-page":"568","article-title":"Pyramid vision transformer: A versatile backbone for dense prediction without convolutions","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"Wang","year":"2021"},{"key":"2025041005093006500_ref49","first-page":"770","article-title":"Deep residual learning for image recognition","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"He","year":"2016"},{"key":"2025041005093006500_ref50","doi-asserted-by":"publisher","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","article-title":"Cath\u2013a hierarchic classification of protein domain structures","volume":"5","author":"Orengo","year":"1997","journal-title":"Structure"},{"key":"2025041005093006500_ref51","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1609.02907","article-title":"Semi-supervised classification with graph convolutional networks","author":"Kipf","year":"2016"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/2\/bbaf156\/62900891\/bbaf156.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/2\/bbaf156\/62900891\/bbaf156.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,10]],"date-time":"2025-04-10T05:09:41Z","timestamp":1744261781000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf156\/8109671"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3]]},"references-count":51,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf156","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,3]]},"published":{"date-parts":[[2025,3]]},"article-number":"bbaf156"}}