{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T03:10:22Z","timestamp":1769829022176,"version":"3.49.0"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T00:00:00Z","timestamp":1751414400000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFC3400300"],"award-info":[{"award-number":["2022YFC3400300"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62472445"],"award-info":[{"award-number":["62472445"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004735","name":"Hunan Provincial Natural Science Foundation of China","doi-asserted-by":"crossref","award":["2023JJ40763"],"award-info":[{"award-number":["2023JJ40763"]}],"id":[{"id":"10.13039\/501100004735","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Accurate prediction of single-guide RNA (sgRNA) activity is crucial for optimizing the CRISPR\/Cas9 gene-editing system, as it directly influences the efficiency and accuracy of genome modifications. However, existing prediction methods mainly rely on large-scale experimental data of a single Cas9 variant to construct Cas9 protein (variants)-specific sgRNA activity prediction models, which limits their generalization ability and prediction performance across different Cas9 protein (variants), as well as their scalability to the continuously discovered new variants.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this study, we proposed PLM-CRISPR, a novel deep learning-based model that leverages protein language models to capture Cas9 protein (variants) representations for cross-variant sgRNA activity prediction. PLM-CRISPR uses tailored feature extraction modules for both sgRNA and protein sequences, incorporating a cross-variant training strategy and a dynamic feature fusion mechanism to effectively model their interactions. Extensive experiments demonstrate that PLM-CRISPR outperforms existing methods across datasets spanning seven Cas9 protein (variants) in three real-world scenarios, demonstrating its superior performance in handling data-scarce situations, including cases with few or no samples for novel variants. Comparative analyses with traditional machine learning and deep learning models further confirm the effectiveness of PLM-CRISPR. Additionally, motif analysis reveals that PLM-CRISPR accurately identifies high-activity sgRNA sequence patterns across diverse Cas9 protein (variants). Overall, PLM-CRISPR provides a robust, scalable, and generalizable solution for sgRNA activity prediction across diverse Cas9 protein (variants).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code can be obtained from https:\/\/github.com\/CSUBioGroup\/PLM-CRISPR.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf385","type":"journal-article","created":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T07:39:25Z","timestamp":1751355565000},"source":"Crossref","is-referenced-by-count":5,"title":["Leveraging protein language models for cross-variant CRISPR\/Cas9 sgRNA activity prediction"],"prefix":"10.1093","volume":"41","author":[{"given":"Yalin","family":"Hou","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083,","place":["China"]}]},{"given":"Yiming","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6372-6798","authenticated-orcid":false,"given":"Ruiqing","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083,","place":["China"]}]},{"given":"Fuhao","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Information Engineering, Northwest A&F University , Yangling, Shaanxi 712100,","place":["China"]}]},{"given":"Fei","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0188-1394","authenticated-orcid":false,"given":"Min","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1726-0955","authenticated-orcid":false,"given":"Min","family":"Zeng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,7,2]]},"reference":[{"key":"2025071200573033900_btaf385-B1","doi-asserted-by":"crossref","first-page":"1257","DOI":"10.1002\/cac2.12366","article-title":"Current updates of CRISPR\/Cas9-mediated genome editing and targeting within tumor cells: an innovative strategy of cancer management","volume":"42","author":"Allemailem","year":"2022","journal-title":"Cancer Commun"},{"key":"2025071200573033900_btaf385-B2","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1038\/s41586-019-1711-4","article-title":"Search-and-replace genome editing without double-strand breaks or donor DNA","volume":"576","author":"Anzalone","year":"2019","journal-title":"Nature"},{"key":"2025071200573033900_btaf385-B3","doi-asserted-by":"crossref","first-page":"1473","DOI":"10.1093\/bioinformatics\/btu048","article-title":"Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases","volume":"30","author":"Bae","year":"2014","journal-title":"Bioinformatics"},{"key":"2025071200573033900_btaf385-B4","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/nbt.4066","article-title":"A highly specific SpCas9 variant is identified by in vivo screening in yeast","volume":"36","author":"Casini","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2025071200573033900_btaf385-B5","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1038\/nature24268","article-title":"Enhanced proofreading governs CRISPR\u2013Cas9 targeting accuracy","volume":"550","author":"Chen","year":"2017","journal-title":"Nature"},{"key":"2025071200573033900_btaf385-B6","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1186\/s13059-018-1459-4","article-title":"DeepCRISPR: optimized CRISPR guide RNA design by deep learning","volume":"19","author":"Chuai","year":"2018","journal-title":"Genome Biol"},{"key":"2025071200573033900_btaf385-B7","doi-asserted-by":"crossref","first-page":"D190","DOI":"10.1093\/nar\/gkm895","article-title":"The universal protein resource (UniProt)","volume":"36","author":"Consortium U","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2025071200573033900_btaf385-B8","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1038\/nbt.3437","article-title":"Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR\u2013Cas9","volume":"34","author":"Doench","year":"2016","journal-title":"Nat Biotechnol"},{"key":"2025071200573033900_btaf385-B9","first-page":"57350","article-title":"Enhanced genome editing with Cas9 ribonucleoprotein in diverse cells and organisms","volume":"135","author":"Farboud","year":"2018","journal-title":"J Vis Exp"},{"key":"2025071200573033900_btaf385-B10","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1038\/nature24644","article-title":"Programmable base editing of a\u2022 T to G\u2022 C in genomic DNA without DNA cleavage","volume":"551","author":"Gaudelli","year":"2017","journal-title":"Nature"},{"key":"2025071200573033900_btaf385-B11","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1038\/nmeth.2812","article-title":"E-CRISP: fast CRISPR target site identification","volume":"11","author":"Heigwer","year":"2014","journal-title":"Nat Methods"},{"key":"2025071200573033900_btaf385-B12","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature26155","article-title":"Evolved Cas9 variants with broad PAM compatibility and high DNA specificity","volume":"556","author":"Hu","year":"2018","journal-title":"Nature"},{"key":"2025071200573033900_btaf385-B13","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1126\/science.1225829","article-title":"A programmable dual-RNA\u2013guided DNA endonuclease in adaptive bacterial immunity","volume":"337","author":"Jinek","year":"2012","journal-title":"Science"},{"key":"2025071200573033900_btaf385-B14","doi-asserted-by":"crossref","first-page":"D1152","DOI":"10.1093\/nar\/gku893","article-title":"The Addgene repository: an international nonprofit plasmid and data resource","volume":"43","author":"Kamens","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2025071200573033900_btaf385-B15","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1038\/nrg3686","article-title":"A guide to genome engineering with programmable nucleases","volume":"15","author":"Kim","year":"2014","journal-title":"Nat Rev Genet"},{"key":"2025071200573033900_btaf385-B16","doi-asserted-by":"crossref","first-page":"eaax9249","DOI":"10.1126\/sciadv.aax9249","article-title":"SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance","volume":"5","author":"Kim","year":"2019","journal-title":"Sci Adv"},{"key":"2025071200573033900_btaf385-B17","doi-asserted-by":"crossref","first-page":"1328","DOI":"10.1038\/s41587-020-0537-9","article-title":"Prediction of the sequence-specific cleavage activity of Cas9 variants","volume":"38","author":"Kim","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2025071200573033900_btaf385-B18","doi-asserted-by":"crossref","first-page":"490","DOI":"10.1038\/nature16526","article-title":"High-fidelity CRISPR\u2013Cas9 nucleases with no detectable genome-wide off-target effects","volume":"529","author":"Kleinstiver","year":"2016","journal-title":"Nature"},{"key":"2025071200573033900_btaf385-B19","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1038\/nature17946","article-title":"Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage","volume":"533","author":"Komor","year":"2016","journal-title":"Nature"},{"key":"2025071200573033900_btaf385-B20","doi-asserted-by":"crossref","first-page":"3048","DOI":"10.1038\/s41467-018-05477-x","article-title":"Directed evolution of CRISPR\u2013Cas9 to increase its specificity","volume":"9","author":"Lee","year":"2018","journal-title":"Nat Commun"},{"key":"2025071200573033900_btaf385-B21","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/s41392-023-01309-7","article-title":"CRISPR\/Cas9 therapeutics: progress and prospects","volume":"8","author":"Li","year":"2023","journal-title":"Signal Transduct Target Ther"},{"key":"2025071200573033900_btaf385-B22","doi-asserted-by":"crossref","first-page":"10024","DOI":"10.1038\/s41467-024-54365-0","article-title":"Discovering CRISPR\u2013Cas system with self-processing pre-crRNA capability by foundation models","volume":"15","author":"Li","year":"2024","journal-title":"Nat Commun"},{"key":"2025071200573033900_btaf385-B23","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025071200573033900_btaf385-B24","doi-asserted-by":"crossref","first-page":"3676","DOI":"10.1093\/bioinformatics\/btv423","article-title":"CRISPR-ERA: a comprehensive design tool for CRISPR-mediated gene editing, repression and activation","volume":"31","author":"Liu","year":"2015","journal-title":"Bioinformatics"},{"key":"2025071200573033900_btaf385-B25","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/BIBM62325.2024.10822205","volume-title":"2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Liu","year":"2024"},{"key":"2025071200573033900_btaf385-B26","first-page":"25","article-title":"Consistent individualized feature attribution for tree ensembles","volume":"5","author":"Lundberg","year":"2019","journal-title":"Methods"},{"key":"2025071200573033900_btaf385-B27","doi-asserted-by":"crossref","first-page":"W401","DOI":"10.1093\/nar\/gku410","article-title":"CHOPCHOP: a CRISPR\/Cas9 and TALEN web tool for genome editing","volume":"42","author":"Montague","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2025071200573033900_btaf385-B28","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1007\/s11103-020-01102-y","article-title":"sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks","volume":"105","author":"Niu","year":"2021","journal-title":"Plant Mol Biol"},{"key":"2025071200573033900_btaf385-B29","doi-asserted-by":"crossref","first-page":"1211","DOI":"10.1038\/nmeth.2646","article-title":"pLogo: a probabilistic approach to visualizing sequence motifs","volume":"10","author":"O\u2019shea","year":"2013","journal-title":"Nat Methods"},{"key":"2025071200573033900_btaf385-B30","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1002\/(SICI)1097-0282(1997)44:3<309::AID-BIP8>3.0.CO;2-Z","article-title":"Measuring the thermodynamics of RNA secondary structure formation","volume":"44","author":"SantaLucia","year":"1997","journal-title":"Biopolymers"},{"key":"2025071200573033900_btaf385-B31","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1126\/science.aad5227","article-title":"Rationally engineered Cas9 nucleases with improved specificity","volume":"351","author":"Slaymaker","year":"2016","journal-title":"Science"},{"key":"2025071200573033900_btaf385-B32","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","article-title":"CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice","volume":"22","author":"Thompson","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2025071200573033900_btaf385-B33","doi-asserted-by":"publisher","author":"Verkuil","year":"2022","DOI":"10.1101\/2022.12.21.521521"},{"key":"2025071200573033900_btaf385-B34","doi-asserted-by":"crossref","first-page":"1518","DOI":"10.1109\/TCBB.2022.3201631","article-title":"TransCrispr: transformer based hybrid model for predicting CRISPR\/Cas9 single guide RNA cleavage efficiency","volume":"20","author":"Wan","year":"2023","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025071200573033900_btaf385-B35","doi-asserted-by":"crossref","first-page":"4284","DOI":"10.1038\/s41467-019-12281-8","article-title":"Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning","volume":"10","author":"Wang","year":"2019","journal-title":"Nat Commun"},{"key":"2025071200573033900_btaf385-B36","doi-asserted-by":"crossref","first-page":"eadd8643","DOI":"10.1126\/science.add8643","article-title":"CRISPR technology: a decade of genome editing is only the beginning","volume":"379","author":"Wang","year":"2023","journal-title":"Science"},{"key":"2025071200573033900_btaf385-B37","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1186\/s13059-015-0784-0","article-title":"WU-CRISPR: characteristics of functional guide RNAs for the CRISPR\/Cas9 system","volume":"16","author":"Wong","year":"2015","journal-title":"Genome Biol"},{"key":"2025071200573033900_btaf385-B38","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1016\/j.jgg.2017.03.004","article-title":"Temperature effect on CRISPR\u2013Cas9 mediated genome editing","volume":"44","author":"Xiang","year":"2017","journal-title":"J Genet Genomics"},{"key":"2025071200573033900_btaf385-B39","doi-asserted-by":"crossref","first-page":"3238","DOI":"10.1038\/s41467-021-23576-0","article-title":"Enhancing CRISPR\u2013Cas9 gRNA efficiency prediction by data integration and deep learning","volume":"12","author":"Xiang","year":"2021","journal-title":"Nat Commun"},{"key":"2025071200573033900_btaf385-B40","doi-asserted-by":"publisher","author":"Xiao","year":"2025","DOI":"10.48550\/arXiv.2502.1750,"},{"key":"2025071200573033900_btaf385-B41","doi-asserted-by":"crossref","first-page":"142309","DOI":"10.1016\/j.ijbiomac.2025.142309","article-title":"CasPro-ESM2: accurate identification of Cas proteins integrating pre-trained protein language model and multi-scale convolutional neural network","volume":"308","author":"Yan","year":"2025","journal-title":"Int J Biol Macromol"},{"key":"2025071200573033900_btaf385-B42","doi-asserted-by":"crossref","first-page":"btaf127","DOI":"10.1093\/bioinformatics\/btaf127","article-title":"RNALoc-LM: RNA subcellular localization prediction using pre-trained RNA language model","volume":"41","author":"Zeng","year":"2025","journal-title":"Bioinformatics"},{"key":"2025071200573033900_btaf385-B43","doi-asserted-by":"crossref","first-page":"344","DOI":"10.1016\/j.csbj.2020.01.013","article-title":"C-RNNCrispr: prediction of CRISPR\/Cas9 sgRNA activity using convolutional and recurrent neural networks","volume":"18","author":"Zhang","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2025071200573033900_btaf385-B44","doi-asserted-by":"crossref","first-page":"1445","DOI":"10.1016\/j.csbj.2021.03.001","article-title":"Prediction of CRISPR\/Cas9 single guide RNA cleavage efficiency and specificity by attention-based convolutional neural networks","volume":"19","author":"Zhang","year":"2021","journal-title":"Computat Struct Biotechnol J"},{"key":"2025071200573033900_btaf385-B45","doi-asserted-by":"crossref","first-page":"7320","DOI":"10.1021\/acs.jcim.3c01339","article-title":"Unified model to predict gRNA efficiency across diverse cell lines and CRISPR\u2013Cas9 systems","volume":"63","author":"Zhong","year":"2023","journal-title":"J Chem Inform Model"},{"key":"2025071200573033900_btaf385-B46","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1038\/s41580-020-00288-9","article-title":"Applications of CRISPR\u2013cas in agriculture and plant biotechnology","volume":"21","author":"Zhu","year":"2020","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2025071200573033900_btaf385-B47","doi-asserted-by":"crossref","first-page":"e108424","DOI":"10.1371\/journal.pone.0108424","article-title":"CRISPRseek: a bioconductor package to identify target-specific guide RNAs for CRISPR\u2013Cas9 genome-editing systems","volume":"9","author":"Zhu","year":"2014","journal-title":"PloS One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf385\/63646110\/btaf385.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf385\/63646110\/btaf385.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf385\/63646110\/btaf385.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,12]],"date-time":"2025-07-12T04:57:45Z","timestamp":1752296265000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf385\/8182142"}},"subtitle":[],"editor":[{"given":"Jianlin","family":"Cheng","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,7,1]]},"references-count":47,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf385","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7,1]]},"article-number":"btaf385"}}