{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T11:04:38Z","timestamp":1768993478242,"version":"3.49.0"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,6,1]],"date-time":"2020-06-01T00:00:00Z","timestamp":1590969600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,6,1]],"date-time":"2020-06-01T00:00:00Z","timestamp":1590969600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines).<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-020-3531-9","type":"journal-article","created":{"date-parts":[[2020,6,2]],"date-time":"2020-06-02T18:05:26Z","timestamp":1591121126000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning"],"prefix":"10.1186","volume":"21","author":[{"given":"Ali Haisam","family":"Muhammad Rafid","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Md.","family":"Toufikuzzaman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammad Saifur","family":"Rahman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9419-6478","authenticated-orcid":false,"given":"M. Sohel","family":"Rahman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,6,1]]},"reference":[{"issue":"2","key":"3531_CR1","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1007\/s41649-018-0056-x","volume":"10","author":"G Rubeis","year":"2018","unstructured":"Rubeis G, Steger F. Risks and benefits of human germline genome editing: An ethical analysis. Asian Bioeth Rev. 2018; 10(2):133\u201341. https:\/\/doi.org\/10.1007\/s41649-018-0056-x.","journal-title":"Asian Bioeth Rev"},{"key":"3531_CR2","volume-title":"Sequence based computational methods for protein attribute prediction and phylogeny reconstruction. PhD thesis","author":"MS Rahman","year":"2018","unstructured":"Rahman MS. Sequence based computational methods for protein attribute prediction and phylogeny reconstruction. PhD thesis. Dhaka: Bangladesh University of Engineering and Technology; 2018."},{"key":"3531_CR3","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1016\/j.artmed.2018.12.010","volume":"94","author":"MS Rahman","year":"2019","unstructured":"Rahman MS, Rahman MK, Saha S, Kaykobad M, Rahman MS. Antigenic: An improved prediction model of protective antigens. Artif Intell Med. 2019; 94:28\u201341. https:\/\/doi.org\/10.1016\/j.artmed.2018.12.010.","journal-title":"Artif Intell Med"},{"key":"3531_CR4","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1016\/j.artmed.2017.11.003","volume":"84","author":"MS Rahman","year":"2018","unstructured":"Rahman MS, Rahman MK, Kaykobad M, Rahman MS. isgpt: An optimized model to identify sub-golgi protein types using SVM and random forest based feature selection. Artif Intell Med. 2018; 84:90\u2013100. https:\/\/doi.org\/10.1016\/j.artmed.2017.11.003.","journal-title":"Artif Intell Med"},{"key":"3531_CR5","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1016\/j.jtbi.2018.05.006","volume":"452","author":"MS Rahman","year":"2018","unstructured":"Rahman MS, Shatabda S, Saha S, Kaykobad M, Rahman MS. Dpp-pseaac: A dna-binding protein prediction model using chou\u2019s general pseaac. J Theor Biol. 2018; 452:22\u201334. https:\/\/doi.org\/10.1016\/j.jtbi.2018.05.006.","journal-title":"J Theor Biol"},{"key":"3531_CR6","doi-asserted-by":"publisher","unstructured":"Dacrema M. F., Cremonesi P., Jannach D.Are we really making much progress? a worrying analysis of recent neural recommendation approaches. In: Proceedings of the 13th ACM Conference on Recommender Systems. ACM: 2019. https:\/\/doi.org\/10.1145\/3298689.3347058.","DOI":"10.1145\/3298689.3347058"},{"issue":"6096","key":"3531_CR7","doi-asserted-by":"publisher","first-page":"816","DOI":"10.1126\/science.1225829","volume":"337","author":"M Jinek","year":"2012","unstructured":"Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. A programmable dual-rna\u2013guided dna endonuclease in adaptive bacterial immunity. Science. 2012; 337(6096):816\u201321.","journal-title":"Science"},{"issue":"6166","key":"3531_CR8","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1126\/science.1247005","volume":"343","author":"O Shalem","year":"2014","unstructured":"Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelsen TS, Heckl D, Ebert BL, Root DE, Doench JG, et al.Genome-scale crispr-cas9 knockout screening in human cells. Science. 2014; 343(6166):84\u20137.","journal-title":"Science"},{"issue":"6166","key":"3531_CR9","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1126\/science.1246981","volume":"343","author":"T Wang","year":"2014","unstructured":"Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the crispr-cas9 system. Science. 2014; 343(6166):80\u20134.","journal-title":"Science"},{"issue":"2","key":"3531_CR10","doi-asserted-by":"publisher","first-page":"184","DOI":"10.1038\/nbt.3437","volume":"34","author":"JG Doench","year":"2016","unstructured":"Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C, Orchard R, et al.Optimized sgrna design to maximize activity and minimize off-target effects of crispr-cas9. Nat Biotechnol. 2016; 34(2):184.","journal-title":"Nat Biotechnol"},{"issue":"2","key":"3531_CR11","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1007\/s12539-018-0298-z","volume":"10","author":"Y Cui","year":"2018","unstructured":"Cui Y, Xu J, Cheng M, Liao X, Peng S. Review of crispr\/cas9 sgrna design tools. Interdiscip Sci Comput Life Sci. 2018; 10(2):455\u201365.","journal-title":"Interdiscip Sci Comput Life Sci"},{"issue":"3","key":"3531_CR12","first-page":"273","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995; 20(3):273\u201397.","journal-title":"Mach Learn"},{"issue":"1","key":"3531_CR13","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1007\/s12539-017-0271-2","volume":"10","author":"Z Pei","year":"2018","unstructured":"Pei Z, Liu J, Liu M, Zhou W, Yan P, Wen S, Chen Y. Risk-predicting model for incident of essential hypertension based on environmental and genetic factors with support vector machine. Interdiscip Sci Comput Life Sci. 2018; 10(1):126\u201330.","journal-title":"Interdiscip Sci Comput Life Sci"},{"issue":"8","key":"3531_CR14","doi-asserted-by":"publisher","first-page":"0181943","DOI":"10.1371\/journal.pone.0181943","volume":"12","author":"MK Rahman","year":"2017","unstructured":"Rahman MK, Rahman MS. Crisprpred: A flexible and efficient tool for sgrnas on-target activity prediction in crispr\/cas9 systems. PloS one. 2017; 12(8):0181943.","journal-title":"PloS one"},{"issue":"2","key":"3531_CR15","doi-asserted-by":"publisher","first-page":"122","DOI":"10.1038\/nmeth.2812","volume":"11","author":"F Heigwer","year":"2014","unstructured":"Heigwer F, Kerr G, Boutros M. E-crisp: fast crispr target site identification. Nat Methods. 2014; 11(2):122.","journal-title":"Nat Methods"},{"issue":"8","key":"3531_CR16","doi-asserted-by":"publisher","first-page":"805","DOI":"10.1038\/nbt.3291","volume":"33","author":"CR MacPherson","year":"2015","unstructured":"MacPherson CR, Scherf A. Flexible guide-rna design for crispr applications using protospacer workbench. Nat Biotechnol. 2015; 33(8):805.","journal-title":"Nat Biotechnol"},{"issue":"W1","key":"3531_CR17","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1093\/nar\/gkw398","volume":"44","author":"K Labun","year":"2016","unstructured":"Labun K, Montague TG, Gagnon JA, Thyme SB, Valen E. Chopchop v2: a web tool for the next generation of crispr genome engineering. Nucleic Acids Res. 2016; 44(W1):272\u20136.","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"3531_CR18","doi-asserted-by":"publisher","first-page":"218","DOI":"10.1186\/s13059-015-0784-0","volume":"16","author":"N Wong","year":"2015","unstructured":"Wong N, Liu W, Wang X. Wu-crispr: characteristics of functional guide rnas for the crispr\/cas9 system. Genome Biol. 2015; 16(1):218.","journal-title":"Genome Biol"},{"key":"3531_CR19","volume-title":"Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1","author":"T. K. Ho","year":"1995","unstructured":"Ho T. K.Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1. Montreal: IEEE: 1995. p. 278\u2013282."},{"issue":"8","key":"3531_CR20","doi-asserted-by":"publisher","first-page":"832","DOI":"10.1109\/34.709601","volume":"20","author":"T. K. Ho","year":"1998","unstructured":"Ho T. K.The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998; 20(8):832\u201344.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"1","key":"3531_CR21","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1186\/s13059-018-1459-4","volume":"19","author":"G Chuai","year":"2018","unstructured":"Chuai G, Ma H, Yan J, Chen M, Hong N, Xue D, Zhou C, Zhu C, Chen K, Duan B, et al.Deepcrispr: optimized crispr guide rna design by deep learning. Genome Biol. 2018; 19(1):80.","journal-title":"Genome Biol"},{"key":"3531_CR22","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","volume":"61","author":"J Schmidhuber","year":"2015","unstructured":"Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw. 2015; 61:85\u2013117.","journal-title":"Neural Netw"},{"issue":"1","key":"3531_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-018-07882-8","volume":"10","author":"D Wang","year":"2019","unstructured":"Wang D, Zhang C, Wang B, Li B, Wang Q, Liu D, Wang H, Zhou Y, Shi L, Lan F, et al.Optimized crispr guide rna design for two high-fidelity cas9 variants by deep learning. Nat Commun. 2019; 10(1):1\u201314.","journal-title":"Nat Commun"},{"issue":"1","key":"3531_CR24","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1186\/s13059-016-1012-2","volume":"17","author":"M Haeussler","year":"2016","unstructured":"Haeussler M, Sch\u00f6nig K, Eckert H, Eschstruth A, Miann\u00e9 J, Renaud J-B, Schneider-Maunoury S, Shkumatava A, Teboul L, Kent J, et al.Evaluation of off-target and on-target scoring algorithms and integration into the guide rna selection tool crispor. Genome Biol. 2016; 17(1):148.","journal-title":"Genome Biol"},{"key":"3531_CR25","unstructured":"Gini C. In: Pizetti E, Salvemini T, (eds).Variabilit\u00e0 e mutabilit\u00e0 (variability and mutability). 1955 ed. Bologna, Reprinted in Memorie di metodologica statistica. Rome: Libreria Eredi Virgilio Veschi ; 1912."},{"key":"3531_CR26","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011; 12:2825\u201330.","journal-title":"J Mach Learn Res"},{"issue":"8","key":"3531_CR27","doi-asserted-by":"publisher","first-page":"1147","DOI":"10.1101\/gr.191452.115","volume":"25","author":"H Xu","year":"2015","unstructured":"Xu H, Xiao T, Chen C-H, Li W, Meyer CA, Wu Q, Wu D, Cong L, Zhang F, Liu JS, et al.Sequence determinants of improved crispr sgrna design. Genome Res. 2015; 25(8):1147\u201357.","journal-title":"Genome Res"},{"issue":"3","key":"3531_CR28","doi-asserted-by":"publisher","first-page":"0119372","DOI":"10.1371\/journal.pone.0119372","volume":"10","author":"SV Prykhozhij","year":"2015","unstructured":"Prykhozhij SV, Rajan V, Gaston D, Berman JN. Crispr multitargeter: a web tool to find common and unique crispr single guide rna targets in a set of similar sequences. PloS one. 2015; 10(3):0119372.","journal-title":"PloS one"},{"issue":"9","key":"3531_CR29","doi-asserted-by":"publisher","first-page":"823","DOI":"10.1038\/nmeth.3473","volume":"12","author":"R Chari","year":"2015","unstructured":"Chari R, Mali P, Moosburner M, Church GM. Unraveling crispr-cas9 genome engineering parameters via a library-on-library approach. Nat Methods. 2015; 12(9):823.","journal-title":"Nat Methods"},{"issue":"24","key":"3531_CR30","doi-asserted-by":"crossref","first-page":"4014","DOI":"10.1093\/bioinformatics\/btv537","volume":"31","author":"J Park","year":"2015","unstructured":"Park J, Bae S, Kim J-S. Cas-designer: a web-based tool for choice of crispr-cas9 target sites. Bioinformatics. 2015; 31(24):4014\u20136.","journal-title":"Bioinformatics"},{"issue":"1","key":"3531_CR31","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","volume":"63","author":"P Geurts","year":"2006","unstructured":"Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006; 63(1):3\u201342.","journal-title":"Mach Learn"},{"key":"3531_CR32","first-page":"797","volume":"19","author":"Z Wen","year":"2018","unstructured":"Wen Z, Shi J, Li Q, He B, Chen J. ThunderSVM: A fast SVM library on GPUs and CPUs. J Mach Learn Res. 2018; 19:797\u2013801.","journal-title":"J Mach Learn Res"},{"key":"3531_CR33","volume-title":"Artificial Intelligence: A Modern Approach, 3rd edn.","author":"S Russell","year":"2009","unstructured":"Russell S, Norvig P. Artificial Intelligence: A Modern Approach, 3rd edn.USA: Prentice Hall Press; 2009."},{"key":"3531_CR34","unstructured":"Chuai G.Private Communication. 2019."},{"key":"3531_CR35","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1016\/j.artmed.2017.11.003","volume":"84","author":"MS Rahman","year":"2018","unstructured":"Rahman MS, Rahman MK, Kaykobad M, Rahman MS. isgpt: An optimized model to identify sub-golgi protein types using svm and random forest based feature selection. Artif Intell Med. 2018; 84:90\u2013100.","journal-title":"Artif Intell Med"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3531-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-020-3531-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3531-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,1]],"date-time":"2023-10-01T21:14:47Z","timestamp":1696194887000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-3531-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,1]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3531"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-3531-9","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,1]]},"assertion":[{"value":"5 November 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 May 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 June 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"223"}}