{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T02:17:46Z","timestamp":1775873866869,"version":"3.50.1"},"reference-count":47,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2022,7,31]],"date-time":"2022-07-31T00:00:00Z","timestamp":1659225600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior (CAPES)","award":["001"],"award-info":[{"award-number":["001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>COVID-19, the illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus belonging to the Coronaviridade family, a single-strand positive-sense RNA genome, has been spreading around the world and has been declared a pandemic by the World Health Organization. On 17 January 2022, there were more than 329 million cases, with more than 5.5 million deaths. Although COVID-19 has a low mortality rate, its high capacities for contamination, spread, and mutation worry the authorities, especially after the emergence of the Omicron variant, which has a high transmission capacity and can more easily contaminate even vaccinated people. Such outbreaks require elucidation of the taxonomic classification and origin of the virus (SARS-CoV-2) from the genomic sequence for strategic planning, containment, and treatment of the disease. Thus, this work proposes a high-accuracy technique to classify viruses and other organisms from a genome sequence using a deep learning convolutional neural network (CNN). Unlike the other literature, the proposed approach does not limit the length of the genome sequence. The results show that the novel proposal accurately distinguishes SARS-CoV-2 from the sequences of other viruses. The results were obtained from 1557 instances of SARS-CoV-2 from the National Center for Biotechnology Information (NCBI) and 14,684 different viruses from the Virus-Host DB. As a CNN has several changeable parameters, the tests were performed with forty-eight different architectures; the best of these had an accuracy of 91.94 \u00b1 2.62% in classifying viruses into their realms correctly, in addition to 100% accuracy in classifying SARS-CoV-2 into its respective realm, Riboviria. For the subsequent classifications (family, genera, and subgenus), this accuracy increased, which shows that the proposed architecture may be viable in the classification of the virus that causes COVID-19.<\/jats:p>","DOI":"10.3390\/s22155730","type":"journal-article","created":{"date-parts":[[2022,8,1]],"date-time":"2022-08-01T23:49:27Z","timestamp":1659397767000},"page":"5730","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5736-0782","authenticated-orcid":false,"given":"Gabriel B. M.","family":"C\u00e2mara","sequence":"first","affiliation":[{"name":"Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"},{"name":"Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8167-5568","authenticated-orcid":false,"given":"Maria G. F.","family":"Coutinho","sequence":"additional","affiliation":[{"name":"Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6282-9744","authenticated-orcid":false,"given":"Lucileide M. D. da","family":"Silva","sequence":"additional","affiliation":[{"name":"Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"},{"name":"Federal Institute of Education, Science and Technology of Rio Grande do Norte, Paraiso, Santa Cruz 59200-000, RN, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8428-2583","authenticated-orcid":false,"given":"Walter V. do N.","family":"Gadelha","sequence":"additional","affiliation":[{"name":"Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6356-3538","authenticated-orcid":false,"given":"Matheus F.","family":"Torquato","sequence":"additional","affiliation":[{"name":"Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3798-5512","authenticated-orcid":false,"given":"Raquel de M.","family":"Barbosa","sequence":"additional","affiliation":[{"name":"Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"},{"name":"Department of Pharmacy and Pharmaceutical Technology, University of Granada, 18071 Granada, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7536-2506","authenticated-orcid":false,"given":"Marcelo A. C.","family":"Fernandes","sequence":"additional","affiliation":[{"name":"Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"},{"name":"Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"},{"name":"Department of Computer Engineering and Automation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1804","DOI":"10.3390\/v2081803","article-title":"Coronavirus Genomics and Bioinformatics Analysis","volume":"2","author":"Woo","year":"2010","journal-title":"Viruses"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1038\/s41579-018-0118-9","article-title":"Origin and evolution of pathogenic coronaviruses","volume":"17","author":"Cui","year":"2019","journal-title":"Nat. Rev. Microbiol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1038\/s41586-020-2012-7","article-title":"A pneumonia outbreak associated with a new coronavirus of probable bat origin","volume":"579","author":"Zhou","year":"2020","journal-title":"Nature"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/s41586-020-2008-3","article-title":"A new coronavirus associated with human respiratory disease in China","volume":"579","author":"Wu","year":"2020","journal-title":"Nature"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"450","DOI":"10.1038\/s41591-020-0820-9","article-title":"The proximal origin of SARS-CoV-2","volume":"26","author":"Andersen","year":"2020","journal-title":"Nat. Med."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/j.cell.2020.02.058","article-title":"Structure, Function, and Antigenicity of the SARSCoV-2 Spike Glycoprotein","volume":"181","author":"Walls","year":"2020","journal-title":"Cell"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/j.virol.2021.02.013","article-title":"Conflicting and ambiguous names of overlapping ORFs in the SARS-CoV-2 genome: A homology-based resolution","volume":"558","author":"Jungreis","year":"2021","journal-title":"Virology"},{"key":"ref_8","first-page":"11","article-title":"The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak\u2014An update on the status","volume":"7","author":"Guo","year":"2020","journal-title":"Mil. Med. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1346","DOI":"10.1016\/j.cub.2020.03.022","article-title":"Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak","volume":"6","author":"Zhang","year":"2020","journal-title":"Curr. Biol."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Randhawa, G.S., Soltysiak, M.P.M., Roz, H.E., de Souza, C.P.E., Hill, K.A., and Kari, L. (2020). Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoS ONE, 15.","DOI":"10.1101\/2020.02.03.932350"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"(1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443\u2013453.","DOI":"10.1016\/0022-2836(70)90057-4"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"(1991). Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods, 3, 66\u201370.","DOI":"10.1016\/S1046-2023(05)80165-3"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"(1990). Basic local alignment search tool. J. Mol. Biol., 215, 403\u2013410.","DOI":"10.1006\/jmbi.1990.9999"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1093\/bib\/bbt078","article-title":"Applications of alignment-free methods in epigenomics","volume":"15","author":"Pinello","year":"2013","journal-title":"Briefings Bioinform."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison\u2014A review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: Benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Morgenstern, B. (2019). Sequence Comparison without Alignment: The SpaM approaches. bioRxiv.","DOI":"10.1101\/2019.12.16.878314"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/s13059-019-1755-7","article-title":"Benchmarking of alignment-free sequence comparison methods","volume":"20","author":"Zielezinski","year":"2019","journal-title":"Genome Biol."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"105618","DOI":"10.1016\/j.dib.2020.105618","article-title":"Chaos game representation dataset of SARS-CoV-2 genome","volume":"30","author":"Barbosa","year":"2020","journal-title":"Data Brief"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2163","DOI":"10.1093\/nar\/18.8.2163","article-title":"Chaos game representation of gene structure","volume":"18","author":"Jeffrey","year":"1990","journal-title":"Nucleic Acids Res."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1093\/bioinformatics\/btz493","article-title":"Deep learning on chaos game representation for proteins","volume":"36","author":"Eger","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"105829","DOI":"10.1016\/j.dib.2020.105829","article-title":"Data stream dataset of SARS-CoV-2 genome","volume":"31","author":"Barbosa","year":"2020","journal-title":"Data Brief"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Randhawa, G.S., Hill, K.A., and Kari, L. (2019). ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels. BMC Genom., 20.","DOI":"10.1186\/s12864-019-5571-y"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1038\/nrg3920","article-title":"Machine learning applications in genetics and genomics","volume":"16","author":"Libbrecht","year":"2015","journal-title":"Nat. Rev. Genet."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1280","DOI":"10.1093\/bib\/bbx165","article-title":"BioSeq-Analysis: A platform for DNA, RNA and protein sequence analysis based on machine learning approaches","volume":"20","author":"Liu","year":"2019","journal-title":"Briefings Bioinform."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Fiannaca, A., La Paglia, L., La Rosa, M., Bosco, L., Renda, G., Rizzo, R., Gaglio, S., and Urso, A. (2018). Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform., 19.","DOI":"10.1186\/s12859-018-2182-6"},{"key":"ref_29","unstructured":"Randhawa, G.S., Soltysiak, M.P., Roz, H.E., de Souza, C.P., Hill, K.A., and Kari, L. (2020). Machine learning analysis of genomic signatures provides evidence of associations between Wuhan 2019-nCoV and bat betacoronaviruses. bioRxiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Remita, M.A., Halioui, A., Diouara, A.A.M., Daigle, B., Kiani, G., and Diallo, A.B. (2017). A machine learning approach for viral genome classification. BMC Bioinform., 18.","DOI":"10.1186\/s12859-017-1602-3"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mock, F., Viehweger, A., Barth, E., and Marz, M. (2019). Viral host prediction with Deep Learning. bioRxiv.","DOI":"10.1101\/575571"},{"key":"ref_32","unstructured":"Zhu, H., Guo, Q., Li, M., Wang, C., Fang, Z., Wang, P., Tan, J., Wu, S., and Xiao, Y. (2020). Host and infectivity prediction of Wuhan 2019 novel coronavirus using deep learning algorithm. BioRxiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1089\/cmb.2019.0436","article-title":"Comparative Study Using Neural Networks for 16S Ribosomal Gene Classification","volume":"27","author":"Desai","year":"2020","journal-title":"J. Comput. Biol."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1038\/s41588-018-0295-5","article-title":"A primer on deep learning in genomics","volume":"51","author":"Zou","year":"2019","journal-title":"Nat. Genet."},{"key":"ref_35","first-page":"851","article-title":"Deep learning in bioinformatics","volume":"18","author":"Min","year":"2017","journal-title":"Briefings Bioinform."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","article-title":"Deep learning: New computational modelling techniques for genomics","volume":"20","author":"Eraslan","year":"2019","journal-title":"Nat. Rev. Genet."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Angelini, C., Rancoita, P.M., and Rovetta, S. (2015, January 10\u201312). A Deep Learning Approach to DNA Sequence Classification. Proceedings of the Computational Intelligence Methods for Bioinformatics and Biostatistics, Naples, Italy.","DOI":"10.1007\/978-3-319-44332-4"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"280","DOI":"10.4236\/jbise.2016.95021","article-title":"DNA sequence classification by convolutional neural network","volume":"9","author":"Nguyen","year":"2016","journal-title":"J. Biomed. Sci. Eng."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Tampuu, A., Bzhalava, Z., Dillner, J., and Vicente, R. (2019). ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS ONE, 14.","DOI":"10.1101\/602656"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1007\/s40484-019-0187-4","article-title":"Identifying viruses from metagenomic data using deep learning","volume":"8","author":"Ren","year":"2020","journal-title":"Quant. Biol."},{"key":"ref_41","unstructured":"Lopez-Rincon, A., Tonda, A., Mendoza-Maldonado, L., Claassen, E., Garssen, J., and Kraneveld, A.D. (2020). Accurate Identification of SARS-CoV-2 from Viral Genome Sequences using Deep Learning. bioRxiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.ymeth.2020.05.018","article-title":"CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning","volume":"189","author":"Shang","year":"2021","journal-title":"Methods"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Coutinho, M.G.F., C\u00e2mara, G.B.M., Barbosa, R.d.M., and Fernandes, M.A.C. (2021). Deep learning based on stacked sparse autoencoder applied to viral genome classification of SARS-CoV-2 virus. bioRxiv.","DOI":"10.1101\/2021.10.14.464414"},{"key":"ref_44","unstructured":"Fernandes, M.A.C. (2020). k-mers 1D and 2D representation dataset of SARS-CoV-2 nucleotide sequences. Mendeley Data."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"2063","DOI":"10.1109\/TNNLS.2018.2790388","article-title":"Applications of Deep Learning and Reinforcement Learning to Biological Data","volume":"29","author":"Mahmud","year":"2018","journal-title":"IEEE Trans. Neural Networks Learn. Syst."},{"key":"ref_46","unstructured":"Acheson, N.H. (2007). Fundamentals of Molecular Virology, Wiley."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"81297","DOI":"10.1109\/ACCESS.2019.2923687","article-title":"Viral genome deep classifier","volume":"7","author":"Grabowski","year":"2019","journal-title":"IEEE Access"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/15\/5730\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:00:14Z","timestamp":1760140814000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/15\/5730"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,31]]},"references-count":47,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["s22155730"],"URL":"https:\/\/doi.org\/10.3390\/s22155730","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,31]]}}}